<a href="https://www.kaggle.com/code/yunasheng/automate-your-eda-and-understand-the-data-quickly?scriptVersionId=164343541" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

<div style="text-align: center"><img src="https://miro.medium.com/v2/resize:fit:1400/format:webp/0*VT2hHvhHJpuJxSrD.jpeg" width="100%" heigh="100%" alt="Retrieve&Re-Rank pipeline"></div>

Whether you are a seasoned pro or just dipping your toes into the world of data using Python as a data analyst, data scientist, or going by any other title, these libraries are an absolute must-know. We all know that manual exploratory data analysis(EDA) can be time-consuming. especially when we are diving into new data. It eats up hours of our days to analyze and understand the data.

In this article, I will share `five` different `Python-supported packages` that will transform the way you analyze data. These packages automate your data-analyzing tasks and perform EDA with ease and speed. It also gives an overview of the data and delivers a thorough and comprehensive analysis that allows you to visualize the data easily and gain valuable insights.

In [1]:
# import the necessary libraries
import pandas as pd
import numpy as np
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:90% !important; }</style>"))

# read the data 
data = pd.read_csv('/kaggle/input/housing-prices-dataset/Housing.csv')
data.columns = data.columns.str.upper()   #convert the columns to uppercase 
data

  from IPython.core.display import display, HTML


Unnamed: 0,PRICE,AREA,BEDROOMS,BATHROOMS,STORIES,MAINROAD,GUESTROOM,BASEMENT,HOTWATERHEATING,AIRCONDITIONING,PARKING,PREFAREA,FURNISHINGSTATUS
0,13300000,7420,4,2,3,yes,no,no,no,yes,2,yes,furnished
1,12250000,8960,4,4,4,yes,no,no,no,yes,3,no,furnished
2,12250000,9960,3,2,2,yes,no,yes,no,no,2,yes,semi-furnished
3,12215000,7500,4,2,2,yes,no,yes,no,yes,3,yes,furnished
4,11410000,7420,4,1,2,yes,yes,yes,no,yes,2,no,furnished
...,...,...,...,...,...,...,...,...,...,...,...,...,...
540,1820000,3000,2,1,1,yes,no,yes,no,no,2,no,unfurnished
541,1767150,2400,3,1,1,no,no,no,no,no,0,no,semi-furnished
542,1750000,3620,2,1,1,yes,no,no,no,no,0,no,unfurnished
543,1750000,2910,3,1,1,no,no,no,no,no,0,no,furnished


The dataset consists of 545 rows of data; there are 12 independent variables (bedrooms, bathrooms, parking, etc.) that provide various attributes related to house price and one target dependent variable (PIRCE).

# 1.Dtale
Dtale is an open-souce library that offers comprehensive insights into your data through a user-friendly interface for exploration and visualization. It accomplishes almost every aspect of EDA that we require, and the great thing about these packages is that we can also export the code for each visualization we create. To get started, install the packages using pip install and run the provided code as shown below.

In [2]:
%%capture
#install the dtale package
!pip install dtale  # from inside jupyter notebook or
# !pip install dtale # from terminal
!pip install ipython

In [3]:
!pip install pyngrok

Collecting pyngrok
  Downloading pyngrok-7.1.2-py3-none-any.whl.metadata (7.6 kB)
Downloading pyngrok-7.1.2-py3-none-any.whl (22 kB)
Installing collected packages: pyngrok
Successfully installed pyngrok-7.1.2


In [4]:
# import the library
import dtale
import dtale.app as dtale_app

# dataa = pd.read_csv('/kaggle/input/house-prices-advanced-regression-techniques/sample_submission.csv')
d = dtale.show(data)
d



After executing the code above, the results will be displayed in your editor or browser. It offers a comprehensive analysis, allowing us to check for missing values, duplicates, min/max/mean values, perform correlation analysis, plot various graphs, and much more.

# 2. Sweetviz
If you’re looking to save and document your data analysis findings in HTML format, then Sweetviz is the ideal choice. It offers interactive analytical results, allowing you to dive deeper into your data and discover valuable insights. Simply install the packages and run the code below.

In [5]:
%%capture
#installing the package
!pip install sweetviz # from inside jupyter notebook

In [6]:
# analysis report with sweetviz
import sweetviz as sv

# make analysis and assign the target feature
sweet_report = sv.analyze(data , target_feat='PRICE') 

# view the analysis result in browser 
sweet_report.show_html()

# save the analysis result into html
sweet_report.show_html('analysis_with_sweetviz.html', scale=0.92)

                                             |          | [  0%]   00:00 -> (? left)

Report SWEETVIZ_REPORT.html was generated! NOTEBOOK/COLAB USERS: the web browser MAY not pop up, regardless, the report IS saved in your notebook/colab files.
Report analysis_with_sweetviz.html was generated! NOTEBOOK/COLAB USERS: the web browser MAY not pop up, regardless, the report IS saved in your notebook/colab files.


In [7]:
from IPython.display import IFrame

# Assuming the Sweetviz report file is named 'analysis_with_sweetviz.html'
report_file = 'analysis_with_sweetviz.html'

# Display the Sweetviz report using IFrame
IFrame(src=report_file, width=1000, height=500)

# 3.Data-profiling

Pandas-profiling provides an extensive insight into your data effortlessly. It covers all the fundamental analysis you require with minimal coding effort. Just like Sweetviz, it also helps in documenting your EDA in HTML format, making data exploration and analysis a breeze.

In [8]:
%%capture
# install the packages
!pip install ydata-profiling
# !pip install pydantic
# !pip install pydantic-settings

In [9]:
# import the library
from ydata_profiling import ProfileReport

# view the analysis result inside jupyter
prof = ProfileReport(data, title="DATA ANALYSIS REPORT")
prof

# save the analysis to html
# prof.to_file(output_file='PANDAS_PROFILING_ANALYSIS.html')

Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]



In the code provided above, 'minimal' is optional; you can choose to set it as either 'True' or 'False'.When set to 'True', it will display only the "Overview" and "Variables"(found in the images in the top right corner) of the data analysis. On the other hand, if you set it to 'False', it will provide a more detailed analysis, including information on `Correlation, Missing Values, and sample data`. Plus, if you click on 'Toggle detailed' located for each variable at the bottom right corner, it shows the statistical information and analysis detailed for each selected feature.