# Exploratory Data Analysis Using Sweetviz
Sweetviz: - Sweetviz is an open-source Python library used to perform quick EDA. The library offers beautiful, high-density visualizations for EDA with only few lines of code. The EDA analysis output generated is a fully self-contained HTML application with user interaction. The output file generated will be a 1080p widescreen html app which opens in your default web browser. The latest version of Sweetviz is 2.1.0 released on April-1,2021. Sweetviz helps in quick visualization of target variables and also offers the comparison of Train and Test data in various aspects. It also offers the functionality to compare two subsets of the same data set with its compare_intra() function.

Sweetviz has three main functions for creating reports:
    • analyze(...)
    • compare(...)
    • compare_intra(...)


In [1]:
#import the libraries
import pandas as pd 
import numpy as np
import sweetviz as st

In [2]:
#load the data
df=pd.read_csv("data_sample.csv")
df.head()

Unnamed: 0,Year,Period,Crop,Area,Production,Yield,Area Under Irrigation (%),EYear
0,1950,1950-51,Wheat,9.75,6.46,663,33.99,51
1,1951,1951-52,Wheat,9.47,6.18,653,35.76,52
2,1952,1952-53,Wheat,9.83,7.5,763,37.15,53
3,1953,1953-54,Wheat,10.68,8.02,750,36.16,54
4,1954,1954-55,Wheat,11.26,9.04,803,35.0,55


In [25]:
# Now lets generate report using Sweetviz
#first analyze the dataframe using analyze() function 
# We profide dataframe as input
report=st.analyze(df)
# Function for Report 
# Default arguments will generate to "SWEETVIZ_REPORT.html"
report.show_html()

                                             |          | [  0%]   00:00 -> (? left)

Report SWEETVIZ_REPORT.html was generated! NOTEBOOK/COLAB USERS: the web browser MAY not pop up, regardless, the report IS saved in your notebook/colab files.


In [4]:
# Now lets Compare the train and Test data 
#total size of dataframe
df.shape

(340, 8)

In [5]:
# load train_test_split from sklearn
from sklearn.model_selection import train_test_split
#split the data in train and test with test size 20%
train, test = train_test_split(df, test_size=0.2)

In [6]:
#training data Shape
train.shape

(272, 8)

In [7]:
#test data Shape
test.shape

(68, 8)

In [26]:
#lets compare the two data sets using compare() function of Sweetviz
compare=st.compare([train, "Training Data"], [test, "Test Data"])
#show report
compare.show_html("Train_Test_Report.html", 
                  open_browser=True, 
                  layout='widescreen', 
                  scale=None)

                                             |          | [  0%]   00:00 -> (? left)

Report Train_Test_Report.html was generated! NOTEBOOK/COLAB USERS: the web browser MAY not pop up, regardless, the report IS saved in your notebook/colab files.


Now lets compare the two categorical features from the same data frame using Sweetviz compare_intra() function

In [11]:
df.Crop.unique()

array(['Wheat', 'Rice', 'Jowar', 'Bajra', 'Maize'], dtype=object)

In [16]:
others=['Rice', 'Jowar', 'Bajra', 'Maize']
others

['Rice', 'Jowar', 'Bajra', 'Maize']

In [27]:
#compare the two categorical features Wheat and others 
comp_report= st.compare_intra(df, df["Crop"] == "Wheat", ['Wheat',"Others"])
comp_report.show_html('SWEETVIZ_COMPARE_TWO.html', 
                      open_browser=True, 
                      layout='widescreen', 
                      scale=None)

                                             |          | [  0%]   00:00 -> (? left)

Report SWEETVIZ_COMPARE_TWO.html was generated! NOTEBOOK/COLAB USERS: the web browser MAY not pop up, regardless, the report IS saved in your notebook/colab files.


In [23]:
comp_report.show_html('SWEETVIZ_COMPARE_TWO.html', open_browser=True, layout='widescreen', scale=None)

Report SWEETVIZ_COMPARE_TWO.html was generated! NOTEBOOK/COLAB USERS: the web browser MAY not pop up, regardless, the report IS saved in your notebook/colab files.


In [5]:
from autoviz.AutoViz_Class import AutoViz_Class
AV = AutoViz_Class()

Imported AutoViz_Class version: 0.0.81. Call using:
    from autoviz.AutoViz_Class import AutoViz_Class
    AV = AutoViz_Class()
    AV.AutoViz(filename, sep=',', depVar='', dfte=None, header=0, verbose=0,
                            lowess=False,chart_format='svg',max_rows_analyzed=150000,max_cols_analyzed=30)
Note: verbose=0 or 1 generates charts and displays them in your local Jupyter notebook.
      verbose=2 saves plots in your local machine under AutoViz_Plots directory and does not display charts.


In [53]:
import dtale

In [54]:
dff=pd.read_csv("data_sample.csv")
dff.head()

Unnamed: 0,Year,Period,Crop,Area,Production,Yield,Area Under Irrigation (%),EYear
0,1950,1950-51,Wheat,9.75,6.46,663,33.99,51
1,1951,1951-52,Wheat,9.47,6.18,653,35.76,52
2,1952,1952-53,Wheat,9.83,7.5,763,37.15,53
3,1953,1953-54,Wheat,10.68,8.02,750,36.16,54
4,1954,1954-55,Wheat,11.26,9.04,803,35.0,55


In [55]:
output=dtale.show(df)

In [59]:
output.to_html()

AttributeError: 'DtaleData' object has no attribute 'to_html'

Executing shutdown due to inactivity...


2021-04-08 10:52:54,398 - INFO     - Executing shutdown due to inactivity...


Executing shutdown due to inactivity...


2021-04-08 10:53:03,498 - INFO     - Executing shutdown due to inactivity...


Executing shutdown...


2021-04-08 10:53:12,971 - INFO     - Executing shutdown...


Executing shutdown...


2021-04-08 10:53:13,431 - INFO     - Executing shutdown...


In [57]:
output

