# Install

To use sweetviz, you can simply install the package from pip. To do this inside a notebook use the shell command ("!").



In [1]:
!pip install sweetviz==2.1.4

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting sweetviz==2.1.4
  Downloading sweetviz-2.1.4-py3-none-any.whl (15.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.1/15.1 MB[0m [31m39.4 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: sweetviz
Successfully installed sweetviz-2.1.4


# Getting Started
Once installed, you just need to `import` the module. Then, using sweetviz is a simple two-step process:
1. Create a `DataframeReport` object using one of: `analyze()`, `compare()` or `compare_intra()`
2. Use a `show_xxx()` function to render the report. You can now use either **html** or **notebook** report options, as well as apply **scaling**.

Let's get started and import sweetviz, pandas, and the HCC dataset, which we will use for this notebook:

In [2]:
import pandas as pd
import sweetviz as sv

# Read the data
Don't forget to load the HCC dataset. Here we will read the file directly from our GitHub repository. However, you can first download the file and then upload it to your working directory and read it as `pd.read_csv('hcc.csv')`. See this post on different ways to load data into Google Colab https://towardsdatascience.com/7-ways-to-load-external-data-into-google-colab-7ba73e7d5fc7.

In [3]:
df = pd.read_csv('https://raw.githubusercontent.com/Data-Centric-AI-Community/awesome-data-centric-ai/master/medium/data-profiling-tools/data/hcc.csv')

# Generate and show the Report

In [4]:
report = sv.analyze(df)
report.show_html('report.html')

                                             |          | [  0%]   00:00 -> (? left)

Report report.html was generated! NOTEBOOK/COLAB USERS: the web browser MAY not pop up, regardless, the report IS saved in your notebook/colab files.


In [5]:
report.show_notebook() 

# Additional Features
Let's tweak the "Outcome" feature to enable the exploration of categories ("Male" and "Female") and compare the insights.

In [6]:
df.Outcome = pd.Categorical(df.Outcome)
df['Survival'] = df.Outcome.cat.codes

In [7]:
df.head()

Unnamed: 0,Gender,Age,Alcohol,Hallmark,PS,Encephalopathy,Hemoglobin,HBeAg,MCV,Total_Bil,O2,Dir_Bil,Ferritin,Outcome,Survival
0,Male,67,Yes,AYes,Active,,13.7,No,106.6,2.1,999,0.5,,Alive,0
1,Female,62,No,BYes,Active,,,No,103.4,,999,,,Alive,0
2,Male,78,Yes,CYes,Ambulatory,,8.9,No,79.8,0.4,999,0.1,16.0,Alive,0
3,Male,77,Yes,DYes,Active,,13.4,No,97.1,0.4,999,0.2,,Dead,1
4,Male,76,Yes,EYes,Active,,14.3,No,95.1,0.7,999,,22.0,Alive,0


## Create a Comparison Report

In [8]:
comparison_report = sv.compare_intra(df, df["Gender"] == 'Male', ["Male", "Female"], 'Survival')
comparison_report.show_notebook() 

                                             |          | [  0%]   00:00 -> (? left)