# Author - Mayank Singh
Data Analyst|MySQL|Python|Web Crawling and Scraping
|EDA|Visualization|CBAP|Agile Cert

Creation Date - 09-10-2022


This file contains EDA using PANDAS PROFILING of Donations given to Aam Admin Party aka AAP during FY 2020-21. The source of Data is - "#MYNETA.info - "https://myneta.info/".

# Importing Liabrary

## Basic Info

Instead of Installing Pandas-Profiling using "pip install -U pandas-profiling" command, we have installed using "pip install pandas-profiling[notebook]" command because - For the Jupyter widgets extension (used for progress bars and the interactive widget-based report) to work, we need to install and activate the corresponding extensions using "pip install pandas-profiling[notebook]"

In [1]:
import pandas as pd
import numpy as np
from pandas_profiling import ProfileReport

# Reading File

In [2]:
df = pd.read_csv("C:\\Users\\mayank.singh27\\PycharmProjects\\pythonProject\\TestCode\\AAP_Donation_Data_fy20-21_Cleaned_New1.csv")

In [3]:
df.drop('Unnamed: 0', axis = 1, inplace = True)#dropping unwanted columns from the Dataframe

In [4]:
df.head()#overview of Dataframe

Unnamed: 0,Name,Address,Pan,Amount,Contribution Mode,Financial
0,Bharatha Swamukti Samsthe,108 Sarang of Unitech Parkway Apts 1st Main Ro...,Y,12000000.0,Cheque State Bank of India Mahalakshmi Layout ...,2020-21
1,Prudent Electoral Trust,G-15 Hans Bhawan 1 Bahadur Shah Zafar Marg New...,Y,10000000.0,Cheque HDFC Bank Kailash Building K.G. Marg Ne...,2020-21
2,Bharatha Swamukti Samsthe,108 Sarang of Unitech Parkway Apts 1st Main Ro...,Y,9900000.0,Cheque State Bank of India Mahalakshmi Layout ...,2020-21
3,Bharatha Swamukti Samsthe,108 Sarang of Unitech Parkway Apts 1st Main Ro...,Y,9500000.0,Cheque State Bank of India Mahalakshmi Layout ...,2020-21
4,Bharatha Swamukti Samsthe,108 Sarang of Unitech Parkway Apts 1st Main Ro...,Y,9100000.0,Cheque State Bank of India Mahalakshmi Layout ...,2020-21


## Starting with Pandas-Profiling

pandas-profiling generates profile reports from a pandas DataFrame. The pandas df.describe() function is handy yet a little basic for exploratory data analysis. pandas-profiling extends pandas DataFrame with df.profile_report(), which automatically generates a standardized univariate and multivariate report for data understanding.

For each column, the following information (whenever relevant for the column type) is presented in an interactive HTML report:

Type inference: detect the types of columns in a DataFrame

Essentials: type, unique values, indication of missing values

Quantile statistics: minimum value, Q1, median, Q3, maximum, range, interquartile range

Descriptive statistics: mean, mode, standard deviation, sum, median absolute deviation, coefficient of variation, kurtosis, skewness

Most frequent and extreme values

Histograms: categorical and numerical

Correlations: high correlation warnings, based on different correlation metrics (Spearman, Pearson, Kendall, Cramér’s V, Phik, Auto)

Missing values: through counts, matrix, heatmap and dendrograms

Duplicate rows: list of the most common duplicated rows

Text analysis: most common categories (uppercase, lowercase, separator), scripts (Latin, Cyrillic) and blocks (ASCII, Cyrilic)

File and Image analysis: file sizes, creation dates, dimensions, indication of truncated images and existence of EXIF metadata

The report contains three additional sections:

Overview: mostly global details about the dataset (number of records, number of variables, overall missigness and duplicates, memory footprint)

Alerts: a comprehensive and automatic list of potential data quality issues (high correlation, skewness, uniformity, zeros, missing values, constant values, between others)

Reproduction: technical details about the analysis (time, version and configuration)

## Way 1 of generating Pandas-Profilling Report in an embedded html format

In [5]:
profile = ProfileReport(df, title="Pandas Profiling Report")

In [6]:
profile

Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]



## Way 2 of generating Pandas-Profiling Report in an embedded widget format

In [7]:
profile.to_widgets()

Render widgets:   0%|          | 0/1 [00:00<?, ?it/s]

VBox(children=(Tab(children=(Tab(children=(GridBox(children=(VBox(children=(GridspecLayout(children=(HTML(valu…

## Way 3 - The HTML report can be directly embedded in a cell in a similar fashion:

In [8]:
profile.to_notebook_iframe()
profile



# Exporting the report to a file
To generate a HTML report file, save the ProfileReport to an object and use the to_file() function:

profile.to_file("your_report.html")
Alternatively, the report’s data can be obtained as a JSON file:

#### As a JSON string
json_data = profile.to_json()

#### As a file
profile.to_file("your_report.json")

# Deeper profiling
The contents, behaviour and appearance of the report are easily customizable. The example code below loads the explorative configuration file, which includes many features for text analysis (length distribution, word distribution and character/unicode information), files (file size, creation time) and images (dimensions, EXIF information).

In [9]:
profile = ProfileReport(df, title="Pandas Profiling Report", explorative=True)

In [10]:
profile

Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]



# Sample and duplicates
Explicitly showing a dataset’s sample and duplicate rows can be disabled, to guarantee the report does not directly leak any data:

report = df.profile_report(duplicates=None, samples=None)

In [11]:
report = df.profile_report(duplicates=None, samples=None)
report

Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]



### Customizing the visualizations
Plot rendering options
A way how to pass arguments to the underlying matplotlib visualization engine is to use the plot argument when computing the profile. It is possible to change the default format of images to png (default is SVG) using the key-pair image_format: "png" and also the resolution of the images using dpi: 800. An example would be:

In [12]:
profile = ProfileReport(
    df,
    title="Pandas Profiling Report",
    explorative=True,
    plot={"dpi": 200, "image_format": "png"},
)
profile

Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

