## 1. Load a dataset

The program expects a pandas `DataFrame`.

In [1]:
# Using the auto-mpg sample dataset
import seaborn as sns
data = sns.load_dataset('mpg')
data.head()

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model_year,origin,name
0,18.0,8,307.0,130.0,3504,12.0,70,usa,chevrolet chevelle malibu
1,15.0,8,350.0,165.0,3693,11.5,70,usa,buick skylark 320
2,18.0,8,318.0,150.0,3436,11.0,70,usa,plymouth satellite
3,16.0,8,304.0,150.0,3433,12.0,70,usa,amc rebel sst
4,17.0,8,302.0,140.0,3449,10.5,70,usa,ford torino


## 2. Get statistics and graphs

In [2]:
from eda_report.multivariate import MultiVariable

x = MultiVariable(data)
x.corr_type

100%|██████████████████████████████████████████| 21/21 [00:07<00:00,  2.85it/s]


{('displacement', 'model_year'): 'weak negative correlation (-0.37)',
 ('mpg', 'displacement'): 'strong negative correlation (-0.80)',
 ('cylinders', 'model_year'): 'weak negative correlation (-0.35)',
 ('displacement', 'acceleration'): 'moderate negative correlation (-0.54)',
 ('mpg', 'horsepower'): 'strong negative correlation (-0.78)',
 ('cylinders', 'acceleration'): 'moderate negative correlation (-0.51)',
 ('mpg', 'cylinders'): 'strong negative correlation (-0.78)',
 ('displacement', 'horsepower'): 'strong positive correlation (0.90)',
 ('cylinders', 'displacement'): 'very strong positive correlation (0.95)',
 ('cylinders', 'horsepower'): 'strong positive correlation (0.84)',
 ('horsepower', 'weight'): 'strong positive correlation (0.86)',
 ('mpg', 'weight'): 'strong negative correlation (-0.83)',
 ('horsepower', 'model_year'): 'weak negative correlation (-0.42)',
 ('horsepower', 'acceleration'): 'moderate negative correlation (-0.69)',
 ('mpg', 'model_year'): 'moderate positive c

In [3]:
x.show_correlation_heatmap()

In [4]:
x.show_joint_scatterplot()

In [5]:
from eda_report.univariate import Variable

horse_power = Variable(data['horsepower'])
horse_power.statistics

Number of observations    392.000000
Average                   104.469388
Standard Deviation         38.491160
Minimum                    46.000000
Lower Quartile             75.000000
Median                     93.500000
Upper Quartile            126.000000
Maximum                   230.000000
Skewness                    1.087326
Kurtosis                    0.696947
Name: horsepower, dtype: float64

In [6]:
horse_power.show_graphs()

## 3. Create an EDA report

You can customize the following:

- `title`: default = 'Exploratory Data Analysis Report',
- `graph_color`: default = 'orangered',
- `output_filename`: default = 'eda-report.docx'

In [7]:
from eda_report import get_word_report
get_word_report(data)

[INFO 17:51:50.376] Assessing correlation in numeric variables...
100%|██████████████████████████████████████████| 21/21 [00:07<00:00,  2.79it/s]
[INFO 17:52:03.677] Done. Summarising each variable...
100%|████████████████████████████████████████████| 9/9 [00:01<00:00,  5.35it/s]
[INFO 17:52:05.599] Done. Results saved as 'eda-report.docx'
