# Getting Started with EDA

Before you develop a score card model, getting a well understanding of the data you are using plays an import role. In this tuorial, we will discuss exploratory data analysis module included in `yasc`.

## Imports

First we import `yasc` and check its version.

In [None]:
# imports
import yasc
from yasc.data import german_data
from yasc.eda import (
    missing_stat,
    numeric_stat,
    categorical_stat,
    corr_analysis,
    describe
)
import pandas as pd
import numpy as np

# show version
yasc.__version__

## Load data

Here we use `german_data()` to load german credit data and show its first five rows.

In [None]:
# load german credit data
data = german_data()
data.head()

## Check missing values I

`missing_stat()` function can help us find out columns with missing values and output missing rates.

In [None]:
missing_stat(data)

In [None]:
# only include columns with missing values
missing_stat(data, only_missing_columns=True)

## Check missing values II

The german data happens to have no missing values, let's create a data frame with missing values to test `missing_stat()` function.

In [None]:
# create a data frame with missing values
df1 = pd.DataFrame({'a': [1, 2, 3, 4], 'b': [np.nan, 5, 6, 7], 'c': [8, 9, np.nan, 10]})

In [None]:
missing_stat(df1)

In [None]:
# check missing statistics of single column
missing_stat(df1, "b")

## Describe numeric columns

In [None]:
# check statistics of numeric columns
numeric_stat(data)

## Describe categorial columns

In [None]:
# check statistics of categorical columns
categorical_stat(data)

## Generate descriptive statistics

`describe()` function is used to generate descriptive statistics of observed data. Beyond what we get from `pandas.core.frame.DataFrame.describe()`, from `describe()` we can check missing values in columns and type of column (numeric or categorical).

In [None]:
describe(df1)

In [None]:
# get a descriptive statistics
describe(data)

## Correlation analysis

In [None]:
corr, ax = corr_analysis(data, show_plot=True)