# Exploratory Data Analysis: Loan Payments

The aim of this project is to conduct exploratory data analysis (EDA) on a database of loan payments for a financial institution.

The code block below initialises the SQL table we will be using to perform this analysis as a Pandas dataframe using the 'db_utils.py' script included in the root directory of this project.

In [1]:
import db_utils as dbu
import pandas as pd

credentials = dbu.load_yaml('credentials.yaml')
data = dbu.RDSDatabaseConnector(credentials)
data.start_sqlalchemy_engine()
df = data.get_data('loan_payments') # turning SQL table into a Pandas dataframe
pd.set_option('display.max_columns', 50) # SQL table has 43 columns, pandas shows default 9

#### Columns with null data 
- funded_amount,
- term,
- int_rate,
- employment_length,
- mths_since_last_delinq,
- mths_since_last_record,
- last_payment_date,
- next_payment_date,
- last_credit_pull_date,
- collections_12_mths_ex_med,
- mths_since_last_major_derog 

#### Columns to Convert
- term: category OK, null data present
- grade: category OK
- sub_grade: category OK
- employment_length: category OK, null data present
- home_ownership: category OK
- verification_status: category OK
- loan_status: category OK
- payment_plan: category OK, but 99.9% of data is n, only 1 response y, potential unnecessary column
- purpose: category OK
- application_type: category OK, but unnecessary column since only 1 type of response
- issue_date: period
- earliest_credit_line: period
- last_payment_date: period
- next_payment_date: period
- last_credit_pull_date: period


In [2]:
# Converting categorical data columns from object to category data type
category_columns = ["term", "grade", "sub_grade", "employment_length", "home_ownership", "verification_status", "loan_status", "payment_plan", "purpose", "application_type"]
convert_to_categories = dbu.DataTransform(df, category_columns)
df = convert_to_categories.to_category()

In [3]:
# Converting columns containing dates from object(month, year) to period(M) data type
date_columns = ["issue_date", "earliest_credit_line", "last_payment_date", "next_payment_date", "last_credit_pull_date"]
convert_to_periods = dbu.DataTransform(df, date_columns)
df = convert_to_periods.to_period()