# Loan Data Approval Notebook

## Objectives
- The client wants to grasp the trends within their customer base to figure out which variables are most closely correlated to getting a loan approved.

## Inputs
- 'output/loan_data.csv'

## Outputs
- create a study that can be used to build the Streamlit Dashboard

## Change Working Directory

The notebooks are stored in a subfolder, therefore when running the notebook in the editor, we need to change the working directory from its current folder to its parent folder

We access the current directory with os.getcwd()

In [1]:
import os
current_dir = os.getcwd()
current_dir

'c:\\Users\\Bartek\\Desktop\\Predictive-Analysis\\jupyter_notebooks'

- We use os.path.dirname() to get the parent directory
- Then we call the os.chir() function, which defines the new current directory

In [2]:
os.chdir(os.path.dirname(current_dir))
print("You set a new current directory")

You set a new current directory


Confirm the new current directory

In [3]:
current_dir = os.getcwd()
current_dir

'c:\\Users\\Bartek\\Desktop\\Predictive-Analysis'

# Load Data

In [4]:
import pandas as pd
loan_data = pd.read_csv("outputs/loan_data.csv")
loan_data = loan_data.drop(['Loan_ID'] , axis=1)

# Data Exploration
We aim to become better acquainted with the dataset by examining the types and distribution of variables, identifying missing data, and understanding the significance of these variables in a business context.

In [5]:
from ydata_profiling import ProfileReport
pandas_report = ProfileReport(df=loan_data)
pandas_report.to_notebook_iframe()

ImportError: DLL load failed while importing _path: The specified module could not be found.

# Correlation Study
We use .corr() for the spearman and pearson method, and sort the correlations in descending order.For each variable we remove missing data and zeros and calculate the correlation coefficient between the variable and the sale price. We store it in a list and convert the list to a Pandas Series.

In [None]:
loan_data['Loan_Status'] = loan_data['Loan_Status'].map({'Y': 1, 'N': 0})
numerical_columns = loan_data.select_dtypes(include=['float', 'int']).columns
corr_spearman = loan_data[numerical_columns].corr(method='spearman')['Loan_Status'].sort_values(key=abs, ascending=False)[1:].head(10)
corr_spearman

Credit_History       0.618937
CoapplicantIncome    0.124820
LoanAmount           0.055227
Loan_Amount_Term    -0.048417
ApplicantIncome     -0.032439
Name: Loan_Status, dtype: float64

We have recived both strong and weak correlations between the Loan_Status and the given variables. You should generally use strong levels of correlation, but as it is not possible in this case we will go for the top 4 strongly correlated virables.

In [None]:
import_variables = set(corr_spearman[:3].index.to_list() + corr_spearman[:3].index.to_list())
import_variables

{'CoapplicantIncome', 'Credit_History', 'LoanAmount'}

## Categorical Bar Chart
Plot bar charts for all the object fields in the database displaying the proportions of each variable accordingly.

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

loan_data_raw = pd.read_csv("inputs/datasets/raw/loan_data.csv").drop(['Loan_ID'] , axis=1)
cat_num = loan_data_raw.select_dtypes(include=['object']).columns.to_list()
fig, ax = plt.subplots(4, 2, figsize=(12,15))

for index, cat_col in enumerate(cat_num):
    row, col = index//2, index%2
    sns.countplot(x=cat_col, data=loan_data_raw, hue='Loan_Status', ax=ax[row, col])

plt.subplots_adjust(hspace=1)
plt.show()

ImportError: DLL load failed while importing _path: The specified module could not be found.

## Numerical Acceptance Study

Study how loan acceptance is based or correlated with each numerical value to see if there are any patterns based on high or low Credit score.

In [None]:
for col in loan_data_raw.select_dtypes(['float', 'int']):
    sns.displot(x=col, data=loan_data_raw, hue='Loan_Status', kde=True, bins=30)

## Conclusions and Next steps
The correlations and plots interpretation converge.
- An accepted customer has a credit score
- 