### About Company:
--------------------------------------------------------------------------------------------------------------
BC Finance company deals in all home loans. They have a presence across all urban, semi-urban and rural areas. 
The customer first applies for a home loan after that company validates the customer's eligibility for a loan.

### Problem:
--------------------------------------------------------------------------------------------------------------
The company wants to automate the loan eligibility process (real-time) based on customer detail provided while 
filling out the online application form. 

The details the users need to provide include

| Detail            | Description                                             |
|--------------------|---------------------------------------------------------|
| Gender             | Gender of the applicant (Male/Female)                    |
| Married            | Applicant's marital status (Yes/No)                       |
| Dependents         | Number of dependents                                      |
| Education          | Applicant's education level (Graduate/Not Graduate)       |
| Self_Employed      | Whether the applicant is self-employed (Yes/No)           |
| ApplicantIncome    | Applicant's monthly income                                |
| CoapplicantIncome  | Co-applicant's monthly income                             |
| LoanAmount         | Loan amount requested (in thousands)                      |
| Loan_Amount_Term   | Term of loan Days                                |
| Credit_History     | Credit history meets guidelines (1 - Yes, 0 - No)         |
| Property_Area      | Location of property (Urban/Rural)              |


Based on a given dataset a machine learning model needs to be trained to predict whether a load will be approved or not (Folder location "./Machine-Learning-382-Project-1/Data)


--------------------------
----------------------

# Step 1 Problem statement
BC Finance offers home loans to consumers and determines eligibility manually. Loan approval decisions are influenced by a variety of personal details, including income, credit history, employment status, and other factors.  Understanding the correlations between these details and loan approval outcomes is crucial for developing a predictive model to automatically forecast loan approval eligibility.

--------------------------
----------------------

# Step 2 Hypothesis
The personal details of loan applicants exhibit correlations with loan approval outcomes. Analyzing these correlations using historical loan data will allow us to develop a predictive model that accurately forecasts loan approval based on applicant characteristics and historical patterns.
 
A few possible hypothesis to consider include the following:
1.	Applicants with higher income may be more likely to be eligible for loans
2.	Applicants who are not self-employed may be more likely to be approved for loans
3.	Applicants applying for smaller amounts may be more likely to be approved for loans
4.	Applicants with good credit history will be more likely to be approved 


--------------------------
----------------------

# Step 3 Getting the system ready

### <u>Loading the data<u>

In [59]:
import pandas as pd
from category_encoders import OneHotEncoder
import warnings
warnings.simplefilter(action="ignore", category=FutureWarning)
import dash
from dash import html, dcc, Input, Output
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
import joblib
from scikeras.wrappers import KerasClassifier
from sklearn.model_selection import StratifiedKFold


# Non Feature Preparatoin

In [60]:
def prepare_data(data):
    df_raw_new = pd.read_csv(data)
    
    if 'Loan_ID' in df_raw_new.columns:
        df_raw_new.drop(columns = 'Loan_ID', inplace=True)

    df_raw_new['Credit_History'] = df_raw_new['Credit_History'].astype(object)
    missing_values = (
        df_raw_new.isnull().sum()/len(df_raw_new)*100
    ).round(0).astype(int)

    df_raw_new['LoanAmount'].fillna(df_raw_new['LoanAmount'].mean(), inplace=True)
    df_raw_new['Loan_Amount_Term'].fillna(df_raw_new['Loan_Amount_Term'].mode()[0], inplace=True) 
    df_raw_new['Credit_History'].fillna(df_raw_new['Credit_History'].mode()[0], inplace=True)

    df_raw_new['Gender'].fillna(df_raw_new['Gender'].mode()[0], inplace=True)
    df_raw_new['Married'].fillna(df_raw_new['Married'].mode()[0], inplace=True)
    df_raw_new['Dependents'].fillna(df_raw_new['Dependents'].mode()[0], inplace=True)
    df_raw_new['Self_Employed'].fillna(df_raw_new['Self_Employed'].mode()[0], inplace=True)

    min_mask = lambda col, val: df_raw_new[col] < val
    income_mask = min_mask('ApplicantIncome',10000)
    coapplicant_income_mask = min_mask('CoapplicantIncome',5701)
    loan_ammount_mask = min_mask('LoanAmount',260)

    df_raw_new1 = df_raw_new[income_mask & coapplicant_income_mask & loan_ammount_mask]

    if 'Loan_Status' in df_raw_new1.columns:
        df_raw_new1['Loan_Status'] = df_raw_new1['Loan_Status'].apply(lambda s: 1 if s == 'Y' else 0)
        
    ohe = OneHotEncoder(
    use_cat_names=True, 
    cols=['Gender', 'Dependents', 'Married', 'Education', 'Self_Employed', 'Credit_History','Property_Area']
    )

    encoded_df = ohe.fit_transform(df_raw_new1)

    return encoded_df

In [61]:
(prepare_data('../data/raw_data.csv'))



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



Unnamed: 0,Gender_Male,Gender_Female,Married_No,Married_Yes,Dependents_0,Dependents_1,Dependents_3+,Dependents_2,Education_Graduate,Education_Not Graduate,...,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History_1.0,Credit_History_0.0,Property_Area_Urban,Property_Area_Rural,Property_Area_Semiurban,Loan_Status
0,1,0,1,0,1,0,0,0,1,0,...,5849,0.0,146.412162,360.0,1,0,1,0,0,1
1,1,0,0,1,0,1,0,0,1,0,...,4583,1508.0,128.000000,360.0,1,0,0,1,0,0
2,1,0,0,1,1,0,0,0,1,0,...,3000,0.0,66.000000,360.0,1,0,1,0,0,1
3,1,0,0,1,1,0,0,0,0,1,...,2583,2358.0,120.000000,360.0,1,0,1,0,0,1
4,1,0,1,0,1,0,0,0,1,0,...,6000,0.0,141.000000,360.0,1,0,1,0,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
609,0,1,1,0,1,0,0,0,1,0,...,2900,0.0,71.000000,360.0,1,0,0,1,0,1
610,1,0,0,1,0,0,1,0,1,0,...,4106,0.0,40.000000,180.0,1,0,0,1,0,1
611,1,0,0,1,0,1,0,0,1,0,...,8072,240.0,253.000000,360.0,1,0,1,0,0,1
612,1,0,0,1,0,0,0,1,1,0,...,7583,0.0,187.000000,360.0,1,0,1,0,0,1


# Feature Encoded Preperation

In [62]:
def prepare_data1(data):
    df_raw_new = pd.read_csv(data)
    
    if 'Loan_ID' in df_raw_new.columns:
        df_raw_new.drop(columns = 'Loan_ID', inplace=True)

    df_raw_new['Credit_History'] = df_raw_new['Credit_History'].astype(object)
    missing_values = (
        df_raw_new.isnull().sum()/len(df_raw_new)*100
    ).round(0).astype(int)

    df_raw_new['LoanAmount'].fillna(df_raw_new['LoanAmount'].mean(), inplace=True)
    df_raw_new['Loan_Amount_Term'].fillna(df_raw_new['Loan_Amount_Term'].mode()[0], inplace=True) 
    df_raw_new['Credit_History'].fillna(df_raw_new['Credit_History'].mode()[0], inplace=True)

    df_raw_new['Gender'].fillna(df_raw_new['Gender'].mode()[0], inplace=True)
    df_raw_new['Married'].fillna(df_raw_new['Married'].mode()[0], inplace=True)
    df_raw_new['Dependents'].fillna(df_raw_new['Dependents'].mode()[0], inplace=True)
    df_raw_new['Self_Employed'].fillna(df_raw_new['Self_Employed'].mode()[0], inplace=True)

    min_mask = lambda col, val: df_raw_new[col] < val
    income_mask = min_mask('ApplicantIncome',10000)
    coapplicant_income_mask = min_mask('CoapplicantIncome',5701)
    loan_ammount_mask = min_mask('LoanAmount',260)

    df_raw_new1 = df_raw_new[income_mask & coapplicant_income_mask & loan_ammount_mask]

    if 'Loan_Status' in df_raw_new1.columns:
        df_raw_new1['Loan_Status'] = df_raw_new1['Loan_Status'].apply(lambda s: 1 if s == 'Y' else 0)
    
    df_raw_new1['Total_Income'] = df_raw_new1['ApplicantIncome'] + df_raw_new1['CoapplicantIncome']
    df_raw_new1['Loan_Term_Category'] = pd.cut(df_raw_new1['Loan_Amount_Term'], bins=[0, 180, 360, 600], labels=['Short-term', 'Medium-term', 'Long-term'])
    df_raw_new1['Income_Stability'] = df_raw_new1[['ApplicantIncome', 'CoapplicantIncome']].std(axis=1)
    df_raw_new1['Loan_to_Income_ratio'] = df_raw_new1['LoanAmount'] / ((df_raw_new1['ApplicantIncome'] + df_raw_new1['CoapplicantIncome']))
    
    df_raw_new1['Loan_Term_Category'] = df_raw_new1['Loan_Term_Category'].astype(object)
        
    ohe = OneHotEncoder(
        use_cat_names=True, 
        cols=['Gender','Married', 'Dependents', 'Education', 'Self_Employed', 'Credit_History','Property_Area', 'Loan_Term_Category']
        )

    encoded_df = ohe.fit_transform(df_raw_new1)

    return encoded_df

In [63]:
prepare_data('../data/raw_data.csv')



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



Unnamed: 0,Gender_Male,Gender_Female,Married_No,Married_Yes,Dependents_0,Dependents_1,Dependents_3+,Dependents_2,Education_Graduate,Education_Not Graduate,...,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History_1.0,Credit_History_0.0,Property_Area_Urban,Property_Area_Rural,Property_Area_Semiurban,Loan_Status
0,1,0,1,0,1,0,0,0,1,0,...,5849,0.0,146.412162,360.0,1,0,1,0,0,1
1,1,0,0,1,0,1,0,0,1,0,...,4583,1508.0,128.000000,360.0,1,0,0,1,0,0
2,1,0,0,1,1,0,0,0,1,0,...,3000,0.0,66.000000,360.0,1,0,1,0,0,1
3,1,0,0,1,1,0,0,0,0,1,...,2583,2358.0,120.000000,360.0,1,0,1,0,0,1
4,1,0,1,0,1,0,0,0,1,0,...,6000,0.0,141.000000,360.0,1,0,1,0,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
609,0,1,1,0,1,0,0,0,1,0,...,2900,0.0,71.000000,360.0,1,0,0,1,0,1
610,1,0,0,1,0,0,1,0,1,0,...,4106,0.0,40.000000,180.0,1,0,0,1,0,1
611,1,0,0,1,0,1,0,0,1,0,...,8072,240.0,253.000000,360.0,1,0,1,0,0,1
612,1,0,0,1,0,0,0,1,1,0,...,7583,0.0,187.000000,360.0,1,0,1,0,0,1


In [64]:

df = prepare_data('../data/raw_data.csv')
X = df.drop(['Loan_Status'], axis=1)
y = df['Loan_Status']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

def create_model():
    model = Sequential([
        Dense(21, activation='relu', input_shape=(21,)),
        Dense(8, activation='relu'),
        Dense(8, activation='relu'),
        Dense(1, activation='sigmoid')
    ])
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    return model

model = KerasClassifier(build_fn=create_model, epochs=25, batch_size=32, verbose=0)

kfold = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

results = cross_val_score(model, X_train_scaled, y_train, cv=kfold)

print("Cross-validation Accuracy: %.2f%% (+/- %.2f%%)" % (results.mean() * 100, results.std() * 100))



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


``build_fn`` will be renamed to ``model`` in a future release, at which point use of ``build_fn`` will raise an Error instead.


Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.


``build_fn`` will be renamed to ``model`` in a future release, at which point use of ``build_fn`` will raise an Error instead.


Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.


``build_fn`` will be renamed to ``model`` in a future release, at which point use of ``build_fn`` will raise an Error instead.




Cross-validation Accuracy: 80.38% (+/- 1.53%)


In [None]:



df = prepare_data1('../data/raw_data.csv')
X = df.drop(['Loan_Status'], axis=1)
y = df['Loan_Status']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

def create_model():
    model = Sequential([
        Dense(27, activation='relu', input_shape=(27,)),
        Dense(16, activation='relu'),
        Dense(16, activation='relu'),
        Dense(1, activation='sigmoid')
    ])
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    return model

model = KerasClassifier(build_fn=create_model, epochs=15, batch_size=32, verbose=0)

kfold = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

results = cross_val_score(model, X_train_scaled, y_train, cv=kfold)

print("Cross-validation Accuracy: %.2f%% (+/- %.2f%%)" % (results.mean() * 100, results.std() * 100))

In [None]:
app = dash.Dash(__name__)

# Define layout
app.layout = html.Div([
    html.H1("Loan Application"),
    html.Div([
        html.Label(""),
        dcc.Input(id='input1', type='text')
    ]),
    html.Div([
        html.Label("Input 2"),
        dcc.Input(id='input2', type='text')
    ]),
    html.Div([
        html.Label("Input 3"),
        dcc.Input(id='input3', type='text')
    ]),
    html.Div([
        html.Label("Input 4"),
        dcc.Input(id='input4', type='text')
    ]),
    html.Div([
        html.Label("Input 5"),
        dcc.Input(id='input5', type='text')
    ]),
    html.Div([
        html.Label("Input 6"),
        dcc.Input(id='input6', type='text')
    ]),
    html.Div([
        html.Label("Input 7"),
        dcc.Input(id='input7', type='text')
    ]),
    html.Div([
        html.Label("Input 8"),
        dcc.Input(id='input8', type='text')
    ]),
    html.Div([
        html.Label("Input 9"),
        dcc.Input(id='input9', type='text')
    ]),
    html.Div([
        html.Label("Input 10"),
        dcc.Input(id='input10', type='text')
    ]),
    html.Div([
        html.Label("Input 11"),
        dcc.Input(id='input11', type='text')
    ]),
    html.Div([
        html.Label("Input 12"),
        dcc.Input(id='input12', type='text')
    ]),
    
    html.Button('Submit', id='submit-val', n_clicks=0),
    html.Div(id='output-div')
])

# Callback to handle button click and write to CSV
@app.callback(
    Output('output-div', 'children'),
    [Input('submit-val', 'n_clicks')],
    [Input('input{}'.format(i), 'value') for i in range(1, 13)]
)
def update_output(n_clicks, *inputs):
    if n_clicks > 0:
        # Create a DataFrame with the input data
        data = {'Input {}'.format(i): [val] for i, val in enumerate(inputs, start=1)}
        df = pd.DataFrame(data)
        
        # Append data to CSV file
        df.to_csv('../data/loan_applications.csv', mode='a', header=False, index=False)
        
        return 'Data added to CSV successfully.'

if __name__ == '__main__':
    app.run_server(debug=True)
