# Models

# Table of contents:
* 1 [Preparation](#intro-bullet)
* 2 [Logistic Regression](#first-bullet)
* 3 [Random Forest](#second-bullet)
* 4 [Naive Bayes](#third-bullet)
* 5 [LigthGBM](#fourth-bullet)
    * 5.1 [Plain LigthGBM](#fifth-bullet)
    * 5.2 [LightGBM with LR](#sixth-bullet)
    * 5.3 [LigthGBM with RF](#seventh-bullet)
    * 5.4 [LigthGBM with NB](#eigth-bullet)
* 6 [XGBM](#nineth-bullet)
    * 6.1 [Plain XGBM](#tenth-bullet)
    * 6.2 [XGBM with LR](#eleventh-bullet)
    * 6.3 [XGBM with RF](#12-bullet)
    * 6.4 [XGBM with NB](#13-bullet)

## 1. Preparation <a class="anchor" id="intro-bullet"></a>

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
train_data = pd.read_csv("/Users/michael/Documents/GitHub/Home_Credit_Default_Risk_Project/train.csv")
test_data = pd.read_csv("/Users/michael/Documents/GitHub/Home_Credit_Default_Risk_Project/test.csv")

In [3]:
test_data=test_data.drop(columns='Unnamed: 0')
train_data=train_data.drop(columns='Unnamed: 0')

## 2. Logistic Regression <a class="anchor" id="first-bullet"></a>

In [4]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score

def LogRegModel(train, test):
     # Extract the ids
    train_ids = train['SK_ID_CURR']
    test_ids = test['SK_ID_CURR']
    
    # Extract the labels for training
    labels = train['TARGET']
    test_labels = test['TARGET']
    # Remove the ids and target
    train = train.drop(columns = ['TARGET'])
    test= test.drop(columns = ['TARGET'])
    # Make the model with the specified regularization parameter
    log_reg = LogisticRegression(C = 0.0001)
    # Train on the training data
    log_reg.fit(train, labels)
    # Select only second column(TARGET)
    log_reg_pred = log_reg.predict_proba(test)[:, 1]
    print('Train/Test split results:')
    print("ROC",  roc_auc_score(test_labels, log_reg_pred))

In [5]:
LogRegModel(train_data,test_data)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression


Train/Test split results:
ROC 0.6308539127767936


## 3. Random Forest <a class="anchor" id="second-bullet"></a>

In [6]:
from sklearn.ensemble import RandomForestClassifier

from sklearn.metrics import roc_auc_score

def RanForModel(train,test):
    rf = RandomForestClassifier(n_estimators=500,
                                max_depth=10,min_samples_split=20,
                                min_samples_leaf=6,
                                max_features='auto')
     # Extract the ids
    train_ids = train['SK_ID_CURR']
    test_ids = test['SK_ID_CURR']
    
    # Extract the labels for training
    labels = train['TARGET']
    test_labels = test['TARGET']
    # Remove the ids and target
    train = train.drop(columns = ['TARGET'])
    test= test.drop(columns = ['TARGET'])
    
    rf.fit(X = train, y = labels)
    # Select only second column(TARGET)
    ran_for_pred = rf.predict_proba(test)[:, 1]
    
    print('Train/Test split results:')
    print("ROC",  roc_auc_score(test_labels, ran_for_pred))




In [7]:
RanForModel(train_data,test_data)

Train/Test split results:
ROC 0.7329243023766238


In [8]:
train_data


Unnamed: 0,SK_ID_CURR,TARGET,NAME_CONTRACT_TYPE,FLAG_OWN_CAR,FLAG_OWN_REALTY,CNT_CHILDREN,AMT_INCOME_TOTAL,AMT_CREDIT,AMT_ANNUITY,AMT_GOODS_PRICE,...,HOUSETYPE_MODE_terraced house,WALLSMATERIAL_MODE_Block,WALLSMATERIAL_MODE_Mixed,WALLSMATERIAL_MODE_Monolithic,WALLSMATERIAL_MODE_Others,WALLSMATERIAL_MODE_Panel,"WALLSMATERIAL_MODE_Stone, brick",WALLSMATERIAL_MODE_Wooden,EMERGENCYSTATE_MODE_No,EMERGENCYSTATE_MODE_Yes
0,249678,0,0,0,1,0,279000.0,1223010.0,51948.0,1125000.0,...,0,0,0,0,0,0,0,0,0,0
1,424164,0,0,1,0,0,135000.0,862560.0,30559.5,720000.0,...,0,0,0,0,0,0,0,0,0,0
2,243765,0,0,1,0,0,315000.0,571500.0,18567.0,571500.0,...,0,0,0,0,0,0,1,0,1,0
3,314764,0,0,1,0,0,90000.0,675000.0,22437.0,675000.0,...,0,0,0,0,0,0,0,0,0,0
4,203759,0,0,1,1,0,270000.0,301896.0,23490.0,252000.0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
246003,103605,0,1,1,1,1,112500.0,180000.0,9000.0,180000.0,...,0,0,0,0,0,0,0,0,0,0
246004,135717,0,0,0,0,0,99000.0,215640.0,18576.0,180000.0,...,0,0,0,0,0,0,0,0,0,0
246005,422234,1,0,0,1,0,202500.0,450000.0,35554.5,450000.0,...,0,0,0,0,0,1,0,0,1,0
246006,151121,1,0,0,0,0,157500.0,1575000.0,41679.0,1575000.0,...,0,0,0,0,0,0,1,0,1,0


## 4. Naive Bayesn <a class="anchor" id="third-bullet"></a>

In [9]:
from sklearn.naive_bayes import BernoulliNB
from sklearn.naive_bayes import ComplementNB
from sklearn.naive_bayes import ComplementNB

from sklearn.metrics import roc_auc_score
def NaiveBayModel(train,test):
    
    
    clf = BernoulliNB()
    
     # Extract the ids
    train_ids = train['SK_ID_CURR']
    test_ids = test['SK_ID_CURR']
    
    # Extract the labels for training
    labels = train['TARGET']
    test_labels = test['TARGET']
    # Remove the ids and target
    train = train.drop(columns = ['TARGET'])
    test= test.drop(columns = ['TARGET'])
    
    clf.fit(X = train, y = labels)
    clf_pred = clf.predict_proba(test)[:, 1]
    
    print('Train/Test split results:')
    print("ROC",  roc_auc_score(test_labels, clf_pred))



In [10]:
NaiveBayModel(train_data,test_data)

Train/Test split results:
ROC 0.6301445318536083


## 5. LigthGBM <a class="anchor" id="fourth-bullet"></a>

## 5.1 Plain LightGBM <a class="anchor" id="fifth-bullet"></a>

## 5.2 LightGBM with LR <a class="anchor" id="sixth-bullet"></a>

## 5.3 LightGBM with RF <a class="anchor" id="seventh-bullet"></a>

## 5.4 LightGBM with NB <a class="anchor" id="eigth-bullet"></a>

## 6. XGBM <a class="anchor" id="nineth-bullet"></a>


## 6.1 Plain XGBM<a class="anchor" id="tenth-bullet"></a>

## 6.2 XGBM with LR<a class="anchor" id="eleventh-bullet"></a>

## 6.3 XGBM with RF <a class="anchor" id="12-bullet"></a>

## 6.4 XGBM with NB <a class="anchor" id="13-bullet"></a>