# <a id='toc1_'></a>[Loan Default Prediction (Predict)](#toc0_)

**Table of contents**<a id='toc0_'></a>    
- [Loan Default Prediction (Predict)](#toc1_)    
  - [Libraries](#toc1_1_)    
  - [Load configs and transformers](#toc1_2_)    
  - [Read Data](#toc1_3_)    
  - [Transform new set](#toc1_4_)    
  - [Save predictions](#toc1_5_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

## <a id='toc1_1_'></a>[Libraries](#toc0_)

In [34]:
# Basic python
import numpy as np
from collections import Counter
import pickle
import sys

# Data manipulation
import pandas as pd
import datetime as dt
import polars as pl

# Visualization
import matplotlib.pyplot as plt

# appending a path
sys.path.append('../src/')

# Own modules
import helpers as hp

## <a id='toc1_2_'></a>[Load configs and transformers](#toc0_)

In [None]:
# Load configs 
with open ('../configs/feature_names.pkl', 'rb') as fp:
    features = pickle.load(fp)

# Transformers
features = pickle.load(open('../configs/feature_names.pkl','rb'))
MF_imputer = pickle.load(open('../configs/simple_imputer.pkl','rb'))
OH_transformer = pickle.load(open('../configs/onehot.pkl','rb'))
KNN_imputer = pickle.load(open('../configs/knn_imputer.pkl', 'rb'))
Scaler = pickle.load(open('../configs/R_scaler.pkl', 'rb'))

# Logistic regression model
clf = pickle.load(open('../models/finalized_model.pkl', 'rb'))

## <a id='toc1_3_'></a>[Read Data](#toc0_)

In [36]:
test_path = "../data/raw/"
file = "blind_samples.csv"

test = pl.read_csv(test_path+ file)

## <a id='toc1_4_'></a>[Transform new set](#toc0_)

In [59]:
# Separate as done in preprocessing
alpha = 0.8
null_set = test.filter(pl.sum_horizontal(pl.all().is_null()) >= alpha* len(test.columns)).write_csv("../data/test/missing_predictions.csv")
known_set = test.filter(pl.sum_horizontal(pl.all().is_null()) < alpha* len(test.columns))

In [None]:
# drop target
target = "default"
X_test = known_set.drop(pl.col("default")).to_pandas()

# All required transformations
droper = hp.Select_features(features)
droper.fit(X_test)
X_test = droper.transform(X_test)
X_test = MF_imputer.transform(X_test)
X_test = OH_transformer.transform(X_test)
X_test.loc[:,:] = KNN_imputer.transform(X_test)
X_test.loc[:,:] =Scaler.transform(X_test)

## <a id='toc1_5_'></a>[Save predictions](#toc0_)

In [None]:
# Predict and save
known_set.with_columns(pl.Series("default",clf.predict(X_test))).write_csv("../data/test/predicted_loan.csv")
