<a id='table_of_contents'></a>

# Table of Contents
&emsp; [Import Libraries](#import_libraries) <br> <br>
&emsp; [Load Data](#load_data) <br> <br>
[1. Data Understanding & Feature Analysis](#data_understanding_feature_analysis) <br> <br>
&emsp; [1.1 Visualization](#visualization) <br> <br>
&emsp; [1.2 Feature Analysis](#feature_analysis) <br> <br>
[2. Data Wrangling & Preprocessing](#data_wrangling_preprocessing) <br> <br>
&emsp; [2.1 Bryan Ling Zehao](#bryan_ling_zehao) <br> <br>
&emsp; &emsp; [2.11 Remove Unwanted Columns](#remove_unwanted_columns_bryan) <br> <br>
&emsp; &emsp; [2.12 Handle Missing Values](#handle_missing_values_bryan) <br> <br>
&emsp; &emsp; [2.13 Categorical Data Encoding](#categorical_data_encoding_bryan) <br> <br>
&emsp; &emsp; [2.14 Data Imputation](#data_imputation_bryan) <br> <br>
&emsp; &emsp; [2.15 Feature Selection](#feature_selection_bryan) <br> <br>
&emsp; [2.2 Chai Xiang Zhi](#chai_xiang_zhi) <br> <br>
&emsp; &emsp; [2.21 Feature Imputation](#feature_imputation_xz) <br> <br>
&emsp; &emsp; [2.22 Feature Binning & Discretisation](#feature_binning_and_discretisation_xz) <br> <br>
&emsp; &emsp; [2.23 Feature Engineering & Combination](#feature_engineering_and_combination_xz) <br> <br>
&emsp; &emsp; [2.24 Drop Features](#drop_featues_xz) <br> <br>
&emsp; &emsp; [2.25 Datatype Transformation](#datatype_transformation_xz) <br> <br>
&emsp; &emsp; [2.26 Numerical Feature Normalisation](#numerical_feature_normalisation_xz) <br> <br>
&emsp; &emsp; [2.27 Categorical Feature Encoding](#categorical_feature_encoding_xz) <br> <br>
[3. Exploratory Data Analysis (EDA)](#eda) <br> <br>
[4. Model Development](#model_development) <br> <br>
&emsp; [4.1 Baseline Models Overview](#baseline_models_overview) <br> <br>
&emsp; [4.2 Hyperparameter Tuning](#hyperparameter_tuning) <br> <br>
[5. Model Evaluation](#model_evaluation) <br> <br>
&emsp; [5.1 K-fold Cross Validation](#k_fold_cross_validation) <br> <br>
&emsp; [5.2 F1 Score](#f1_score) <br> <br>
&emsp; [5.3 Confusion Matrix](#confusion_matrix) <br> <br>
&emsp; [5.4 ROC](#roc) <br> <br>
[6. Model Deployment](#model_deployment) <br> <br>
&emsp; [6.1 Final Modelling using entire Training Dataset](#f) <br> <br>
&emsp; [6.2 Predict Test Set](#f) <br> <br>

<a id='import_libraries'></a>
## Import Libraries

###### &emsp; &emsp; &emsp; &nbsp; &nbsp;[Table of Contents](#table_of_contents)

##### &emsp; &emsp; Data Understanding and Feature Analysis Libraries

In [None]:
# Ignore warnings
import warnings
warnings.filterwarnings('ignore')

import numpy as np
import pandas as pd

# Data visualisation
import seaborn as sns
import matplotlib.pyplot as plt

# Features summary
from fast_ml import eda
from fast_ml.utilities import display_all

# Missing data visualisation
import missingno as msno

##### &emsp; &emsp; Data Preprocessing Libraries

In [None]:
# Multivariaate feature imputation method
from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import IterativeImputer
from sklearn.ensemble import RandomForestClassifier

# Categorical feature encoding
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OrdinalEncoder

# Feature selection
from sklearn.feature_selection import chi2, f_classif, mutual_info_classif, RFE

# Feature binning on funder & installer
from nltk.tokenize import word_tokenize
from nltk.stem.snowball import SnowballStemmer

# Numerical feature normalisation
from scipy.stats import zscore

##### &emsp; &emsp; Modelling & Model Evaluation Libraries

In [None]:
# Modelling
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from catboost import CatBoostClassifier
from xgboost import XGBClassifier

# Hyperparameter tuning
from sklearn.model_selection import GridSearchCV

# Model saving
import pickle

# Model evaluation
from sklearn.metrics import confusion_matrix, roc_curve, auc, classification_report, ConfusionMatrixDisplay, accuracy_score

<a id='load_data'></a>
## Load Data
###### &emsp; &emsp; &emsp; &nbsp; &nbsp;[Table of Contents](#table_of_contents)

In [None]:
# Read the training set for each member of the group
df_xz = pd.read_csv('./dataset/data_mining_water_table.csv')
df_bryan = pd.read_csv('./dataset/data_mining_water_table.csv')

---
<a id='data_understanding_feature_analysis'></a>
# 1. Data Understanding & Feature Analysis
###### &emsp; &emsp; &emsp; &nbsp; &nbsp;[Table of Contents](#table_of_contents)

---
<a id='data_wrangling_preprocessing'></a>
# 2. Data Wrangling & Preprocessing
###### &emsp; &emsp; &emsp; &nbsp; &nbsp;[Table of Contents](#table_of_contents)

<a id='bryan_ling_zehao'></a>
## &emsp; 2.1 Bryan Ling Zehao
###### &emsp; &emsp; &emsp; &nbsp; &nbsp;[Table of Contents](#table_of_contents)

<a id='remove_unwanted_columns_bryan'></a>
### &emsp; &emsp; 2.11 Remove Unwanted Columns

<a id='handle_missing_values_bryan'></a>
### &emsp; &emsp; 2.12 Handle Missing Values

<a id='categorical_data_encoding_bryan'></a>
### &emsp; &emsp; 2.13 Categorical Data Encoding

<a id='data_imputation_bryan'></a>
### &emsp; &emsp; 2.14 Data Imputation

<a id='feature_selection_bryan'></a>
### &emsp; &emsp; 2.15 Feature Selection

<a id='chai_xiang_zhi'></a>
## &emsp; 2.2 Chai Xiang Zhi
###### &emsp; &emsp; &emsp; &nbsp; &nbsp;[Table of Contents](#table_of_contents)

<a id='feature_imputation_xz'></a>
### &emsp; &emsp; 2.21 Feature Imputation

<a id='feature_binning_and_discretisation_xz'></a>
### &emsp; &emsp; 2.22 Feature Binning & Discretisation

<a id='feature_engineering_and_combination_xz'></a>
### &emsp; &emsp; 2.23 Feature Engineering & Combination

<a id='drop_featues_xz'></a>
### &emsp; &emsp; 2.24 Drop Features

<a id='datatype_transformation_xz'></a>
### &emsp; &emsp; 2.25 Datatype Transformation

<a id='numerical_feature_normalisation_xz'></a>
### &emsp; &emsp; 2.26 Numerical Feature Normalisation

<a id='categorical_feature_encoding_xz'></a>
### &emsp; &emsp; 2.27 Categorical Feature Encoding

---
<a id='eda'></a>
# 3. Exploratory Data Analysis (EDA)
###### &emsp; &emsp; &emsp; &nbsp; &nbsp;[Table of Contents](#table_of_contents)

---
<a id='model_development'></a>
# 4. Model Development
###### &emsp; &emsp; &emsp; &nbsp; &nbsp;[Table of Contents](#table_of_contents)

<a id='baseline_models_overview'></a>
## &emsp; 4.1 Baseline Models Overview

<a id='hyperparameter_tuning'></a>
## &emsp; 4.2 Hyperparameter tuning

---
<a id='model_evaluation'></a>
# 5. Model Evaluation
###### &emsp; &emsp; &emsp; &nbsp; &nbsp;[Table of Contents](#table_of_contents)

<a id='k_fold_cross_validation'></a>
## &emsp; 5.1 K-fold Cross Validation

<a id='f1_score'></a>
## &emsp; 5.2 F1 Score

<a id='confusion_matrix'></a>
## &emsp; 5.3 Confusion Matrix

<a id='roc'></a>
## &emsp; 5.4 ROC

---
<a id='model_deployment'></a>
# 6. Model Deployment


###### &emsp; &emsp; &emsp; &nbsp; &nbsp;[Table of Contents](#table_of_contents)