# Part 4: Model Training Pipeline
In the previous notebook, we settled on a model algorithm after validating it properly and are now ready to formalize the training pipeline from start to finish. The training pipeline will take the raw dataset as input and perform both the feature engineering and model training as a single pipeline. We will specifically do the following actions:

- Importing the raw dataset from the "/data/raw" directory
- Splitting the data into training and validation datasets
- Using our feature engineering and model algorithm code to build an end-to-end training pipeline
- Saving the model as a serialized pickle file

The algorithm to be used will be Random Forest Classifier:
1. Best hyperparameters: {'max_depth': 15, 'min_samples_leaf': 1, 'min_samples_split': 10, 'n_estimators': 75}
2. Average accuracy score: 82%
3. Average ROC AUC score: 81%

In [None]:
# Importing the necessary Python libraries
import warnings
import numpy as np
import pandas as pd
from datetime import datetime
from category_encoders.one_hot import OneHotEncoder
from sklearn.metrics import accuracy_score, roc_auc_score, f1_score, mean_absolute_error, mean_squared_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import FunctionTransformer
from sklearn.preprocessing import StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.linear_model import Lasso
from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import Pipeline

# Hiding any warnings
warnings.filterwarnings('ignore')

# Adjusting Pandas output
pd.set_option("display.max_columns", None)

In [None]:
# Loading in the training data
df_raw = pd.read_csv('../data/raw/titanic-train-raw.csv')

# Separating predictor value from the remainder of the dataset
X = df_raw.drop(columns = ['Survived'])
y = df_raw[['Survived']]