### Importing Necessary Modules
For this part of the model building process, the modules to be imported would be "joblib",

"sklearn" and "pandas". The "joblib" module would be used to save the created machine learning model.

Predicting if a passenger Survived or not has only 2 outcomes, "yes" or "no".

The Logistic Rergression model gave a pretty good performance as a benchmark. 

Now the Random Forest Classifier from sklearn would be used to build a Random Forest model

and compare with the Logistic Regression model.

In [1]:
import joblib
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV

### Read in the Data
The training set's features and labels would be used to train the model first of all.

They are brought in using the pandas "read_csv" function.

In [2]:
address1 = r"...\...\...\titanic_EDA\Split_data\train_features.csv"

address2 = r"...\...\...\titanic_EDA\Split_data\train_labels.csv"

train_features = pd.read_csv(address1)
train_labels = pd.read_csv(address2)

### Hyperparamters & Cross Validation
Hyperparameters are parameters/properties, whose values are used to control the learning process.

For this Random Forest model, two hyperparameters would be employed. "n_estimators" and "max_depth".

Random Forest Classifier is basically a number of Decision Trees that are independent of each other.

The "n_estimators" parameter is used to set the number of Trees to be used

and the "max_depth" parameter is used to control the depth of the Tree from the Root Node.

3 different estimator values would be used and 6 different depths would be chosen.

A 5-fold cross validation would also be performed with GridSearchCV and this would create

90 Random Forest models and the best one would be chosen amongst them.

In [3]:
random_forest = RandomForestClassifier()
parameters = {
    'n_estimators': [5, 50, 250],
    'max_depth': [2, 4, 8, 16, 32, None]
}

cross_validation = GridSearchCV(random_forest, parameters, cv = 5)

cross_validation.fit(train_features, train_labels.values.ravel())

### Print Function
The function below is just an extra step to print the results of the cross validation step in a 

readable format. This function is going to print the best model with the best suited hyperparameter

and also the performance of all the models and their respective hyperparamters.

In [4]:
def print_results(results):
    print('BEST PARAMS: {}\n'.format(results.best_params_))

    mean_score = results.cv_results_['mean_test_score']
    std_score = results.cv_results_['std_test_score']
    
    for mean, std, params in zip(mean_score, std_score, results.cv_results_['params']):
        print('{} (+/-{}) for {}'.format(round(mean, 3), round(std * 2, 3), params))

In [5]:
print_results(cross_validation)

BEST PARAMS: {'max_depth': 4, 'n_estimators': 50}

0.794 (+/-0.086) for {'max_depth': 2, 'n_estimators': 5}
0.803 (+/-0.077) for {'max_depth': 2, 'n_estimators': 50}
0.792 (+/-0.091) for {'max_depth': 2, 'n_estimators': 250}
0.813 (+/-0.106) for {'max_depth': 4, 'n_estimators': 5}
0.826 (+/-0.103) for {'max_depth': 4, 'n_estimators': 50}
0.811 (+/-0.09) for {'max_depth': 4, 'n_estimators': 250}
0.809 (+/-0.045) for {'max_depth': 8, 'n_estimators': 5}
0.818 (+/-0.067) for {'max_depth': 8, 'n_estimators': 50}
0.82 (+/-0.061) for {'max_depth': 8, 'n_estimators': 250}
0.803 (+/-0.067) for {'max_depth': 16, 'n_estimators': 5}
0.815 (+/-0.059) for {'max_depth': 16, 'n_estimators': 50}
0.807 (+/-0.076) for {'max_depth': 16, 'n_estimators': 250}
0.79 (+/-0.061) for {'max_depth': 32, 'n_estimators': 5}
0.815 (+/-0.074) for {'max_depth': 32, 'n_estimators': 50}
0.811 (+/-0.072) for {'max_depth': 32, 'n_estimators': 250}
0.822 (+/-0.113) for {'max_depth': None, 'n_estimators': 5}
0.811 (+/-0.074)

### Best Model
The best model that displayed the best performance with the best suited hyperparameter value 

is shown below.

In [6]:
cross_validation.best_estimator_

### Saving the Model

The best model would now be saved using the joblib module.

In [None]:
joblib.dump(cross_validation.best_estimator_, r'...\...\...\titanic_EDA\Models\RF_model.pkl')