# Decision Tree Model for Customer Satisfaction Prediction

This notebook implements a decision tree model to predict customer satisfaction using the Santander Customer Satisfaction dataset. The following steps were taken:

1. **Data Loading**:
    - The training and test datasets are loaded using pandas.

2. **Exploratory Data Analysis**:
    - The first few rows of the dataset are displayed to understand its structure.
    - Null values are counted to assess data quality.
    - Summary statistics and information about the dataset are shown.
    - Value counts of categorical features are printed for understanding feature distributions.
    - A correlation matrix is computed to identify relationships between features and the target variable.

3. **Feature Selection**:
    - Highly correlated features with the target variable are identified to focus on important predictors.
    - Features with low correlation (absolute value < 0.1) are also noted to potentially exclude irrelevant variables.

4. **Model Training**:
    - The data is split into training and validation sets (70% training and 30% validation).
    - Several decision tree models are defined with varying hyperparameters for comparison.

5. **Model Evaluation**:
    - Each model is trained and evaluated using accuracy, F1 score, ROC AUC score, and confusion matrix.
    - Results are stored and displayed to identify the best-performing model.

6. **Hyperparameter Tuning**:
    - A grid search is performed to find the optimal hyperparameters for the best model based on ROC AUC score.

7. **Model Training with Best Hyperparameters**:
    - **Updating Parameters**: 
        - Models are redefined according to the suggested best hyperparameters.
        - This includes adjusting `max_depth`, `criterion`, `class_weight`, and `min_samples_leaf` values.
    - **Re-evaluation**:
        - Each updated model is retrained and evaluated using accuracy, F1 score, and ROC AUC score.
        - Results are stored to compare with the initial models.
    - **CSV File Generation for Comparison**:
        - A CSV file is generated containing predictions and performance metrics of models with tuned parameters for easy comparison.

8. **Exploring Different Tuning Options**:
    - Additional hyperparameter tuning and redefinition of models based on new suggested values.
    - **CSV Generation for Second Comparison**:
        - Another CSV file is generated for a third comparison, allowing us to assess differences in model performance with each round of tuning.

9. **Final Evaluation**:
    - The best hyperparameter-tuned model is evaluated, and its performance metrics are printed.
    - A classification report is generated for detailed performance analysis.

10. **Submission Generation**:
    - A function is created to generate a submission file for the test dataset containing predicted probabilities for customer satisfaction.

11. **Comparison of Results**:
    - Results across initial models, first tuning, and second tuning are analyzed to conclude the best model and approach.


## Importing Libraries

In [1]:
import numpy as np # Import NumPy for numerical operations
import pandas as pd # Import pandas for data manipulation

## Load Datasets

In [2]:
# Load the training and test datasets
train = pd.read_csv('C:\\Users\\ayush\\Downloads\\Santander Customer Satisfaction - TRAIN.csv')
test = pd.read_csv('C:\\Users\\ayush\\Downloads\\Santander Customer Satisfaction - TEST-Without TARGET.csv')


In [3]:
# Display the first few rows of the training dataset
print(train.head())

   ID  var3  var15  imp_ent_var16_ult1  imp_op_var39_comer_ult1  \
0   1     2     23                 0.0                      0.0   
1   3     2     34                 0.0                      0.0   
2   4     2     23                 0.0                      0.0   
3   8     2     37                 0.0                    195.0   
4  10     2     39                 0.0                      0.0   

   imp_op_var39_comer_ult3  imp_op_var40_comer_ult1  imp_op_var40_comer_ult3  \
0                      0.0                      0.0                      0.0   
1                      0.0                      0.0                      0.0   
2                      0.0                      0.0                      0.0   
3                    195.0                      0.0                      0.0   
4                      0.0                      0.0                      0.0   

   imp_op_var40_efect_ult1  imp_op_var40_efect_ult3  ...  \
0                      0.0                      0.0  ...

## Exploratory Data Analysis (EDA)

In [4]:
# Count and display the number of null values in each column
null_counts = train.isnull().sum()
print(null_counts)

ID                         0
var3                       0
var15                      0
imp_ent_var16_ult1         0
imp_op_var39_comer_ult1    0
                          ..
saldo_medio_var44_hace3    0
saldo_medio_var44_ult1     0
saldo_medio_var44_ult3     0
var38                      0
TARGET                     0
Length: 371, dtype: int64


In [5]:
# Display statistical summaries of the training dataset
train.describe()

Unnamed: 0,ID,var3,var15,imp_ent_var16_ult1,imp_op_var39_comer_ult1,imp_op_var39_comer_ult3,imp_op_var40_comer_ult1,imp_op_var40_comer_ult3,imp_op_var40_efect_ult1,imp_op_var40_efect_ult3,...,saldo_medio_var33_hace2,saldo_medio_var33_hace3,saldo_medio_var33_ult1,saldo_medio_var33_ult3,saldo_medio_var44_hace2,saldo_medio_var44_hace3,saldo_medio_var44_ult1,saldo_medio_var44_ult3,var38,TARGET
count,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,...,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0,76020.0
mean,75964.050723,-1523.199277,33.212865,86.208265,72.363067,119.529632,3.55913,6.472698,0.412946,0.567352,...,7.935824,1.365146,12.21558,8.784074,31.505324,1.858575,76.026165,56.614351,117235.8,0.039569
std,43781.947379,39033.462364,12.956486,1614.757313,339.315831,546.266294,93.155749,153.737066,30.604864,36.513513,...,455.887218,113.959637,783.207399,538.439211,2013.125393,147.786584,4040.337842,2852.579397,182664.6,0.194945
min,1.0,-999999.0,5.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5163.75,0.0
25%,38104.75,2.0,23.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,67870.61,0.0
50%,76043.0,2.0,28.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,106409.2,0.0
75%,113748.75,2.0,40.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,118756.3,0.0
max,151838.0,238.0,105.0,210000.0,12888.03,21024.81,8237.82,11073.57,6600.0,6600.0,...,50003.88,20385.72,138831.63,91778.73,438329.22,24650.01,681462.9,397884.3,22034740.0,1.0


In [6]:
train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 76020 entries, 0 to 76019
Columns: 371 entries, ID to TARGET
dtypes: float64(111), int64(260)
memory usage: 215.2 MB


In [7]:
# Display value counts for categorical columns
for col in train.select_dtypes(include=['object']).columns:
    print(f"Value counts for {col}:")
    print(train[col].value_counts())
    print("\n") 

In [8]:
# Calculate the correlation matrix for the training dataset
correlation = train.corr()
target_corr = correlation["TARGET"].sort_values(ascending=False)
highly_correlated_features = target_corr.index[1:4] # Get the top 3 features correlated with TARGET
print("3 highly correlation with target\n", highly_correlated_features.tolist())

3 highly correlation with target
 ['var36', 'var15', 'ind_var8_0']


In [9]:
target_corr = correlation["TARGET"].sort_values(ascending=True)

# Identify variables with low correlation (absolute value < 0.1)
low_corr_features = target_corr[abs(target_corr) < 0.1].index.tolist()

# Limit to three irrelevant features if there are more than three
irrelevant_features = low_corr_features[:3]  # Get the first three irrelevant features

print("3 irrelevant variables:\n", irrelevant_features)

3 irrelevant variables:
 ['num_var4', 'num_var35', 'ind_var13']


## Model Training and Feature Importance

In [10]:
from sklearn.tree import DecisionTreeClassifier # Import DecisionTreeClassifier for modeling
from sklearn.model_selection import train_test_split, GridSearchCV # Import functions for splitting data and hyperparameter tuning

In [51]:
# Define features and target variable for modeling
X = train.drop(columns=["TARGET"]) # Features (input data)
y = train["TARGET"] # Target variable (output)

In [21]:
# Split the data into training and validation sets (70% train, 30% validation)
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.3, random_state=42)


In [22]:
# Initialize the Decision Tree model
model = DecisionTreeClassifier(random_state=42)
model.fit(X_train, y_train) # Fit the model to the training data

DecisionTreeClassifier(random_state=42)

In [23]:
# Get feature importance from the trained model
feature_importances = pd.DataFrame({
    "feature":X.columns,
    "importance":model.feature_importances_
}).sort_values(by="importance", ascending=False)

In [24]:
# Get the top 3 most important features
most_important_features = feature_importances["feature"].head(3).tolist()
print("Top Important Features:\n", most_important_features)

Top Important Features:
 ['ID', 'var38', 'var15']


## Model Evaluation - Comparing Different Models

In [25]:
# Define multiple models with different hyperparameters for comparison
models = {
    "Model 1": DecisionTreeClassifier(max_depth=5, criterion="gini", min_samples_leaf=5),
    "Model 2": DecisionTreeClassifier(max_depth=10, criterion="gini", min_samples_leaf=10),
    "Model 3": DecisionTreeClassifier(max_depth=None, criterion="entropy", min_samples_leaf=20),
    "Model 4": DecisionTreeClassifier(max_depth=10, criterion="entropy", min_samples_leaf=10),
}

In [26]:
# Import metrics for evaluation
from sklearn.metrics import accuracy_score, f1_score, roc_auc_score, confusion_matrix, classification_report

In [36]:
# Initialize an empty list to store results
results = []
# Iterate over each model, fit it, and evaluate its performance
for name, model in models.items():
    model.fit(X_train, y_train) # Fit model on training data
    val_predictions = model.predict(X_val) # Make predictions on validation data

    # Calculate evaluation metrics
    accuracy = accuracy_score(y_val, val_predictions)
    f1 = f1_score(y_val, val_predictions)
    roc_auc = roc_auc_score(y_val, model.predict_proba(X_val)[:, 1])
    confusion = confusion_matrix(y_val, val_predictions)

    # Store results
    results.append({
        "Model": name,
        "Accuracy": accuracy,
        "F1 Score": f1,
        "ROC AUC": roc_auc,
        "Confusion Matrix": confusion
    })

# Convert results to a DataFrame for easy viewing
results_df = pd.DataFrame(results)


In [31]:
print(results_df)

     Model  Accuracy  F1 Score   ROC AUC          Confusion Matrix
0  Model 1  0.959835  0.000000  0.807686    [[21890, 1], [915, 0]]
1  Model 2  0.958037  0.018462  0.779357   [[21840, 51], [906, 9]]
2  Model 3  0.957292  0.058027  0.705577  [[21802, 89], [885, 30]]
3  Model 4  0.958257  0.016529  0.775468   [[21846, 45], [907, 8]]


In [32]:
# Identify the best model based on ROC AUC score
best_model_name = results_df.loc[results_df['ROC AUC'].idxmax()]['Model']
best_model = models[best_model_name]
print(f"\nBest Model: {best_model_name}")


Best Model: Model 1


## Hyperparameter Tuning

In [33]:
# Hyperparameter tuning using Grid Search for the best model
param_grid = {
    'max_depth': [None, 5, 10, 15],
    'min_samples_split': [2, 5, 10],
    'criterion': ['gini', 'entropy']
}

grid_search = GridSearchCV(best_model, param_grid, scoring='roc_auc', cv=5) # Define grid search
grid_search.fit(X_train, y_train)


GridSearchCV(cv=5,
             estimator=DecisionTreeClassifier(max_depth=5, min_samples_leaf=5),
             param_grid={'criterion': ['gini', 'entropy'],
                         'max_depth': [None, 5, 10, 15],
                         'min_samples_split': [2, 5, 10]},
             scoring='roc_auc')

## Evaluation of Best Hyperparameter-Tuned Model

In [34]:
# Best model from Grid Search
best_hyper_model = grid_search.best_estimator_
best_params = grid_search.best_params_
print(f"Best Hyperparameters for {best_model_name}: {best_params}")

# Evaluate the best model
best_val_predictions = best_hyper_model.predict(X_val)

# Calculate metrics for the best hyperparameter-tuned model
best_accuracy = accuracy_score(y_val, best_val_predictions)
best_f1 = f1_score(y_val, best_val_predictions)
best_roc_auc = roc_auc_score(y_val, best_hyper_model.predict_proba(X_val)[:, 1])
best_confusion = confusion_matrix(y_val, best_val_predictions)

# Print evaluation metrics for the best hyperparameter-tuned model
print(f"\nBest Hyperparameter-Tuned Model Accuracy: {best_accuracy:.4f}")
print(f"Best Hyperparameter-Tuned Model F1 Score: {best_f1:.4f}")
print(f"Best Hyperparameter-Tuned Model ROC AUC Score: {best_roc_auc:.4f}")
print("Best Hyperparameter-Tuned Model Confusion Matrix:")
print(best_confusion)
print("Best Hyperparameter-Tuned Model Classification Report:")
print(classification_report(y_val, best_val_predictions))

Best Hyperparameters for Model 1: {'criterion': 'entropy', 'max_depth': 5, 'min_samples_split': 10}

Best Hyperparameter-Tuned Model Accuracy: 0.9597
Best Hyperparameter-Tuned Model F1 Score: 0.0000
Best Hyperparameter-Tuned Model ROC AUC Score: 0.8002
Best Hyperparameter-Tuned Model Confusion Matrix:
[[21887     4]
 [  915     0]]
Best Hyperparameter-Tuned Model Classification Report:
              precision    recall  f1-score   support

           0       0.96      1.00      0.98     21891
           1       0.00      0.00      0.00       915

    accuracy                           0.96     22806
   macro avg       0.48      0.50      0.49     22806
weighted avg       0.92      0.96      0.94     22806



## Submission File Generation

In [35]:
# Function to generate submission file for the test dataset
def generate_submission(model, test_df, features):
    predictions = model.predict_proba(test_df[features])[:, 1]
    submission = pd.DataFrame({
        'ID': test_df['ID'],
        'TARGET': predictions
    })
    submission.to_csv('submission.csv', index=False)
    print("Submission file created: submission.csv")
# Generate the submission file
generate_submission(best_hyper_model, test, X.columns)


Submission file created: submission.csv


## Model Training with Best Hyperparameters

In [37]:
# Changing the parameters of the models according to the suggested best hyperparameters
models = {
    "Model 1": DecisionTreeClassifier(max_depth=5, criterion="entropy",class_weight='balanced', min_samples_leaf=2),
    "Model 2": DecisionTreeClassifier(max_depth=10, criterion="gini", min_samples_leaf=10),
    "Model 3": DecisionTreeClassifier(max_depth=None, criterion="entropy", min_samples_leaf=20),
    "Model 4": DecisionTreeClassifier(max_depth=10, criterion="entropy", min_samples_leaf=10),
}


In [38]:
results = []

for name, model in models.items():
    model.fit(X_train, y_train)
    val_predictions = model.predict(X_val)

    # Calculate evaluation metrics
    accuracy = accuracy_score(y_val, val_predictions)
    f1 = f1_score(y_val, val_predictions)
    roc_auc = roc_auc_score(y_val, model.predict_proba(X_val)[:, 1])
    confusion = confusion_matrix(y_val, val_predictions)

    # Store results
    results.append({
        "Model": name,
        "Accuracy": accuracy,
        "F1 Score": f1,
        "ROC AUC": roc_auc,
        "Confusion Matrix": confusion
    })

# Convert results to DataFrame
results_df = pd.DataFrame(results)

In [39]:
print(results_df)

     Model  Accuracy  F1 Score   ROC AUC             Confusion Matrix
0  Model 1  0.734544  0.179675  0.799795  [[16089, 5802], [252, 663]]
1  Model 2  0.958037  0.018462  0.780172      [[21840, 51], [906, 9]]
2  Model 3  0.957467  0.060078  0.705046     [[21805, 86], [884, 31]]
3  Model 4  0.958213  0.014478  0.776497      [[21846, 45], [908, 7]]


In [40]:
# Identify the best model based on ROC AUC score
best_model_name = results_df.loc[results_df['ROC AUC'].idxmax()]['Model']
best_model = models[best_model_name]
print(f"\nBest Model: {best_model_name}")


Best Model: Model 1


In [41]:
# Hyperparameter tuning using Grid Search for the best model
param_grid = {
    'max_depth': [None, 5, 10, 15],
    'min_samples_split': [2, 5, 10],
    'criterion': ['gini', 'entropy']
}

grid_search = GridSearchCV(best_model, param_grid, scoring='roc_auc', cv=5)
grid_search.fit(X_train, y_train)


GridSearchCV(cv=5,
             estimator=DecisionTreeClassifier(class_weight='balanced',
                                              criterion='entropy', max_depth=5,
                                              min_samples_leaf=2),
             param_grid={'criterion': ['gini', 'entropy'],
                         'max_depth': [None, 5, 10, 15],
                         'min_samples_split': [2, 5, 10]},
             scoring='roc_auc')

In [42]:
# Best model from Grid Search
best_hyper_model = grid_search.best_estimator_
best_params = grid_search.best_params_
print(f"Best Hyperparameters for {best_model_name}: {best_params}")

# Evaluate the best model
best_val_predictions = best_hyper_model.predict(X_val)

# Calculate metrics for the best hyperparameter-tuned model
best_accuracy = accuracy_score(y_val, best_val_predictions)
best_f1 = f1_score(y_val, best_val_predictions)
best_roc_auc = roc_auc_score(y_val, best_hyper_model.predict_proba(X_val)[:, 1])
best_confusion = confusion_matrix(y_val, best_val_predictions)

# Print evaluation metrics for the best hyperparameter-tuned model
print(f"\nBest Hyperparameter-Tuned Model Accuracy: {best_accuracy:.4f}")
print(f"Best Hyperparameter-Tuned Model F1 Score: {best_f1:.4f}")
print(f"Best Hyperparameter-Tuned Model ROC AUC Score: {best_roc_auc:.4f}")
print("Best Hyperparameter-Tuned Model Confusion Matrix:")
print(best_confusion)
print("Best Hyperparameter-Tuned Model Classification Report:")
print(classification_report(y_val, best_val_predictions))

Best Hyperparameters for Model 1: {'criterion': 'entropy', 'max_depth': 5, 'min_samples_split': 5}

Best Hyperparameter-Tuned Model Accuracy: 0.7345
Best Hyperparameter-Tuned Model F1 Score: 0.1797
Best Hyperparameter-Tuned Model ROC AUC Score: 0.7998
Best Hyperparameter-Tuned Model Confusion Matrix:
[[16089  5802]
 [  252   663]]
Best Hyperparameter-Tuned Model Classification Report:
              precision    recall  f1-score   support

           0       0.98      0.73      0.84     21891
           1       0.10      0.72      0.18       915

    accuracy                           0.73     22806
   macro avg       0.54      0.73      0.51     22806
weighted avg       0.95      0.73      0.82     22806



In [43]:
# Function to generate submission file for the test dataset
def generate_submission(model, test_df, features):
    predictions = model.predict_proba(test_df[features])[:, 1]
    submission = pd.DataFrame({
        'ID': test_df['ID'],
        'TARGET': predictions
    })
    submission.to_csv('submission.csv', index=False)
    print("Submission file created: submission.csv")
# Generate the submission file
generate_submission(best_hyper_model, test, X.columns)


Submission file created: submission.csv


## Model Training with Different Hyperparameters to Explore Different Results

In [44]:
# Changing the parameters to get different results to explore the results
models = {
    "Model 1": DecisionTreeClassifier(max_depth=5, criterion="entropy",class_weight='balanced', min_samples_leaf=5),
    "Model 2": DecisionTreeClassifier(max_depth=2, criterion="gini", min_samples_leaf=10),
    "Model 3": DecisionTreeClassifier(max_depth=10, criterion="entropy", min_samples_leaf=20),
    "Model 4": DecisionTreeClassifier(max_depth=5, criterion="gini",class_weight='balanced', min_samples_leaf=10),
}

In [45]:
results = []

for name, model in models.items():
    model.fit(X_train, y_train)
    val_predictions = model.predict(X_val)

    # Calculate evaluation metrics
    accuracy = accuracy_score(y_val, val_predictions)
    f1 = f1_score(y_val, val_predictions)
    roc_auc = roc_auc_score(y_val, model.predict_proba(X_val)[:, 1])
    confusion = confusion_matrix(y_val, val_predictions)

    # Store results
    results.append({
        "Model": name,
        "Accuracy": accuracy,
        "F1 Score": f1,
        "ROC AUC": roc_auc,
        "Confusion Matrix": confusion
    })

# Convert results to DataFrame
results_df = pd.DataFrame(results)

In [46]:
print(results_df)

     Model  Accuracy  F1 Score   ROC AUC             Confusion Matrix
0  Model 1  0.734544  0.179675  0.799795  [[16089, 5802], [252, 663]]
1  Model 2  0.959879  0.000000  0.765357       [[21891, 0], [915, 0]]
2  Model 3  0.959397  0.004301  0.777571      [[21878, 13], [913, 2]]
3  Model 4  0.784136  0.203527  0.803696  [[17254, 4637], [286, 629]]


In [47]:
# Identify the best model based on ROC AUC score
best_model_name = results_df.loc[results_df['ROC AUC'].idxmax()]['Model']
best_model = models[best_model_name]
print(f"\nBest Model: {best_model_name}")


Best Model: Model 4


In [48]:
# Hyperparameter tuning using Grid Search for the best model
param_grid = {
    'max_depth': [None, 5, 10, 15],
    'min_samples_split': [2, 5, 10],
    'criterion': ['gini', 'entropy']
}

grid_search = GridSearchCV(best_model, param_grid, scoring='roc_auc', cv=5)
grid_search.fit(X_train, y_train)


GridSearchCV(cv=5,
             estimator=DecisionTreeClassifier(class_weight='balanced',
                                              max_depth=5,
                                              min_samples_leaf=10),
             param_grid={'criterion': ['gini', 'entropy'],
                         'max_depth': [None, 5, 10, 15],
                         'min_samples_split': [2, 5, 10]},
             scoring='roc_auc')

In [49]:
# Best model from Grid Search
best_hyper_model = grid_search.best_estimator_
best_params = grid_search.best_params_
print(f"Best Hyperparameters for {best_model_name}: {best_params}")

# Evaluate the best model
best_val_predictions = best_hyper_model.predict(X_val)

# Calculate metrics for the best hyperparameter-tuned model
best_accuracy = accuracy_score(y_val, best_val_predictions)
best_f1 = f1_score(y_val, best_val_predictions)
best_roc_auc = roc_auc_score(y_val, best_hyper_model.predict_proba(X_val)[:, 1])
best_confusion = confusion_matrix(y_val, best_val_predictions)

# Print evaluation metrics for the best hyperparameter-tuned model
print(f"\nBest Hyperparameter-Tuned Model Accuracy: {best_accuracy:.4f}")
print(f"Best Hyperparameter-Tuned Model F1 Score: {best_f1:.4f}")
print(f"Best Hyperparameter-Tuned Model ROC AUC Score: {best_roc_auc:.4f}")
print("Best Hyperparameter-Tuned Model Confusion Matrix:")
print(best_confusion)
print("Best Hyperparameter-Tuned Model Classification Report:")
print(classification_report(y_val, best_val_predictions))

Best Hyperparameters for Model 4: {'criterion': 'entropy', 'max_depth': 5, 'min_samples_split': 2}

Best Hyperparameter-Tuned Model Accuracy: 0.7345
Best Hyperparameter-Tuned Model F1 Score: 0.1797
Best Hyperparameter-Tuned Model ROC AUC Score: 0.7998
Best Hyperparameter-Tuned Model Confusion Matrix:
[[16089  5802]
 [  252   663]]
Best Hyperparameter-Tuned Model Classification Report:
              precision    recall  f1-score   support

           0       0.98      0.73      0.84     21891
           1       0.10      0.72      0.18       915

    accuracy                           0.73     22806
   macro avg       0.54      0.73      0.51     22806
weighted avg       0.95      0.73      0.82     22806



In [50]:
# Function to generate submission file for the test dataset
def generate_submission(model, test_df, features):
    predictions = model.predict_proba(test_df[features])[:, 1]
    submission = pd.DataFrame({
        'ID': test_df['ID'],
        'TARGET': predictions
    })
    submission.to_csv('submission.csv', index=False)
    print("Submission file created: submission.csv")
# Generate the submission file
generate_submission(best_hyper_model, test, X.columns)

Submission file created: submission.csv
