# Table of Contents

<a class="anchor" id="top"></a>

** ** 

1. [Importing Libraries & Data](#1.-Importing-Libraries-&-Data) <br><br>

2. [Modelling](#2.-Modelling)

   2.1 [Hyperparameter Tuning](#2.1-Hyperparameter-Tuning) <br>
   
   2.2 [Combining Models](#2.2-Combining-Models) <br><br>
   
3. [Final Predictions](#3.-Final-Predictions) <br><br>

4. [Export](#4.-Export) <br>


# 1. Importing Libraries & Data

In [None]:
# Libraries for Data Manipulation
import pandas as pd

# Importing classification models
from sklearn.linear_model import LogisticRegression, SGDClassifier  # Linear models
from sklearn.tree import DecisionTreeClassifier  
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier, AdaBoostClassifier, StackingClassifier, VotingClassifier  # Ensemble classifiers
from sklearn.calibration import CalibratedClassifierCV
from sklearn.svm import SVC 
from sklearn.model_selection import RandomizedSearchCV 
from sklearn.gaussian_process import GaussianProcessClassifier  
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis, QuadraticDiscriminantAnalysis  # Discriminant analysis models
from xgboost import XGBClassifier  
from lightgbm import LGBMClassifier
  
# Utility imports
import itertools  
from sklearn.metrics import f1_score

# Importing custom metrics module
import metrics as m

# Display Settings
pd.set_option('display.max_columns', None)

# Suppress warnings for cleaner output
import warnings
warnings.filterwarnings("ignore")

**Import Data**

In [2]:
# Load training and validation datasets for features and target variables

# Features for training and validation
X_train = pd.read_csv('./project_data/x_train_all.csv', index_col='Claim Identifier')
X_val = pd.read_csv('./project_data/x_val_all.csv', index_col='Claim Identifier')

# Target variables for training and validation
y_train = pd.read_csv('./project_data/y_train.csv', index_col='Claim Identifier')
y_val = pd.read_csv('./project_data/y_val.csv', index_col='Claim Identifier')

In [3]:
# Load the test dataset with the 'Claim Identifier' as the index
test = pd.read_csv('./project_data/test_all.csv', index_col='Claim Identifier')

**Features**

In [4]:
# Define lists of numerical features for final selection and testing
final_num = ['Accident Month', 'Accident Year', 'Average Weekly Wage', 'Birth Year', 'First Hearing Year']  
# Selected numerical features for final use

try_num = ['Accident Day', 'Assembly Day', 'C-2 Day', 'IME-4 Count', 'IME-4 Count Log']  
# Numerical features under consideration for testing or further analysis

In [5]:
# Define lists of categorical features for final selection and testing
final_categ = ['Attorney/Representative Bin', 'C-3 Date Binary', 'Carrier Name freq',  
               'Industry Code', 'WCIO Cause of Injury Code', 'WCIO Nature of Injury Code', 
               'WCIO Part Of Body Code']  
# Selected categorical features for final use

try_categ = ['County of Injury freq', 'District Name freq', 'Medical Fee Region freq']  
# Categorical features under consideration for testing or further analysis

In [None]:
# Extract the final features along with a selected test feature from training and validation datasets
X_train = X_train1[final_num + final_categ + [try_num[2]]]

# Validation set with the same selected features for consistency
X_val = X_val1[final_num + final_categ + [try_num[2]]]

| Model | Features                               | macro-F1 |
| ----- | -------------------------------------- | -------- |
| RF    | final_num + final_categ                | 0.38     |
| RF    | final_num + final_categ + [try_num[0]] | 0.39     |
| RF    | final_num + final_categ + [try_num[1]] | 0.39     |
| RF    | final_num + final_categ + [try_num[2]] | 0.39     |
| ----- | ----------------------- | -------- |
| ----- | ----------------------- | -------- |

# 2. Modelling
Modeling involves training machine learning algorithms to uncover patterns and make predictions. It includes selecting algorithms, tuning hyperparameters, and evaluating performance. In this section, we aim to build and refine models for accurate and reliable results.

<a href="#top">Top &#129033;</a>

In [5]:
# Print the list of column names in the training features dataset
print(X_train.columns.to_list(), '\n')

# Print the number of columns in the training features dataset
print(len(X_train.columns))

['Accident Year', 'Average Weekly Wage', 'First Hearing Year', 'C-2 Year', 'C-2 Month', 'Birth Year', 'Assembly Year', 'IME-4 Count', 'Assembly Day', 'Attorney/Representative', 'C-3 Date Binary', 'WCIO Nature of Injury Code', 'Carrier Name', 'County of Injury', 'Industry Code', 'Medical Fee Region', 'WCIO Cause of Injury Code', 'WCIO Part Of Body Code'] 

18


**Baseline Model**

In [None]:
# Initialize the Logistic Regression model and fit it to the training data
model = LogisticRegression()
model.fit(X_train, y_train)

In [None]:
# Predict the target values for the training dataset
train_pred = model.predict(X_train)

# Predict the target values for the validation dataset
val_pred = model.predict(X_val)

In [None]:
# Evaluate the model's performance using the custom metrics function on both training and validation predictions
metrics(y_train, train_pred, y_val, val_pred)

              precision    recall  f1-score   support

           1       0.00      0.00      0.00      9981
           2       0.72      0.95      0.82    232862
           3       0.21      0.01      0.02     55125
           4       0.53      0.67      0.59    118806
           5       0.22      0.00      0.01     38624
           6       0.00      0.00      0.00      3369
           7       0.00      0.00      0.00        77
           8       0.00      0.00      0.00       376

    accuracy                           0.66    459220
   macro avg       0.21      0.20      0.18    459220
weighted avg       0.55      0.66      0.57    459220

              precision    recall  f1-score   support

           1       0.00      0.00      0.00      2495
           2       0.72      0.95      0.82     58216
           3       0.21      0.01      0.02     13781
           4       0.53      0.66      0.59     29701
           5       0.17      0.00      0.01      9656
           6       0.00 

**SGD Classifier**

In [None]:
# Initialize the SGDClassifier model and fit it to the training data
sgd = SGDClassifier()
sgd.fit(X_train, y_train)

In [None]:
# Predict the target values for the training dataset using the SGDClassifier model
train_pred_sgd = sgd.predict(X_train)

# Predict the target values for the validation dataset using the SGDClassifier model
val_pred_sgd = sgd.predict(X_val)

In [None]:
# Evaluate the performance of the SGDClassifier model using the custom metrics function for both training and validation predictions
m.metrics(y_train, train_pred_sgd, y_val, val_pred_sgd)

______________________________________________________________________
                                TRAIN                                 
----------------------------------------------------------------------
              precision    recall  f1-score   support

           1       0.00      0.00      0.00      9981
           2       0.73      0.99      0.84    232862
           3       0.12      0.00      0.00     55125
           4       0.59      0.73      0.65    118806
           5       0.22      0.00      0.00     38624
           6       0.00      0.00      0.00      3369
           7       0.00      0.00      0.00        77
           8       0.00      0.00      0.00       376

    accuracy                           0.69    459220
   macro avg       0.21      0.21      0.19    459220
weighted avg       0.56      0.69      0.60    459220

______________________________________________________________________
                                VALIDATION                       

**Decision Tree Classifier**

In [9]:
# Initialize the DecisionTreeClassifier model and fit it to the training data
dt = DecisionTreeClassifier()
dt.fit(X_train, y_train)

In [10]:
# Predict the target values for the training dataset using the DecisionTreeClassifier model
train_pred_dt = dt.predict(X_train)

# Predict the target values for the validation dataset using the DecisionTreeClassifier model
val_pred_dt = dt.predict(X_val)

In [11]:
# Evaluate the performance of the DecisionTreeClassifier model using the custom metrics function for both training and validation predictions
m.metrics(y_train, train_pred_dt, y_val, val_pred_dt)

______________________________________________________________________
                                TRAIN                                 
----------------------------------------------------------------------
              precision    recall  f1-score   support

           1       1.00      1.00      1.00      8108
           2       1.00      1.00      1.00    218985
           3       1.00      1.00      1.00     52736
           4       1.00      1.00      1.00    113543
           5       1.00      1.00      1.00     37098
           6       1.00      1.00      1.00      3209
           7       1.00      1.00      1.00        74
           8       1.00      1.00      1.00       319

    accuracy                           1.00    434072
   macro avg       1.00      1.00      1.00    434072
weighted avg       1.00      1.00      1.00    434072

______________________________________________________________________
                                VALIDATION                       

**Random Forest Classifier**

In [35]:
# Initialize the RandomForestClassifier model and fit it to the training data
rf = RandomForestClassifier()
rf.fit(X_train, y_train)

In [36]:
# Predict the target values for the training dataset using the RandomForestClassifier model
train_pred_rf = rf.predict(X_train)

# Predict the target values for the validation dataset using the RandomForestClassifier model
val_pred_rf = rf.predict(X_val)

In [37]:
# Evaluate the performance of the RandomForestClassifier model using the custom metrics function for both training and validation predictions
m.metrics(y_train, train_pred_rf, y_val, val_pred_rf)

______________________________________________________________________
                                TRAIN                                 
----------------------------------------------------------------------
              precision    recall  f1-score   support

           1       1.00      1.00      1.00      8108
           2       1.00      1.00      1.00    218985
           3       1.00      1.00      1.00     52736
           4       1.00      1.00      1.00    113543
           5       1.00      1.00      1.00     37098
           6       1.00      1.00      1.00      3209
           7       1.00      1.00      1.00        74
           8       1.00      1.00      1.00       319

    accuracy                           1.00    434072
   macro avg       1.00      1.00      1.00    434072
weighted avg       1.00      1.00      1.00    434072

______________________________________________________________________
                                VALIDATION                       

In [None]:
results = {}

# Loop over all combinations of try_num and try_categ
for num_combination_size in range(len(try_num) + 1):
    for categ_combination_size in range(len(try_categ) + 1):
        # Get all combinations of specified size for try_num and try_categ
        num_combinations = list(itertools.combinations(try_num, num_combination_size))
        categ_combinations = list(itertools.combinations(try_categ, categ_combination_size))
        
        for num_subset in num_combinations:
            for categ_subset in categ_combinations:
                # Create the feature set by combining final and selected try features
                selected_features = final_num + final_categ + list(num_subset) + list(categ_subset)
                
                # Subset the data
                X_train_subset = X_train[selected_features]
                X_val_subset = X_val[selected_features]
                
                # Train the Random Forest model and evaluate on the test set
                rf = RandomForestClassifier(random_state = 1)
                rf.fit(X_train_subset, y_train)
                
                train_pred_rf = rf.predict(X_train_subset)
                val_pred_rf = rf.predict(X_val_subset)
                
                class_report = m.metrics(y_train, train_pred_rf, y_val, val_pred_rf)
                # Calculate macro F1 score
                f1_macro = f1_score(y_val, val_pred_rf, average="macro")

                
                # Store the f1_macro with the feature set as the key
                results[f"{num_subset} + {categ_subset}"] = f1_macro
                
                print(f"Combination: {num_subset} + {categ_subset} | Macro F1 Score: {f1_macro}")


______________________________________________________________________
                                TRAIN                                 
----------------------------------------------------------------------
              precision    recall  f1-score   support

           1       1.00      0.99      0.99      8108
           2       1.00      1.00      1.00    218985
           3       1.00      1.00      1.00     52736
           4       1.00      1.00      1.00    113543
           5       1.00      1.00      1.00     37098
           6       1.00      1.00      1.00      3209
           7       1.00      1.00      1.00        74
           8       1.00      0.99      1.00       319

    accuracy                           1.00    434072
   macro avg       1.00      1.00      1.00    434072
weighted avg       1.00      1.00      1.00    434072

______________________________________________________________________
                                VALIDATION                       

              precision    recall  f1-score   support

           1       0.72      0.42      0.53      2027
           2       0.85      0.96      0.90     54746
           3       0.47      0.08      0.14     13184
           4       0.70      0.88      0.78     28386
           5       0.69      0.53      0.60      9275
           6       0.08      0.00      0.00       802
           7       0.00      0.00      0.00        18
           8       0.67      0.12      0.21        80

    accuracy                           0.78    108518
   macro avg       0.52      0.37      0.39    108518
weighted avg       0.74      0.78      0.74    108518

Combination: () + ('County of Injury freq', 'District Name freq') | Macro F1 Score: 0.39466647569389673
______________________________________________________________________
                                TRAIN                                 
----------------------------------------------------------------------
              precision    recal

              precision    recall  f1-score   support

           1       1.00      1.00      1.00      8108
           2       1.00      1.00      1.00    218985
           3       1.00      1.00      1.00     52736
           4       1.00      1.00      1.00    113543
           5       1.00      1.00      1.00     37098
           6       1.00      1.00      1.00      3209
           7       1.00      1.00      1.00        74
           8       1.00      1.00      1.00       319

    accuracy                           1.00    434072
   macro avg       1.00      1.00      1.00    434072
weighted avg       1.00      1.00      1.00    434072

______________________________________________________________________
                                VALIDATION                                 
----------------------------------------------------------------------
              precision    recall  f1-score   support

           1       0.72      0.41      0.52      2027
           2       0.8

______________________________________________________________________
                                TRAIN                                 
----------------------------------------------------------------------
              precision    recall  f1-score   support

           1       1.00      1.00      1.00      8108
           2       1.00      1.00      1.00    218985
           3       1.00      1.00      1.00     52736
           4       1.00      1.00      1.00    113543
           5       1.00      1.00      1.00     37098
           6       1.00      1.00      1.00      3209
           7       1.00      1.00      1.00        74
           8       1.00      1.00      1.00       319

    accuracy                           1.00    434072
   macro avg       1.00      1.00      1.00    434072
weighted avg       1.00      1.00      1.00    434072

______________________________________________________________________
                                VALIDATION                       

______________________________________________________________________
                                TRAIN                                 
----------------------------------------------------------------------
              precision    recall  f1-score   support

           1       1.00      1.00      1.00      8108
           2       1.00      1.00      1.00    218985
           3       1.00      1.00      1.00     52736
           4       1.00      1.00      1.00    113543
           5       1.00      1.00      1.00     37098
           6       1.00      1.00      1.00      3209
           7       1.00      1.00      1.00        74
           8       1.00      1.00      1.00       319

    accuracy                           1.00    434072
   macro avg       1.00      1.00      1.00    434072
weighted avg       1.00      1.00      1.00    434072

______________________________________________________________________
                                VALIDATION                       

______________________________________________________________________
                                TRAIN                                 
----------------------------------------------------------------------
              precision    recall  f1-score   support

           1       1.00      0.99      0.99      8108
           2       1.00      1.00      1.00    218985
           3       1.00      1.00      1.00     52736
           4       1.00      1.00      1.00    113543
           5       1.00      1.00      1.00     37098
           6       1.00      1.00      1.00      3209
           7       1.00      1.00      1.00        74
           8       1.00      0.99      1.00       319

    accuracy                           1.00    434072
   macro avg       1.00      1.00      1.00    434072
weighted avg       1.00      1.00      1.00    434072

______________________________________________________________________
                                VALIDATION                       

______________________________________________________________________
                                TRAIN                                 
----------------------------------------------------------------------
              precision    recall  f1-score   support

           1       1.00      1.00      1.00      8108
           2       1.00      1.00      1.00    218985
           3       1.00      1.00      1.00     52736
           4       1.00      1.00      1.00    113543
           5       1.00      1.00      1.00     37098
           6       1.00      1.00      1.00      3209
           7       1.00      1.00      1.00        74
           8       1.00      1.00      1.00       319

    accuracy                           1.00    434072
   macro avg       1.00      1.00      1.00    434072
weighted avg       1.00      1.00      1.00    434072

______________________________________________________________________
                                VALIDATION                       

______________________________________________________________________
                                TRAIN                                 
----------------------------------------------------------------------
              precision    recall  f1-score   support

           1       1.00      1.00      1.00      8108
           2       1.00      1.00      1.00    218985
           3       1.00      1.00      1.00     52736
           4       1.00      1.00      1.00    113543
           5       1.00      1.00      1.00     37098
           6       1.00      1.00      1.00      3209
           7       1.00      1.00      1.00        74
           8       1.00      1.00      1.00       319

    accuracy                           1.00    434072
   macro avg       1.00      1.00      1.00    434072
weighted avg       1.00      1.00      1.00    434072

______________________________________________________________________
                                VALIDATION                       

______________________________________________________________________
                                TRAIN                                 
----------------------------------------------------------------------
              precision    recall  f1-score   support

           1       1.00      0.99      0.99      8108
           2       1.00      1.00      1.00    218985
           3       1.00      1.00      1.00     52736
           4       1.00      1.00      1.00    113543
           5       1.00      1.00      1.00     37098
           6       1.00      1.00      1.00      3209
           7       1.00      1.00      1.00        74
           8       1.00      0.99      1.00       319

    accuracy                           1.00    434072
   macro avg       1.00      1.00      1.00    434072
weighted avg       1.00      1.00      1.00    434072

______________________________________________________________________
                                VALIDATION                       

______________________________________________________________________
                                TRAIN                                 
----------------------------------------------------------------------
              precision    recall  f1-score   support

           1       1.00      1.00      1.00      8108
           2       1.00      1.00      1.00    218985
           3       1.00      1.00      1.00     52736
           4       1.00      1.00      1.00    113543
           5       1.00      1.00      1.00     37098
           6       1.00      1.00      1.00      3209
           7       1.00      1.00      1.00        74
           8       1.00      1.00      1.00       319

    accuracy                           1.00    434072
   macro avg       1.00      1.00      1.00    434072
weighted avg       1.00      1.00      1.00    434072

______________________________________________________________________
                                VALIDATION                       

______________________________________________________________________
                                TRAIN                                 
----------------------------------------------------------------------
              precision    recall  f1-score   support

           1       1.00      1.00      1.00      8108
           2       1.00      1.00      1.00    218985
           3       1.00      1.00      1.00     52736
           4       1.00      1.00      1.00    113543
           5       1.00      1.00      1.00     37098
           6       1.00      1.00      1.00      3209
           7       1.00      1.00      1.00        74
           8       1.00      1.00      1.00       319

    accuracy                           1.00    434072
   macro avg       1.00      1.00      1.00    434072
weighted avg       1.00      1.00      1.00    434072

______________________________________________________________________
                                VALIDATION                       

______________________________________________________________________
                                TRAIN                                 
----------------------------------------------------------------------
              precision    recall  f1-score   support

           1       1.00      1.00      1.00      8108
           2       1.00      1.00      1.00    218985
           3       1.00      1.00      1.00     52736
           4       1.00      1.00      1.00    113543
           5       1.00      1.00      1.00     37098
           6       1.00      1.00      1.00      3209
           7       1.00      1.00      1.00        74
           8       1.00      1.00      1.00       319

    accuracy                           1.00    434072
   macro avg       1.00      1.00      1.00    434072
weighted avg       1.00      1.00      1.00    434072

______________________________________________________________________
                                VALIDATION                       

______________________________________________________________________
                                TRAIN                                 
----------------------------------------------------------------------
              precision    recall  f1-score   support

           1       1.00      1.00      1.00      8108
           2       1.00      1.00      1.00    218985
           3       1.00      1.00      1.00     52736
           4       1.00      1.00      1.00    113543
           5       1.00      1.00      1.00     37098
           6       1.00      1.00      1.00      3209
           7       1.00      1.00      1.00        74
           8       1.00      1.00      1.00       319

    accuracy                           1.00    434072
   macro avg       1.00      1.00      1.00    434072
weighted avg       1.00      1.00      1.00    434072

______________________________________________________________________
                                VALIDATION                       

              precision    recall  f1-score   support

           1       0.72      0.41      0.52      2027
           2       0.85      0.96      0.90     54746
           3       0.51      0.06      0.11     13184
           4       0.69      0.89      0.78     28386
           5       0.70      0.52      0.60      9275
           6       0.00      0.00      0.00       802
           7       0.00      0.00      0.00        18
           8       0.55      0.07      0.13        80

    accuracy                           0.78    108518
   macro avg       0.50      0.36      0.38    108518
weighted avg       0.74      0.78      0.73    108518

Combination: ('Accident Day', 'C-2 Day') + ('Medical Fee Region freq',) | Macro F1 Score: 0.38021181651261493
______________________________________________________________________
                                TRAIN                                 
----------------------------------------------------------------------
              precision   

              precision    recall  f1-score   support

           1       1.00      1.00      1.00      8108
           2       1.00      1.00      1.00    218985
           3       1.00      1.00      1.00     52736
           4       1.00      1.00      1.00    113543
           5       1.00      1.00      1.00     37098
           6       1.00      1.00      1.00      3209
           7       1.00      1.00      1.00        74
           8       1.00      1.00      1.00       319

    accuracy                           1.00    434072
   macro avg       1.00      1.00      1.00    434072
weighted avg       1.00      1.00      1.00    434072

______________________________________________________________________
                                VALIDATION                                 
----------------------------------------------------------------------
              precision    recall  f1-score   support

           1       0.72      0.41      0.53      2027
           2       0.8

              precision    recall  f1-score   support

           1       0.72      0.41      0.52      2027
           2       0.85      0.96      0.90     54746
           3       0.49      0.07      0.12     13184
           4       0.69      0.89      0.78     28386
           5       0.69      0.52      0.59      9275
           6       0.00      0.00      0.00       802
           7       0.00      0.00      0.00        18
           8       0.64      0.09      0.15        80

    accuracy                           0.78    108518
   macro avg       0.51      0.37      0.38    108518
weighted avg       0.74      0.78      0.73    108518

Combination: ('Assembly Day', 'C-2 Day') + ('Medical Fee Region freq',) | Macro F1 Score: 0.3827390480497937
______________________________________________________________________
                                TRAIN                                 
----------------------------------------------------------------------
              precision    

              precision    recall  f1-score   support

           1       1.00      1.00      1.00      8108
           2       1.00      1.00      1.00    218985
           3       1.00      1.00      1.00     52736
           4       1.00      1.00      1.00    113543
           5       1.00      1.00      1.00     37098
           6       1.00      1.00      1.00      3209
           7       1.00      1.00      1.00        74
           8       1.00      1.00      1.00       319

    accuracy                           1.00    434072
   macro avg       1.00      1.00      1.00    434072
weighted avg       1.00      1.00      1.00    434072

______________________________________________________________________
                                VALIDATION                                 
----------------------------------------------------------------------
              precision    recall  f1-score   support

           1       0.73      0.41      0.53      2027
           2       0.8

              precision    recall  f1-score   support

           1       0.72      0.41      0.52      2027
           2       0.85      0.96      0.90     54746
           3       0.50      0.08      0.14     13184
           4       0.70      0.89      0.78     28386
           5       0.71      0.55      0.62      9275
           6       0.00      0.00      0.00       802
           7       0.00      0.00      0.00        18
           8       0.62      0.10      0.17        80

    accuracy                           0.78    108518
   macro avg       0.51      0.37      0.39    108518
weighted avg       0.75      0.78      0.74    108518

Combination: ('C-2 Day', 'IME-4 Count') + ('Medical Fee Region freq',) | Macro F1 Score: 0.392842913482426
______________________________________________________________________
                                TRAIN                                 
----------------------------------------------------------------------
              precision    re

              precision    recall  f1-score   support

           1       1.00      0.99      0.99      8108
           2       1.00      1.00      1.00    218985
           3       1.00      1.00      1.00     52736
           4       1.00      1.00      1.00    113543
           5       1.00      1.00      1.00     37098
           6       1.00      1.00      1.00      3209
           7       1.00      1.00      1.00        74
           8       1.00      0.99      1.00       319

    accuracy                           1.00    434072
   macro avg       1.00      1.00      1.00    434072
weighted avg       1.00      1.00      1.00    434072

______________________________________________________________________
                                VALIDATION                                 
----------------------------------------------------------------------
              precision    recall  f1-score   support

           1       0.72      0.42      0.53      2027
           2       0.8

              precision    recall  f1-score   support

           1       0.73      0.41      0.53      2027
           2       0.85      0.96      0.90     54746
           3       0.48      0.07      0.12     13184
           4       0.69      0.89      0.78     28386
           5       0.69      0.53      0.60      9275
           6       0.25      0.00      0.00       802
           7       0.00      0.00      0.00        18
           8       0.56      0.06      0.11        80

    accuracy                           0.78    108518
   macro avg       0.53      0.37      0.38    108518
weighted avg       0.74      0.78      0.73    108518

Combination: ('Accident Day', 'Assembly Day') + ('District Name freq', 'Medical Fee Region freq') | Macro F1 Score: 0.3807502745726553
______________________________________________________________________
                                TRAIN                                 
----------------------------------------------------------------------
 

______________________________________________________________________
                                TRAIN                                 
----------------------------------------------------------------------
              precision    recall  f1-score   support

           1       1.00      1.00      1.00      8108
           2       1.00      1.00      1.00    218985
           3       1.00      1.00      1.00     52736
           4       1.00      1.00      1.00    113543
           5       1.00      1.00      1.00     37098
           6       1.00      1.00      1.00      3209
           7       1.00      1.00      1.00        74
           8       1.00      1.00      1.00       319

    accuracy                           1.00    434072
   macro avg       1.00      1.00      1.00    434072
weighted avg       1.00      1.00      1.00    434072

______________________________________________________________________
                                VALIDATION                       

              precision    recall  f1-score   support

           1       0.73      0.42      0.53      2027
           2       0.85      0.96      0.90     54746
           3       0.48      0.09      0.15     13184
           4       0.70      0.89      0.79     28386
           5       0.72      0.57      0.63      9275
           6       0.75      0.00      0.01       802
           7       0.00      0.00      0.00        18
           8       0.71      0.12      0.21        80

    accuracy                           0.78    108518
   macro avg       0.62      0.38      0.40    108518
weighted avg       0.75      0.78      0.74    108518

Combination: ('Accident Day', 'IME-4 Count Log') + ('District Name freq', 'Medical Fee Region freq') | Macro F1 Score: 0.4022968204435338
______________________________________________________________________
                                TRAIN                                 
---------------------------------------------------------------------

______________________________________________________________________
                                TRAIN                                 
----------------------------------------------------------------------
              precision    recall  f1-score   support

           1       1.00      1.00      1.00      8108
           2       1.00      1.00      1.00    218985
           3       1.00      1.00      1.00     52736
           4       1.00      1.00      1.00    113543
           5       1.00      1.00      1.00     37098
           6       1.00      1.00      1.00      3209
           7       1.00      1.00      1.00        74
           8       1.00      1.00      1.00       319

    accuracy                           1.00    434072
   macro avg       1.00      1.00      1.00    434072
weighted avg       1.00      1.00      1.00    434072

______________________________________________________________________
                                VALIDATION                       

              precision    recall  f1-score   support

           1       0.72      0.42      0.53      2027
           2       0.85      0.96      0.90     54746
           3       0.50      0.09      0.15     13184
           4       0.70      0.89      0.79     28386
           5       0.71      0.56      0.63      9275
           6       0.00      0.00      0.00       802
           7       0.00      0.00      0.00        18
           8       0.67      0.10      0.17        80

    accuracy                           0.78    108518
   macro avg       0.52      0.38      0.40    108518
weighted avg       0.75      0.78      0.74    108518

Combination: ('Assembly Day', 'IME-4 Count Log') + ('District Name freq', 'Medical Fee Region freq') | Macro F1 Score: 0.3958411565779913
______________________________________________________________________
                                TRAIN                                 
---------------------------------------------------------------------

______________________________________________________________________
                                TRAIN                                 
----------------------------------------------------------------------
              precision    recall  f1-score   support

           1       1.00      1.00      1.00      8108
           2       1.00      1.00      1.00    218985
           3       1.00      1.00      1.00     52736
           4       1.00      1.00      1.00    113543
           5       1.00      1.00      1.00     37098
           6       1.00      1.00      1.00      3209
           7       1.00      1.00      1.00        74
           8       1.00      1.00      1.00       319

    accuracy                           1.00    434072
   macro avg       1.00      1.00      1.00    434072
weighted avg       1.00      1.00      1.00    434072

______________________________________________________________________
                                VALIDATION                       

In [None]:
# Find the feature combination with the highest macro F1 score from the 'results' dictionary.
best_features = max(results, key=results.get)  # 'max' returns the key (feature combination) with the highest value (macro F1 score) from the results dictionary.


# Print the best feature combination and its corresponding macro F1 score.
print("Best feature combination:", best_features)
print("Best macro F1 score:", results[best_features])

In [None]:
import play_song as song
song.play_('audio.mp3')

**Gradient Boosting Classifier**

In [None]:
# Initialize the GradientBoostingClassifier model and fit it to the training data
gb = GradientBoostingClassifier()
gb.fit(X_train, y_train)

In [None]:
# Predict the target values for the training dataset using the GradientBoostingClassifier model
train_pred_gb = gb.predict(X_train)

# Predict the target values for the validation dataset using the GradientBoostingClassifier model
val_pred_gb = gb.predict(X_val)

In [None]:
# Evaluate the performance of the GradientBoostingClassifier model using the custom metrics function for both training and validation predictions
m.metrics(y_train, train_pred_gb, y_val, val_pred_gb)

______________________________________________________________________
                                TRAIN                                 
----------------------------------------------------------------------
              precision    recall  f1-score   support

           1       0.72      0.47      0.57      9981
           2       0.85      0.96      0.90    232862
           3       0.50      0.06      0.11     55125
           4       0.69      0.88      0.77    118806
           5       0.70      0.56      0.62     38624
           6       0.67      0.00      0.01      3369
           7       0.77      0.22      0.34        77
           8       0.23      0.09      0.13       376

    accuracy                           0.78    459220
   macro avg       0.64      0.40      0.43    459220
weighted avg       0.75      0.78      0.74    459220

______________________________________________________________________
                                VALIDATION                       

**AdaBoost Classifier**

In [None]:
# Initialize the AdaBoostClassifier model and fit it to the training data
ab = AdaBoostClassifier()
ab.fit(X_train, y_train)

In [None]:
# Predict the target values for the training dataset using the AdaBoostClassifier model
train_pred_ab = ab.predict(X_train)

# Predict the target values for the validation dataset using the AdaBoostClassifier model
val_pred_ab = ab.predict(X_val)

In [None]:
# Evaluate the performance of the AdaBoostClassifier model using the custom metrics function for both training and validation predictions
m.metrics(y_train, train_pred_ab, y_val, val_pred_ab)

______________________________________________________________________
                                TRAIN                                 
----------------------------------------------------------------------
              precision    recall  f1-score   support

           1       0.19      0.57      0.29      9981
           2       0.84      0.88      0.86    232862
           3       0.50      0.02      0.04     55125
           4       0.64      0.86      0.74    118806
           5       0.56      0.35      0.43     38624
           6       0.00      0.00      0.00      3369
           7       0.00      0.00      0.00        77
           8       0.12      0.63      0.21       376

    accuracy                           0.71    459220
   macro avg       0.36      0.41      0.32    459220
weighted avg       0.71      0.71      0.67    459220

______________________________________________________________________
                                VALIDATION                       

**SVC**

In [None]:
# Initialize the Support Vector Classifier (SVC) model and fit it to the training data
svc = SVC()
svc.fit(X_train, y_train)

In [None]:
# Predict the target values for the training dataset using the Support Vector Classifier (SVC) model
train_pred_svc = svc.predict(X_train)

# Predict the target values for the validation dataset using the Support Vector Classifier (SVC) model
val_pred_svc = svc.predict(X_val)

In [None]:
# Evaluate the performance of the Support Vector Classifier (SVC) model using the custom metrics function for both training and validation predictions
m.metrics(y_train, train_pred_svc, y_val, val_pred_svc)

**Gaussian Process Classifier**

In [None]:
## kills the kernel

In [None]:
# gau_p = GaussianProcessClassifier()
# gau_p.fit(X_train, y_train)

In [None]:
# train_pred_gau_p = gau_p.predict(X_train)
# val_pred_gau_p = gau_p.predict(X_val)

In [None]:
# metrics(y_train, train_pred_gau_p , y_val, val_pred_gau_p)

**Linear Discriminant Analysis**

In [None]:
# Initialize the Linear Discriminant Analysis (LDA) model and fit it to the training data
lda = LinearDiscriminantAnalysis()
lda.fit(X_train, y_train)

In [None]:
# Predict the target values for the training dataset using the Linear Discriminant Analysis (LDA) model
train_pred_lda = lda.predict(X_train)

# Predict the target values for the validation dataset using the Linear Discriminant Analysis (LDA) model
val_pred_lda = lda.predict(X_val)

In [None]:
# Evaluate the performance of the Linear Discriminant Analysis (LDA) model using the custom metrics function for both training and validation predictions
m.metrics(y_train, train_pred_lda, y_val, val_pred_lda)

______________________________________________________________________
                                TRAIN                                 
----------------------------------------------------------------------
              precision    recall  f1-score   support

           1       0.41      0.39      0.40      9981
           2       0.69      0.93      0.79    232862
           3       0.29      0.02      0.03     55125
           4       0.55      0.44      0.49    118806
           5       0.48      0.38      0.43     38624
           6       0.08      0.15      0.11      3369
           7       0.00      0.00      0.00        77
           8       0.09      0.59      0.15       376

    accuracy                           0.63    459220
   macro avg       0.32      0.36      0.30    459220
weighted avg       0.58      0.63      0.58    459220

______________________________________________________________________
                                VALIDATION                       

**Quadratic Discriminant Analysis**

In [None]:
# Initialize the Quadratic Discriminant Analysis (QDA) model and fit it to the training data
qda = QuadraticDiscriminantAnalysis()
qda.fit(X_train, y_train)

In [None]:
# Predict the target values for the training dataset using the Quadratic Discriminant Analysis (QDA) model
train_pred_qda = qda.predict(X_train)

# Predict the target values for the validation dataset using the Quadratic Discriminant Analysis (QDA) model
val_pred_qda = qda.predict(X_val)

In [None]:
# Evaluate the performance of the Quadratic Discriminant Analysis (QDA) model using the custom metrics function for both training and validation predictions
m.metrics(y_train, train_pred_qda, y_val, val_pred_qda)

______________________________________________________________________
                                TRAIN                                 
----------------------------------------------------------------------
              precision    recall  f1-score   support

           1       0.21      0.61      0.31      9981
           2       0.73      0.88      0.80    232862
           3       0.23      0.08      0.12     55125
           4       0.56      0.05      0.10    118806
           5       0.53      0.35      0.42     38624
           6       0.04      0.40      0.07      3369
           7       0.00      0.92      0.00        77
           8       0.02      0.94      0.03       376

    accuracy                           0.52    459220
   macro avg       0.29      0.53      0.23    459220
weighted avg       0.59      0.52      0.49    459220

______________________________________________________________________
                                VALIDATION                       

**XGB Classifier**

In [13]:
# xgb = XGBClassifier()
# xgb.fit(X_train, y_train)

In [None]:
# train_pred_xgb = xgb.predict(X_train)
# val_pred_xgb = xgb.predict(X_val)

In [None]:
# m.metrics(y_train, train_pred_xgb , y_val, val_pred_xgb)

**LGBM Classifier**

In [None]:
## kills the kernel

In [None]:
# lgbm = LGBMClassifier()
# lgbm.fit(X_train, y_train)

In [None]:
# train_pred_lgbm = lgbm.predict(X_train)
# val_pred_lgbm = lgbm.predict(X_val)

In [None]:
# m.metrics(y_train, train_pred_lgbm , y_val, val_pred_lgbm)

## 2.1 Hyperparameter Tuning
Hyperparameter tuning optimizes pre-set parameters that influence a model’s performance, such as learning rate and regularization. In this section, we will explore techniques like grid search and random search to improve model accuracy and generalization.

<a href="#top">Top &#129033;</a>

In [None]:
# Set up the parameter grid for hyperparameter tuning
param_grid = {
    'n_estimators': [100, 200, 300, 500],            # Number of trees in the forest
    'max_depth': [None, 10, 20, 30, 50],             # Maximum depth of each tree
    'min_samples_split': [2, 5, 10],                 # Minimum number of samples required to split an internal node
    'min_samples_leaf': [1, 2, 4],                   # Minimum number of samples required to be at a leaf node
    'max_features': ['sqrt', 'log2', None],          # Number of features to consider when looking for the best split
    'bootstrap': [True, False]                       # Whether bootstrap samples are used when building trees
}


### estrutura:
# {'nome_do_parametro'; valores_para_o_parametro}

In [None]:
# Define the RandomForestClassifier model
model = RandomForestClassifier()


# Set up RandomizedSearchCV for hyperparameter tuning
random_search = RandomizedSearchCV(
    estimator=model,         # Model to tune
    param_distributions=param_grid,  # Hyperparameter space
    n_iter=100,              # Number of parameter combinations to try
    scoring='f1_macro',      # Metric to evaluate model performance
    cv=5,                    # Number of folds in cross-validation
    verbose=2,               # Level of verbosity
    random_state=42          # Seed for reproducibility
)

In [None]:
# Fit the model using RandomizedSearchCV
random_search.fit(X_train, y_train)

Fitting 5 folds for each of 100 candidates, totalling 500 fits
[CV] END bootstrap=True, max_depth=30, max_features=None, min_samples_leaf=4, min_samples_split=2, n_estimators=500; total time=23.3min
[CV] END bootstrap=True, max_depth=30, max_features=None, min_samples_leaf=4, min_samples_split=2, n_estimators=500; total time=23.8min
[CV] END bootstrap=True, max_depth=30, max_features=None, min_samples_leaf=4, min_samples_split=2, n_estimators=500; total time=33.1min
[CV] END bootstrap=True, max_depth=30, max_features=None, min_samples_leaf=4, min_samples_split=2, n_estimators=500; total time=23.6min
[CV] END bootstrap=True, max_depth=30, max_features=None, min_samples_leaf=4, min_samples_split=2, n_estimators=500; total time=23.5min
[CV] END bootstrap=False, max_depth=None, max_features=sqrt, min_samples_leaf=1, min_samples_split=10, n_estimators=100; total time= 1.9min
[CV] END bootstrap=False, max_depth=None, max_features=sqrt, min_samples_leaf=1, min_samples_split=10, n_estimators=1

[CV] END bootstrap=False, max_depth=None, max_features=log2, min_samples_leaf=1, min_samples_split=5, n_estimators=200; total time= 3.0min
[CV] END bootstrap=False, max_depth=None, max_features=None, min_samples_leaf=2, min_samples_split=10, n_estimators=300; total time=13.8min
[CV] END bootstrap=False, max_depth=None, max_features=None, min_samples_leaf=2, min_samples_split=10, n_estimators=300; total time=13.4min
[CV] END bootstrap=False, max_depth=None, max_features=None, min_samples_leaf=2, min_samples_split=10, n_estimators=300; total time=13.5min
[CV] END bootstrap=False, max_depth=None, max_features=None, min_samples_leaf=2, min_samples_split=10, n_estimators=300; total time=13.6min
[CV] END bootstrap=False, max_depth=None, max_features=None, min_samples_leaf=2, min_samples_split=10, n_estimators=300; total time=13.6min
[CV] END bootstrap=True, max_depth=10, max_features=log2, min_samples_leaf=2, min_samples_split=2, n_estimators=300; total time= 1.3min
[CV] END bootstrap=True, 

[CV] END bootstrap=True, max_depth=30, max_features=log2, min_samples_leaf=4, min_samples_split=5, n_estimators=100; total time= 1.0min


KeyboardInterrupt: 

In [None]:
# Print the best hyperparameters found by RandomizedSearchCV
print("Best parameters found: ", random_search.best_params_)

In [None]:
# Generate predictions for both training and validation datasets using the best model from random search
train_pred_randoms = random_search.predict(X_train)
val_pred_randoms = random_search.predict(X_val)

In [None]:
# Evaluate model performance on training and validation sets 
m.metrics(y_train, train_pred_randoms , y_val, val_pred_randoms)

## 2.2 Combining Models
Combining models, also known as ensemble learning, involves integrating multiple machine learning models to improve predictive performance. In this section, we will explore methods for combining models, evaluate their performance, and demonstrate how ensemble techniques can lead to more reliable and effective predictions compared to single models.

<a href="#top">Top &#129033;</a>

In [71]:
# Initialize a CalibratedClassifierCV with SGDClassifier
calibrated_sgd = CalibratedClassifierCV(SGDClassifier())

# Define a list of estimators for the ensemble model 
estimators = [
    #('sgd', calibrated_sgd),  # SGDClassifier with probability calibration
    ('rf', RandomForestClassifier()),  
    ('dt', DecisionTreeClassifier()),  
    ('gb', GradientBoostingClassifier()),  
   # ('ab', AdaBoostClassifier()),  
]

**Voting Classifier**

In [72]:
# Initialize a VotingClassifier with a list of base estimators and 'soft' voting (weighted average of predicted probabilities)
voting_clf = VotingClassifier(estimators=estimators, voting='soft')

In [73]:
# Train the VotingClassifier on the training data (X_train and y_train)
voting_clf.fit(X_train, y_train)

# ~40 min to run

In [74]:
# Make predictions on the training and validation sets using the trained VotingClassifier
train_pred_voting = voting_clf.predict(X_train)  # Predictions for the training set
val_pred_voting = voting_clf.predict(X_val)  # Predictions for the validation set

In [75]:
# Evaluate the performance of the VotingClassifier on both the training and validation sets using the custom metrics function
m.metrics(y_train, train_pred_voting, y_val, val_pred_voting)

______________________________________________________________________
                                TRAIN                                 
----------------------------------------------------------------------
              precision    recall  f1-score   support

           1       1.00      1.00      1.00      8559
           2       1.00      1.00      1.00    229184
           3       1.00      1.00      1.00     54456
           4       1.00      1.00      1.00    117955
           5       1.00      1.00      1.00     38551
           6       1.00      1.00      1.00      3357
           7       1.00      1.00      1.00        78
           8       1.00      1.00      1.00       362

    accuracy                           1.00    452502
   macro avg       1.00      1.00      1.00    452502
weighted avg       1.00      1.00      1.00    452502

______________________________________________________________________
                                VALIDATION                       

In [None]:
import play_song as song
song.play_('audio.mp3')

**Stacking Classifier**

In [None]:
# Initialize a StackingClassifier with a list of base estimators and a LogisticRegression as the meta-model (final estimator)
stacking_clf = StackingClassifier(
    estimators=estimators,  # List of base models
    final_estimator=LogisticRegression()  # Logistic Regression as the meta-model to combine base model predictions
)

In [None]:
# Train the StackingClassifier on the training data (X_train and y_train)
stacking_clf.fit(X_train, y_train)

# ~3h to run

In [None]:
# Make predictions on the training and validation sets using the trained StackingClassifier
train_pred_stack = stacking_clf.predict(X_train)  # Predictions for the training set
val_pred_stack = stacking_clf.predict(X_val)  # Predictions for the validation set

In [None]:
# Evaluate the model's performance on both the training and validation sets using the custom metrics function
m.metrics(y_train, train_pred_stack, y_val, val_pred_stack)

______________________________________________________________________
                                TRAIN                                 
----------------------------------------------------------------------
              precision    recall  f1-score   support

           1       0.99      0.87      0.93      9981
           2       0.97      1.00      0.98    232862
           3       0.99      0.89      0.93     55125
           4       0.99      1.00      0.99    118806
           5       0.99      0.98      0.99     38624
           6       0.99      0.99      0.99      3369
           7       0.00      0.00      0.00        77
           8       0.98      0.99      0.98       376

    accuracy                           0.98    459220
   macro avg       0.86      0.84      0.85    459220
weighted avg       0.98      0.98      0.98    459220

______________________________________________________________________
                                VALIDATION                       

# 3. Final Predictions
In this section, we will focus on generating and evaluating final predictions and interpreting the results. The goal is to ensure that the predictions made by the model are as accurate and meaningful as possible for the given task.

<a href="#top">Top &#129033;</a>

In [77]:
# Select only the columns from the test dataset that are present in the training dataset to ensure feature consistency
test = test[X_train.columns]

In [78]:
# Predict the 'Claim Injury Type' for the test dataset using the trained model and assign the predictions to the corresponding column
test['Claim Injury Type'] = voting_clf.predict(test)

Map Predictions to Original Values

In [None]:
# Map numeric codes to descriptive labels for the 'Claim Injury Type' column in the test dataset
label_mapping = {
    1: "1. CANCELLED",
    2: "2. NON-COMP",
    3: "3. MED ONLY",
    4: "4. TEMPORARY",
    5: "5. PPD SCH LOSS",
    6: "6. PPD NSL",
    7: "7. PTD",
    8: "8. DEATH"
}

# Replace numeric values in 'Claim Injury Type' with the corresponding labels from the mapping
test['Claim Injury Type'] = test['Claim Injury Type'].replace(label_mapping)

Check each category inside the target

In [None]:
# Count unique values in column 'Claim Injury Type'
test['Claim Injury Type'].value_counts() 

Claim Injury Type
2. NON-COMP        299163
4. TEMPORARY        46014
3. MED ONLY         29315
1. CANCELLED         6978
5. PPD SCH LOSS      6191
6. PPD NSL            218
8. DEATH               90
7. PTD                  6
Name: count, dtype: int64

# 4. Export
Exporting machine learning models is a crucial step in the model development lifecycle, enabling the reuse, and integration of trained models into real-world applications. After training and evaluating a model, it is essential to save it in a suitable format for future use. This allows the model to be efficiently loaded and applied to new datasets or incorporated into production systems without the need for retraining. 

<a href="#top">Top &#129033;</a>

**Select Columns for predictions**

In [81]:
# Extract the target variable 'Claim Injury Type' from the test dataset for prediction
predictions = test['Claim Injury Type']

**Export**

In [82]:
# Assign a descriptive name for easy reference
name = 'voting_clf_all_scaled_3'

In [83]:
# Save the predictions to a CSV file.
predictions.to_csv(f'./predictions/{name}.csv')

__*<center>Models*__ 
    
| Model | Feature Selection | Parameters | Kaggle Score |
| ----- | ----------------- | ---------- | -------------|
| Voting (sgd_rf_dt_gb_ab)  | 2 | - | 0.37300 |
| Stacking (sgd_rf_dt_gb_ab) | 2 | - | 0.40255 |
| RF (agedrop_ime4drop_birthyear_drop_ime4log)| 3 | - | 0.41072 |
| Voting (sgd_rf_dt_gb_ab),  (agedrop_ime4drop_birthyear_drop_ime4log)| 3 | - | 0.34477 |
| RF (all_scaled_new_encoding_agedrop_ime4drop_ime4log) | 4 | - | 0.39087 |
| RF (all_scaled) | 5 | - | 0.38734 |
| Voting (all_scaled) | 5 | - | 0.37536 |
| RF (all_scaled) | 3 | - | 0.38814 |
| Voting (all_scaled) | 3 | - | 0.31799 |
    
<br><br>
    
    
__*<center>Models K-Fold*__ 

| Model | Feature Selection | Log | Parameters | Kaggle Score | Fold |
| ----- | ------------------ | --- | ---------- | ------------ | ---- |
| LogReg | - | - | -  | 0.21122 | 5 |
| RF | 1 | X | - | 0.29078 | 5 |
| XGB | 1 | X | - | 0.20642 | 10 |
| RF | - | - | - | 0.26616 | 5 |
    
<br><br>
    
__*<center>Models w/ Stratified K-Fold*__   
    
| Model | Feature Selection | Log | Parameters | Kaggle Score | Fold | 
| ----- | ------------------ | --- | ---------- | ------------ | ---- |
| RF | - | - | - | 0.26912 | 10 |
| DT | - | - | - | 0.14236 | 10 |
| DT | - | X | - | 0.15589 | 10 |

<br><br>
    
**Features for Feature Selection 1**

['C-2 Day', 'Accident Year', 'Birth Year', 'Assembly Month',
            'C-2 Month', 'Average Weekly Wage', 'Age at Injury', 
            'C-2 Year', 'Number of Dependents', 'Accident Day', 
            'Assembly Year', 'First Hearing Year', 'IME-4 Count', 
            'Assembly Day', 'Accident Month', 
            'WCIO Cause of Injury Code', 'Gender', 
            'COVID-19 Indicator', 'WCIO Part Of Body Code', 
            'County of Injury', 'Attorney/Representative', 
            'Carrier Type', 'District Name', 'Medical Fee Region', 
            'Zip Code', 'Carrier Name', 'C-3 Date Binary', 
            'Alternative Dispute Resolution', 
            'WCIO Nature of Injury Code', 'Industry Code']
    
    
**Features for Feature Selection 2**    
    
['Age at Injury',
 'Average Weekly Wage',
 'Assembly Year',
 'C-2 Month',
 'C-2 Year',
 'First Hearing Year',
 'IME-4 Count Log',
 'Attorney/Representative',
 'Carrier Name',
 'Carrier Name Log',
 'Carrier Type',
 'County of Injury',
 'District Name',
 'Gender',
 'Industry Code',
 'Medical Fee Region',
 'WCIO Cause of Injury Code',
 'WCIO Nature of Injury Code',
 'WCIO Part Of Body Code',
 'C-3 Date Binary']

    
 **Features for Feature Selection 3**       
    
['Age at Injury',
 'Average Weekly Wage',
 'Assembly Year',
 'C-2 Month',
 'C-2 Year',
 'First Hearing Year',
 'IME-4 Count Log',
 'Attorney/Representative',
 'Carrier Name',
 'Carrier Type',
 'County of Injury',
 'District Name',
 'Gender',
 'Industry Code',
 'Medical Fee Region',
 'WCIO Cause of Injury Code',
 'WCIO Nature of Injury Code',
 'WCIO Part Of Body Code',
 'C-3 Date Binary']
    
    
 **Features for Feature Selection 4**   
    
['Age at Injury',
 'Average Weekly Wage',
 'Assembly Year',
 'C-2 Month',
 'C-2 Year',
 'First Hearing Year',
 'IME-4 Count Log',
 'Attorney/Representative',
 'Carrier Name',
 'County of Injury',
 'District Name',
 'Industry Code',
 'Medical Fee Region',
 'WCIO Cause of Injury Code',
 'WCIO Nature of Injury Code',
 'WCIO Part Of Body Code',
 'C-3 Date Binary',
 'Carrier Type_1A. PRIVATE',
 'Carrier Type_2A. SIF',
 'Carrier Type_3A. SELF PUBLIC',
 'Carrier Type_4A. SELF PRIVATE',
 'Carrier Type_5. SPECIAL FUND',
 'Gender_F',
 'Gender_M',
 'Gender_U',
 'Alternative Dispute Resolution',
 'Claim Injury Type']
    
    
**Features for Feature Selection 5**   
    
['Accident Year',
 'Average Weekly Wage',
 'First Hearing Year',
 'C-2 Year',
 'C-2 Month',
 'Birth Year',
 'Assembly Year',
 'IME-4 Count',
 'Assembly Day',
 'Attorney/Representative',
 'C-3 Date Binary',
 'WCIO Nature of Injury Code',
 'Carrier Name',
 'County of Injury',
 'Industry Code',
 'Medical Fee Region',
 'WCIO Cause of Injury Code',
 'WCIO Part Of Body Code']