# Table of Contents

<a class="anchor" id="top"></a>

** ** 

1. [Importing Libraries & Data](#1.-Importing-Libraries-&-Data) <br><br>

2. [Modelling](#2.-Modelling)

   2.1 [Hyperparameter Tuning](#2.1-Hyperparameter-Tuning) <br>
   
   2.2 [Combining Models](#2.2-Combining-Models) <br><br>
   
3. [Final Predictions](#3.-Final-Predictions) <br><br>

4. [Export](#4.-Export) <br>


# 1. Importing Libraries & Data

In [36]:
# Libraries for Data Manipulation
import pandas as pd

# Importing classification models
from sklearn.linear_model import LogisticRegression, SGDClassifier  # Linear models
from sklearn.tree import DecisionTreeClassifier  
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier, AdaBoostClassifier, StackingClassifier, VotingClassifier  # Ensemble classifiers
from sklearn.calibration import CalibratedClassifierCV
from sklearn.svm import SVC 
from sklearn.model_selection import RandomizedSearchCV 
from sklearn.gaussian_process import GaussianProcessClassifier  
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis, QuadraticDiscriminantAnalysis  # Discriminant analysis models
from xgboost import XGBClassifier  
from lightgbm import LGBMClassifier  
# from catboost import CatBoostClassifier

# Importing custom metrics module
import metrics as m

# Display Settings
pd.set_option('display.max_columns', None)

# Suppress warnings for cleaner output
import warnings
warnings.filterwarnings("ignore")

**Import Data**

In [33]:
# Load training and validation datasets for features and target variables

# Features for training and validation
X_train = pd.read_csv('./project_data/x_train_all_feat.csv', index_col='Claim Identifier')
X_val = pd.read_csv('./project_data/x_val_all_feat.csv', index_col='Claim Identifier')

# Target variables for training and validation
y_train = pd.read_csv('./project_data/y_train.csv', index_col='Claim Identifier')
y_val = pd.read_csv('./project_data/y_val.csv', index_col='Claim Identifier')

In [49]:
# Load the test dataset with the 'Claim Identifier' as the index
test = pd.read_csv('./project_data/test_all_feat.csv', index_col='Claim Identifier')

In [38]:
features = ['Age at Injury', 'Average Weekly Wage', 'Assembly Year', 'C-2 Month', 'C-2 Year', 
            'First Hearing Year', 'IME-4 Count Log', 'Attorney/Representative', 'Carrier Name', 
            'Carrier Type freq', 'County of Injury', 'District Name freq', 'Gender Enc', 'Industry Code', 
            'Medical Fee Region', 'WCIO Cause of Injury Code', 'WCIO Nature of Injury Code', 
            'WCIO Part Of Body Code', 'C-3 Date Binary']

X_train = X_train[features]
X_val = X_val[features]

# 2. Modelling

<a href="#top">Top &#129033;</a>

In [5]:
# Print the list of column names in the training features dataset
print(X_train.columns.to_list(), '\n')

# Print the number of columns in the training features dataset
print(len(X_train.columns))

['Accident Year', 'Average Weekly Wage', 'First Hearing Year', 'C-2 Year', 'C-2 Month', 'Birth Year', 'Assembly Year', 'IME-4 Count', 'Assembly Day', 'Attorney/Representative', 'C-3 Date Binary', 'WCIO Nature of Injury Code', 'Carrier Name', 'County of Injury', 'Industry Code', 'Medical Fee Region', 'WCIO Cause of Injury Code', 'WCIO Part Of Body Code'] 

18


**Baseline Model**

In [None]:
# Initialize the Logistic Regression model and fit it to the training data
model = LogisticRegression()
model.fit(X_train, y_train)

In [None]:
# Predict the target values for the training dataset
train_pred = model.predict(X_train)

# Predict the target values for the validation dataset
val_pred = model.predict(X_val)

In [None]:
# Evaluate the model's performance using the custom metrics function on both training and validation predictions
metrics(y_train, train_pred, y_val, val_pred)

              precision    recall  f1-score   support

           1       0.00      0.00      0.00      9981
           2       0.72      0.95      0.82    232862
           3       0.21      0.01      0.02     55125
           4       0.53      0.67      0.59    118806
           5       0.22      0.00      0.01     38624
           6       0.00      0.00      0.00      3369
           7       0.00      0.00      0.00        77
           8       0.00      0.00      0.00       376

    accuracy                           0.66    459220
   macro avg       0.21      0.20      0.18    459220
weighted avg       0.55      0.66      0.57    459220

              precision    recall  f1-score   support

           1       0.00      0.00      0.00      2495
           2       0.72      0.95      0.82     58216
           3       0.21      0.01      0.02     13781
           4       0.53      0.66      0.59     29701
           5       0.17      0.00      0.01      9656
           6       0.00 

**SGD Classifier**

In [None]:
# Initialize the SGDClassifier model and fit it to the training data
sgd = SGDClassifier()
sgd.fit(X_train, y_train)

In [None]:
# Predict the target values for the training dataset using the SGDClassifier model
train_pred_sgd = sgd.predict(X_train)

# Predict the target values for the validation dataset using the SGDClassifier model
val_pred_sgd = sgd.predict(X_val)

In [None]:
# Evaluate the performance of the SGDClassifier model using the custom metrics function for both training and validation predictions
m.metrics(y_train, train_pred_sgd, y_val, val_pred_sgd)

______________________________________________________________________
                                TRAIN                                 
----------------------------------------------------------------------
              precision    recall  f1-score   support

           1       0.00      0.00      0.00      9981
           2       0.73      0.99      0.84    232862
           3       0.12      0.00      0.00     55125
           4       0.59      0.73      0.65    118806
           5       0.22      0.00      0.00     38624
           6       0.00      0.00      0.00      3369
           7       0.00      0.00      0.00        77
           8       0.00      0.00      0.00       376

    accuracy                           0.69    459220
   macro avg       0.21      0.21      0.19    459220
weighted avg       0.56      0.69      0.60    459220

______________________________________________________________________
                                VALIDATION                       

**Decision Tree Classifier**

In [6]:
# Initialize the DecisionTreeClassifier model and fit it to the training data
dt = DecisionTreeClassifier()
dt.fit(X_train, y_train)

In [7]:
# Predict the target values for the training dataset using the DecisionTreeClassifier model
train_pred_dt = dt.predict(X_train)

# Predict the target values for the validation dataset using the DecisionTreeClassifier model
val_pred_dt = dt.predict(X_val)

In [8]:
# Evaluate the performance of the DecisionTreeClassifier model using the custom metrics function for both training and validation predictions
m.metrics(y_train, train_pred_dt, y_val, val_pred_dt)

______________________________________________________________________
                                TRAIN                                 
----------------------------------------------------------------------
              precision    recall  f1-score   support

           1       1.00      1.00      1.00      8559
           2       1.00      1.00      1.00    229184
           3       1.00      1.00      1.00     54456
           4       1.00      1.00      1.00    117955
           5       1.00      1.00      1.00     38551
           6       1.00      1.00      1.00      3357
           7       1.00      1.00      1.00        78
           8       1.00      1.00      1.00       362

    accuracy                           1.00    452502
   macro avg       1.00      1.00      1.00    452502
weighted avg       1.00      1.00      1.00    452502

______________________________________________________________________
                                VALIDATION                       

**Random Forest Classifier**

In [39]:
# Initialize the RandomForestClassifier model and fit it to the training data
rf = RandomForestClassifier()
rf.fit(X_train, y_train)

In [40]:
# Predict the target values for the training dataset using the RandomForestClassifier model
train_pred_rf = rf.predict(X_train)

# Predict the target values for the validation dataset using the RandomForestClassifier model
val_pred_rf = rf.predict(X_val)

In [41]:
# Evaluate the performance of the RandomForestClassifier model using the custom metrics function for both training and validation predictions
m.metrics(y_train, train_pred_rf, y_val, val_pred_rf)

______________________________________________________________________
                                TRAIN                                 
----------------------------------------------------------------------
              precision    recall  f1-score   support

           1       1.00      0.99      1.00      8559
           2       1.00      1.00      1.00    229184
           3       1.00      1.00      1.00     54456
           4       1.00      1.00      1.00    117955
           5       1.00      1.00      1.00     38551
           6       1.00      1.00      1.00      3357
           7       1.00      1.00      1.00        78
           8       1.00      1.00      1.00       362

    accuracy                           1.00    452502
   macro avg       1.00      1.00      1.00    452502
weighted avg       1.00      1.00      1.00    452502

______________________________________________________________________
                                VALIDATION                       

**Gradient Boosting Classifier**

In [None]:
# Initialize the GradientBoostingClassifier model and fit it to the training data
gb = GradientBoostingClassifier()
gb.fit(X_train, y_train)

In [None]:
# Predict the target values for the training dataset using the GradientBoostingClassifier model
train_pred_gb = gb.predict(X_train)

# Predict the target values for the validation dataset using the GradientBoostingClassifier model
val_pred_gb = gb.predict(X_val)

In [None]:
# Evaluate the performance of the GradientBoostingClassifier model using the custom metrics function for both training and validation predictions
m.metrics(y_train, train_pred_gb, y_val, val_pred_gb)

______________________________________________________________________
                                TRAIN                                 
----------------------------------------------------------------------
              precision    recall  f1-score   support

           1       0.72      0.47      0.57      9981
           2       0.85      0.96      0.90    232862
           3       0.50      0.06      0.11     55125
           4       0.69      0.88      0.77    118806
           5       0.70      0.56      0.62     38624
           6       0.67      0.00      0.01      3369
           7       0.77      0.22      0.34        77
           8       0.23      0.09      0.13       376

    accuracy                           0.78    459220
   macro avg       0.64      0.40      0.43    459220
weighted avg       0.75      0.78      0.74    459220

______________________________________________________________________
                                VALIDATION                       

**AdaBoost Classifier**

In [None]:
# Initialize the AdaBoostClassifier model and fit it to the training data
ab = AdaBoostClassifier()
ab.fit(X_train, y_train)

In [None]:
# Predict the target values for the training dataset using the AdaBoostClassifier model
train_pred_ab = ab.predict(X_train)

# Predict the target values for the validation dataset using the AdaBoostClassifier model
val_pred_ab = ab.predict(X_val)

In [None]:
# Evaluate the performance of the AdaBoostClassifier model using the custom metrics function for both training and validation predictions
m.metrics(y_train, train_pred_ab, y_val, val_pred_ab)

______________________________________________________________________
                                TRAIN                                 
----------------------------------------------------------------------
              precision    recall  f1-score   support

           1       0.19      0.57      0.29      9981
           2       0.84      0.88      0.86    232862
           3       0.50      0.02      0.04     55125
           4       0.64      0.86      0.74    118806
           5       0.56      0.35      0.43     38624
           6       0.00      0.00      0.00      3369
           7       0.00      0.00      0.00        77
           8       0.12      0.63      0.21       376

    accuracy                           0.71    459220
   macro avg       0.36      0.41      0.32    459220
weighted avg       0.71      0.71      0.67    459220

______________________________________________________________________
                                VALIDATION                       

**SVC**

In [None]:
# Initialize the Support Vector Classifier (SVC) model and fit it to the training data
svc = SVC()
svc.fit(X_train, y_train)

In [None]:
# Predict the target values for the training dataset using the Support Vector Classifier (SVC) model
train_pred_svc = svc.predict(X_train)

# Predict the target values for the validation dataset using the Support Vector Classifier (SVC) model
val_pred_svc = svc.predict(X_val)

In [None]:
# Evaluate the performance of the Support Vector Classifier (SVC) model using the custom metrics function for both training and validation predictions
m.metrics(y_train, train_pred_svc, y_val, val_pred_svc)

**Gaussian Process Classifier**

In [None]:
## kills the kernel

In [None]:
# gau_p = GaussianProcessClassifier()
# gau_p.fit(X_train, y_train)

In [None]:
# train_pred_gau_p = gau_p.predict(X_train)
# val_pred_gau_p = gau_p.predict(X_val)

In [None]:
# metrics(y_train, train_pred_gau_p , y_val, val_pred_gau_p)

**Linear Discriminant Analysis**

In [None]:
# Initialize the Linear Discriminant Analysis (LDA) model and fit it to the training data
lda = LinearDiscriminantAnalysis()
lda.fit(X_train, y_train)

In [None]:
# Predict the target values for the training dataset using the Linear Discriminant Analysis (LDA) model
train_pred_lda = lda.predict(X_train)

# Predict the target values for the validation dataset using the Linear Discriminant Analysis (LDA) model
val_pred_lda = lda.predict(X_val)

In [None]:
# Evaluate the performance of the Linear Discriminant Analysis (LDA) model using the custom metrics function for both training and validation predictions
m.metrics(y_train, train_pred_lda, y_val, val_pred_lda)

______________________________________________________________________
                                TRAIN                                 
----------------------------------------------------------------------
              precision    recall  f1-score   support

           1       0.41      0.39      0.40      9981
           2       0.69      0.93      0.79    232862
           3       0.29      0.02      0.03     55125
           4       0.55      0.44      0.49    118806
           5       0.48      0.38      0.43     38624
           6       0.08      0.15      0.11      3369
           7       0.00      0.00      0.00        77
           8       0.09      0.59      0.15       376

    accuracy                           0.63    459220
   macro avg       0.32      0.36      0.30    459220
weighted avg       0.58      0.63      0.58    459220

______________________________________________________________________
                                VALIDATION                       

**Quadratic Discriminant Analysis**

In [None]:
# Initialize the Quadratic Discriminant Analysis (QDA) model and fit it to the training data
qda = QuadraticDiscriminantAnalysis()
qda.fit(X_train, y_train)

In [None]:
# Predict the target values for the training dataset using the Quadratic Discriminant Analysis (QDA) model
train_pred_qda = qda.predict(X_train)

# Predict the target values for the validation dataset using the Quadratic Discriminant Analysis (QDA) model
val_pred_qda = qda.predict(X_val)

In [None]:
# Evaluate the performance of the Quadratic Discriminant Analysis (QDA) model using the custom metrics function for both training and validation predictions
m.metrics(y_train, train_pred_qda, y_val, val_pred_qda)

______________________________________________________________________
                                TRAIN                                 
----------------------------------------------------------------------
              precision    recall  f1-score   support

           1       0.21      0.61      0.31      9981
           2       0.73      0.88      0.80    232862
           3       0.23      0.08      0.12     55125
           4       0.56      0.05      0.10    118806
           5       0.53      0.35      0.42     38624
           6       0.04      0.40      0.07      3369
           7       0.00      0.92      0.00        77
           8       0.02      0.94      0.03       376

    accuracy                           0.52    459220
   macro avg       0.29      0.53      0.23    459220
weighted avg       0.59      0.52      0.49    459220

______________________________________________________________________
                                VALIDATION                       

**XGB Classifier**

In [13]:
# xgb = XGBClassifier()
# xgb.fit(X_train, y_train)

In [None]:
# train_pred_xgb = xgb.predict(X_train)
# val_pred_xgb = xgb.predict(X_val)

In [None]:
# m.metrics(y_train, train_pred_xgb , y_val, val_pred_xgb)

**LGBM Classifier**

In [None]:
## kills the kernel

In [None]:
# lgbm = LGBMClassifier()
# lgbm.fit(X_train, y_train)

In [None]:
# train_pred_lgbm = lgbm.predict(X_train)
# val_pred_lgbm = lgbm.predict(X_val)

In [None]:
# m.metrics(y_train, train_pred_lgbm , y_val, val_pred_lgbm)

## 2.1 Hyperparameter Tuning

<a href="#top">Top &#129033;</a>

In [None]:
param_dist = {
    'n_estimators': np.arange(50, 501, 50),          
    'max_depth': [None] + list(np.arange(10, 111, 10)),  
    'min_samples_split': [2, 5, 10, 20],             
    'min_samples_leaf': [1, 2, 4, 10],              
    'max_features': ['sqrt', 'log2', None],        
    'bootstrap': [True, False]                      
}


### estrutura:
# {'nome_do_parametro'; valores_para_o_parametro}

In [None]:
model = # modelo que querem usar

random_search = RandomizedSearchCV(
    estimator= model,
    param_distributions=param_dist,
    n_iter=100,               # (quanto maior mais tempo demora)
    scoring='f1_macro',       
    cv=5,                     # número de folds
    verbose=2,
    random_state=42
)

In [None]:
random_search.fit(X_train, y_train)

In [None]:
print("Best parameters found: ", random_search.best_params_)

In [None]:
train_pred_randoms = random_search.predict(X_train)
val_pred_randoms = random_search.predict(X_val)

In [None]:
m.metrics(y_train, train_pred_randoms , y_val, val_pred_randoms)

## 2.2 Combining Models

<a href="#top">Top &#129033;</a>

In [20]:
# Initialize a CalibratedClassifierCV with SGDClassifier
calibrated_sgd = CalibratedClassifierCV(SGDClassifier())

# Define a list of estimators for the ensemble model 
estimators = [
    ('sgd', calibrated_sgd),  # SGDClassifier with probability calibration
    ('rf', RandomForestClassifier()),  
    ('dt', DecisionTreeClassifier()),  
    ('gb', GradientBoostingClassifier()),  
    ('ab', AdaBoostClassifier()),  
]

**Voting Classifier**

In [21]:
# Initialize a VotingClassifier with a list of base estimators and 'soft' voting (weighted average of predicted probabilities)
voting_clf = VotingClassifier(estimators=estimators, voting='soft')

In [22]:
# Train the VotingClassifier on the training data (X_train and y_train)
voting_clf.fit(X_train, y_train)

# ~40 min to run

In [23]:
# Make predictions on the training and validation sets using the trained VotingClassifier
train_pred_voting = voting_clf.predict(X_train)  # Predictions for the training set
val_pred_voting = voting_clf.predict(X_val)  # Predictions for the validation set

In [24]:
# Evaluate the performance of the VotingClassifier on both the training and validation sets using the custom metrics function
m.metrics(y_train, train_pred_voting, y_val, val_pred_voting)

______________________________________________________________________
                                TRAIN                                 
----------------------------------------------------------------------
              precision    recall  f1-score   support

           1       1.00      0.88      0.94      8559
           2       0.99      1.00      1.00    229184
           3       1.00      0.98      0.99     54456
           4       1.00      1.00      1.00    117955
           5       1.00      1.00      1.00     38551
           6       1.00      1.00      1.00      3357
           7       1.00      1.00      1.00        78
           8       1.00      0.99      1.00       362

    accuracy                           1.00    452502
   macro avg       1.00      0.98      0.99    452502
weighted avg       1.00      1.00      0.99    452502

______________________________________________________________________
                                VALIDATION                       

In [25]:
import play_song as song
song.play_('audio.mp3')

Input #0, wav, from '/var/folders/mm/fxsq_1490x9dd2w76tqvt3kr0000gn/T/tmp0q2ytu3n.wav':
  Duration: 00:00:10.00, bitrate: 1536 kb/s
  Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 48000 Hz, 2 channels, s16, 1536 kb/s
   9.96 M-A:  0.000 fd=   0 aq=    0KB vq=    0KB sq=    0B 




**Stacking Classifier**

In [None]:
# Initialize a StackingClassifier with a list of base estimators and a LogisticRegression as the meta-model (final estimator)
stacking_clf = StackingClassifier(
    estimators=estimators,  # List of base models
    final_estimator=LogisticRegression()  # Logistic Regression as the meta-model to combine base model predictions
)

In [None]:
# Train the StackingClassifier on the training data (X_train and y_train)
stacking_clf.fit(X_train, y_train)

# ~3h to run

In [None]:
# Make predictions on the training and validation sets using the trained StackingClassifier
train_pred_stack = stacking_clf.predict(X_train)  # Predictions for the training set
val_pred_stack = stacking_clf.predict(X_val)  # Predictions for the validation set

In [None]:
# Evaluate the model's performance on both the training and validation sets using the custom metrics function
m.metrics(y_train, train_pred_stack, y_val, val_pred_stack)

______________________________________________________________________
                                TRAIN                                 
----------------------------------------------------------------------
              precision    recall  f1-score   support

           1       0.99      0.87      0.93      9981
           2       0.97      1.00      0.98    232862
           3       0.99      0.89      0.93     55125
           4       0.99      1.00      0.99    118806
           5       0.99      0.98      0.99     38624
           6       0.99      0.99      0.99      3369
           7       0.00      0.00      0.00        77
           8       0.98      0.99      0.98       376

    accuracy                           0.98    459220
   macro avg       0.86      0.84      0.85    459220
weighted avg       0.98      0.98      0.98    459220

______________________________________________________________________
                                VALIDATION                       

# 3. Final Predictions

<a href="#top">Top &#129033;</a>

In [50]:
# Select only the columns from the test dataset that are present in the training dataset to ensure feature consistency
test = test[X_train.columns]

In [51]:
# Predict the 'Claim Injury Type' for the test dataset using the trained model and assign the predictions to the corresponding column
test['Claim Injury Type'] = rf.predict(test)

Map Predictions to Original Values

In [52]:
label_mapping = {
    1: "1. CANCELLED",
    2: "2. NON-COMP",
    3: "3. MED ONLY",
    4: "4. TEMPORARY",
    5: "5. PPD SCH LOSS",
    6: "6. PPD NSL",
    7: "7. PTD",
    8: "8. DEATH"
}

test['Claim Injury Type'] = test['Claim Injury Type'].replace(label_mapping)

Check each category inside the target

In [53]:
test['Claim Injury Type'].value_counts() 

Claim Injury Type
2. NON-COMP        300797
4. TEMPORARY        82581
3. MED ONLY          3163
1. CANCELLED         1433
5. PPD SCH LOSS         1
Name: count, dtype: int64

# 4. Export

<a href="#top">Top &#129033;</a>

**Select Columns for predictions**

In [54]:
# Extract the target variable 'Claim Injury Type' from the test dataset for prediction
predictions = test['Claim Injury Type']

**Export**

In [31]:
# Assign a descriptive name for easy reference
name = 'voting_clf_all_scaled'

In [32]:
# Save the predictions to a CSV file.
predictions.to_csv(f'./predictions/{name}.csv')

__*<center>Models*__ 
    
| Model | Feature Selection | Parameters | Kaggle Score |
| ----- | ----------------- | ---------- | -------------|
| Voting (sgd_rf_dt_gb_ab)  | 2 | - | 0.37300 |
| Stacking (sgd_rf_dt_gb_ab) | 2 | - | 0.40255 |
| RF (agedrop_ime4drop_birthyear_drop_ime4log)| 3 | - | 0.41072 |
| Voting (sgd_rf_dt_gb_ab),  (agedrop_ime4drop_birthyear_drop_ime4log)| 3 | - | 0.34477 |
| RF (all_scaled_new_encoding_agedrop_ime4drop_ime4log) | 4 | - | 0.39087 |
| RF (all_scaled) | 5 | - | 0.38734 |
| Voting (all_scaled) | 5 | - | 0.37536 |
| ----- | ----------------- | ---------- | -------------|
| ----- | ----------------- | ---------- | -------------|
    
<br><br>
    
    
__*<center>Models K-Fold*__ 

| Model | Feature Selection | Log | Parameters | Kaggle Score | Fold |
| ----- | ------------------ | --- | ---------- | ------------ | ---- |
| LogReg | - | - | -  | 0.21122 | 5 |
| RF | 1 | X | - | 0.29078 | 5 |
| XGB | 1 | X | - | 0.20642 | 10 |
| RF | - | - | - | 0.26616 | 5 |
    
<br><br>
    
__*<center>Models w/ Stratified K-Fold*__   
    
| Model | Feature Selection | Log | Parameters | Kaggle Score | Fold | 
| ----- | ------------------ | --- | ---------- | ------------ | ---- |
| RF | - | - | - | 0.26912 | 10 |
| DT | - | - | - | 0.14236 | 10 |
| DT | - | X | - | 0.15589 | 10 |

<br><br>
    
**Features for Feature Selection 1**

['C-2 Day', 'Accident Year', 'Birth Year', 'Assembly Month',
            'C-2 Month', 'Average Weekly Wage', 'Age at Injury', 
            'C-2 Year', 'Number of Dependents', 'Accident Day', 
            'Assembly Year', 'First Hearing Year', 'IME-4 Count', 
            'Assembly Day', 'Accident Month', 
            'WCIO Cause of Injury Code', 'Gender', 
            'COVID-19 Indicator', 'WCIO Part Of Body Code', 
            'County of Injury', 'Attorney/Representative', 
            'Carrier Type', 'District Name', 'Medical Fee Region', 
            'Zip Code', 'Carrier Name', 'C-3 Date Binary', 
            'Alternative Dispute Resolution', 
            'WCIO Nature of Injury Code', 'Industry Code']
    
    
**Features for Feature Selection 2**    
    
['Age at Injury',
 'Average Weekly Wage',
 'Assembly Year',
 'C-2 Month',
 'C-2 Year',
 'First Hearing Year',
 'IME-4 Count Log',
 'Attorney/Representative',
 'Carrier Name',
 'Carrier Name Log',
 'Carrier Type',
 'County of Injury',
 'District Name',
 'Gender',
 'Industry Code',
 'Medical Fee Region',
 'WCIO Cause of Injury Code',
 'WCIO Nature of Injury Code',
 'WCIO Part Of Body Code',
 'C-3 Date Binary']

    
 **Features for Feature Selection 3**       
    
['Age at Injury',
 'Average Weekly Wage',
 'Assembly Year',
 'C-2 Month',
 'C-2 Year',
 'First Hearing Year',
 'IME-4 Count Log',
 'Attorney/Representative',
 'Carrier Name',
 'Carrier Type',
 'County of Injury',
 'District Name',
 'Gender',
 'Industry Code',
 'Medical Fee Region',
 'WCIO Cause of Injury Code',
 'WCIO Nature of Injury Code',
 'WCIO Part Of Body Code',
 'C-3 Date Binary']
    
    
 **Features for Feature Selection 4**   
    
['Age at Injury',
 'Average Weekly Wage',
 'Assembly Year',
 'C-2 Month',
 'C-2 Year',
 'First Hearing Year',
 'IME-4 Count Log',
 'Attorney/Representative',
 'Carrier Name',
 'County of Injury',
 'District Name',
 'Industry Code',
 'Medical Fee Region',
 'WCIO Cause of Injury Code',
 'WCIO Nature of Injury Code',
 'WCIO Part Of Body Code',
 'C-3 Date Binary',
 'Carrier Type_1A. PRIVATE',
 'Carrier Type_2A. SIF',
 'Carrier Type_3A. SELF PUBLIC',
 'Carrier Type_4A. SELF PRIVATE',
 'Carrier Type_5. SPECIAL FUND',
 'Gender_F',
 'Gender_M',
 'Gender_U',
 'Alternative Dispute Resolution',
 'Claim Injury Type']
    
    
**Features for Feature Selection 5**   
    
['Accident Year',
 'Average Weekly Wage',
 'First Hearing Year',
 'C-2 Year',
 'C-2 Month',
 'Birth Year',
 'Assembly Year',
 'IME-4 Count',
 'Assembly Day',
 'Attorney/Representative',
 'C-3 Date Binary',
 'WCIO Nature of Injury Code',
 'Carrier Name',
 'County of Injury',
 'Industry Code',
 'Medical Fee Region',
 'WCIO Cause of Injury Code',
 'WCIO Part Of Body Code']