# SIIM-ISIC : An Ensemble Beginner's Approach

> This notebook consists of a simple approach to solve SIIM ISIC Melanoma Classification Problem. <br>
> Using ensemble of my main model submission and a simple XGBoost model to improve my final blend model's performance!<br>
> NOTE: I have achieved nearly **94.68%** score using this strategy and which gave me a good intial push for the competition on the Leaderboard.

In [None]:
import numpy as np
import pandas as pd

import xgboost as xgb
from sklearn.metrics import accuracy_score
from sklearn.datasets import load_iris

In [None]:
train= pd.read_csv('../input/siim-isic-melanoma-classification/train.csv')
test= pd.read_csv('../input/siim-isic-melanoma-classification/test.csv')

In [None]:
train.head()

In [None]:
train.target.value_counts()

> It is clear that the data is highly imbalanced and skewed towards the 0 target class

### Handling missing values in Train and Test Datasets 

In [None]:
train['sex'] = train['sex'].fillna('na')
train['age_approx'] = train['age_approx'].fillna(0)
train['anatom_site_general_challenge'] = train['anatom_site_general_challenge'].fillna('na')

In [None]:
test['sex'] = test['sex'].fillna('na')
test['age_approx'] = test['age_approx'].fillna(0)
test['anatom_site_general_challenge'] = test['anatom_site_general_challenge'].fillna('na')

### Feature Engineering

In [None]:
train['sex'] = train['sex'].astype("category").cat.codes +1
train['anatom_site_general_challenge'] = train['anatom_site_general_challenge'].astype("category").cat.codes +1
train.head()

In [None]:
test['sex'] = test['sex'].astype("category").cat.codes +1
test['anatom_site_general_challenge'] = test['anatom_site_general_challenge'].astype("category").cat.codes +1
test.head()

### Data Manipulation for training and validation

In [None]:
x_train = train[['sex', 'age_approx','anatom_site_general_challenge']]
y_train = train['target']

In [None]:
x_test = test[['sex', 'age_approx','anatom_site_general_challenge']]

In [None]:
train_DMatrix = xgb.DMatrix(x_train, label= y_train)
test_DMatrix = xgb.DMatrix(x_test)

### Building a simple XGBoost model for training 
(Hyperparameter tuning already done and model best values used)

In [None]:
xgb_model = xgb.XGBClassifier(n_estimators=2000, 
                        max_depth=8, 
                        objective='multi:softprob',
                        seed=0,  
                        nthread=-1, 
                        learning_rate=0.15, 
                        num_class = 2, 
                        scale_pos_weight = (32542/584))

In [None]:
xgb_model.fit(x_train, y_train)

In [None]:
xgb_pred_result = xgb_model.predict_proba(x_test)[:,1]
print(xgb_pred_result)

In [None]:
xgb_df = pd.DataFrame({
        "image_name": test["image_name"],
        "target": xgb_pred_result
    })

xgb_df.to_csv('tuned_XGBClassifier_submission.csv', index=False)

### Loading older submission files

In [None]:
main_submission = pd.read_csv('../input/my-siim-isic-submissions/my_siim_isic_main_submission.csv')
efficient_b7 = pd.read_csv('../input/my-siim-isic-submissions/EfficientNetB7_submission.csv')
efficient_b7_blend_6 = pd.read_csv('../input/my-siim-isic-submissions/EfficientNetB7_submission_Blend_6.csv')
model_blend_0_6 = pd.read_csv('../input/my-siim-isic-submissions/submission_models_blended_0-6.csv')

In [None]:
final_target =  main_submission.target *0.85 + efficient_b7.target *0.05 + xgb_df.target *0.10

In [None]:
result = pd.DataFrame({
        "image_name": test["image_name"],
        "target": final_target
    })

result.to_csv('final_submission_blend.csv', index=False)

> That is a simple approach to ensemble blend my main submission with a simple XGBoost Model. 
> This has improved my final accuracy to some good extent and performed much better after hyperparameter tuning. 

Read my other notebooks at:
https://www.kaggle.com/blurredmachine/notebooks<br>
Competition Link:
https://www.kaggle.com/c/siim-isic-melanoma-classification

I hope you like this approach and it might be useful for someone to get a good score in competition.<br>
I am continuously working on this notebook to keep it updated with new features and easy approaches for beginners to understand the concepts easily.


### Consider upvoting if it was helpful! 😃