# <center>IMPACT PROJECT - GESTAMP</center> 
## <center>Defect Detection using Machine Learning</center> 
### <center>Random Search</center>
<center>Group 14</center> 

<img 
    src="https://www.gestamp.com/getattachment/c8d61c0f-e752-4156-8002-97e21ab43a3f/Imag2-2" width="2400" height="1000" align="center"/>

This notebook can be the 4 datasets

## <center>Table of Contents</center>
1. [Split Dataset](#1)
2. [Hyperparameters Tuning: Random search](#2)
3. [Model Training](#3)
4. [Model Testing and Evaluating](#4)


In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import StratifiedKFold, RandomizedSearchCV, train_test_split
from sklearn.metrics import roc_auc_score
import xgboost as xgb

In [2]:
# Load dataset
data = pd.read_csv('/content/drive/Shareddrives/Capstone/EDA_Modelling/binary_strat1_le_ss.csv')


<a id='1'>**Split Dataset**</a>

In [3]:
X = data.drop('Defect', axis=1)
y = data['Defect']

In [4]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)


<a id='2'>**Hyperparameters Tuning: Random search**</a>


**Initialize XGBoost Classifier**

In [5]:
xgb_classifier = xgb.XGBClassifier(random_state=42, n_estimators=100,
    learning_rate=0.09,
    objective='binary:logistic',
    tree_method='gpu_hist',
    seed=0,

)


In [8]:
# initialize hyperparameters
param_grid = {
    'n_estimators': np.arange(50, 100),
    'max_depth': np.arange(3, 10),
    'learning_rate': [0.1, 0.01, 0.001],

}

In [None]:
# use random search with k fold
kfold = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
random_search = RandomizedSearchCV(xgb_classifier, param_distributions=param_grid, scoring='roc_auc', cv=kfold, n_iter=30)
random_search.fit(X_train, y_train)

In [None]:
best_params = random_search.best_params_
best_params

In [None]:
xgb.plot_importance(random_search.best_estimator_)

In [None]:
ax = xgb.plot_importance(model)
ax.set_xlabel('auc')  
plt.show()

<a id='3'>**Model Training**</a>

In [None]:
# train the model with best combination of hyperparameters
xgb_classifier = xgb.XGBClassifier(**best_params, random_state=42)
xgb_classifier.fit(X_train, y_train)


In [None]:
# Getting the feature importances from the model
importance_dict = xgb_classifier.get_booster().get_score(importance_type='weight')

# Creating a DataFrame from the importances
importance_df = pd.DataFrame(list(importance_dict.items()), columns=['Feature', 'Importance'])

# Sorting the DataFrame by importance (descending order)
importance_df = importance_df.sort_values('Importance', ascending=False)

# Printing the table format
print(importance_df)

<a id='4'>**Model Testing and Evaluation**</a>

In [None]:
# predict 
y_pred_prob = xgb_classifier.predict_proba(X_test)[:, 1]
auc = roc_auc_score(y_test, y_pred_prob)
print("AUC: {:.4f}".format(auc))