## <u style="color: blue">7. Gridsearch</u> 

Comme précisé dans le notebook principal, vous trouverez ici les calculs de Gridsearch pour définir le modèle le plus performant pour la prédiction de tweets concernant des catastrophes:
- Decision Tree
- Random Forest
- Linear Regression
- XGBoost
- SVM

In [1]:
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer

from sklearn.linear_model import LogisticRegression
from Decision_Tree import DecisionTree
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn import svm
import xgboost as xgb

In [2]:
df_train = pd.read_csv('./CSV/train_tweets_cleaned.csv')

In [3]:
df_train.head()

Unnamed: 0,id,text,target,transformed_text,stemmed_text
0,1,Deeds Reason ALLAH Forgive,1,"['Deeds', 'Reason', 'ALLAH', 'Forgive']",deed reason allah forgiv
1,4,Forest fire near La Ronge Sask Canada,1,"['Forest', 'fire', 'near', 'La', 'Ronge', 'Sas...",forest fire near la rong sask canada
2,5,residents asked shelter place notified officer...,1,"['residents', 'asked', 'shelter', 'place', 'no...",resid ask shelter place notifi offic evacu she...
3,6,receive evacuation orders California,1,"['receive', 'evacuation', 'orders', 'California']",receiv evacu order california
4,7,sent photo Ruby smoke pours school,1,"['sent', 'photo', 'Ruby', 'smoke', 'pours', 's...",sent photo rubi smoke pour school


In [4]:
vectorizer = TfidfVectorizer(max_features=6678)
X = vectorizer.fit_transform(df_train['stemmed_text'].fillna('')).toarray()
# X_test = vectorizer.transform(df_test['stemmed_text']).toarray()

y = df_train['target']

In [5]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y)

Gridsearch pour le decision tree :

In [6]:
models_params = [
{
    'model': DecisionTreeClassifier(),
    'params': {
    'criterion': ['gini', 'entropy']
    },
},
{
    'model': RandomForestClassifier(),
    'params': {
    'criterion': ['gini', 'entropy']
    },
},
{
    'model': xgb.XGBClassifier(),
    'params': {
    },
},
{
    'model': svm.SVC(),
    'params': {
    'kernel': ['rbf', 'poly', 'sigmoid']
    },
},
{
    'model': LogisticRegression(),
    'params': {
    'solver': ['newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga']
    },
}]

In [7]:
scores = []

for model_param in models_params:
    clf = GridSearchCV(model_param['model'], model_param['params'], n_jobs=-1, verbose=2)
    clf.fit(X_train, y_train)
    scores.append({
        'model': model_param['model'],
        'best_score': clf.best_score_,
        'best_params': clf.best_params_
    })

Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 1 candidates, totalling 5 fits
Fitting 5 folds for each of 3 candidates, totalling 15 fits


In [None]:
# for i in scores:
#     print('model: ', i['model'])
#     print('score: ', i['best_score'])
#     print('parameters: ', i['best_params'])
#     print()

Matrice de confusion pour chaque modèle testé:

In [None]:
# confusion matrix for each models
for i in scores:
    model = i['model']
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    print('model: ', model)
    print('confusion matrix: ')
    print(confusion_matrix(y_test, y_pred))
    print('classification report: ')
    print(classification_report(y_test, y_pred))
    print('accuracy: ', accuracy_score(y_test, y_pred))
    print()

Le meilleur modèle est le SVM avec un score de 0.77 et les paramètres suivants : {'kernel': 'rbf'}

In [None]:
model = svm.SVC()
params = {
    'kernel': ['rbf', 'poly', 'sigmoid'],
    'C': [0.1, 1, 100, 1000],
    'gamma': [0.1, 0.01, 0.001]
}

models_scores = []

clf = GridSearchCV(model, params, cv=5, n_jobs=-1, verbose=2)
clf.fit(X_train, y_train)
models_scores.append({
    'model': model,
    'best_score': clf.best_score_,
    'best_params': clf.best_params_
})

In [None]:
for i in models_scores:
    print('model: ', i['model'])
    print('score: ', i['best_score'])
    print('parameters: ', i['best_params'])
    print()

Maintenant que nous en avons terminé avec la partie du Gridsearch, nous pouvons retourner sur le premier notebook afin de mettre en application les résultats obtenus.

## <u style="color: blue">[8. Evaluation des modèles](./Eye_Of_Emergency.ipynb#8.-Evaluation-des-modèles)</u>

In [None]:
model = svm.SVC(kernel='rbf', C=100, gamma=0.01)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

print(classification_report(y_test, y_pred))

In [None]:
X = df_train['stemmed_text']
y = df_train['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y)

In [None]:
model = svm.SVC(kernel='rbf', C=100, gamma=0.01)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

print(classification_report(y_test, y_pred))