# Model Interpretation

At this point we have selected the SVM as our preferred model to do the predictions. We will now study its behaviour by analyzing misclassified articles. Hopefully this will give us some insights on the way the model is working.

In [168]:
import pickle
import pandas as pd
import numpy as np
import random
import sklearn
from joblib import load

Let's load what we need:

Let's get the predictions on the test set:

In [170]:
# Dataframe
path_df = "C:/Users/asus-pc/Documents/PBA/Tugas Akhir/03. Feature Engineering/Pickles_title/df.pickle"
with open(path_df, 'rb') as data:
    df = pickle.load(data)
    
# X_train
path_X_train = "C:/Users/asus-pc/Documents/PBA/Tugas Akhir/03. Feature Engineering/Pickles_title/X_train.pickle"
with open(path_X_train, 'rb') as data:
    X_train = pickle.load(data)

# X_test
path_X_test = "C:/Users/asus-pc/Documents/PBA/Tugas Akhir/03. Feature Engineering/Pickles_title/X_test.pickle"
with open(path_X_test, 'rb') as data:
    X_test = pickle.load(data)

# y_train
path_y_train = "C:/Users/asus-pc/Documents/PBA/Tugas Akhir/03. Feature Engineering/Pickles_title/y_train.pickle"
with open(path_y_train, 'rb') as data:
    y_train = pickle.load(data)

# y_test
path_y_test = "C:/Users/asus-pc/Documents/PBA/Tugas Akhir/03. Feature Engineering/Pickles_title/y_test.pickle"
with open(path_y_test, 'rb') as data:
    y_test = pickle.load(data)

# features_train
path_features_train = "C:/Users/asus-pc/Documents/PBA/Tugas Akhir/03. Feature Engineering/Pickles_title/features_train.pickle"
with open(path_features_train, 'rb') as data:
    features_train = pickle.load(data)

# labels_train
path_labels_train = "C:/Users/asus-pc/Documents/PBA/Tugas Akhir/03. Feature Engineering/Pickles_title/labels_train.pickle"
with open(path_labels_train, 'rb') as data:
    labels_train = pickle.load(data)

# features_test
path_features_test = "C:/Users/asus-pc/Documents/PBA/Tugas Akhir/03. Feature Engineering/Pickles_title/features_test.pickle"
with open(path_features_test, 'rb') as data:
    features_test = pickle.load(data)

# labels_test
path_labels_test = "C:/Users/asus-pc/Documents/PBA/Tugas Akhir/03. Feature Engineering/Pickles_title/labels_test.pickle"
with open(path_labels_test, 'rb') as data:
    labels_test = pickle.load(data)
    
# SVM Model
path_model = "C:/Users/asus-pc/Documents/PBA/Tugas Akhir/04. Model Training/Models/best_svc.pickle"
with open(path_model, 'rb') as data:
    svc_model = pickle.load(data)

# Category mapping dictionary
category_codes = {
    'notification of information': 0,
    'donation': 1,
    'criticism': 2,
    'hoax': 3,
}

category_names = {
    0: 'notification of information',
    1: 'donation',
    2: 'criticism',
    3: 'hoax'
}

ModuleNotFoundError: No module named 'sklearn.svm.classes'

In [153]:
predictions = svc_model.predict(features_test)

In [136]:
print(pickle.format_version)

4.0


Now we'll create the Test Set dataframe with the actual and predicted categories:

In [107]:
# Indexes of the test set
index_X_test = X_test.index

# We get them from the original df
df_test = df.loc[index_X_test]

# Add the predictions
df_test['prediction'] = predictions

# Clean columns
df_test = df_test[['title', 'label', 'label_code', 'prediction']]

# Decode
df_test['label_predicted'] = df_test['prediction']
df_test = df_test.replace({'label_predicted':category_names})

# Clean columns again
df_test = df_test[['title', 'label', 'label_predicted']]

In [108]:
df_test.head()

Unnamed: 0,title,label,label_predicted
2182,Update Corona Indonesia 24 Oktober 2020 dan Se...,notification of information,notification of information
1279,Pertamina Diminta Lihat Fluktuasi Harga Minyak...,criticism,criticism
1729,UPDATE 17 Januari: Ada 145.482 Kasus Aktif Cov...,notification of information,notification of information
1477,"Sebaran 4.002 Kasus Positif Hari Ini, DKI-Jaba...",donation,donation
1964,UPDATE Corona 31 Maret di 32 Provinsi: Kasus B...,notification of information,notification of information


Let's get the misclassified articles:

In [109]:
condition = (df_test['label'] != df_test['label_predicted'])

df_misclassified = df_test[condition]

len(df_misclassified)
df_misclassified.head(50)

Unnamed: 0,title,label,label_predicted
1454,Satgas Covid-19 dan Polda Jabar Diminta Panggi...,criticism,hoax
1323,Polri Tak Akan Berikan Sanksi Hukum Ataupun De...,donation,hoax
360,Buntut Kasus Kerumunan Massa Rizieq Shihab: Gi...,criticism,hoax
945,Kominfo Catat Ada 2 Ribu Lebih Konten Hoax Ten...,hoax,criticism
1324,Polri Tidak Berikan Izin Keramaian Pelaksanaan...,donation,criticism
157,"Bagikan Tips Cegah Corona, Jokowi: Musuh Terbe...",hoax,criticism
605,FAKTA-FAKTA Pasien Positif Corona di Prabumuli...,hoax,notification of information
1119,Narapidana Sumbangkan Hasil Karyanya untuk Pen...,donation,criticism
15,"2 Warga Indonesia Positif Corona, Berikut Fakt...",hoax,criticism
235,"Berikan Apresiasi, Yurianto Sebut Banyak Masya...",donation,criticism


Let's get a sample of 3 articles. We'll define a function to make this process faster:

In [111]:
def output_article(row_article):
    print('Actual Category: %s' %(row_article['label']))
    print('Predicted Category: %s' %(row_article['label_predicted']))
    print('-------------------------------------------')
    print('Text: ')
    print('%s' %(row_article['title']))

We'll get three random numbers from the indexes:

In [112]:
random.seed(8)
list_samples = random.sample(list(df_misclassified.index), 3)
list_samples

[901, 22, 1441]

First case:

In [113]:
output_article(df_misclassified.loc[list_samples[0]])

Actual Category: donation
Predicted Category: criticism
-------------------------------------------
Text: 
Kementerian Keuangan Berikan Fasilitas Penundaan Pembayaran Cukai


Second case:

In [114]:
output_article(df_misclassified.loc[list_samples[1]])

Actual Category: donation
Predicted Category: criticism
-------------------------------------------
Text: 
3 Tempat yang Berpotensi Jadi Titik Penularan Corona, Achmad Yurianto Berikan Tips untuk Tetap Aman


Third case:

In [115]:
output_article(df_misclassified.loc[list_samples[2]])

Actual Category: criticism
Predicted Category: donation
-------------------------------------------
Text: 
Rumah Sakit Swasta di Kendal Diminta Terima Pasien ODP Virus Corona


We can see that in all cases the category is not 100% clear, since these articles contain concepts of both categories. These errors will always happen and we are not looking forward to be 100% accurate on them.