# Assignment 3

## Instructions
- Run this notebook on ```Google Colab(preferable)```
- Write your code and analysis in the indicated cells.
- Ensure that this notebook runs without errors when the cells are run in sequence.
- Do not attempt to change the contents of other cells. 

## Packages Used
- sklearn [link](https://scikit-learn.org/)
- Keras [link](https://keras.io/guides/)

## Submission
- Rename the notebook to `<roll_number>_Assignment3_Q3.ipynb`.


## Question 3
Fake news is a widespread problem and there are many methods for combating it.
You have to build a fake news detection system using a ML model. Train any ML model (ANN, LSTM) over the given Dataset.
The dataset has short statements spoken by people and has the meta-information and corresponding label for those sentences. 
Your target is label column which has 6 labels(in the increasing order of truthfullness): pants-fire, false, barely-true, half-true, mostly-true, true.

The features are 'statement', 'subject', 'speaker', 'job', 'state', 'party', 'barely_true_c', 'false_c', 'half_true_c', 'mostly_true_c', 'pants_on_fire_c', 'venue' and the target is column "label".

The statement is made by speaker whose job, party are given along with 6 columns which are an account of the  type of news(labels) the person has shared before. 
The person who has shared fake content before is likely to share it in future and this can be accounted by the ML model as a feature. Column barely_true_c contains how many barely_true news has the speaker shared (and so is with column X_c, value of X_c is number of X the person shared).


You have to perform two tasks:
* task1: Binary classification <br>
Classify the given news as true/false. Take the labels pants-fire, false, barely-true as false and rest (half-true, mostly-true, true) as true.
* task2: Six-way classification <br>
Classify the given news into six-classes "pants-fire, false, barely-true, half-true, mostly-true, true".

For each of the tasks:
1) Experiment with depth of network and try to fine-tune hyperparameters reporting your observations. <br>
2) Report the accuracy, f1-score, confusion matrix on train, val and test sets. <br>
3) Experiment with bag-of-words, glove and bert embeddings(code given in the below notebook) and report results. <br> Comment on what is the affect of embedding on the results.

The pre-processing code is provided, you need to write the training and test.

Note: You are supposed to train on trainset, fine-tune on val and just eval on test set. If found that you trained on val/test sets, the penalty will be incurred.

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


In [None]:
# !pip install numpy
# !pip install tensorflow
# !pip install re
# !pip install nltk
# !pip install keras
# !pip install sklearn
!pip install sentence_transformers



In [63]:
# Importing libraries
import numpy as np
import pandas as pd
from tensorflow import keras  #feel free to use any other library
import numpy as np

import re
import nltk
import numpy as np
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
nltk.download('stopwords')
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.feature_extraction.text import CountVectorizer
from keras.utils import np_utils

from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from tensorflow.keras.optimizers import SGD
from keras.callbacks import EarlyStopping
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
from sklearn.metrics import multilabel_confusion_matrix
import matplotlib.pyplot as plt
from sklearn.metrics import accuracy_score,f1_score
from keras.layers import Dense,Input,LSTM
from keras.models import Model

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [None]:
train = pd.read_csv('q3_data/train.csv')
val = pd.read_csv('q3_data/val.csv')
test = pd.read_csv('q3_data/test.csv')

In [None]:
# Dropping the 'id' column
train.drop('id', axis = 1, inplace = True)
test.drop('id', axis = 1, inplace = True)
val.drop('id', axis = 1, inplace = True)

In [None]:
train.head(5)

Unnamed: 0,label,statement,subject,speaker,job,state,party,barely_true_c,false_c,half_true_c,mostly_true_c,pants_on_fire_c,venue
0,False,Says the Annies List political group supports ...,abortion,dwayne-bohac,State representative,Texas,republican,0,1,0,0,0,a mailer
1,half-true,When did the decline of coal start? It started...,"energy,history,job-accomplishments",scott-surovell,State delegate,Virginia,democrat,0,0,1,1,0,a floor speech.
2,mostly-true,"Hillary Clinton agrees with John McCain ""by vo...",foreign-policy,barack-obama,President,Illinois,democrat,70,71,160,163,9,Denver
3,False,Health care reform legislation is likely to ma...,health-care,blog-posting,,,none,7,19,3,5,44,a news release
4,half-true,The economic turnaround started at the end of ...,"economy,jobs",charlie-crist,,Florida,democrat,15,9,20,19,2,an interview on CNN


In [None]:
# Checking the shape of data
print(train.shape)
print(val.shape)
print(test.shape)

(10269, 13)
(1284, 13)
(1283, 13)


## Clean and pre-process data
* Replace missing values
* Remove numbers and special characters
* Convert to upper-case

We experiment with two types of processing, one directly appending the other attributes like subject, job, state, party to sentence and then applying bag of words on it.

Other being encoding sentence with glove embeddings and passing just that.

In [None]:

def dataPreprocessing(data):
    '''Function for cleaning the dataset
    '''
    corpus = []
    # Missing values
    data["job"].fillna("no-job", inplace = True)
    data["state"].fillna("no-state", inplace = True)

    for x in range(data.shape[0]):
        statement = re.sub('[^a-zA-Z]', ' ', data['statement'][x]) # Removing all numbers and special characters
        statement = statement.lower() # Converting uppercase to lowercase
        statement = statement.split()
        
        # you can experiment with any other stemmers
        ps = PorterStemmer()
        statement = [ps.stem(word) for word in statement if not word in set(stopwords.words('english'))] # Stemming the dataset and removing stopwords
        statement = ' '.join(statement)
        subject = data['subject'][x].replace(',', ' ')
        speaker = data['speaker'][x]
        job = data['job'][x].lower()
        # job = job.replace(' ', '-')
        state = data['state'][x].lower()
        party = data['party'][x].lower()
        corpus.append(statement + ' '  + subject + ' ' + job + ' ' + state + ' ' + party)
    return corpus

In [None]:
x_train = dataPreprocessing(train)
x_val = dataPreprocessing(val) 
x_test = dataPreprocessing(test) 

In [None]:
len(x_train), len(x_val), len(x_test)

(10269, 1284, 1283)

In [None]:
corpus = x_train + x_val + x_test

## Using bag-of-words embedding


In [None]:
# Converting the corpus into bag-of-words
cv = CountVectorizer(max_features = 8000)
X = cv.fit_transform(corpus).toarray()

In [None]:
X.shape

(12836, 8000)

In [None]:
train.columns

Index(['label', 'statement', 'subject', 'speaker', 'job', 'state', 'party',
       'barely_true_c', 'false_c', 'half_true_c', 'mostly_true_c',
       'pants_on_fire_c', 'venue'],
      dtype='object')

In [None]:
# Selecting the columns 'barely_true_c',	'false_c',	'half_true_c',	'mostly_true_c',	'pants_on_fire_c'
label_cols = ['barely_true_c', 'false_c', 'half_true_c', 'mostly_true_c',
       'pants_on_fire_c']
x_train2 = train[label_cols]
x_val2 = val[label_cols]
x_test2 = test[label_cols]

In [None]:
# Stacking x_train and x_train2 horizontally
x_train_bow = np.hstack((X[:len(x_train)], x_train2))
x_val_bow = np.hstack((X[len(x_train):len(x_train)+len(x_val)], x_val2))
x_test_bow = np.hstack((X[len(x_train)+len(x_val):], x_test2))

In [None]:
x_train_bow.shape

(10269, 8005)

## Use of Glove Embedding


download glove embeddings from 'https://nlp.stanford.edu/data/glove.6B.zip','glove.6B.zip'
and place in your current working folder


In [None]:
!unzip "/content/gdrive/MyDrive/SMAI/Assignment 3/glove.6B.zip" -d "glove"

Archive:  /content/gdrive/MyDrive/SMAI/Assignment 3/glove.6B.zip
  inflating: glove/glove.6B.50d.txt  
  inflating: glove/glove.6B.100d.txt  
  inflating: glove/glove.6B.200d.txt  
  inflating: glove/glove.6B.300d.txt  


In [None]:
emmbed_dict = {}
with open('glove/glove.6B.200d.txt','r') as f:
  for line in f:
    values = line.split()
    word = values[0]
    vector = np.asarray(values[1:],'float32')
    emmbed_dict[word]=vector


In [None]:
emmbed_dict['oov'] = np.zeros(200)

In [None]:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('paraphrase-MiniLM-L6-v2')

In [None]:
from nltk.tokenize import word_tokenize
from keras.preprocessing.sequence import pad_sequences
nltk.download('punkt')
def dataPreprocessing_glove(data):
    corpus = []
    # Missing values
    data["job"].fillna("no-job", inplace = True)
    data["state"].fillna("no-state", inplace = True)

    for x in range(data.shape[0]):
        statement = re.sub('[^a-zA-Z]', ' ', data['statement'][x]) # Removing all numbers and special characters
        statement = statement.lower() # Converting uppercase to lowercase
        statement = word_tokenize(statement)

        embed_statement = []
        for w in statement:
            if w in emmbed_dict:
                embed_statement.append(emmbed_dict[w])
            else:
                embed_statement.append(emmbed_dict['oov'])
         
        # bonus: Think how you can encode the below features(hint: look upon label encoding or training your own word2vec or any other embedding model)
    
#         subject = data['subject'][x].replace(',', ' ')
#         speaker = data['speaker'][x]
#         job = data['job'][x].lower()
#         # job = job.replace(' ', '-')
#         state = data['state'][x].lower()
#         party = data['party'][x].lower()
        corpus.append(embed_statement)
    corpus = np.array(corpus)
    corpus=pad_sequences(corpus,padding='pre',maxlen=40)

    return corpus

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [None]:
x_train_glove = dataPreprocessing_glove(train)
x_val_glove = dataPreprocessing_glove(val) 
x_test_glove = dataPreprocessing_glove(test) 



In [None]:
x_train_glove=x_train_glove.reshape(x_train_glove.shape[0],8000)
x_val_glove=x_val_glove.reshape(x_val_glove.shape[0],8000)
x_test_glove=x_test_glove.reshape(x_test_glove.shape[0],8000)

In [None]:
x_train_glove = np.hstack((x_train_glove, x_train2))
x_val_glove = np.hstack((x_val_glove, x_val2))
x_test_glove = np.hstack((x_test_glove, x_test2))

## Use of bert embeddings
note: we used our pre-processed code for bow which has the attributed appended to end the end of sentence. 

In [None]:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('paraphrase-MiniLM-L6-v2')

x_train_bert = np.hstack((model.encode(x_train), x_train2))
x_val_bert = np.hstack((model.encode(x_val), x_val2))
x_test_bert = np.hstack((model.encode(x_test), x_test2))

Now use the above 3 types of embedded inputs(bow, glove, bert embeddings) for the 2 classification tasks and compare their outputs


# Six-way classification

## Preprocessing

In [None]:
num_classes = 6
# Preprocessing function for the labels
def categorize(data):
    y = data["label"].tolist()

    # Encoding the Dependent Variable
    labelencoder_y = LabelEncoder()
    y = labelencoder_y.fit_transform(y)

    # Converting to binary class matrix
    y = np_utils.to_categorical(y, num_classes)
    return y

In [None]:
y_train_six_way = categorize(train)
y_test_six_way = categorize(test)
y_val_six_way = categorize(val)

Build a model and pass bow, glove and bert embedded inputs: x_train_bow, x_train_glove, x_train_bert(similarly validate for val and report results on test)


## Model

In [None]:
def define_bag_of_words_6_way_model():
  model = Sequential()
  model.add(Dense(1024, activation='relu'))
  model.add(Dense(512, activation='relu'))
  model.add(Dense(128, activation='relu'))
  model.add(Dense(32, activation='relu'))

  model.add(Dense(6, activation='softmax'))
  opt = SGD(learning_rate=0.01, momentum=0.9)
  model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])
  return model

In [73]:

def define_glove_6_way_model(shapes):
    input = Input(shape=shapes)
    ip_layer1=keras.layers.Conv1D(16,4,padding="same",activation='relu')(input)
    
    ip_layer1=keras.layers.MaxPooling1D(pool_size=2,strides=1,padding='same')(ip_layer1)
    LSTM_Layer_1 = LSTM(128,return_sequences=True)(ip_layer1)
    LSTM_Layer_2 = LSTM(64)(LSTM_Layer_1)
    dense_layer = Dense(6, activation='sigmoid')(LSTM_Layer_2)
    model= Model(inputs=input, outputs=dense_layer)
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['acc'])
    return model

In [None]:
def define_bag_of_words_binary_model():
  model = Sequential()
  model.add(Dense(1024, activation='relu'))
  model.add(Dense(512, activation='relu'))
  model.add(Dense(128, activation='relu'))
  model.add(Dense(32, activation='relu'))

  model.add(Dense(2, activation='softmax'))
  opt = SGD(learning_rate=0.01, momentum=0.9)
  model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])
  return model

In [None]:
bag_of_words_6_way_model = define_bag_of_words_6_way_model()
es = EarlyStopping(monitor="val_accuracy",mode="auto",verbose=1,patience=3,restore_best_weights=True)
history = bag_of_words_6_way_model.fit(x_train_bow, y_train_six_way,batch_size=32,validation_data=(x_val_bow,y_val_six_way),epochs=10,callbacks=[es])


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 8: early stopping


In [None]:
loss, acc = bag_of_words_6_way_model.evaluate(x_test_bow,y_test_six_way)
print(acc)

0.41465315222740173


In [None]:
def predict(model,x):
  y = model.predict(x)
  predictions=[]
  for i in y:
    a = np.zeros(6)
    a[np.argmax(i,0)]=1
    predictions.append(a)
  return predictions

In [None]:
def report_multilabel_metrics(model,x,y):
  predictions = predict(model,x)
  cm = multilabel_confusion_matrix(y, predictions)
  print("Confusion Matrix\n",cm)
  print("Accuracy:",accuracy_score(y, predictions))
  print("F1 score:",f1_score(y, predictions,average='weighted'))

In [None]:
print("Report for Training data")
report_multilabel_metrics(bag_of_words_6_way_model,x_train_bow, y_train_six_way)
print("============================")
print("Report for Validation data")
report_multilabel_metrics(bag_of_words_6_way_model,x_val_bow,y_val_six_way)
print("============================")
print("Report for Test data")
report_multilabel_metrics(bag_of_words_6_way_model,x_test_bow,y_test_six_way)

Report for Training data
Confusion Matrix
 [[[7813  458]
  [1142  856]]

 [[8580    6]
  [1257  426]]

 [[8196  416]
  [ 945  712]]

 [[4543 3603]
  [ 377 1746]]

 [[7567  736]
  [1054  912]]

 [[9338   89]
  [ 533  309]]]
Accuracy: 0.4831044892394586
F1 score: 0.48320785869246236
Report for Validation data
Confusion Matrix
 [[[ 956   65]
  [ 163  100]]

 [[1109    6]
  [ 140   29]]

 [[ 971   76]
  [ 154   83]]

 [[ 537  499]
  [  58  190]]

 [[ 939   94]
  [ 160   91]]

 [[1144   24]
  [  89   27]]]
Accuracy: 0.40498442367601245
F1 score: 0.39965359772247966
Report for Test data
Confusion Matrix
 [[[ 958   75]
  [ 154   96]]

 [[1069    3]
  [ 178   33]]

 [[ 994   75]
  [ 142   72]]

 [[ 540  476]
  [  61  206]]

 [[ 929  105]
  [ 157   92]]

 [[1174   17]
  [  59   33]]]
Accuracy: 0.4146531566640686
F1 score: 0.4030868441048496


In [78]:
x_train_glove1 = np.expand_dims(x_train_glove,-1)
glove_6_way_model = define_glove_6_way_model(x_train_glove1.shape[1:])
es = EarlyStopping(monitor="val_accuracy",mode="auto",verbose=1,patience=3,restore_best_weights=True)
history = glove_6_way_model.fit(x_train_glove, y_train_six_way,batch_size=32,validation_data=(x_val_glove,y_val_six_way),epochs=10,callbacks=[es])


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [80]:
loss, acc = glove_6_way_model.evaluate(x_test_glove,y_test_six_way)
print(acc)

0.4489477872848511


In [81]:
print("Report for Training data")
report_multilabel_metrics(glove_6_way_model,x_train_glove, y_train_six_way)
print("============================")
print("Report for Validation data")
report_multilabel_metrics(glove_6_way_model,x_val_glove,y_val_six_way)
print("============================")
print("Report for Test data")
report_multilabel_metrics(glove_6_way_model,x_test_glove,y_test_six_way)

Report for Training data
Confusion Matrix
 [[[7332  939]
  [1052  946]]

 [[8575   11]
  [1354  329]]

 [[7736  876]
  [ 953  704]]

 [[6603 1543]
  [ 980 1143]]

 [[6348 1955]
  [ 681 1285]]

 [[9230  197]
  [ 501  341]]]
Accuracy: 0.4623624500925114
F1 score: 0.451604935888765
Report for Validation data
Confusion Matrix
 [[[ 907  114]
  [ 146  117]]

 [[1115    0]
  [ 138   31]]

 [[ 948   99]
  [ 135  102]]

 [[ 821  215]
  [ 116  132]]

 [[ 795  238]
  [  92  159]]

 [[1142   26]
  [  65   51]]]
Accuracy: 0.46105919003115264
F1 score: 0.4531706458021525
Report for Test data
Confusion Matrix
 [[[ 915  118]
  [ 133  117]]

 [[1069    3]
  [ 178   33]]

 [[ 962  107]
  [ 126   88]]

 [[ 817  199]
  [ 130  137]]

 [[ 778  256]
  [  95  154]]

 [[1167   24]
  [  45   47]]]
Accuracy: 0.4489477786438036
F1 score: 0.43635391164489756


In [None]:
x_train_bert1 = np.expand_dims(x_train_bert,-1)
bert_6_way_model = define_glove_6_way_model(x_train_bert1.shape[1:])
es = EarlyStopping(monitor="val_accuracy",mode="auto",verbose=1,patience=3,restore_best_weights=True)
history = bert_6_way_model.fit(x_train_bert, y_train_six_way,batch_size=32,validation_data=(x_val_bert,y_val_six_way),epochs=10,callbacks=[es])


In [None]:
loss, acc = bert_6_way_model.evaluate(x_test_bert,y_test_six_way)
print(acc)

In [None]:
print("Report for Training data")
report_multilabel_metrics(bert_6_way_model,x_train_bert, y_train_six_way)
print("============================")
print("Report for Validation data")
report_multilabel_metrics(bert_6_way_model,x_val_bert,y_val_six_way)
print("============================")
print("Report for Test data")
report_multilabel_metrics(bert_6_way_model,x_test_bert,y_test_six_way)

In [None]:
## write your code here
# Initialize hyperparameters
# Create model
# train
# test
# report accuracy, f1-score and confusion matrix

# Binary Classification

## Preprocessing

In [None]:
num_classes = 2

In [None]:
# Function for preprocessing labels
def dataPreprocessingBinary(data):
    y = data["label"].tolist()

    # Changing the 'half-true', 'mostly-true', barely-true', 'pants-fire' labels to True/False for Binary Classification
    for x in range(len(y)):
        if(y[x] == 'half-true'):
            y[x] = 'True'
        elif(y[x] == 'mostly-true'):
            y[x] = 'True'
        elif(y[x] == 'barely-true'):
            y[x] = 'False'
        elif(y[x] == 'pants-fire'):
            y[x] = 'False'

    # Converting the lables into binary class matrix
    labelencoder_y = LabelEncoder()
    y = labelencoder_y.fit_transform(y)
    y = np_utils.to_categorical(y, num_classes)
    return y

In [None]:
y_train_binary = dataPreprocessingBinary(train)
y_test_binary = dataPreprocessingBinary(test)
y_val_binary = dataPreprocessingBinary(val)

In [57]:
def report_metrics(model,x,y):
  y_pred = model.predict(x)
  predictions=[]
  for i in y_pred:
    a = np.zeros(2)
    a[np.argmax(i,0)]=1
    predictions.append(a)
    
  cm = multilabel_confusion_matrix(y, predictions)
  print("Confusion Matrix\n",cm)
  print("Accuracy:",accuracy_score(y, predictions))
  print("F1 score:",f1_score(y, predictions,average=None))

In [None]:
bag_of_words_binary_model = define_bag_of_words_binary_model()
es = EarlyStopping(monitor="val_accuracy",mode="auto",verbose=1,patience=3,restore_best_weights=True)
history = bag_of_words_binary_model.fit(x_train_bow, y_train_binary,batch_size=32,validation_data=(x_val_bow,y_val_binary),epochs=10,callbacks=[es])


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 6: early stopping


In [None]:
loss, acc = bag_of_words_binary_model.evaluate(x_test_bow,y_test_binary)
print(acc)

0.7404520511627197


In [58]:
print("Report for Training data")
report_metrics(bag_of_words_binary_model,x_train_bow, y_train_binary)
print("============================")
print("Report for Validation data")
report_metrics(bag_of_words_binary_model,x_val_bow,y_val_binary)
print("============================")
print("Report for Test data")
report_metrics(bag_of_words_binary_model,x_test_bow,y_test_binary)

Report for Training data
Confusion Matrix
 [[[5027  745]
  [1753 2744]]

 [[2744 1753]
  [ 745 5027]]]
Accuracy: 0.7567435972343948
F1 score: [0.6872026  0.80098789]
Report for Validation data
Confusion Matrix
 [[[566 102]
  [262 354]]

 [[354 262]
  [102 566]]]
Accuracy: 0.7165109034267912
F1 score: [0.66044776 0.75668449]
Report for Test data
Confusion Matrix
 [[[624 103]
  [230 326]]

 [[326 230]
  [103 624]]]
Accuracy: 0.7404520654715511
F1 score: [0.66192893 0.78937381]


In [71]:
from keras.models import Model
def define_glove_binary_model(shapes):
    input = Input(shape=shapes)
    ip_layer1=keras.layers.Conv1D(16,4,padding="same",activation='relu')(input)
    
    ip_layer1=keras.layers.MaxPooling1D(pool_size=2,strides=1,padding='same')(ip_layer1)
    LSTM_Layer_1 = LSTM(128,return_sequences=True)(ip_layer1)
    LSTM_Layer_2 = LSTM(64)(LSTM_Layer_1)
    dense_layer = Dense(2, activation='sigmoid')(LSTM_Layer_2)
    model= Model(inputs=input, outputs=dense_layer)
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['acc'])
    return model

In [72]:
x_train_glove1 = np.expand_dims(x_train_glove,-1)
glove_binary_model = define_glove_binary_model(x_train_glove1.shape[1:])
es = EarlyStopping(monitor="val_accuracy",mode="auto",verbose=1,patience=3,restore_best_weights=True)
history = glove_binary_model.fit(x_train_glove, y_train_binary,batch_size=32,validation_data=(x_val_glove,y_val_binary),epochs=10,callbacks=[es])


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [75]:
loss, acc = glove_binary_model.evaluate(x_test_glove,y_test_binary)
print(acc)

0.7326578497886658


In [76]:
print("Report for Training data")
report_metrics(glove_binary_model,x_train_glove, y_train_binary)
print("============================")
print("Report for Validation data")
report_metrics(glove_binary_model,x_val_glove,y_val_binary)
print("============================")
print("Report for Test data")
report_metrics(glove_binary_model,x_test_glove,y_test_binary)

Report for Training data
Confusion Matrix
 [[[4854  918]
  [1793 2704]]

 [[2704 1793]
  [ 918 4854]]]
Accuracy: 0.7360015580874476
F1 score: [0.66609188 0.78170545]
Report for Validation data
Confusion Matrix
 [[[557 111]
  [264 352]]

 [[352 264]
  [111 557]]]
Accuracy: 0.7079439252336449
F1 score: [0.65245598 0.74815312]
Report for Test data
Confusion Matrix
 [[[608 119]
  [224 332]]

 [[332 224]
  [119 608]]]
Accuracy: 0.7326578332034295
F1 score: [0.65938431 0.77998717]


## Model
Build a model and pass bow, glove and bert embedded inputs: x_train_bow, x_train_glove, x_train_bert(similarly validate for val and report results on test)


In [None]:
## write your code here
# Initialize hyperparameters
# Create model
# train
# test
# report accuracy, f1-score and confusion matrix