## What this file is for

It is possible to capture the hidden states of a transformer at any layer.  The ones captured here are a 796 dimensional embedding at the output,  representing each token in a statement.

Classification using these was attempted but was not successful.

In [19]:
# Importing necessary libraries
import pandas as pd
from datetime import datetime
import sklearn
import torch
import torch.nn as nn
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
device
from transformers import BertTokenizer, BertModel

In [9]:
# procedure for getting the data sets and formatting them for the transformer
 

def prepareDataset( filename):
     
    ReadSet=pd.read_excel(filename )

    ReadSet['text']=ReadSet['Statement']
    ReadSet['labels']=ReadSet['Label']
    
    ReadSet=ReadSet.drop(['ID','Label','Statement','Subject','Speaker','Job','From','Affiliation','PantsTotal','NotRealTotal','BarelyTotal','HalfTotal','MostlyTotal','RealTotal','Context'],axis=1)
     
    return ReadSet


In [10]:
# preparing the training dataset
train=prepareDataset( 'train-clean.xlsx')
# and display for inspecting
train

Unnamed: 0,text,labels
0,President Obama is a Muslim.,0
1,An independent payment advisory board created ...,0
2,U.S. Sen. Bill Nelson was the deciding vote fo...,2
3,Large phone companies and their trade associat...,4
4,RIPTA has really some of the fullest buses for...,4
...,...,...
10261,The Georgia Dome has returned $10 billion in e...,1
10262,Then-Gov. Carl Sanders put 56 percent of the s...,4
10263,Nathan Deal saved the HOPE scholarship program.,4
10264,John Faso took money from fossil fuel companie...,3


In [11]:
# preparing the evaluation/validation dataset
Eval=prepareDataset('valid-clean.xlsx')
# and display for inspecting
Eval

Unnamed: 0,text,labels
0,New Jerseys once-broken pension system is now ...,3
1,The new health care law will cut $500 billion ...,2
2,"For thousands of public employees, Wisconsin G...",3
3,Because as a Senator Toomey stood up for Wall ...,4
4,The governors budget proposal reduces the stat...,5
...,...,...
1279,You can import as many hemp products into this...,5
1280,Says when Republicans took over the state legi...,3
1281,Wisconsin's laws ranked the worst in the world...,2
1282,"There currently are 825,000 student stations s...",4


In [12]:
# preparing the test set dataset
test=prepareDataset('test-clean.xlsx')
test

Unnamed: 0,text,labels
0,"In a lawsuit between private citizens, a Flori...",4
1,Obama-Nelson economic record: Job creation ......,4
2,Says George LeMieux even compared Marco Rubio ...,2
3,Gene Green is the NRAs favorite Democrat in Co...,2
4,"In labor negotiations with city employees, Mil...",2
...,...,...
1277,Says Milwaukee County Executive Chris Abele sp...,1
1278,"The words subhuman mongrel, which Ted Nugent c...",5
1279,California's Prop 55 prevents $4 billion in ne...,2
1280,Says One of the states largest governments mad...,0


#  Capturing the hidden layers 

The simplest way to do this is with the Huggingface Transformers library
https://huggingface.co/transformers/model_doc/bert.html

At this point you should terminate this notebook and start running the remaining steps from this point

##### Using  BertModel
The sentence vector can be capture with outputs[1]

The 1X6 classification vector can be capture with outputs[0]

adding output_hidden_states=True  allows you to capture all hidden states in output[1]


In [20]:
model_class='bert'  # bert or roberta or albert
model_version='bert-base-cased' #bert-base-cased, roberta-base, roberta-large, albert-base-v2 OR albert-large-v2
output_folder='./TunedModels/'+model_class+'/'+model_version+"/"

# Load pre-trained model (weights)
CheckPoint='checkpoint-161-epoch-4'  #epoch 2
preSavedCheckpoint=output_folder+CheckPoint
model =BertModel.from_pretrained(preSavedCheckpoint) # ,output_hidden_states=True  if you wish

In [21]:
# Load pre-trained model tokenizer (vocabulary)
tokenizer = BertTokenizer.from_pretrained(preSavedCheckpoint)

#for testing it (uncomment the following 2 lines)
#tokens_tensor = torch.tensor(tokenizer.encode("Hello, my dog is very cute")).unsqueeze(0)
#tokens_tensor

In [22]:
# place model in evaluation mode
# This is IMPORTANT to have reproducible results during evaluation!
# it deactivates the DropOut modules
model.eval()

BertModel(
  (embeddings): BertEmbeddings(
    (word_embeddings): Embedding(28996, 768, padding_idx=0)
    (position_embeddings): Embedding(512, 768)
    (token_type_embeddings): Embedding(2, 768)
    (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (encoder): BertEncoder(
    (layer): ModuleList(
      (0): BertLayer(
        (attention): BertAttention(
          (self): BertSelfAttention(
            (query): Linear(in_features=768, out_features=768, bias=True)
            (key): Linear(in_features=768, out_features=768, bias=True)
            (value): Linear(in_features=768, out_features=768, bias=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (output): BertSelfOutput(
            (dense): Linear(in_features=768, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          

In [23]:
# use this to check that the model is working

with torch.no_grad():
        tokens_tensor = torch.tensor(tokenizer.encode("Hello, my dog is very cute")).unsqueeze(0)
        outputs = model(tokens_tensor)
        classification = outputs[0]
        sentenceVector = outputs[1]


print(classification)
print('_________________________')
print(classification.size())
print('_____ Sentence Vector_____')
print(sentenceVector)
print('_________________________')
print(sentenceVector.size())


tensor([[[-0.2796, -0.6438,  0.0258,  ..., -0.0553,  0.6021,  0.4983],
         [-0.1477, -0.1678,  0.7997,  ..., -0.1568,  0.7950,  0.6561],
         [ 0.3069,  0.5455,  1.1903,  ..., -0.1290, -0.0614,  0.7561],
         ...,
         [-0.3995,  0.2563,  0.1365,  ...,  0.7624,  0.4063,  0.4120],
         [ 0.1616, -0.3017,  0.4328,  ..., -0.3868,  0.6118,  0.2967],
         [ 0.4040, -0.6882,  0.8193,  ..., -0.3489,  0.5836, -0.0777]]])
_________________________
torch.Size([1, 9, 768])
_____ Sentence Vector_____
tensor([[-0.3891,  0.2266,  0.9752, -0.6544,  0.1215, -0.5357,  0.6574,  0.0100,
         -0.3325, -0.0315, -0.0804,  0.8440, -0.2456, -0.8986, -0.5873, -0.2237,
          0.1653,  0.1258, -0.9787,  0.0976, -0.3274, -0.9096,  0.5075, -0.1409,
          0.2600, -0.0811,  0.1712,  0.9795, -0.2689,  0.9629,  0.1324, -0.6161,
         -0.6890, -0.8535,  0.1986,  0.3653, -0.9317, -0.2481,  0.3966, -0.4678,
         -0.1731,  0.6675,  0.0152, -0.2752,  0.1034, -0.1125,  0.0279, -0.2

In [24]:
sentenceVector.numpy()

array([[-0.38907713,  0.22658618,  0.9751551 , -0.6543652 ,  0.12148222,
        -0.5356933 ,  0.6574321 ,  0.01001288, -0.3325216 , -0.03150015,
        -0.08035869,  0.84399843, -0.24559331, -0.8986449 , -0.587316  ,
        -0.22366308,  0.16534932,  0.12576467, -0.97867686,  0.09760844,
        -0.32740077, -0.9095873 ,  0.5075457 , -0.14091869,  0.2600149 ,
        -0.08106023,  0.17119773,  0.9795097 , -0.26889402,  0.9629115 ,
         0.13242915, -0.616064  , -0.6890336 , -0.8535495 ,  0.19860597,
         0.36531705, -0.93170583, -0.24810933,  0.39660627, -0.4678403 ,
        -0.17307971,  0.66752434,  0.01520166, -0.27515113,  0.10336161,
        -0.1125436 ,  0.02793297, -0.25013205, -0.09938671,  0.9962989 ,
         0.20001042,  0.9803463 ,  0.07254764,  0.25343397,  0.4116171 ,
         0.03744793,  0.7586972 , -0.05051298, -0.16765298, -0.10250951,
         0.23956813,  0.00301847, -0.00553937,  0.51250166, -0.58163184,
        -0.41187403, -0.45626318,  0.39665696,  0.4

In [25]:
# We capture the vectors of every statement, and save it with this procedure

SavesDirectory='./TunedModels/'+model_class+'/'+model_version+"/Vectors/"

def saveVectors(set,filename):
    statementVectors=[] # we collect the sentence vectors in this array
    print ("Fetching Vectors...", end='')
    for row in range(len(set)):
        with torch.no_grad():
            text=train.iloc[row,0]
            tokens_tensor = torch.tensor(tokenizer.encode(text)).unsqueeze(0)
            outputs = model(tokens_tensor)    
            sentenceVector = outputs[1]
            statementVectors.append(sentenceVector[0].numpy())
     
    print('Saving...',end='')
    fileOut = pd.DataFrame(data= statementVectors)
    fileOut.to_csv(SavesDirectory+filename+'.tsv', sep='\t',  index=False)
     
    print('Saving Complete!')
    

In [26]:
saveVectors(train,'trainOut')

Fetching Vectors...Saving...Saving Complete!


In [27]:
saveVectors(Eval,'evalOut')

Fetching Vectors...Saving...Saving Complete!


In [28]:
saveVectors(test,'testOut')

Fetching Vectors...Saving...Saving Complete!


#  Adding the reputation vector to the statement Vector

In [1]:
import pandas as pd
import torch
import torch.nn as nn
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
device

device(type='cuda', index=0)

In [6]:

train=pd.read_excel('train-clean-Reputation.xlsx' )
train=train.iloc[:,:-1].astype(float)
train=train/200  #for scaling
#train

model_class='bert'  # bert or roberta or albert
model_version='bert-base-cased' #bert-base-cased, roberta-base, roberta-large, albert-base-v2 OR albert-large-v2
SavesDirectory='./TunedModels/'+model_class+'/'+model_version+"/Vectors/"
TF_Output=pd.read_csv( SavesDirectory+'trainOut.tsv', sep='\t')

train=pd.concat([train,TF_Output], axis=1)

train

Unnamed: 0,PantsTotal,NotRealTotal,BarelyTotal,HalfTotal,MostlyTotal,RealTotal,0,1,2,3,...,758,759,760,761,762,763,764,765,766,767
0,0.00,0.000,0.000,0.000,0.005,0.0,-0.353628,0.144715,0.958464,-0.681585,...,0.463839,0.643878,-0.712667,0.672926,-0.043116,0.436722,-0.138033,0.965976,0.144237,0.848983
1,0.01,0.000,0.000,0.000,0.005,0.0,0.144056,-0.214195,0.095764,0.488308,...,-0.051924,0.318266,-0.376326,0.485823,-0.274125,-0.069330,0.407660,-0.244027,0.078874,-0.227743
2,0.01,0.000,0.000,0.000,0.005,0.0,-0.225200,0.035485,0.773057,-0.188975,...,0.115203,0.861145,-0.733366,0.676177,-0.448811,0.543391,0.091833,0.594777,0.602495,0.173886
3,0.00,0.000,0.000,0.005,0.000,0.0,0.309293,-0.111637,0.026551,0.546971,...,0.599675,-0.301488,0.279851,0.116902,-0.345542,0.248941,0.455902,-0.292546,-0.026388,-0.631524
4,0.00,0.000,0.000,0.005,0.000,0.0,0.175661,0.124550,0.784903,-0.417388,...,0.610993,0.133260,-0.258388,0.465152,-0.376480,-0.024658,0.114852,0.673887,0.473460,0.106824
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10260,0.00,0.005,0.000,0.010,0.000,0.0,0.247382,-0.038540,-0.220711,0.490467,...,0.101806,0.142231,0.082679,-0.188128,-0.736837,0.289984,0.260620,-0.663452,-0.147648,-0.314752
10261,0.00,0.005,0.000,0.010,0.000,0.0,-0.158101,-0.000162,0.912721,-0.635446,...,0.594023,0.451269,-0.065867,0.616350,0.004871,0.076593,0.162441,0.884791,0.198261,0.514505
10262,0.00,0.005,0.000,0.010,0.000,0.0,0.047984,-0.056740,0.280595,0.227913,...,-0.020549,0.670855,-0.584498,0.435903,-0.434411,0.462449,0.254470,-0.284243,0.449133,-0.215167
10263,0.00,0.000,0.005,0.000,0.000,0.0,-0.225869,-0.048088,0.832683,-0.486722,...,0.019364,0.678839,-0.327944,0.195281,-0.392379,-0.510906,0.112248,0.907283,-0.540267,0.543526


In [7]:
TrainLables=pd.read_excel('train-clean-Reputation.xlsx' )
TrainLables=TrainLables.iloc[:,-1] 

TrainLables=pd.get_dummies(TrainLables)
TrainLables

Unnamed: 0,0,1,2,3,4,5
0,1,0,0,0,0,0
1,1,0,0,0,0,0
2,0,0,1,0,0,0
3,0,0,0,0,1,0
4,0,0,0,0,1,0
...,...,...,...,...,...,...
10260,0,1,0,0,0,0
10261,0,0,0,0,1,0
10262,0,0,0,0,1,0
10263,0,0,0,1,0,0


In [8]:
input=torch.tensor(train.values)
del(train)
input

tensor([[ 0.0000,  0.0000,  0.0000,  ...,  0.9660,  0.1442,  0.8490],
        [ 0.0100,  0.0000,  0.0000,  ..., -0.2440,  0.0789, -0.2277],
        [ 0.0100,  0.0000,  0.0000,  ...,  0.5948,  0.6025,  0.1739],
        ...,
        [ 0.0000,  0.0050,  0.0000,  ..., -0.2842,  0.4491, -0.2152],
        [ 0.0000,  0.0000,  0.0050,  ...,  0.9073, -0.5403,  0.5435],
        [ 0.0000,  0.0000,  0.0000,  ..., -0.2191, -0.5112,  0.0335]],
       dtype=torch.float64)

In [9]:
targets=torch.tensor(TrainLables.astype(float).values)
del(TrainLables)
targets

tensor([[1., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0.],
        [0., 0., 1., 0., 0., 0.],
        ...,
        [0., 0., 0., 0., 1., 0.],
        [0., 0., 0., 1., 0., 0.],
        [0., 0., 0., 0., 1., 0.]], dtype=torch.float64)

In [10]:
 
size= torch.tensor(input[0].size())
InputSize=size.item()

OutputSize=torch.tensor(targets[0].size()).item()

print('input size:', InputSize)
print('output size:', OutputSize)

input size: 774
output size: 6


In [11]:

class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        
         
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(InputSize, 120)  # input size 32
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, OutputSize)  #classifies 'outputsize' different classes

    def forward(self, x):
        x = torch.tanh(self.fc1(x))
        x = torch.tanh(self.fc2(x)) 
        x = torch.tanh(self.fc3(x)).double()
        return x

    

#now we use it

net = Net()

In [26]:
# here we  setup the neural network parameters
# pick an optimizer (Simple Gradient Descent)

learning_rate = 4e-4
criterion = nn.MSELoss()  #computes the loss Function

import torch.optim as optim

# creating optimizer
#optimizer = optim.SGD(net.parameters(), lr=learning_rate)
optimizer = torch.optim.Adam(net.parameters(), lr=learning_rate)


In [27]:
for epoch in range(300):  
        
    optimizer.zero_grad()   # zero the gradient buffers
    output = net(input.float())

    loss = criterion(output, targets)
    print('Loss:', loss, ' at epoch:', epoch)

    loss.backward()  #backprop
    optimizer.step()    # Does the update

Loss: tensor(0.0857, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 0
Loss: tensor(0.2733, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 1
Loss: tensor(0.1323, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 2
Loss: tensor(0.1271, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 3
Loss: tensor(0.1663, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 4
Loss: tensor(0.1570, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 5
Loss: tensor(0.1189, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 6
Loss: tensor(0.0937, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 7
Loss: tensor(0.1065, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 8
Loss: tensor(0.1257, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 9
Loss: tensor(0.1238, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 10
Loss: tensor(0.1067, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 11
Loss: tensor(0

Loss: tensor(0.0820, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 99
Loss: tensor(0.0820, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 100
Loss: tensor(0.0820, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 101
Loss: tensor(0.0819, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 102
Loss: tensor(0.0819, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 103
Loss: tensor(0.0819, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 104
Loss: tensor(0.0819, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 105
Loss: tensor(0.0819, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 106
Loss: tensor(0.0819, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 107
Loss: tensor(0.0819, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 108
Loss: tensor(0.0819, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 109
Loss: tensor(0.0819, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoc

Loss: tensor(0.0814, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 199
Loss: tensor(0.0814, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 200
Loss: tensor(0.0814, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 201
Loss: tensor(0.0813, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 202
Loss: tensor(0.0813, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 203
Loss: tensor(0.0813, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 204
Loss: tensor(0.0813, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 205
Loss: tensor(0.0813, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 206
Loss: tensor(0.0813, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 207
Loss: tensor(0.0813, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 208
Loss: tensor(0.0813, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 209
Loss: tensor(0.0813, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epo

Loss: tensor(0.0807, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 297
Loss: tensor(0.0807, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 298
Loss: tensor(0.0807, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 299


In [25]:
#save the FCNN model

stage='NNetworkStatementVector/'
SavesDirectory='./TunedModels/'+model_class+'/'+model_version+"/"+stage
PATH = SavesDirectory+'Tanh_MSE_adam1.pth'

torch.save(net.state_dict(), PATH)

# more on saving pytorch networks: https://pytorch.org/docs/stable/notes/serialization.html

In [29]:
#load previously saved FCNN model 

stage='NNetworkStatementVector/'
SavesDirectory='./TunedModels/'+model_class+'/'+model_version+"/"+stage
PATH = SavesDirectory+'Tanh_MSE_adam1.pth'

net = Net()
net.load_state_dict(torch.load(PATH))

<All keys matched successfully>

In [14]:
# load the test data

TestData=pd.read_excel('test-clean-Reputation.xlsx' )
TestData=TestData.iloc[:,:-1].astype(float)
TestData=TestData/200

SavesDirectory='./TunedModels/'+model_class+'/'+model_version+"/Vectors/"
TF_Output=pd.read_csv( SavesDirectory+'testOut.tsv', sep='\t')

TestData=pd.concat([TestData,TF_Output], axis=1)


TestData=torch.tensor(TestData.values)
TestData

tensor([[ 0.0100,  0.0050,  0.0200,  ...,  0.9660,  0.1442,  0.8490],
        [ 0.0100,  0.0050,  0.0200,  ..., -0.2440,  0.0789, -0.2277],
        [ 0.0100,  0.0050,  0.0200,  ...,  0.5948,  0.6025,  0.1739],
        ...,
        [ 0.0050,  0.0000,  0.0000,  ...,  0.9461,  0.1329,  0.6838],
        [ 0.0000,  0.0000,  0.0050,  ...,  0.9595, -0.4551,  0.7177],
        [ 0.0000,  0.0000,  0.0050,  ...,  0.8213, -0.4577,  0.7748]],
       dtype=torch.float64)

In [15]:
labels=pd.read_excel('test-clean-Reputation.xlsx' )
labels=labels.iloc[:,-1] 
labelsOneHot=pd.get_dummies(labels)
 
TestLables =torch.tensor(labelsOneHot.values)
TestLables

tensor([[0, 0, 0, 0, 1, 0],
        [0, 0, 0, 0, 1, 0],
        [0, 0, 1, 0, 0, 0],
        ...,
        [0, 0, 1, 0, 0, 0],
        [1, 0, 0, 0, 0, 0],
        [0, 0, 0, 1, 0, 0]], dtype=torch.uint8)

In [30]:
correct = 0
total = 0


Y=[]  #target
Pred=[]  #predicted

with torch.no_grad():
    for row in range(len(TestData)):
        outputs = net(TestData[row,:].float())
        result=0
        total+=1
        if outputs[0]<outputs[1]:result=1
        if outputs[result]<outputs[2]:result=2
        if outputs[result]<outputs[3]:result=3
        if outputs[result]<outputs[4]:result=4
        if outputs[result]<outputs[5]:result=5
        
        if labelsOneHot.iloc[row,result]==1: correct+=1
        
        Y.append(result)
        Pred.append(labels.iloc[row])
        
        print(result, end=' ')

                
print('Correct:', correct, 'out of:', total )
print('Accuracy of the network : ',( 100 * correct / total))

2 1 2 4 3 5 3 4 1 3 5 1 5 3 3 3 2 4 1 3 5 3 3 3 1 1 1 4 1 4 2 3 4 4 1 1 3 2 3 3 4 3 4 2 1 3 4 1 3 2 3 3 5 5 1 3 4 5 2 5 3 5 4 5 5 1 5 3 4 1 2 1 3 4 1 1 3 3 4 5 3 3 3 5 4 3 3 1 4 2 3 5 3 5 5 3 3 3 1 1 2 2 3 1 4 3 3 1 5 5 5 3 4 3 3 3 4 4 4 3 3 3 2 3 3 5 3 3 3 4 1 4 1 3 3 1 1 2 1 4 5 5 4 3 2 4 1 1 3 2 5 4 4 3 5 2 1 3 3 3 1 4 1 1 2 2 0 1 3 3 2 2 1 1 3 3 2 3 2 3 1 2 5 5 2 1 4 3 1 2 1 3 3 2 2 4 4 1 5 5 4 2 1 0 0 0 1 2 0 0 0 1 0 0 1 1 1 1 1 4 3 4 4 3 3 3 3 1 4 3 3 5 2 3 3 3 2 1 1 3 3 2 1 1 2 5 1 1 3 1 2 3 3 4 3 2 4 3 4 2 2 3 3 3 2 1 1 3 1 3 3 1 1 2 3 3 5 3 3 3 4 5 2 2 5 4 3 1 3 3 1 3 4 1 1 3 4 3 5 3 3 2 0 2 3 2 1 4 3 4 1 3 1 4 2 3 3 3 1 3 1 2 1 3 0 3 3 2 1 5 1 3 0 2 1 5 1 4 1 3 3 1 3 3 1 4 3 1 0 1 3 0 1 1 1 1 1 4 3 1 1 1 2 1 3 0 1 1 1 4 1 1 3 1 1 1 2 4 2 3 4 2 4 5 3 1 5 1 3 1 4 4 4 1 4 5 4 5 2 1 3 5 5 5 5 3 5 2 1 3 3 4 4 1 5 2 5 1 1 5 3 3 3 1 1 1 4 2 1 5 3 3 5 5 3 4 5 1 5 1 2 1 1 3 3 3 3 1 3 4 2 5 1 1 4 4 5 2 2 4 1 1 1 4 5 2 5 1 3 4 3 3 1 1 1 2 1 3 3 1 4 1 3 3 2 3 2 2 5 5 2 4 4 5 4 2 2 3 1 3 

In [23]:
from sklearn import metrics 
print(metrics.confusion_matrix(Y,Pred))

[[  5   2   3   5   1   3]
 [ 24  67  55  51  64  48]
 [ 10  38  36  36  26  31]
 [ 29  75  57 100  81  71]
 [ 14  33  44  35  44  26]
 [  9  31  26  39  33  30]]


In [24]:
target_names = ['Pants', 'False', 'Barely-True','Half-True','Mostly-True','True']

print(metrics.classification_report(Y, Pred,target_names =target_names))

              precision    recall  f1-score   support

       Pants       0.05      0.26      0.09        19
       False       0.27      0.22      0.24       309
 Barely-True       0.16      0.20      0.18       177
   Half-True       0.38      0.24      0.29       413
 Mostly-True       0.18      0.22      0.20       196
        True       0.14      0.18      0.16       168

    accuracy                           0.22      1282
   macro avg       0.20      0.22      0.19      1282
weighted avg       0.26      0.22      0.23      1282

