# Reputation based detection

This notebook is used to test if the inclusion of reputation vector biases the classification of the statement.

The basic assumtion being:  If a frequent liar says something true, it will be marked as a lie.  Similarly a lie frequent truth-speaker will tend to be marked as truth.

We first load the trained model and test it on a collection of statments that are not on the usual level of truth of their speakers (truths from liars, and lies from "honest"), and then check the result.

In [1]:
# Importing necessary libraries
import pandas as pd 
import sklearn
import torch
import torch.nn as nn
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
device 
import numpy as np

## Now we load the rows whos' label does not reflect the reputation of the speaker

These rows were selected from the test set and include only cases where EITHER habitual liars are speaking a mostly-true or true statement OR predominantly honest speakers are lying.

These were computed in Excel by computing a positivity ratio:  A value that if negative and large, considers the speaker a habitual liar and if positive and large considers the speaker somewhat honest.

Positivity Ratio was computed as follows: (Truths*1.5)+(MostlyTruths)-False-(Pants*1.5)


In [2]:
ReadSet=pd.read_excel('test_reputationExceptions.xlsx' )
    
test=ReadSet.drop(['ID','Label','Statement','Subject','Speaker','Job','From','Affiliation','Context','PositivityRatio'],axis=1)
test
#now we normalise these values


    

Unnamed: 0,PantsTotal,NotRealTotal,BarelyTotal,HalfTotal,MostlyTotal,Truths
0,105,43,11,8,5,6
1,105,43,11,8,5,6
2,61,114,63,51,37,14
3,61,114,63,51,37,14
4,61,114,63,51,37,14
...,...,...,...,...,...,...
220,11,41,26,32,40,23
221,11,41,26,32,40,23
222,11,41,26,32,40,23
223,0,1,3,6,3,8


In [3]:
#here we normalise these values to reflect those used by the trained network
for row in range(len(test)):    
    for value in range(6):
        test.iloc[row,value]=test.iloc[row,value]/200
test

Unnamed: 0,PantsTotal,NotRealTotal,BarelyTotal,HalfTotal,MostlyTotal,Truths
0,0.525,0.215,0.055,0.040,0.025,0.030
1,0.525,0.215,0.055,0.040,0.025,0.030
2,0.305,0.570,0.315,0.255,0.185,0.070
3,0.305,0.570,0.315,0.255,0.185,0.070
4,0.305,0.570,0.315,0.255,0.185,0.070
...,...,...,...,...,...,...
220,0.055,0.205,0.130,0.160,0.200,0.115
221,0.055,0.205,0.130,0.160,0.200,0.115
222,0.055,0.205,0.130,0.160,0.200,0.115
223,0.000,0.005,0.015,0.030,0.015,0.040


## Now we can test how well these reputations tend to taint the statements.

It is expected that we will see many truths being considered lies or moslty lies and viceversa.

In [8]:
#  adding the Fully Connected Neural Network to include the reputation vector
InputSize=6
OutputSize=6

class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        
         
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(InputSize, 24)   
        self.fc2 = nn.Linear(24, 12)
        self.fc3 = nn.Linear(12, OutputSize)  #classifies 'outputsize' different classes

    def forward(self, x):
        x = torch.sigmoid(self.fc1(x))
        x = torch.sigmoid(self.fc2(x)) 
        x = torch.sigmoid(self.fc3(x)).double()
        return x

    

#
#And we load previously saved FCNN model (which was trained)

PATH =  '../Sigmoid_linearMSE_adam_4427.pth'

net = Net()
net.load_state_dict(torch.load(PATH))

<All keys matched successfully>

In [9]:
TestData=torch.tensor(test.values)
TestData

tensor([[0.5250, 0.2150, 0.0550, 0.0400, 0.0250, 0.0300],
        [0.5250, 0.2150, 0.0550, 0.0400, 0.0250, 0.0300],
        [0.3050, 0.5700, 0.3150, 0.2550, 0.1850, 0.0700],
        ...,
        [0.0550, 0.2050, 0.1300, 0.1600, 0.2000, 0.1150],
        [0.0000, 0.0050, 0.0150, 0.0300, 0.0150, 0.0400],
        [0.0150, 0.0250, 0.0300, 0.0350, 0.0300, 0.0450]], dtype=torch.float64)

In [10]:
 
labels=ReadSet['Label']
labelsOneHot=pd.get_dummies(labels)
labelsOneHot



Unnamed: 0,0,1,2,3,4,5
0,0,0,0,0,0,1
1,0,0,0,0,0,1
2,0,0,0,0,1,0
3,0,0,0,0,0,1
4,0,0,0,0,0,1
...,...,...,...,...,...,...
220,1,0,0,0,0,0
221,0,1,0,0,0,0
222,1,0,0,0,0,0
223,0,1,0,0,0,0


In [11]:
correct = 0
total = 0

countCorrect0=0
countCorrect1=0
count0=0
count1=0

Y=[]  #target
Pred=[]  #predicted

with torch.no_grad():
    for row in range(len(TestData)):
        outputs = net(TestData[row,:].float())
        result=0
        total+=1
        if outputs[0]<outputs[1]:result=1
        if outputs[result]<outputs[2]:result=2
        if outputs[result]<outputs[3]:result=3
        if outputs[result]<outputs[4]:result=4
        if outputs[result]<outputs[5]:result=5
        
        if labelsOneHot.iloc[row,result]==1: correct+=1
        
        Y.append(labels.iloc[row])
        Pred.append(result)

        
        print(result, end=' ')
        
       
print('Correct:', correct, 'out of:', total )
print('Accuracy of the network : ',( 100 * correct / total))

0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 4 4 4 4 4 4 4 4 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Correct: 77 out of: 225
Accuracy of the network :  34.22222222222222


In [14]:
from sklearn import metrics
 
print(metrics.confusion_matrix(Y,Pred))

[[ 0 23  0  0  2  0]
 [ 0 77  0  0 19  0]
 [ 0 55  0  0 12  0]
 [ 0  1  0  0  0  0]
 [ 3 19  0  0  0  0]
 [ 3 11  0  0  0  0]]


 ###### in this case the model tends to label most entries as half truths regardless of what they are
 ######  we see that a small number of lies get seen as mostly truths and truths seen as lies
 

 

In [15]:
target_names = ['Pants', 'False', 'Barely-True','Half-True','Mostly-True','True']

print(metrics.classification_report(Y, Pred,target_names =target_names))

              precision    recall  f1-score   support

       Pants       0.00      0.00      0.00        25
       False       0.41      0.80      0.55        96
 Barely-True       0.00      0.00      0.00        67
   Half-True       0.00      0.00      0.00         1
 Mostly-True       0.00      0.00      0.00        22
        True       0.00      0.00      0.00        14

    accuracy                           0.34       225
   macro avg       0.07      0.13      0.09       225
weighted avg       0.18      0.34      0.23       225

