In this notebook we first do the classification using the transformer This is our first classification task.

The output classification vector from the transformer is saved to be used by the FCNN This is our second classification task.


In [1]:
# Importing necessary libraries
import pandas as pd
from datetime import datetime
import sklearn
import torch
import torch.nn as nn
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
device

device(type='cuda', index=0)

In [2]:
from simpletransformers.classification import ClassificationModel

## Preparing the dataset

Some pre-processing to the dataset has already been done in preparation for various tests, so this processing is not from scratch.

In [3]:
# procedure for getting the data sets and formatting them for the transformer
 

def prepareDataset( filename):
     
    ReadSet=pd.read_excel(filename )

    ReadSet['text']=ReadSet['Statement']
    ReadSet['labels']=ReadSet['Label']
    
    ReadSet=ReadSet.drop(['ID','Label','Statement','Subject','Speaker','Job','From','Affiliation','PantsTotal','NotRealTotal','BarelyTotal','HalfTotal','MostlyTotal' ,'RealTotal','Context'],axis=1)
    

    return ReadSet


In [4]:
# preparing the training dataset
train=prepareDataset( 'train-clean.xlsx')
# and display for inspecting
train

Unnamed: 0,text,labels
0,President Obama is a Muslim.,0
1,An independent payment advisory board created ...,0
2,U.S. Sen. Bill Nelson was the deciding vote fo...,2
3,Large phone companies and their trade associat...,4
4,RIPTA has really some of the fullest buses for...,4
...,...,...
10094,The Georgia Dome has returned $10 billion in e...,1
10095,Then-Gov. Carl Sanders put 56 percent of the s...,4
10096,Nathan Deal saved the HOPE scholarship program.,4
10097,John Faso took money from fossil fuel companie...,3


In [5]:
# preparing the evaluation/validation dataset
Eval=prepareDataset('valid-clean.xlsx')
# and display for inspecting
Eval

Unnamed: 0,text,labels
0,New Jerseys once-broken pension system is now ...,3
1,The new health care law will cut $500 billion ...,2
2,"For thousands of public employees, Wisconsin G...",3
3,Because as a Senator Toomey stood up for Wall ...,4
4,The governors budget proposal reduces the stat...,5
...,...,...
1267,You can import as many hemp products into this...,5
1268,Says when Republicans took over the state legi...,3
1269,Wisconsin's laws ranked the worst in the world...,2
1270,"There currently are 825,000 student stations s...",4


In [6]:
# preparing the test set dataset
test=prepareDataset('test-clean.xlsx')
test

Unnamed: 0,text,labels
0,"In a lawsuit between private citizens, a Flori...",4
1,Obama-Nelson economic record: Job creation a...,4
2,Says George LeMieux even compared Marco Rubio ...,2
3,Gene Green is the NRAs favorite Democrat in Co...,2
4,"In labor negotiations with city employees, Mil...",2
...,...,...
1250,Says Milwaukee County Executive Chris Abele sp...,1
1251,"The words subhuman mongrel, which Ted Nugent c...",5
1252,California's Prop 55 prevents $4 billion in ne...,2
1253,Says One of the states largest governments mad...,0


## Setting up the transformer for fine tuning

This is where changes are done to optimise the model

The simpletransformers library is the quickest way to do this at the time of writing. 
For more information on the settings and their default value go here:
https://github.com/ThilinaRajapakse/simpletransformers#default-settings 

###### Please do read that reference before changing any parameters. Don't try to be a hero!

In [7]:
#Set the model being used here
model_class='bert'  # bert or roberta or albert
model_version='bert-base-cased' #bert-base-cased, roberta-base, roberta-large, albert-base-v2 OR albert-large-v2


output_folder='./TunedModels/'+model_class+'/'+model_version+"/"
cache_directory= "./TunedModels/"+model_class+"/"+model_version+"/cache/"
labels_count=6  # the number of classification classes

print('model variables were set up: ')

model variables were set up: 


In [8]:
# use this to test if writing to the directories is working

import os
print(os.getcwd())
print(output_folder)
print(cache_directory)

testWrite=train.head(30)
 
testWrite.to_csv(output_folder+'DeleteThisToo.tsv', sep='\t')
testWrite.to_csv(cache_directory+'DeleteThisToo.tsv', sep='\t')

del(testWrite)

G:\0 finalThesis\CleanedText
./TunedModels/bert/bert-base-cased/
./TunedModels/bert/bert-base-cased/cache/


In [9]:
 
save_every_steps=1285
# assuming training batch size of 8
# any number above 1284 saves the model only at every epoch
# Saving the model mid training very often will consume disk space fast

train_args={
    "output_dir":output_folder,
    "cache_dir":cache_directory,
    'reprocess_input_data': True,
    'overwrite_output_dir': True,
    'num_train_epochs': 2,
    "save_steps": save_every_steps, 
    "learning_rate": 2e-5,
    "train_batch_size": 64,
    "eval_batch_size": 8,
    "evaluate_during_training_steps": 312,
    "max_seq_length": 64,
    "n_gpu": 1,
}

# Create a ClassificationModel
model = ClassificationModel(model_class, model_version, num_labels=labels_count, args=train_args) 

# You can set class weights by using the optional weight argument

### Loading a saved model (based on above args{})

If you stopped training you can continue training from a previously saved check point.
The next cell allows you to load a model from any checkpoint.
The number of epochs in the train_args{} will be done and continue tuning from your checkpoint.

###### HOWEVER
It will overwrite previous checkpoints!
Example:  If you load an epoch-3 checkpoint, the epoch-1 checkpoint will be overwritten by the 4th epoch and it will be equivalent to a 4th epoch even if you have epoch-1 in the name.
###### SO BE CAREFUL

In [16]:
# loading a previously saved model based on this particular Transformer Class and model_name

# loading the checkpoint that gave the best result
CheckPoint='checkpoint-316-epoch-2' 


preSavedCheckpoint=output_folder+CheckPoint

print('Loading model, please wait...')
model = ClassificationModel( model_class, preSavedCheckpoint, num_labels=labels_count, args=train_args) 
print('model in use is :', preSavedCheckpoint )

Loading model, please wait...
model in use is : ./TunedModels/bert/bert-base-cased/checkpoint-316-epoch-2


## Training the Transformer

Skip the next cell if you want to skip the training and go directly to the evaluation

In [10]:
# Train the model
current_time = datetime.now()
model.train_model(train)
print("Training time: ", datetime.now() - current_time)

Converting to features started. Cache is not used.


HBox(children=(FloatProgress(value=0.0, max=10099.0), HTML(value='')))


Selected optimization level O1:  Insert automatic casts around Pytorch functions and Tensor methods.

Defaults for this optimization level are:
enabled                : True
opt_level              : O1
cast_model_type        : None
patch_torch_functions  : True
keep_batchnorm_fp32    : None
master_weights         : None
loss_scale             : dynamic
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled                : True
opt_level              : O1
cast_model_type        : None
patch_torch_functions  : True
keep_batchnorm_fp32    : None
master_weights         : None
loss_scale             : dynamic


HBox(children=(FloatProgress(value=0.0, description='Epoch', max=2.0, style=ProgressStyle(description_width='i…

HBox(children=(FloatProgress(value=0.0, description='Current iteration', max=158.0, style=ProgressStyle(descri…

Running loss: 1.855415



Running loss: 1.711620



Running loss: 1.670800


HBox(children=(FloatProgress(value=0.0, description='Current iteration', max=158.0, style=ProgressStyle(descri…

Running loss: 1.612583

Training of bert model complete. Saved to ./TunedModels/bert/bert-base-cased/.
Training time:  0:06:22.348176


## Evaluating the training

In [17]:
TrainResult, TrainModel_outputs, wrong_predictions = model.eval_model(train, acc=sklearn.metrics.accuracy_score)

EvalResult, EvalModel_outputs, wrong_predictions = model.eval_model(Eval, acc=sklearn.metrics.accuracy_score)

TestResult, TestModel_outputs, wrong_predictions = model.eval_model(test, acc=sklearn.metrics.accuracy_score)

print('Training Result:', TrainResult['acc'])
#print('Model Out:', TrainModel_outputs)

print('Eval Result:', EvalResult['acc'])
#print('Model Out:', EvalModel_outputs)

print('Test Set Result:', TestResult['acc'])
#print('Model Out:', TestModel_outputs)

Features loaded from cache at ./TunedModels/bert/bert-base-cased/cache/cached_dev_bert_64_6_10099


HBox(children=(FloatProgress(value=0.0, max=1263.0), HTML(value='')))


{'mcc': 0.17723501405117215, 'acc': 0.3291414991583325, 'eval_loss': 1.6195976609572096}
Features loaded from cache at ./TunedModels/bert/bert-base-cased/cache/cached_dev_bert_64_6_1272


HBox(children=(FloatProgress(value=0.0, max=159.0), HTML(value='')))


{'mcc': 0.10218785027119269, 'acc': 0.2672955974842767, 'eval_loss': 1.6896684942005564}
Features loaded from cache at ./TunedModels/bert/bert-base-cased/cache/cached_dev_bert_64_6_1255


HBox(children=(FloatProgress(value=0.0, max=157.0), HTML(value='')))


{'mcc': 0.08938332836121785, 'acc': 0.2597609561752988, 'eval_loss': 1.6725422295795125}
Training Result: 0.3291414991583325
Eval Result: 0.2672955974842767
Test Set Result: 0.2597609561752988


In [18]:
Pred=[]

countCorrect=0

for row in range(TestModel_outputs.shape[0]):
    outputs=TestModel_outputs[row]
    #print(test.iloc[row,0])
    print(outputs, end=' ')
    
    result=0
    if outputs[0]<outputs[1]:result=1
    if outputs[result]<outputs[2]:result=2
    if outputs[result]<outputs[3]:result=3
    if outputs[result]<outputs[4]:result=4
    if outputs[result]<outputs[5]:result=5
    Pred.append(result)
    print(result, ' ',test.iloc[row,1], end=' ')
    if result==test.iloc[row,1]:
        countCorrect+=1
        print('Match',countCorrect)
    print('')

print(countCorrect)

[ 0.2775879   0.5595703   0.28515625 -0.25268555 -0.3251953  -0.26660156] 1   4 
[ 0.09289551  0.6328125   0.41748047  0.1862793  -0.6479492  -0.5097656 ] 1   4 
[-0.00778198  0.4309082   0.45263672  0.16296387 -0.14196777 -0.47094727] 2   2 Match 1

[-1.1396484   0.33984375 -0.11975098  0.49975586  0.4086914   0.6386719 ] 5   2 
[-0.8334961   0.46166992  0.63378906  0.60595703  0.28808594 -0.11645508] 2   2 Match 2

[-1.8007812  -0.08428955 -0.3347168   0.7314453   1.2578125   0.6621094 ] 4   5 
[-0.07501221  0.43237305  0.33398438  0.21179199 -0.04544067 -0.41577148] 1   3 
[-1.0322266   0.41845703  0.47924805  0.62841797  0.2841797   0.01258087] 3   2 
[ 0.3112793   0.4387207   0.07104492 -0.3083496  -0.58251953 -0.32836914] 1   1 Match 3

[-0.44140625  0.5         0.63720703  0.4074707  -0.17443848 -0.2154541 ] 2   0 
[-1.6318359   0.05465698 -0.359375    0.47875977  0.90966797  0.95947266] 5   5 Match 4

[-0.5463867   0.59228516  0.7211914   0.5527344   0.08154297 -0.27685547] 2  

[ 0.68603516  0.37524414  0.20898438 -0.39941406 -1.0322266  -0.5600586 ] 0   5 
[ 0.00383568  0.45922852  0.38745117  0.19946289 -0.4272461  -0.59277344] 1   1 Match 48

[-1.1123047   0.19580078  0.04800415  0.6015625   0.65625     0.10675049] 4   1 
[-0.96435547  0.07806396  0.00664139  0.30200195  0.52783203  0.43066406] 4   0 
[ 0.16369629  0.5083008   0.10180664 -0.09558105 -0.42578125 -0.2006836 ] 1   0 
[ 0.30151367  0.39404297  0.0880127  -0.25561523 -0.33642578 -0.27929688] 1   0 
[-1.6230469   0.109375   -0.11517334  0.67529297  0.75634766  0.6586914 ] 4   4 Match 49

[-1.515625    0.24475098  0.35253906  0.7216797   0.79052734  0.35839844] 4   4 Match 50

[ 0.02891541  0.6020508   0.65478516  0.33984375 -0.41137695 -0.8442383 ] 2   3 
[-1.2646484   0.17907715  0.23303223  0.7451172   0.51220703  0.19189453] 3   3 Match 51

[-1.2138672   0.47607422  0.15148926  0.6713867   0.20690918  0.48876953] 3   5 
[-0.95166016  0.3178711   0.24536133  0.64160156  0.24816895  0.17980957]

[-1.5078125   0.35742188  0.2734375   0.72021484  0.67285156  0.43359375] 3   1 
[-1.6650391  -0.02114868  0.02050781  0.80810547  0.61621094  0.5283203 ] 3   3 Match 94

[-1.7226562   0.17590332 -0.10168457  0.46264648  0.8857422   0.7919922 ] 4   4 Match 95

[-0.80126953  0.40820312  0.45874023  0.55615234  0.3239746  -0.07055664] 3   2 
[-1.1250000e+00  3.2739258e-01  3.7084961e-01  7.7734375e-01
  3.9794922e-01 -1.1640787e-04] 3   4 
[-0.10791016  0.69921875  0.66845703  0.4074707  -0.41235352 -0.52441406] 1   1 Match 96

[-1.5078125  -0.0385437  -0.4182129   0.5366211   1.2714844   0.78808594] 4   4 Match 97

[ 0.09265137  0.56152344  0.42358398  0.18408203 -0.16833496 -0.61572266] 1   1 Match 98

[-1.1201172   0.3918457   0.36669922  0.51416016  0.3166504   0.15014648] 3   2 
[-0.5395508   0.31323242  0.27563477  0.44458008  0.16003418 -0.20422363] 3   0 
[-1.0546875   0.13195801  0.09515381  0.58447266  0.6381836   0.20263672] 4   5 
[-1.0683594   0.18041992 -0.27416992  0.24182


[-0.30126953  0.42749023  0.27075195  0.20056152 -0.12792969 -0.04125977] 1   1 Match 124

[-1.2929688   0.12524414  0.24963379  0.5625      0.43115234  0.21557617] 3   2 
[-1.1376953   0.09204102  0.06130981  0.6557617   0.6699219   0.3395996 ] 4   3 
[ 0.19421387  0.35302734  0.45336914  0.08807373 -0.61816406 -0.5810547 ] 2   0 
[-1.7685547   0.11779785  0.06347656  0.7758789   1.0732422   0.43139648] 4   5 
[ 0.20690918  0.5136719   0.24523926 -0.14819336 -0.2788086  -0.26733398] 1   4 
[-0.41577148  0.5419922   0.6621094   0.625      -0.16223145 -0.30517578] 2   3 
[-1.3408203   0.3022461  -0.00556564  0.49951172  1.0410156   0.49804688] 4   5 
[-1.1035156   0.3215332   0.44482422  0.6621094   0.53125     0.22583008] 3   4 
[-1.2041016  -0.09594727 -0.32104492  0.4855957   0.97753906  0.58740234] 4   5 
[-0.68359375  0.3322754   0.6748047   0.640625    0.16723633 -0.34472656] 2   0 
[-1.1308594   0.16674805  0.30981445  0.83496094  0.32250977  0.12347412] 3   4 
[-1.6171875  -0.0

[-1.8486328   0.18835449 -0.07537842  0.85595703  0.7373047   0.6586914 ] 3   4 
[-1.0927734   0.16882324  0.15930176  0.6694336   0.32348633  0.20275879] 3   4 
[ 0.11456299  0.5180664   0.28320312  0.05020142 -0.3046875  -0.47192383] 1   0 
[ 0.7036133   0.28710938  0.11364746 -0.4555664  -0.9790039  -0.54003906] 0   0 Match 156

[ 0.44848633  0.51416016  0.31176758 -0.19836426 -0.58251953 -0.8652344 ] 1   2 
[-1.5322266   0.08099365 -0.01353455  0.78808594  0.80810547  0.52001953] 4   1 
[-1.5654297   0.29638672  0.08947754  0.71435547  0.6484375   0.5058594 ] 3   1 
[-1.5878906   0.15026855  0.39648438  0.70166016  0.5810547   0.5957031 ] 3   4 
[-1.5722656   0.09350586  0.05023193  0.75097656  1.0361328   0.57666016] 4   3 
[-0.8803711   0.48754883  0.14221191  0.24987793  0.27807617  0.30786133] 1   5 
[-1.7392578  -0.11499023 -0.37963867  0.77978516  1.2490234   0.8144531 ] 4   4 Match 157

[-0.4194336   0.375      -0.19702148 -0.28100586 -0.02104187  0.5751953 ] 5   1 
[-1.1787

[-1.8339844   0.16015625  0.10882568  0.9169922   0.8383789   0.61572266] 3   5 
[-1.4160156  -0.03096008 -0.25048828  0.5019531   1.1367188   0.74072266] 4   4 Match 191

[-1.2011719   0.26049805  0.22692871  0.7631836   0.70458984  0.14758301] 3   2 
[-1.6777344  -0.05325317 -0.3095703   0.76904297  1.2880859   0.66259766] 4   4 Match 192

[-0.8930664   0.20239258 -0.1751709   0.31958008  0.17285156  0.16882324] 3   4 
[-1.2685547   0.2619629  -0.19616699  0.53515625  0.94921875  0.50341797] 4   4 Match 193

[-2.0507812e+00  9.0694427e-04 -4.0014648e-01  6.5185547e-01
  1.2421875e+00  1.0673828e+00] 4   4 Match 194

[-1.1865234   0.02705383  0.12561035  0.7026367   0.6738281   0.36987305] 3   3 Match 195

[-1.6767578   0.2211914   0.19104004  0.6621094   0.9760742   0.68408203] 4   1 
[-0.7348633   0.36401367 -0.14941406  0.07177734  0.37158203  0.64501953] 5   1 
[-0.39916992  0.65527344  0.5209961   0.60253906 -0.22827148 -0.3647461 ] 1   1 Match 196

[-1.15625     0.27368164 -0.08

[-0.27929688  0.42382812  0.6489258   0.66064453 -0.17468262 -0.67089844] 3   1 
[-1.1787109   0.05853271  0.21728516  0.9394531   0.60302734  0.03411865] 3   3 Match 225

[ 0.24633789  0.50146484  0.14477539 -0.11010742 -0.5727539  -0.4951172 ] 1   2 
[-1.4111328   0.2310791   0.37426758  0.91748047  0.6347656   0.2524414 ] 3   1 
[ 0.13806152  0.6152344   0.3630371   0.12988281 -0.31420898 -0.5878906 ] 1   4 
[-0.25024414  0.45776367  0.10113525 -0.01238251 -0.24645996  0.02839661] 1   2 
[-0.8120117   0.3100586   0.52978516  0.80859375  0.13842773 -0.31567383] 3   1 
[-1.5732422   0.06591797 -0.04122925  0.66552734  0.9770508   0.6435547 ] 4   3 
[-0.82128906  0.0075798  -0.04504395  0.45458984  0.39770508  0.13269043] 3   4 
[-0.81152344  0.4519043   0.47851562  0.56933594  0.07495117  0.02748108] 3   1 
[-1.3046875   0.0021534  -0.11279297  0.49536133  0.75878906  0.60546875] 4   2 
[-1.8232422  -0.04727173 -0.19372559  0.94677734  0.93652344  0.7495117 ] 3   4 
[-0.22888184  0.33

[-1.6494141   0.2775879   0.19543457  0.73535156  0.6845703   0.5239258 ] 3   4 
[-0.4038086   0.55615234  0.3232422   0.17944336 -0.38354492  0.05270386] 1   0 
[-1.7050781   0.04724121 -0.09667969  0.76660156  0.80908203  0.6816406 ] 4   4 Match 264

[-1.3613281  -0.01357269 -0.12792969  0.7949219   0.48828125  0.29663086] 3   4 
[-1.1953125   0.3137207   0.13110352  0.6171875   0.45996094  0.5991211 ] 3   5 
[-1.5966797  -0.08856201 -0.00597     0.96533203  0.8364258   0.2553711 ] 3   3 Match 265

[ 0.65966797  0.50878906  0.2631836  -0.37280273 -0.7680664  -0.7949219 ] 0   0 Match 266

[ 0.16870117  0.4152832   0.26538086  0.12347412 -0.41333008 -0.44702148] 1   3 
[-1.2695312   0.11627197 -0.1673584   0.59228516  0.8378906   0.65527344] 4   0 
[-1.6904297   0.08862305 -0.16540527  0.63916016  1.0419922   0.91064453] 4   3 
[-1.328125    0.18383789 -0.2590332   0.36254883  0.7138672   0.7084961 ] 4   5 
[-1.9257812  -0.12322998 -0.44604492  0.85058594  1.1552734   1.1572266 ] 5   2

[-1.0566406   0.22705078  0.37817383  0.69189453  0.18457031  0.15161133] 3   3 Match 302

[-0.9301758   0.32910156  0.50878906  0.6640625   0.15039062  0.05825806] 3   2 
[-0.30981445  0.48168945  0.47045898  0.37036133 -0.20410156 -0.29858398] 1   3 
[ 0.16210938  0.48828125  0.2915039  -0.01626587 -0.4555664  -0.3137207 ] 1   1 Match 303

[-1.7373047   0.01007843 -0.19799805  0.74365234  1.109375    0.9067383 ] 4   1 
[-1.4892578   0.12670898  0.17919922  0.79589844  0.7885742   0.32958984] 3   5 
[-0.99853516  0.4404297   0.28393555  0.6767578   0.23779297  0.07849121] 3   1 
[-0.23461914  0.50341797  0.23022461  0.05453491 -0.1973877  -0.09588623] 1   0 
[-1.9453125  -0.06335449 -0.2902832   0.72216797  1.2236328   1.0185547 ] 4   0 
[-1.0869141   0.39257812  0.6904297   0.8876953   0.5341797  -0.19970703] 3   5 
[-0.80126953  0.5336914   0.28125     0.59472656  0.140625    0.08221436] 3   1 
[-0.53808594  0.53125     0.23339844 -0.04318237 -0.05764771  0.37573242] 1   4 
[-0.0209

In [19]:
from sklearn import metrics
print(metrics.confusion_matrix(test['labels'],Pred))

[[  7  41   9  17   9   8]
 [  6  69  13  72  43  30]
 [  5  56  31  82  32  15]
 [  4  45  19 105  65  17]
 [  2  33  10  89  78  37]
 [  4  31   6  54  75  36]]


In [20]:
target_names = ['Pants', 'False', 'Barely-True','Half-True','Mostly-True','True']

print(metrics.classification_report(test['labels'], Pred,target_names =target_names))

              precision    recall  f1-score   support

       Pants       0.25      0.08      0.12        91
       False       0.25      0.30      0.27       233
 Barely-True       0.35      0.14      0.20       221
   Half-True       0.25      0.41      0.31       255
 Mostly-True       0.26      0.31      0.28       249
        True       0.25      0.17      0.21       206

    accuracy                           0.26      1255
   macro avg       0.27      0.24      0.23      1255
weighted avg       0.27      0.26      0.25      1255



In [21]:
# saving the output of the models to CSVs
#these are 1X6 classification vectors

SavesDirectory='./TunedModels/'+model_class+'/'+model_version+"/Saves/"
print('Saving...')
trainOut = pd.DataFrame(data= TrainModel_outputs )
trainOut.to_csv(SavesDirectory+'trainOut.tsv', sep='\t',  index=False)

evalOut = pd.DataFrame(data= EvalModel_outputs )
evalOut.to_csv(SavesDirectory+'evalOut.tsv', sep='\t',  index=False)

testOut = pd.DataFrame(data= TestModel_outputs )
testOut.to_csv(SavesDirectory+'testOut.tsv', sep='\t',  index=False)

print('Saving Complete on',datetime.now() ,'in:', SavesDirectory)

Saving...
Saving Complete on 2020-04-19 12:57:07.866949 in: ./TunedModels/bert/bert-base-cased/Saves/


In [22]:
del(model)
#del(train,Eval,test)
del(trainOut,evalOut,testOut)
torch.cuda.empty_cache()

#  Adding the reputation vector

This section takes the output results from the transformer used above and uses it together with the speaker's reputation to enhance the classification.

Before running this section it is suggested that you halt the program and start running it again from this cell. The neural net will likely have an error caused by some unreleased variable used by thr simple transformers library. 

In [1]:
import pandas as pd
import torch
import torch.nn as nn
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
device

device(type='cuda', index=0)

In [2]:

train=pd.read_excel('train-clean-Reputation.xlsx' )
train=train.iloc[:,:-1].astype(float)
train=train/200  #for scaling
#train

model_class='bert'  # bert or roberta or albert
model_version='bert-base-cased' #bert-base-cased, roberta-base, roberta-large, albert-base-v2 OR albert-large-v2
SavesDirectory='./TunedModels/'+model_class+'/'+model_version+"/Saves/"
TF_Output=pd.read_csv( SavesDirectory+'trainOut.tsv', sep='\t')

train=pd.concat([train,TF_Output], axis=1)

train

Unnamed: 0,PantsTotal,NotRealTotal,BarelyTotal,HalfTotal,MostlyTotal,RealTotal,0,1,2,3,4,5
0,0.005,0.000,0.00,0.000,0.000,0.0,0.767578,0.405273,0.121094,-0.474121,-0.929688,-0.497314
1,0.005,0.000,0.01,0.000,0.000,0.0,0.082458,0.471680,0.402588,0.063293,-0.761230,-0.521484
2,0.005,0.000,0.01,0.000,0.000,0.0,-0.357910,0.583984,0.496826,0.019791,-0.104431,-0.106079
3,0.000,0.000,0.00,0.000,0.005,0.0,-1.498047,0.184692,0.141968,0.796387,0.678223,0.501953
4,0.000,0.000,0.00,0.000,0.005,0.0,-1.678711,0.028656,-0.227539,0.613770,1.092773,0.895020
...,...,...,...,...,...,...,...,...,...,...,...,...
10094,0.000,0.005,0.00,0.000,0.010,0.0,-0.728516,0.396484,0.320557,0.609375,0.066589,-0.038055
10095,0.000,0.005,0.00,0.000,0.010,0.0,-1.663086,0.326660,0.301758,0.840332,0.741699,0.410400
10096,0.000,0.005,0.00,0.000,0.010,0.0,-0.478516,0.354004,0.528809,0.556152,0.115173,-0.348633
10097,0.000,0.000,0.00,0.005,0.000,0.0,-0.354004,0.335693,0.586426,0.459229,-0.136353,-0.257568


In [3]:
TrainLables=pd.read_excel('train-clean-Reputation.xlsx' )
TrainLables=TrainLables.iloc[:,-1] 

TrainLables=pd.get_dummies(TrainLables)
TrainLables

Unnamed: 0,0,1,2,3,4,5
0,1,0,0,0,0,0
1,1,0,0,0,0,0
2,0,0,1,0,0,0
3,0,0,0,0,1,0
4,0,0,0,0,1,0
...,...,...,...,...,...,...
10094,0,1,0,0,0,0
10095,0,0,0,0,1,0
10096,0,0,0,0,1,0
10097,0,0,0,1,0,0


In [4]:
input=torch.tensor(train.values)
 
input

tensor([[ 0.0050,  0.0000,  0.0000,  ..., -0.4741, -0.9297, -0.4973],
        [ 0.0050,  0.0000,  0.0100,  ...,  0.0633, -0.7612, -0.5215],
        [ 0.0050,  0.0000,  0.0100,  ...,  0.0198, -0.1044, -0.1061],
        ...,
        [ 0.0000,  0.0050,  0.0000,  ...,  0.5562,  0.1152, -0.3486],
        [ 0.0000,  0.0000,  0.0000,  ...,  0.4592, -0.1364, -0.2576],
        [ 0.0000,  0.0000,  0.0000,  ...,  0.2426,  0.4626,  0.6909]],
       dtype=torch.float64)

In [5]:
targets=torch.tensor(TrainLables.astype(float).values)
 
targets

tensor([[1., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0.],
        [0., 0., 1., 0., 0., 0.],
        ...,
        [0., 0., 0., 0., 1., 0.],
        [0., 0., 0., 1., 0., 0.],
        [0., 0., 0., 0., 1., 0.]], dtype=torch.float64)

In [6]:
 
size= torch.tensor(input[0].size())
InputSize=size.item()

OutputSize=torch.tensor(targets[0].size()).item()

print('input size:', InputSize)
print('output size:', OutputSize)

input size: 12
output size: 6


In [7]:

class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        
         
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(InputSize, 24)   
        self.fc2 = nn.Linear(24, 12)
        self.fc3 = nn.Linear(12, OutputSize)  #classifies 'outputsize' different classes

    def forward(self, x):
        x = torch.tanh(self.fc1(x))
        x = torch.tanh(self.fc2(x)) 
        x = torch.tanh(self.fc3(x)).double()
        return x

    

#now we use it

net = Net()

In [30]:
# here we  setup the neural network parameters
# pick an optimizer (Simple Gradient Descent)

learning_rate = 9e-4
criterion = nn.MSELoss()  #computes the loss Function

import torch.optim as optim

# creating optimizer
#optimizer = optim.SGD(net.parameters(), lr=learning_rate)
optimizer = torch.optim.Adam(net.parameters(), lr=learning_rate)


In [31]:
for epoch in range(500):  
        
    optimizer.zero_grad()   # zero the gradient buffers
    output = net(input.float())

    loss = criterion(output, targets)
    print('Loss:', loss, ' at epoch:', epoch)

    loss.backward()  #backprop
    optimizer.step()    # Does the update

Loss: tensor(0.0959, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 0
Loss: tensor(0.0959, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 1
Loss: tensor(0.0959, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 2
Loss: tensor(0.0959, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 3
Loss: tensor(0.0959, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 4
Loss: tensor(0.0959, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 5
Loss: tensor(0.0959, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 6
Loss: tensor(0.0959, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 7
Loss: tensor(0.0959, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 8
Loss: tensor(0.0959, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 9
Loss: tensor(0.0959, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 10
Loss: tensor(0.0959, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 11
Loss: tensor(0

Loss: tensor(0.0958, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 106
Loss: tensor(0.0958, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 107
Loss: tensor(0.0958, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 108
Loss: tensor(0.0958, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 109
Loss: tensor(0.0958, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 110
Loss: tensor(0.0958, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 111
Loss: tensor(0.0958, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 112
Loss: tensor(0.0958, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 113
Loss: tensor(0.0958, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 114
Loss: tensor(0.0958, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 115
Loss: tensor(0.0958, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 116
Loss: tensor(0.0958, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epo

Loss: tensor(0.0958, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 212
Loss: tensor(0.0958, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 213
Loss: tensor(0.0958, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 214
Loss: tensor(0.0958, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 215
Loss: tensor(0.0958, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 216
Loss: tensor(0.0958, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 217
Loss: tensor(0.0958, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 218
Loss: tensor(0.0958, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 219
Loss: tensor(0.0958, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 220
Loss: tensor(0.0958, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 221
Loss: tensor(0.0958, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 222
Loss: tensor(0.0958, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epo

Loss: tensor(0.0958, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 315
Loss: tensor(0.0958, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 316
Loss: tensor(0.0958, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 317
Loss: tensor(0.0958, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 318
Loss: tensor(0.0958, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 319
Loss: tensor(0.0958, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 320
Loss: tensor(0.0958, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 321
Loss: tensor(0.0958, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 322
Loss: tensor(0.0958, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 323
Loss: tensor(0.0958, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 324
Loss: tensor(0.0958, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 325
Loss: tensor(0.0958, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epo

Loss: tensor(0.0958, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 430
Loss: tensor(0.0958, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 431
Loss: tensor(0.0958, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 432
Loss: tensor(0.0958, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 433
Loss: tensor(0.0958, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 434
Loss: tensor(0.0958, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 435
Loss: tensor(0.0958, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 436
Loss: tensor(0.0958, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 437
Loss: tensor(0.0958, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 438
Loss: tensor(0.0958, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 439
Loss: tensor(0.0958, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epoch: 440
Loss: tensor(0.0958, dtype=torch.float64, grad_fn=<MseLossBackward>)  at epo

In [45]:
#load previously saved FCNN model 

stage='NNetwork6WayClass/'
SavesDirectory='./TunedModels/'+model_class+'/'+model_version+"/"+stage
#PATH = SavesDirectory+'4885.pth'

net = Net()
net.load_state_dict(torch.load(PATH))

<All keys matched successfully>

In [35]:
correct = 0
total = 0

countCorrect0=0
countCorrect1=0
count0=0
count1=0
labels=pd.read_excel('train-clean-Reputation.xlsx' )

Y=[]  #target
Pred=[]  #predicted

with torch.no_grad():
    for row in range(len(input)):
        outputs = net(input[row,:].float())
        result=0
        total+=1
        if outputs[0]<outputs[1]:result=1
        if outputs[result]<outputs[2]:result=2
        if outputs[result]<outputs[3]:result=3
        if outputs[result]<outputs[4]:result=4
        if outputs[result]<outputs[5]:result=5
        
        if TrainLables.iloc[row,result]==1: correct+=1
        
        Y.append(labels.iloc[row])
        Pred.append(result)
        
        print(result, end=' ')
        
    
print('Correct:', correct, 'out of:', total )
print('Accuracy of the network : ',( 100 * correct / total))

0 2 2 4 4 5 3 4 1 4 5 5 4 4 3 3 3 2 3 3 1 1 3 1 1 1 5 1 4 3 5 1 1 4 1 0 3 3 3 4 2 3 2 1 4 2 4 1 1 2 5 5 1 1 5 5 5 1 1 2 1 2 5 2 5 5 5 5 5 1 5 5 5 5 5 1 5 5 5 5 2 2 2 5 4 5 5 5 3 5 3 5 3 4 1 0 0 0 0 5 2 5 2 1 5 5 5 1 5 5 5 5 5 5 5 5 5 5 5 5 5 3 3 3 4 3 3 3 3 1 1 4 1 3 3 3 1 3 3 3 5 1 4 4 2 4 1 2 1 2 5 4 1 4 5 5 1 1 2 2 1 2 1 1 1 2 1 1 5 1 1 1 1 1 3 3 2 2 2 2 2 2 5 4 4 4 4 0 2 2 2 3 3 4 4 4 4 4 4 2 2 2 3 3 3 3 3 3 2 2 2 3 1 3 0 2 2 2 2 4 4 4 4 1 3 0 0 4 5 3 3 5 2 3 3 1 2 2 2 1 1 1 1 1 2 2 1 1 2 2 2 3 3 3 3 4 4 4 2 2 2 2 2 2 3 1 3 3 3 1 3 1 1 2 3 3 3 3 2 2 2 5 2 4 4 4 3 1 3 3 4 3 3 0 1 5 5 5 5 5 3 3 0 0 0 3 1 4 3 4 1 3 2 4 4 4 3 3 3 4 1 2 2 1 2 2 3 3 3 3 1 3 3 2 5 5 4 4 4 3 2 5 3 3 3 3 3 3 3 3 4 2 1 1 4 1 4 3 4 2 3 2 3 3 0 4 4 3 4 4 3 3 1 1 4 2 4 3 3 4 4 3 4 4 4 1 4 4 1 4 4 1 4 5 4 1 2 3 3 5 5 5 5 4 5 5 1 4 4 4 4 1 4 2 5 2 2 5 5 3 1 1 1 1 4 2 4 5 4 4 3 3 4 3 4 4 3 1 1 3 4 3 3 4 1 1 4 4 4 3 4 2 3 4 4 5 3 4 4 3 4 4 5 3 3 4 3 4 4 1 1 3 4 4 4 1 3 1 3 4 3 4 3 1 3 4 4 4 5 1 3 4 3 3 4 4 4 3 4 3 

 4 5 3 4 4 5 3 4 5 5 3 3 4 2 1 1 5 4 1 4 4 4 3 4 5 5 4 4 4 4 4 1 4 3 3 4 4 3 1 4 4 4 4 1 1 2 2 3 3 3 3 3 3 3 1 1 1 1 1 1 1 1 5 1 4 3 5 5 4 5 0 0 5 5 4 3 2 4 1 5 1 1 1 3 1 2 2 5 4 4 5 4 5 4 3 4 4 3 2 2 5 4 4 3 1 5 5 3 5 5 5 5 1 4 2 5 5 4 4 5 2 1 5 1 3 2 4 2 4 1 2 4 3 4 4 4 4 1 3 1 4 5 5 5 5 3 0 0 5 3 4 5 5 5 4 3 5 3 2 2 2 2 2 2 4 1 4 1 3 3 4 3 4 3 3 3 4 4 4 5 5 3 5 4 1 5 5 3 0 1 4 2 2 5 2 4 2 5 5 2 3 1 4 4 4 2 4 2 5 5 0 2 4 4 5 5 4 2 4 5 5 1 3 4 0 0 4 0 0 4 1 3 3 3 3 3 3 3 3 1 1 4 5 5 3 5 5 2 5 3 4 3 2 5 5 4 5 5 2 5 3 4 3 2 2 1 3 5 2 5 3 3 3 5 2 2 5 5 2 2 3 5 1 5 2 3 3 3 5 5 5 5 3 5 3 5 5 2 5 2 5 2 5 5 1 1 1 1 1 4 4 3 3 5 5 4 3 1 3 4 1 4 4 4 4 4 4 2 4 1 4 4 2 4 5 5 3 3 3 3 3 3 2 3 4 5 5 5 5 0 0 0 3 3 4 1 4 4 4 4 4 3 4 4 3 2 4 4 4 3 3 1 3 3 4 3 4 5 5 5 5 3 3 3 5 5 0 0 4 4 5 1 1 0 3 3 3 3 3 2 2 2 4 2 4 4 4 4 5 1 4 1 5 3 5 3 3 3 3 5 1 0 4 4 1 4 4 3 4 4 4 1 4 2 5 0 0 0 5 4 3 3 2 2 1 5 2 1 5 2 5 1 1 1 3 3 4 3 5 2 2 1 4 3 4 3 3 4 4 4 3 5 5 5 5 5 1 0 3 3 3 2 3 3 2 5 4 4 4 4 4 4 4 4 4 4 4 4 2 5

 2 2 3 3 1 1 5 5 2 1 1 5 4 3 3 5 3 3 3 3 0 4 1 3 1 2 5 4 5 1 3 4 5 5 5 5 5 5 5 3 3 5 5 2 1 2 1 1 1 5 3 1 4 2 1 4 5 3 5 1 5 5 3 4 5 1 1 2 4 3 1 4 4 5 5 2 0 2 1 4 1 1 4 4 4 4 4 4 4 5 3 5 5 5 1 3 5 5 5 3 3 5 5 5 5 5 5 5 3 5 5 3 1 5 5 3 5 5 1 5 5 5 1 5 3 1 2 5 4 1 4 3 3 3 1 3 1 4 4 4 2 4 4 1 3 4 5 4 0 5 1 1 1 2 2 3 3 2 3 1 1 4 2 5 3 2 2 4 5 4 4 2 3 2 2 2 2 2 5 2 1 3 3 3 2 5 3 3 5 2 2 5 2 3 3 2 1 2 2 2 1 3 1 2 2 3 3 3 1 1 1 2 2 4 2 5 5 5 1 5 3 5 5 3 1 1 5 5 1 5 1 5 3 1 5 3 5 1 5 5 1 5 5 5 1 5 1 1 1 4 4 5 4 4 4 4 4 4 5 5 1 4 2 5 0 3 3 3 3 3 3 1 4 2 2 3 3 3 3 3 3 5 3 4 1 2 1 5 1 3 5 2 1 3 2 3 3 1 3 1 2 1 5 3 1 5 3 5 1 1 1 2 5 5 1 1 1 2 1 1 1 4 5 5 5 3 5 3 1 1 2 1 2 1 2 1 1 1 2 1 2 1 2 1 1 1 2 2 2 1 1 2 1 1 2 2 1 2 2 2 1 1 2 1 5 1 1 3 3 3 3 3 3 3 1 3 3 3 3 1 1 4 3 4 3 1 3 5 2 5 5 5 5 5 5 4 5 1 4 1 3 3 3 3 3 3 5 3 5 3 3 5 5 5 1 3 0 5 5 5 2 4 0 0 0 1 1 3 3 4 4 4 1 4 4 3 4 5 1 5 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 5 1 1 1 1 5 1 1 5 5 1 1 1 1 1 1 1 1 1 1 1 1 5 5 1 1 1 1 1 5 1 4 5 5 2 3 3 5 4 3 3 4 1

In [36]:
# load the validation data

ValidData=pd.read_excel('valid-clean-Reputation.xlsx' )
ValidData=ValidData.iloc[:,:-1].astype(float)
ValidData=ValidData/200

SavesDirectory='./TunedModels/'+model_class+'/'+model_version+"/Saves/"
TF_Output=pd.read_csv( SavesDirectory+'evalOut.tsv', sep='\t')

ValidData=pd.concat([ValidData,TF_Output], axis=1)


ValidData=torch.tensor(ValidData.values)
ValidData


tensor([[ 0.0000,  0.0000,  0.0000,  ...,  0.5371,  0.3501,  0.4424],
        [ 0.0050,  0.0000,  0.0100,  ...,  0.3889, -0.3740, -0.5796],
        [ 0.0000,  0.0000,  0.0050,  ...,  0.4365, -0.3293, -0.5288],
        ...,
        [ 0.0000,  0.0000,  0.0150,  ...,  0.5327,  0.9238,  0.9131],
        [ 0.0000,  0.0000,  0.0000,  ...,  0.7422,  1.2305,  0.8237],
        [ 0.0000,  0.0000,  0.0000,  ...,  0.5117,  1.0781,  0.8599]],
       dtype=torch.float64)

In [37]:
labels=pd.read_excel('valid-clean-Reputation.xlsx' )

labels=labels.iloc[:,-1] 
labelsOneHot=pd.get_dummies(labels)
labelsOneHot

Unnamed: 0,0,1,2,3,4,5
0,0,0,0,1,0,0
1,0,0,1,0,0,0
2,0,0,0,1,0,0
3,0,0,0,0,1,0
4,0,0,0,0,0,1
...,...,...,...,...,...,...
1267,0,0,0,0,0,1
1268,0,0,0,1,0,0
1269,0,0,1,0,0,0
1270,0,0,0,0,1,0


In [38]:
ValidLables =torch.tensor(labelsOneHot.values)
ValidLables

tensor([[0, 0, 0, 1, 0, 0],
        [0, 0, 1, 0, 0, 0],
        [0, 0, 0, 1, 0, 0],
        ...,
        [0, 0, 1, 0, 0, 0],
        [0, 0, 0, 0, 1, 0],
        [0, 0, 0, 1, 0, 0]], dtype=torch.uint8)

In [39]:
correct = 0
total = 0

countCorrect0=0
countCorrect1=0
count0=0
count1=0

Y=[]  #target
Pred=[]  #predicted

with torch.no_grad():
    for row in range(len(ValidData)):
        outputs = net(ValidData[row,:].float())
        result=0
        total+=1
        if outputs[0]<outputs[1]:result=1
        if outputs[result]<outputs[2]:result=2
        if outputs[result]<outputs[3]:result=3
        if outputs[result]<outputs[4]:result=4
        if outputs[result]<outputs[5]:result=5
        
        if labelsOneHot.iloc[row,result]==1: correct+=1
        
        Y.append(labels.iloc[row])
        Pred.append(result)
        
        print(result, end=' ')
        
    
print('Correct:', correct, 'out of:', total )
print('Accuracy of the network : ',( 100 * correct / total))

3 2 3 3 5 5 5 4 5 1 3 5 1 1 2 4 2 2 3 3 2 0 2 2 4 2 3 1 3 5 4 2 3 2 1 3 3 2 2 3 5 4 4 1 3 3 1 4 4 3 4 4 4 1 4 3 1 3 1 1 3 3 4 4 4 3 4 4 3 4 3 3 4 4 4 4 4 1 1 4 4 4 4 3 3 3 1 4 3 1 3 3 3 4 4 1 4 4 1 4 3 4 4 4 4 3 3 3 1 5 1 1 2 4 4 4 2 1 4 1 1 3 3 1 1 5 4 0 1 5 5 5 5 3 3 5 3 2 3 4 3 1 1 1 3 3 2 5 5 0 0 1 0 0 0 1 0 0 0 1 0 4 4 5 1 5 2 1 3 1 5 1 4 4 5 5 1 1 4 5 5 5 3 1 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 2 2 5 3 4 1 4 1 2 2 3 3 3 5 3 3 3 3 3 5 0 1 1 2 3 1 3 5 3 5 1 1 3 5 3 3 5 1 3 5 5 3 3 4 1 4 4 4 1 4 3 4 4 3 3 1 1 4 5 3 2 1 4 4 4 1 2 2 2 3 2 2 4 5 1 1 2 0 4 2 4 1 4 1 1 2 2 3 2 5 2 0 2 1 3 1 2 2 3 3 2 1 2 4 3 4 4 5 1 3 2 1 1 3 1 1 5 5 3 1 2 2 2 3 2 3 3 3 5 5 5 4 3 2 4 4 2 5 3 1 1 0 1 1 1 1 4 1 1 1 3 1 2 1 4 1 1 2 1 4 1 2 4 4 2 1 3 1 1 4 1 1 2 4 1 1 1 4 4 1 2 0 4 5 1 1 5 3 4 5 3 2 2 2 4 2 2 5 5 1 2 5 1 1 3 0 4 2 5 3 3 3 3 3 1 3 1 1 3 1 3 3 5 1 2 2 1 2 2 2 5 3 5 1 2 5 1 3 0 4 3 2 3 2 5 1 2 3 3 4 5 5 1 1 1 3 2 2 2 2 1 5 1 4 3 5 5 5 1 5 1 1 0 2 5 1 2 1 1 1 4 3 2 5 3 2 0 1 5 4 5 5 1 

In [40]:
# load the test data

TestData=pd.read_excel('test-clean-Reputation.xlsx' )
TestData=TestData.iloc[:,:-1].astype(float)
TestData=TestData/200

SavesDirectory='./TunedModels/'+model_class+'/'+model_version+"/Saves/"
TF_Output=pd.read_csv( SavesDirectory+'testOut.tsv', sep='\t')

TestData=pd.concat([TestData,TF_Output], axis=1)


TestData=torch.tensor(TestData.values)
TestData

tensor([[ 0.0000,  0.0050,  0.0100,  ..., -0.2527, -0.3252, -0.2666],
        [ 0.0000,  0.0050,  0.0100,  ...,  0.1863, -0.6479, -0.5098],
        [ 0.0000,  0.0050,  0.0100,  ...,  0.1630, -0.1420, -0.4709],
        ...,
        [ 0.0000,  0.0000,  0.0050,  ...,  0.8877,  0.2993, -0.2554],
        [ 0.0050,  0.0000,  0.0000,  ...,  0.6519,  0.7334,  0.5640],
        [ 0.0000,  0.0000,  0.0000,  ...,  0.7949,  0.3220, -0.1009]],
       dtype=torch.float64)

In [41]:
labels=pd.read_excel('test-clean-Reputation.xlsx' )

labels=labels.iloc[:,-1] 
labelsOneHot=pd.get_dummies(labels)
labelsOneHot

TestLables =torch.tensor(labelsOneHot.values)
TestLables

tensor([[0, 0, 0, 0, 1, 0],
        [0, 0, 0, 0, 1, 0],
        [0, 0, 1, 0, 0, 0],
        ...,
        [0, 0, 1, 0, 0, 0],
        [1, 0, 0, 0, 0, 0],
        [0, 0, 0, 1, 0, 0]], dtype=torch.uint8)

In [46]:
correct = 0
total = 0


Y=[]  #target
Pred=[]  #predicted

with torch.no_grad():
    for row in range(len(TestData)):
        outputs = net(TestData[row,:].float())
        result=0
        total+=1
        if outputs[0]<outputs[1]:result=1
        if outputs[result]<outputs[2]:result=2
        if outputs[result]<outputs[3]:result=3
        if outputs[result]<outputs[4]:result=4
        if outputs[result]<outputs[5]:result=5
        
        if labelsOneHot.iloc[row,result]==1: correct+=1
        
        Y.append(labels.iloc[row])
        Pred.append(result)

        
        
        print(result, end=' ')
        
       
print('Correct:', correct, 'out of:', total )
print('Accuracy of the network : ',( 100 * correct / total))

1 3 2 2 4 5 1 5 2 3 5 5 5 3 3 1 1 1 1 2 1 1 2 2 2 4 3 3 0 2 3 2 1 2 1 2 2 3 3 2 3 3 1 0 3 2 1 1 3 1 3 4 1 5 0 5 3 4 1 4 3 4 4 3 3 4 4 4 5 3 4 3 3 3 1 3 1 4 3 4 3 3 3 1 1 3 3 1 4 3 4 3 4 4 4 4 3 4 4 4 3 4 1 4 4 4 1 4 4 3 4 4 4 4 4 2 4 3 3 4 3 3 1 5 1 4 4 4 4 2 2 4 4 4 4 3 2 2 2 3 0 2 4 3 2 1 3 4 4 4 4 5 5 2 5 5 3 2 0 1 0 0 0 2 2 1 0 3 4 3 3 5 3 5 5 1 5 2 5 3 5 3 0 3 1 0 5 2 1 1 1 1 0 4 4 4 4 5 5 0 0 0 0 0 0 0 0 1 0 0 1 0 0 2 5 4 0 3 3 5 5 3 5 3 5 3 5 1 4 1 3 3 1 5 3 3 3 3 3 3 3 3 3 3 3 3 4 3 2 3 3 1 5 0 0 4 3 3 5 5 1 5 1 4 4 3 4 2 0 5 4 2 2 2 0 4 1 5 5 2 2 4 3 1 4 1 1 4 2 5 5 3 2 2 5 1 5 3 2 4 0 5 5 2 1 5 3 1 4 3 3 0 2 2 2 2 2 2 2 5 2 3 3 3 2 2 4 3 5 5 5 1 1 4 2 3 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 2 1 1 1 1 1 1 2 2 1 2 4 1 1 1 1 1 4 4 4 4 1 4 1 2 1 2 1 5 2 0 2 4 1 1 3 5 3 2 2 4 2 2 2 1 1 3 3 3 3 3 1 2 4 1 3 3 1 3 2 2 1 2 1 3 1 1 3 5 3 4 2 1 5 1 2 5 1 1 2 3 2 4 4 3 4 4 5 1 5 5 4 2 3 5 2 5 5 5 3 1 2 3 3 4 5 5 3 1 1 5 3 1 1 2 1 4 5 3 1 1 3 5 5 2 2 5 5 2 5 5 2 3 3 5 3 3 1 5 1 5 1 1 1 4 4 

In [47]:
from sklearn import metrics 
print(metrics.confusion_matrix(Y,Pred))

[[ 41  27  10   9   2   2]
 [  7 135  29  23  21  18]
 [  6  40  88  50  13  24]
 [  0  40  22 135  33  25]
 [  0  23  27  56 109  34]
 [  3  23  13  32  30 105]]


In [48]:
target_names = ['Pants', 'False', 'Barely-True','Half-True','Mostly-True','True']

print(metrics.classification_report(Y, Pred,target_names =target_names))

              precision    recall  f1-score   support

       Pants       0.72      0.45      0.55        91
       False       0.47      0.58      0.52       233
 Barely-True       0.47      0.40      0.43       221
   Half-True       0.44      0.53      0.48       255
 Mostly-True       0.52      0.44      0.48       249
        True       0.50      0.51      0.51       206

    accuracy                           0.49      1255
   macro avg       0.52      0.48      0.49      1255
weighted avg       0.50      0.49      0.49      1255



In [29]:
#save the FCNN model

stage='NNetwork6WayClass/'
SavesDirectory='./TunedModels/'+model_class+'/'+model_version+"/"+stage
#PATH = SavesDirectory+'4885.pth'

torch.save(net.state_dict(), PATH)

# more on saving pytorch networks: https://pytorch.org/docs/stable/notes/serialization.html