# Compare predictions
The goal is to extract the right predictions of BERT-base on the dev set, and pass only these subset to NetBERT to see if it performs at least as well as BERT-base. Then, extract the wrong predictions of BERT-base and see where NetBERT improves, which specific cases, which classes in particluar, which type of sentences (badly written, not clear?)

In [16]:
import os
import json

import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

## BERT-base predictions

In [17]:
dirpath = '/raid/antoloui/Master-thesis/Code/Extrinsic_evaluation/Classification/output/bert_base_cased/'

with open(os.path.join(dirpath, 'map_classes.json')) as f:
    class_mappings = json.load(f)

### Right predictions

In [18]:
# Load right predictions from BERT-base
df_bert_right = pd.read_csv(os.path.join(dirpath, 'preds_right.csv'), index_col=0)

# Create columns with classes.
df_bert_right['Class'] = df_bert_right.apply(lambda row: class_mappings[str(row.Class_id)], axis=1)
df_bert_right['Prediction'] = df_bert_right.apply(lambda row: class_mappings[str(row.Prediction_id)], axis=1)

# Save dataset for evaluation with NetBERT.
to_drop = ['Class_id', 'Prediction_id', 'Prediction']
df_bert_right.drop(to_drop, axis=1, inplace=True)
df_bert_right.to_csv(os.path.join(dirpath, 'eval_right_preds.csv'))
df_bert_right

Unnamed: 0,Sentence,Class
0,Steps in using cisco wsa,End User Guides
1,compatible version between ASA 5520 and ASDM,Install & Upgrade Guides
2,4500-X netflow multiple exporter,"Configuration (Guides, Examples & TechNotes)"
3,CISCO WSA AsyncOS API,End User Guides
4,blocked messages ESA,End User Guides
...,...,...
783,DNAC App Policy,End User Guides
784,iosxe release schedule,Data Sheets
785,NXOS train tracker,Release Notes
786,nxos n5k HA support,Release Notes


### Wrong predictions 

In [19]:
# Load right predictions from BERT-base
df_bert_wrong = pd.read_csv(os.path.join(dirpath, 'preds_wrong.csv'), index_col=0)

# Create columns with classes.
df_bert_wrong['Class'] = df_bert_wrong.apply(lambda row: class_mappings[str(row.Class_id)], axis=1)
df_bert_wrong['Prediction'] = df_bert_wrong.apply(lambda row: class_mappings[str(row.Prediction_id)], axis=1)

# Save dataset for evaluation with NetBERT.
to_drop = ['Class_id', 'Prediction_id', 'Prediction']
df_bert_wrong.drop(to_drop, axis=1, inplace=True)
df_bert_wrong.to_csv(os.path.join(dirpath, 'eval_wrong_preds.csv'))
df_bert_wrong

Unnamed: 0,Sentence,Class
0,nexus 5000 copp,Release Notes
1,CUCM self care portal what is a valid pin,"Configuration (Guides, Examples & TechNotes)"
2,dx80 current firmware,Data Sheets
3,IOS XE 16.x 3.x,Install & Upgrade Guides
4,Connector and Cable Specifications: 10/100/100...,Install & Upgrade Guides
...,...,...
150,catalyst 9500 license activation,Data Sheets
151,Introduction to Cisco Prime Collaboration Prov...,Install & Upgrade Guides
152,Catalyst 2960-X 48 GigE PoE 740W 4 x 1G SFP LA...,Data Sheets
153,UCCX 12.0(1) image,End User Guides


### Full predictions

In [20]:
df_bert = pd.concat([df_bert_right,df_bert_wrong], ignore_index=True)
df_bert.to_csv(os.path.join(dirpath, 'eval_preds.csv'))
df_bert

Unnamed: 0,Sentence,Class
0,Steps in using cisco wsa,End User Guides
1,compatible version between ASA 5520 and ASDM,Install & Upgrade Guides
2,4500-X netflow multiple exporter,"Configuration (Guides, Examples & TechNotes)"
3,CISCO WSA AsyncOS API,End User Guides
4,blocked messages ESA,End User Guides
...,...,...
938,catalyst 9500 license activation,Data Sheets
939,Introduction to Cisco Prime Collaboration Prov...,Install & Upgrade Guides
940,Catalyst 2960-X 48 GigE PoE 740W 4 x 1G SFP LA...,Data Sheets
941,UCCX 12.0(1) image,End User Guides


## NetBERT predictions

### Evaluate on BERT right predictions only