**Preprocessing step for predicted data**
* input:  file containing source, target and prediction, each trio on one line, (columns separated by tabs)
* output: file containing CoNLL format-friendly table, each word on one line, (columns separated by spaces)µ

Original work for /home/getalp/sfeirj/scripts/preprocess_for_conlleval.py

In [101]:
import pandas as pd
from tqdm import tqdm

## Build predictions dataframe

In [67]:
path = "/run/user/71447/gvfs/sftp:host=decore0.imag.fr,user=sfeirj/home/getalp/sfeirj/data/test_predictions"
predictions = pd.read_csv(path, delimiter="\\t", names=["sentence_idx","source","target","prediction"])
predictions.describe()

  


Unnamed: 0,sentence_idx
count,9479.0
mean,4740.0
std,2736.495935
min,1.0
25%,2370.5
50%,4740.0
75%,7109.5
max,9479.0


In [68]:
len(predictions[predictions["target"] == predictions["prediction"]])

3871

## Observe  tar and pred with different lengths

In [95]:
df_diff = pd.DataFrame(columns=["sentence_idx","source","target","prediction"])
cter = 0
for idx,row in predictions.iterrows():
    src_len = len(row["source"].split(" "))
    pred_len = len(row["prediction"].split(" "))
    if (src_len != pred_len):
        df_diff.loc[cter] = row
        print((src_len, pred_len))
        cter += 1
print(cter)

(37, 36)
(64, 63)
(63, 62)
(63, 62)
(61, 54)
(61, 55)
(61, 54)
(61, 62)
(61, 55)
(60, 54)
(60, 47)
(58, 57)
(53, 52)
(98, 73)
(98, 73)
(94, 200)
(89, 164)
(87, 62)
(86, 54)
(82, 64)
(80, 61)
(80, 61)
(79, 78)
(79, 61)
(76, 51)
(75, 50)
(75, 50)
(74, 50)
(74, 49)
(74, 49)
(73, 48)
(72, 47)
(71, 46)
(71, 72)
(70, 45)
(70, 46)
(70, 45)
(70, 45)
(70, 45)
(70, 45)
(69, 44)
(68, 43)
(66, 41)
(65, 40)
(151, 58)
(115, 72)
(108, 83)
(108, 83)
(104, 105)
(100, 75)
50


In [93]:
df_diff

Unnamed: 0,sentence_idx,source,target,prediction
0,8408,When the bowels of the Earth became too tight ...,O O O O O O O O O O O O O B-N O O O O O O O B-...,O B-N I-N I-N I-N I-N O O O O O O O O O O B-P ...
1,9226,I think a number of the reforms that have come...,B-P O O O O B-N I-N I-N I-N I-N I-N I-N I-N I-...,B-P O O O O O O O O O O B-N I-N I-N I-N I-N I-...
2,9230,In fact his visit has somewhat been ignored an...,O O B-N I-N O O O O O O O O O O O O O B-N O O ...,O O B-N I-N O O O O O O O O O O O O O B-N O O ...
3,9232,"A month ago , Hertz , of Park Ridge , N.J. , s...",O O O O B-N I-N I-N I-N I-N I-N I-N I-N O O B-...,O O O O O O O O O O O O O O B-P O O B-N O O O ...
4,9243,It is because that will not only make the soci...,O O O B-N O O O O O O O O O O O B-P O O O O O ...,B-P O O O O O O O O O O O O O O B-P O O O O O ...
5,9244,Officials in the Iraqi Department of Defense s...,O O O O O O O O O O O O O O O O O O O B-N I-N ...,O O O O O O O O O O O O O O O O O O O O O O O ...
6,9245,`` There is an underlying concern on the part ...,O O O O O O O O O O O O O O O O O O O O B-N I-...,O O O O O O O B-N I-N I-N I-N I-N I-N I-N I-N ...
7,9246,what we have speculated is that it might be a ...,O O O O O O O O O B-N I-N I-N I-N I-N I-N I-N ...,O B-P O O O O B-P O O O O O O O O O O O B-N I-...
8,9251,He also told the British Guardian that he took...,B-P O O B-N I-N I-N O B-P O B-N I-N I-N O B-N ...,B-P O O B-N I-N I-N I-N I-N I-N I-N I-N I-N I-...
9,9252,8 13/16 % to 8 11/16 % one month ; 8 13/16 % t...,O O O O O O O O O O O O O O O O O O O O O O O ...,O O O O O O O O O O O O O O O O O O O O O O O ...


## What to do with different lengths?

### Joint evaluation

#### Build dataframe

In [104]:
df_conll_format = pd.DataFrame(columns=["sentence_idx", "word", "target", "prediction"])
cter = 0

for idx,row in tqdm(predictions.iterrows()):
    splitted_source = row["source"].split(" ")
    splitted_target = row["target"].split(" ")
    splitted_prediction = row["prediction"].split(" ")
    
    # If prediction and target have same length
    if len(splitted_target) == len(splitted_prediction):
        for idx in range(len(splitted_target)):
            df_conll_format.loc[cter] = [row["sentence_idx"], splitted_source[idx], \
                                         splitted_target[idx], splitted_prediction[idx]]
            cter += 1
            
    # If target is longer
    elif len(splitted_target) > len(splitted_prediction):
        for idx in range(len(splitted_prediction)):
            df_conll_format.loc[cter] = [row["sentence_idx"], splitted_source[idx], \
                                         splitted_target[idx], splitted_prediction[idx]]
            cter += 1
        # Fill prediction with "O" tags
        for idx in range(len(splitted_prediction), len(splitted_target)):
            df_conll_format.loc[cter] = [row["sentence_idx"], splitted_source[idx], \
                                         splitted_target[idx], "O"]
            cter += 1
            
    # If prediction is longer
    else:
        for idx in range(len(splitted_target)):
            df_conll_format.loc[cter] = [row["sentence_idx"], splitted_source[idx], \
                                         splitted_target[idx], splitted_prediction[idx]]
            cter += 1
    
    # Write empty line after every sentence
    df_conll_format.loc[cter] = [""]
    cter += 1


0it [00:00, ?it/s][A
10it [00:00, 92.16it/s][A
20it [00:00, 93.93it/s][A
30it [00:00, 95.20it/s][A
40it [00:00, 95.91it/s][A
50it [00:00, 95.54it/s][A
60it [00:00, 95.28it/s][A
70it [00:00, 93.86it/s][A
80it [00:00, 93.64it/s][A
89it [00:00, 91.54it/s][A
98it [00:01, 90.51it/s][A
107it [00:01, 89.62it/s][A
116it [00:01, 88.59it/s][A
125it [00:01, 87.46it/s][A
134it [00:01, 87.21it/s][A
143it [00:01, 85.27it/s][A
152it [00:01, 83.49it/s][A
161it [00:01, 82.20it/s][A
170it [00:01, 81.14it/s][A
179it [00:02, 82.41it/s][A
188it [00:02, 82.61it/s][A
197it [00:02, 83.60it/s][A
206it [00:02, 84.39it/s][A
215it [00:02, 83.82it/s][A
224it [00:02, 84.45it/s][A
233it [00:02, 84.19it/s][A
242it [00:02, 83.06it/s][A
251it [00:02, 82.67it/s][A
260it [00:02, 82.76it/s][A
269it [00:03, 81.62it/s][A
278it [00:03, 81.53it/s][A
287it [00:03, 81.26it/s][A
296it [00:03, 80.58it/s][A
305it [00:03, 78.93it/s][A
313it [00:03, 77.90it/s][A
322it [00:03, 79.85it/s][A
332it [

2982it [01:08, 15.17it/s][A
2984it [01:08, 15.01it/s][A
2986it [01:08, 15.11it/s][A
2988it [01:08, 15.25it/s][A
2990it [01:08, 15.29it/s][A
2992it [01:09, 15.31it/s][A
2994it [01:09, 15.31it/s][A
2996it [01:09, 15.17it/s][A
2998it [01:09, 15.14it/s][A
3000it [01:09, 15.08it/s][A
3002it [01:09, 15.01it/s][A
3004it [01:09, 14.91it/s][A
3006it [01:09, 14.84it/s][A
3008it [01:10, 14.71it/s][A
3010it [01:10, 14.66it/s][A
3012it [01:10, 14.76it/s][A
3014it [01:10, 14.59it/s][A
3016it [01:10, 14.56it/s][A
3018it [01:10, 14.54it/s][A
3020it [01:10, 14.55it/s][A
3022it [01:11, 14.52it/s][A
3024it [01:11, 14.50it/s][A
3026it [01:11, 14.62it/s][A
3028it [01:11, 14.61it/s][A
3030it [01:11, 14.68it/s][A
3032it [01:11, 14.49it/s][A
3034it [01:11, 14.49it/s][A
3036it [01:12, 14.50it/s][A
3038it [01:12, 14.58it/s][A
3040it [01:12, 14.40it/s][A
3042it [01:12, 14.43it/s][A
3044it [01:12, 14.35it/s][A
3046it [01:12, 14.30it/s][A
3048it [01:12, 14.46it/s][A
3050it [01:12,

3879it [02:28,  6.53it/s][A
3880it [02:28,  6.27it/s][A
3881it [02:29,  6.05it/s][A
3882it [02:29,  5.96it/s][A
3883it [02:29,  6.15it/s][A
3884it [02:29,  6.00it/s][A
3885it [02:29,  5.93it/s][A
3886it [02:29,  6.23it/s][A
3887it [02:29,  6.47it/s][A
3888it [02:30,  6.68it/s][A
3889it [02:30,  6.92it/s][A
3890it [02:30,  7.11it/s][A
3891it [02:30,  7.20it/s][A
3892it [02:30,  7.33it/s][A
3893it [02:30,  7.56it/s][A
3894it [02:30,  7.57it/s][A
3895it [02:31,  7.51it/s][A
3896it [02:31,  7.49it/s][A
3897it [02:31,  7.25it/s][A
3898it [02:31,  6.98it/s][A
3899it [02:31,  6.88it/s][A
3900it [02:31,  6.76it/s][A
3901it [02:31,  6.81it/s][A
3902it [02:32,  6.82it/s][A
3903it [02:32,  6.91it/s][A
3904it [02:32,  7.20it/s][A
3905it [02:32,  7.37it/s][A
3906it [02:32,  7.44it/s][A
3907it [02:32,  7.44it/s][A
3908it [02:32,  7.44it/s][A
3909it [02:32,  7.55it/s][A
3910it [02:33,  7.72it/s][A
3911it [02:33,  7.67it/s][A
3912it [02:33,  7.66it/s][A
3913it [02:33,

4443it [03:46,  5.85it/s][A
4444it [03:46,  5.72it/s][A
4445it [03:46,  5.58it/s][A
4446it [03:46,  5.52it/s][A
4447it [03:46,  5.54it/s][A
4448it [03:46,  5.53it/s][A
4449it [03:47,  5.50it/s][A
4450it [03:47,  5.44it/s][A
4451it [03:47,  5.41it/s][A
4452it [03:47,  5.35it/s][A
4453it [03:47,  5.33it/s][A
4454it [03:48,  5.31it/s][A
4455it [03:48,  5.31it/s][A
4456it [03:48,  5.34it/s][A
4457it [03:48,  5.34it/s][A
4458it [03:48,  5.33it/s][A
4459it [03:49,  5.29it/s][A
4460it [03:49,  5.33it/s][A
4461it [03:49,  5.33it/s][A
4462it [03:49,  5.35it/s][A
4463it [03:49,  5.35it/s][A
4464it [03:49,  5.39it/s][A
4465it [03:50,  5.31it/s][A
4466it [03:50,  5.20it/s][A
4467it [03:50,  5.16it/s][A
4468it [03:50,  5.11it/s][A
4469it [03:50,  5.13it/s][A
4470it [03:51,  5.18it/s][A
4471it [03:51,  5.17it/s][A
4472it [03:51,  5.17it/s][A
4473it [03:51,  5.17it/s][A
4474it [03:51,  5.20it/s][A
4475it [03:52,  5.24it/s][A
4476it [03:52,  5.26it/s][A
4477it [03:52,

5007it [05:44,  4.36it/s][A
5008it [05:44,  4.33it/s][A
5009it [05:45,  4.33it/s][A
5010it [05:45,  4.32it/s][A
5011it [05:45,  4.34it/s][A
5012it [05:45,  4.36it/s][A
5013it [05:46,  4.40it/s][A
5014it [05:46,  4.43it/s][A
5015it [05:46,  4.45it/s][A
5016it [05:46,  4.44it/s][A
5017it [05:47,  4.42it/s][A
5018it [05:47,  4.41it/s][A
5019it [05:47,  4.44it/s][A
5020it [05:47,  4.44it/s][A
5021it [05:47,  4.43it/s][A
5022it [05:48,  4.44it/s][A
5023it [05:48,  4.46it/s][A
5024it [05:48,  4.44it/s][A
5025it [05:48,  4.43it/s][A
5026it [05:49,  4.42it/s][A
5027it [05:49,  4.40it/s][A
5028it [05:49,  4.40it/s][A
5029it [05:49,  4.40it/s][A
5030it [05:49,  4.38it/s][A
5031it [05:50,  4.35it/s][A
5032it [05:50,  4.33it/s][A
5033it [05:50,  4.31it/s][A
5034it [05:50,  4.30it/s][A
5035it [05:51,  4.30it/s][A
5036it [05:51,  4.36it/s][A
5037it [05:51,  4.37it/s][A
5038it [05:51,  4.32it/s][A
5039it [05:52,  4.33it/s][A
5040it [05:52,  4.31it/s][A
5041it [05:52,

5571it [08:25,  2.99it/s][A
5572it [08:25,  3.02it/s][A
5573it [08:26,  3.02it/s][A
5574it [08:26,  3.02it/s][A
5575it [08:26,  3.01it/s][A
5576it [08:27,  3.00it/s][A
5577it [08:27,  3.00it/s][A
5578it [08:27,  2.91it/s][A
5579it [08:28,  2.91it/s][A
5580it [08:28,  2.95it/s][A
5581it [08:28,  2.98it/s][A
5582it [08:29,  2.97it/s][A
5583it [08:29,  2.96it/s][A
5584it [08:29,  2.98it/s][A
5585it [08:30,  2.96it/s][A
5586it [08:30,  2.96it/s][A
5587it [08:30,  2.99it/s][A
5588it [08:31,  2.95it/s][A
5589it [08:31,  2.95it/s][A
5590it [08:31,  2.87it/s][A
5591it [08:32,  2.93it/s][A
5592it [08:32,  2.99it/s][A
5593it [08:32,  3.01it/s][A
5594it [08:33,  3.01it/s][A
5595it [08:33,  3.02it/s][A
5596it [08:33,  3.03it/s][A
5597it [08:34,  3.04it/s][A
5598it [08:34,  3.06it/s][A
5599it [08:34,  3.07it/s][A
5600it [08:35,  3.04it/s][A
5601it [08:35,  3.03it/s][A
5602it [08:35,  3.03it/s][A
5603it [08:36,  3.02it/s][A
5604it [08:36,  3.03it/s][A
5605it [08:36,

6135it [12:04,  2.30it/s][A
6136it [12:05,  2.31it/s][A
6137it [12:05,  2.31it/s][A
6138it [12:06,  2.30it/s][A
6139it [12:06,  2.30it/s][A
6140it [12:07,  2.30it/s][A
6141it [12:07,  2.29it/s][A
6142it [12:07,  2.30it/s][A
6143it [12:08,  2.31it/s][A
6144it [12:08,  2.31it/s][A
6145it [12:09,  2.31it/s][A
6146it [12:09,  2.31it/s][A
6147it [12:10,  2.32it/s][A
6148it [12:10,  2.32it/s][A
6149it [12:10,  2.32it/s][A
6150it [12:11,  2.32it/s][A
6151it [12:11,  2.31it/s][A
6152it [12:12,  2.30it/s][A
6153it [12:12,  2.30it/s][A
6154it [12:13,  2.30it/s][A
6155it [12:13,  2.28it/s][A
6156it [12:13,  2.28it/s][A
6157it [12:14,  2.29it/s][A
6158it [12:14,  2.29it/s][A
6159it [12:15,  2.30it/s][A
6160it [12:15,  2.30it/s][A
6161it [12:16,  2.29it/s][A
6162it [12:16,  2.29it/s][A
6163it [12:17,  2.29it/s][A
6164it [12:17,  2.27it/s][A
6165it [12:17,  2.25it/s][A
6166it [12:18,  2.26it/s][A
6167it [12:18,  2.26it/s][A
6168it [12:19,  2.27it/s][A
6169it [12:19,

6699it [16:51,  1.80it/s][A
6700it [16:52,  1.79it/s][A
6701it [16:52,  1.77it/s][A
6702it [16:53,  1.78it/s][A
6703it [16:54,  1.78it/s][A
6704it [16:54,  1.77it/s][A
6705it [16:55,  1.76it/s][A
6706it [16:55,  1.77it/s][A
6707it [16:56,  1.77it/s][A
6708it [16:56,  1.78it/s][A
6709it [16:57,  1.78it/s][A
6710it [16:58,  1.77it/s][A
6711it [16:58,  1.76it/s][A
6712it [16:59,  1.76it/s][A
6713it [16:59,  1.78it/s][A
6714it [17:00,  1.77it/s][A
6715it [17:00,  1.78it/s][A
6716it [17:01,  1.77it/s][A
6717it [17:01,  1.77it/s][A
6718it [17:02,  1.76it/s][A
6719it [17:03,  1.75it/s][A
6720it [17:03,  1.72it/s][A
6721it [17:04,  1.71it/s][A
6722it [17:04,  1.72it/s][A
6723it [17:05,  1.72it/s][A
6724it [17:06,  1.71it/s][A
6725it [17:06,  1.68it/s][A
6726it [17:07,  1.68it/s][A
6727it [17:07,  1.72it/s][A
6728it [17:08,  1.73it/s][A
6729it [17:08,  1.75it/s][A
6730it [17:09,  1.76it/s][A
6731it [17:10,  1.77it/s][A
6732it [17:10,  1.78it/s][A
6733it [17:11,

7263it [23:12,  1.41it/s][A
7264it [23:13,  1.40it/s][A
7265it [23:14,  1.40it/s][A
7266it [23:14,  1.40it/s][A
7267it [23:15,  1.40it/s][A
7268it [23:16,  1.40it/s][A
7269it [23:16,  1.40it/s][A
7270it [23:17,  1.40it/s][A
7271it [23:18,  1.40it/s][A
7272it [23:19,  1.40it/s][A
7273it [23:19,  1.40it/s][A
7274it [23:20,  1.40it/s][A
7275it [23:21,  1.40it/s][A
7276it [23:21,  1.40it/s][A
7277it [23:22,  1.40it/s][A
7278it [23:23,  1.40it/s][A
7279it [23:23,  1.41it/s][A
7280it [23:24,  1.40it/s][A
7281it [23:25,  1.39it/s][A
7282it [23:26,  1.37it/s][A
7283it [23:26,  1.37it/s][A
7284it [23:27,  1.36it/s][A
7285it [23:28,  1.37it/s][A
7286it [23:29,  1.37it/s][A
7287it [23:29,  1.37it/s][A
7288it [23:30,  1.38it/s][A
7289it [23:31,  1.29it/s][A
7290it [23:32,  1.23it/s][A
7291it [23:33,  1.20it/s][A
7292it [23:34,  1.18it/s][A
7293it [23:35,  1.16it/s][A
7294it [23:35,  1.15it/s][A
7295it [23:36,  1.14it/s][A
7296it [23:37,  1.14it/s][A
7297it [23:38,

7827it [31:53,  1.09s/it][A
7828it [31:54,  1.10s/it][A
7829it [31:55,  1.10s/it][A
7830it [31:56,  1.11s/it][A
7831it [31:57,  1.12s/it][A
7832it [31:58,  1.12s/it][A
7833it [31:59,  1.12s/it][A
7834it [32:00,  1.12s/it][A
7835it [32:02,  1.12s/it][A
7836it [32:03,  1.12s/it][A
7837it [32:04,  1.12s/it][A
7838it [32:05,  1.13s/it][A
7839it [32:06,  1.13s/it][A
7840it [32:07,  1.12s/it][A
7841it [32:08,  1.11s/it][A
7842it [32:09,  1.12s/it][A
7843it [32:10,  1.11s/it][A
7844it [32:12,  1.11s/it][A
7845it [32:13,  1.11s/it][A
7846it [32:14,  1.11s/it][A
7847it [32:15,  1.11s/it][A
7848it [32:16,  1.11s/it][A
7849it [32:17,  1.11s/it][A
7850it [32:18,  1.10s/it][A
7851it [32:19,  1.10s/it][A
7852it [32:20,  1.10s/it][A
7853it [32:22,  1.10s/it][A
7854it [32:23,  1.09s/it][A
7855it [32:24,  1.09s/it][A
7856it [32:25,  1.08s/it][A
7857it [32:26,  1.09s/it][A
7858it [32:27,  1.09s/it][A
7859it [32:28,  1.09s/it][A
7860it [32:29,  1.08s/it][A
7861it [32:30,

8391it [43:33,  1.43s/it][A
8392it [43:35,  1.42s/it][A
8393it [43:36,  1.43s/it][A
8394it [43:38,  1.43s/it][A
8395it [43:39,  1.41s/it][A
8396it [43:40,  1.40s/it][A
8397it [43:42,  1.40s/it][A
8398it [43:43,  1.40s/it][A
8399it [43:44,  1.38s/it][A
8400it [43:46,  1.37s/it][A
8401it [43:47,  1.45s/it][A
8402it [43:49,  1.50s/it][A
8403it [43:51,  1.55s/it][A
8404it [43:52,  1.58s/it][A
8405it [43:54,  1.60s/it][A
8406it [43:56,  1.61s/it][A
8407it [43:57,  1.62s/it][A
8408it [43:59,  1.62s/it][A
8409it [44:01,  1.62s/it][A
8410it [44:02,  1.63s/it][A
8411it [44:04,  1.64s/it][A
8412it [44:06,  1.65s/it][A
8413it [44:07,  1.66s/it][A
8414it [44:09,  1.68s/it][A
8415it [44:11,  1.67s/it][A
8416it [44:12,  1.67s/it][A
8417it [44:14,  1.66s/it][A
8418it [44:16,  1.65s/it][A
8419it [44:17,  1.66s/it][A
8420it [44:19,  1.66s/it][A
8421it [44:21,  1.67s/it][A
8422it [44:22,  1.67s/it][A
8423it [44:24,  1.67s/it][A
8424it [44:26,  1.66s/it][A
8425it [44:27,

8955it [59:56,  1.88s/it][A
8956it [59:58,  1.87s/it][A
8957it [59:59,  1.87s/it][A
8958it [1:00:01,  1.86s/it][A
8959it [1:00:03,  1.86s/it][A
8960it [1:00:05,  1.86s/it][A
8961it [1:00:07,  1.84s/it][A
8962it [1:00:09,  1.83s/it][A
8963it [1:00:10,  1.83s/it][A
8964it [1:00:12,  1.82s/it][A
8965it [1:00:14,  1.82s/it][A
8966it [1:00:16,  1.83s/it][A
8967it [1:00:18,  1.83s/it][A
8968it [1:00:20,  1.83s/it][A
8969it [1:00:21,  1.82s/it][A
8970it [1:00:23,  1.81s/it][A
8971it [1:00:25,  1.81s/it][A
8972it [1:00:27,  1.81s/it][A
8973it [1:00:29,  1.81s/it][A
8974it [1:00:30,  1.82s/it][A
8975it [1:00:32,  1.82s/it][A
8976it [1:00:34,  1.83s/it][A
8977it [1:00:36,  1.83s/it][A
8978it [1:00:38,  1.83s/it][A
8979it [1:00:40,  1.82s/it][A
8980it [1:00:41,  1.84s/it][A
8981it [1:00:43,  1.84s/it][A
8982it [1:00:45,  1.84s/it][A
8983it [1:00:47,  1.84s/it][A
8984it [1:00:49,  1.85s/it][A
8985it [1:00:51,  2.03s/it][A
8986it [1:00:54,  2.17s/it][A
8987it [1:00:5

In [119]:
df_conll_format.tail()

Unnamed: 0,sentence_idx,word,target,prediction
169576,9479,rights,O,O
169577,9479,of,O,O
169578,9479,goods,O,O
169579,9479,.,O,O
169580,9479,.,O,O


#### Save dataframe

In [112]:
df_conll_format.to_csv("../../data/test_predictions_for_eval", sep=' ', index=False, header=False)

### Separate evaluation

#### Build dataframe

In [118]:
df_good_length = pd.DataFrame(columns=["sentence_idx", "word", "target", "prediction"])
df_bad_length  = pd.DataFrame(columns=["sentence_idx", "word", "target", "prediction"])
good_cter = 0
bad_cter  = 0

for idx,row in tqdm(predictions.iterrows()):
    splitted_source = row["source"].split(" ")
    splitted_target = row["target"].split(" ")
    splitted_prediction = row["prediction"].split(" ")
    
    # If prediction and target have same length
    if len(splitted_target) == len(splitted_prediction):
        for idx in range(len(splitted_target)):
            df_good_length.loc[good_cter] = [row["sentence_idx"], splitted_source[idx], \
                                         splitted_target[idx], splitted_prediction[idx]]
            good_cter += 1
        # Write empty line
        df_good_length.loc[good_cter] = ["", "", "", ""]
        good_cter += 1
            
    # If target is longer
    elif len(splitted_target) > len(splitted_prediction):
        for idx in range(len(splitted_prediction)):
            df_bad_length.loc[bad_cter] = [row["sentence_idx"], splitted_source[idx], \
                                         splitted_target[idx], splitted_prediction[idx]]
            bad_cter += 1
        # Fill prediction with "O" tags
        for idx in range(len(splitted_prediction), len(splitted_target)):
            df_bad_length.loc[bad_cter] = [row["sentence_idx"], splitted_source[idx], \
                                         splitted_target[idx], "O"]
            bad_cter += 1
        # Write empty line
        df_bad_length.loc[bad_cter] = ["", "", "", ""]
        bad_cter += 1            
            
    # If prediction is longer
    else:
        for idx in range(len(splitted_target)):
            df_bad_length.loc[bad_cter] = [row["sentence_idx"], splitted_source[idx], \
                                         splitted_target[idx], splitted_prediction[idx]]
            bad_cter += 1
        # Write empty line
        df_bad_length.loc[bad_cter] = ["", "", "", ""]
        bad_cter += 1



0it [00:00, ?it/s][A[A

8it [00:00, 79.98it/s][A[A

17it [00:00, 81.98it/s][A[A

26it [00:00, 82.66it/s][A[A

35it [00:00, 82.34it/s][A[A

43it [00:00, 81.28it/s][A[A

52it [00:00, 81.40it/s][A[A

60it [00:00, 80.85it/s][A[A

69it [00:00, 81.19it/s][A[A

77it [00:00, 80.78it/s][A[A

85it [00:01, 80.04it/s][A[A

93it [00:01, 79.30it/s][A[A

101it [00:01, 79.03it/s][A[A

109it [00:01, 78.55it/s][A[A

117it [00:01, 78.15it/s][A[A

125it [00:01, 77.55it/s][A[A

133it [00:01, 76.24it/s][A[A

141it [00:01, 76.08it/s][A[A

149it [00:01, 75.20it/s][A[A

157it [00:01, 74.50it/s][A[A

165it [00:02, 74.59it/s][A[A

173it [00:02, 74.62it/s][A[A

181it [00:02, 73.61it/s][A[A

189it [00:02, 73.29it/s][A[A

197it [00:02, 73.46it/s][A[A

205it [00:02, 73.63it/s][A[A

213it [00:02, 73.36it/s][A[A

221it [00:02, 73.19it/s][A[A

229it [00:02, 73.00it/s][A[A

237it [00:03, 72.72it/s][A[A

245it [00:03, 72.02it/s][A[A

253it [00:03, 72.09it/s]

2678it [00:58, 21.87it/s][A[A

2681it [00:58, 21.79it/s][A[A

2684it [00:59, 21.94it/s][A[A

2687it [00:59, 22.11it/s][A[A

2690it [00:59, 22.40it/s][A[A

2693it [00:59, 22.64it/s][A[A

2696it [00:59, 22.78it/s][A[A

2699it [00:59, 22.98it/s][A[A

2702it [00:59, 22.71it/s][A[A

2705it [00:59, 22.69it/s][A[A

2708it [01:00, 22.72it/s][A[A

2711it [01:00, 22.78it/s][A[A

2714it [01:00, 22.97it/s][A[A

2717it [01:00, 23.07it/s][A[A

2720it [01:00, 23.88it/s][A[A

2723it [01:00, 19.94it/s][A[A

2726it [01:01, 17.83it/s][A[A

2728it [01:01, 16.43it/s][A[A

2730it [01:01, 15.55it/s][A[A

2732it [01:01, 14.99it/s][A[A

2734it [01:01, 14.71it/s][A[A

2736it [01:01, 14.51it/s][A[A

2738it [01:01, 14.37it/s][A[A

2740it [01:02, 14.14it/s][A[A

2742it [01:02, 14.10it/s][A[A

2744it [01:02, 14.09it/s][A[A

2746it [01:02, 14.01it/s][A[A

2748it [01:02, 13.97it/s][A[A

2750it [01:02, 14.05it/s][A[A

2752it [01:02, 13.97it/s][A[A

2754it [01

3665it [02:19,  7.51it/s][A[A

3666it [02:19,  7.48it/s][A[A

3667it [02:19,  7.48it/s][A[A

3668it [02:19,  7.45it/s][A[A

3669it [02:19,  7.49it/s][A[A

3670it [02:19,  7.43it/s][A[A

3671it [02:19,  7.43it/s][A[A

3672it [02:20,  7.42it/s][A[A

3673it [02:20,  7.44it/s][A[A

3674it [02:20,  7.40it/s][A[A

3675it [02:20,  7.43it/s][A[A

3676it [02:20,  7.44it/s][A[A

3677it [02:20,  7.40it/s][A[A

3678it [02:20,  7.44it/s][A[A

3679it [02:20,  7.44it/s][A[A

3680it [02:21,  7.43it/s][A[A

3681it [02:21,  7.37it/s][A[A

3682it [02:21,  7.38it/s][A[A

3683it [02:21,  7.39it/s][A[A

3684it [02:21,  7.37it/s][A[A

3685it [02:21,  7.36it/s][A[A

3686it [02:21,  7.36it/s][A[A

3687it [02:22,  7.35it/s][A[A

3688it [02:22,  7.38it/s][A[A

3689it [02:22,  7.40it/s][A[A

3690it [02:22,  7.40it/s][A[A

3691it [02:22,  7.34it/s][A[A

3692it [02:22,  7.32it/s][A[A

3693it [02:22,  7.30it/s][A[A

3694it [02:23,  7.26it/s][A[A

3695it [02

4161it [03:33,  6.37it/s][A[A

4162it [03:33,  6.37it/s][A[A

4163it [03:33,  6.39it/s][A[A

4164it [03:33,  6.35it/s][A[A

4165it [03:33,  6.34it/s][A[A

4166it [03:33,  6.34it/s][A[A

4167it [03:33,  6.41it/s][A[A

4168it [03:34,  6.44it/s][A[A

4169it [03:34,  6.39it/s][A[A

4170it [03:34,  6.36it/s][A[A

4171it [03:34,  6.39it/s][A[A

4172it [03:34,  6.39it/s][A[A

4173it [03:34,  6.35it/s][A[A

4174it [03:35,  6.39it/s][A[A

4175it [03:35,  6.45it/s][A[A

4176it [03:35,  6.37it/s][A[A

4177it [03:35,  6.34it/s][A[A

4178it [03:35,  6.30it/s][A[A

4179it [03:35,  6.33it/s][A[A

4180it [03:36,  6.30it/s][A[A

4181it [03:36,  6.27it/s][A[A

4182it [03:36,  6.29it/s][A[A

4183it [03:36,  6.31it/s][A[A

4184it [03:36,  6.28it/s][A[A

4185it [03:36,  6.27it/s][A[A

4186it [03:37,  6.27it/s][A[A

4187it [03:37,  6.28it/s][A[A

4188it [03:37,  6.26it/s][A[A

4189it [03:37,  6.29it/s][A[A

4190it [03:37,  6.31it/s][A[A

4191it [03

4657it [05:07,  4.21it/s][A[A

4658it [05:07,  4.22it/s][A[A

4659it [05:07,  4.20it/s][A[A

4660it [05:08,  4.19it/s][A[A

4661it [05:08,  4.20it/s][A[A

4662it [05:08,  4.19it/s][A[A

4663it [05:08,  4.19it/s][A[A

4664it [05:09,  4.19it/s][A[A

4665it [05:09,  4.18it/s][A[A

4666it [05:09,  4.21it/s][A[A

4667it [05:09,  4.21it/s][A[A

4668it [05:10,  4.22it/s][A[A

4669it [05:10,  4.23it/s][A[A

4670it [05:10,  4.21it/s][A[A

4671it [05:10,  4.21it/s][A[A

4672it [05:11,  4.24it/s][A[A

4673it [05:11,  4.22it/s][A[A

4674it [05:11,  4.24it/s][A[A

4675it [05:11,  4.26it/s][A[A

4676it [05:11,  4.25it/s][A[A

4677it [05:12,  4.24it/s][A[A

4678it [05:12,  4.25it/s][A[A

4679it [05:12,  4.23it/s][A[A

4680it [05:12,  4.20it/s][A[A

4681it [05:13,  4.17it/s][A[A

4682it [05:13,  4.17it/s][A[A

4683it [05:13,  4.19it/s][A[A

4684it [05:13,  4.19it/s][A[A

4685it [05:14,  4.20it/s][A[A

4686it [05:14,  4.19it/s][A[A

4687it [05

5153it [07:09,  3.13it/s][A[A

5154it [07:10,  3.14it/s][A[A

5155it [07:10,  3.14it/s][A[A

5156it [07:10,  3.13it/s][A[A

5157it [07:11,  3.13it/s][A[A

5158it [07:11,  3.12it/s][A[A

5159it [07:11,  3.12it/s][A[A

5160it [07:12,  3.11it/s][A[A

5161it [07:12,  3.11it/s][A[A

5162it [07:12,  3.11it/s][A[A

5163it [07:12,  3.10it/s][A[A

5164it [07:13,  3.07it/s][A[A

5165it [07:13,  3.07it/s][A[A

5166it [07:13,  3.07it/s][A[A

5167it [07:14,  3.08it/s][A[A

5168it [07:14,  3.09it/s][A[A

5169it [07:14,  3.09it/s][A[A

5170it [07:15,  3.08it/s][A[A

5171it [07:15,  3.10it/s][A[A

5172it [07:15,  3.09it/s][A[A

5173it [07:16,  3.07it/s][A[A

5174it [07:16,  3.04it/s][A[A

5175it [07:16,  2.98it/s][A[A

5176it [07:17,  2.95it/s][A[A

5177it [07:17,  2.95it/s][A[A

5178it [07:17,  2.96it/s][A[A

5179it [07:18,  2.96it/s][A[A

5180it [07:18,  2.98it/s][A[A

5181it [07:18,  3.00it/s][A[A

5182it [07:19,  2.95it/s][A[A

5183it [07

5649it [09:59,  2.75it/s][A[A

5650it [10:00,  2.76it/s][A[A

5651it [10:00,  2.75it/s][A[A

5652it [10:00,  2.76it/s][A[A

5653it [10:01,  2.76it/s][A[A

5654it [10:01,  2.77it/s][A[A

5655it [10:02,  2.77it/s][A[A

5656it [10:02,  2.76it/s][A[A

5657it [10:02,  2.78it/s][A[A

5658it [10:03,  2.77it/s][A[A

5659it [10:03,  2.77it/s][A[A

5660it [10:03,  2.78it/s][A[A

5661it [10:04,  2.77it/s][A[A

5662it [10:04,  2.77it/s][A[A

5663it [10:04,  2.78it/s][A[A

5664it [10:05,  2.78it/s][A[A

5665it [10:05,  2.77it/s][A[A

5666it [10:06,  2.77it/s][A[A

5667it [10:06,  2.78it/s][A[A

5668it [10:06,  2.78it/s][A[A

5669it [10:07,  2.77it/s][A[A

5670it [10:07,  2.77it/s][A[A

5671it [10:07,  2.76it/s][A[A

5672it [10:08,  2.76it/s][A[A

5673it [10:08,  2.76it/s][A[A

5674it [10:08,  2.74it/s][A[A

5675it [10:09,  2.73it/s][A[A

5676it [10:09,  2.68it/s][A[A

5677it [10:10,  2.67it/s][A[A

5678it [10:10,  2.63it/s][A[A

5679it [10

6145it [13:35,  2.09it/s][A[A

6146it [13:35,  2.06it/s][A[A

6147it [13:36,  2.09it/s][A[A

6148it [13:36,  2.09it/s][A[A

6149it [13:37,  2.08it/s][A[A

6150it [13:37,  2.07it/s][A[A

6151it [13:38,  2.06it/s][A[A

6152it [13:38,  2.08it/s][A[A

6153it [13:39,  2.09it/s][A[A

6154it [13:39,  2.10it/s][A[A

6155it [13:40,  2.12it/s][A[A

6156it [13:40,  2.13it/s][A[A

6157it [13:41,  2.14it/s][A[A

6158it [13:41,  2.15it/s][A[A

6159it [13:42,  2.15it/s][A[A

6160it [13:42,  2.12it/s][A[A

6161it [13:43,  2.06it/s][A[A

6162it [13:43,  2.03it/s][A[A

6163it [13:44,  2.01it/s][A[A

6164it [13:44,  1.99it/s][A[A

6165it [13:45,  1.99it/s][A[A

6166it [13:45,  1.99it/s][A[A

6167it [13:46,  2.01it/s][A[A

6168it [13:46,  2.01it/s][A[A

6169it [13:47,  2.03it/s][A[A

6170it [13:47,  2.05it/s][A[A

6171it [13:48,  2.04it/s][A[A

6172it [13:48,  2.04it/s][A[A

6173it [13:49,  2.05it/s][A[A

6174it [13:49,  2.07it/s][A[A

6175it [13

6641it [18:09,  1.69it/s][A[A

6642it [18:10,  1.68it/s][A[A

6643it [18:11,  1.68it/s][A[A

6644it [18:11,  1.67it/s][A[A

6645it [18:12,  1.68it/s][A[A

6646it [18:12,  1.68it/s][A[A

6647it [18:13,  1.68it/s][A[A

6648it [18:14,  1.68it/s][A[A

6649it [18:14,  1.68it/s][A[A

6650it [18:15,  1.67it/s][A[A

6651it [18:15,  1.68it/s][A[A

6652it [18:16,  1.67it/s][A[A

6653it [18:16,  1.68it/s][A[A

6654it [18:17,  1.67it/s][A[A

6655it [18:18,  1.67it/s][A[A

6656it [18:18,  1.67it/s][A[A

6657it [18:19,  1.67it/s][A[A

6658it [18:20,  1.65it/s][A[A

6659it [18:20,  1.65it/s][A[A

6660it [18:21,  1.66it/s][A[A

6661it [18:21,  1.66it/s][A[A

6662it [18:22,  1.66it/s][A[A

6663it [18:23,  1.66it/s][A[A

6664it [18:23,  1.66it/s][A[A

6665it [18:24,  1.66it/s][A[A

6666it [18:24,  1.66it/s][A[A

6667it [18:25,  1.66it/s][A[A

6668it [18:26,  1.66it/s][A[A

6669it [18:26,  1.66it/s][A[A

6670it [18:27,  1.66it/s][A[A

6671it [18

7137it [23:58,  1.32it/s][A[A

7138it [23:59,  1.31it/s][A[A

7139it [24:00,  1.32it/s][A[A

7140it [24:01,  1.32it/s][A[A

7141it [24:01,  1.32it/s][A[A

7142it [24:02,  1.33it/s][A[A

7143it [24:03,  1.33it/s][A[A

7144it [24:04,  1.34it/s][A[A

7145it [24:04,  1.33it/s][A[A

7146it [24:05,  1.33it/s][A[A

7147it [24:06,  1.33it/s][A[A

7148it [24:07,  1.33it/s][A[A

7149it [24:08,  1.31it/s][A[A

7150it [24:08,  1.28it/s][A[A

7151it [24:09,  1.28it/s][A[A

7152it [24:10,  1.29it/s][A[A

7153it [24:11,  1.30it/s][A[A

7154it [24:11,  1.31it/s][A[A

7155it [24:12,  1.32it/s][A[A

7156it [24:13,  1.32it/s][A[A

7157it [24:14,  1.31it/s][A[A

7158it [24:14,  1.32it/s][A[A

7159it [24:15,  1.32it/s][A[A

7160it [24:16,  1.32it/s][A[A

7161it [24:17,  1.32it/s][A[A

7162it [24:18,  1.29it/s][A[A

7163it [24:18,  1.28it/s][A[A

7164it [24:19,  1.28it/s][A[A

7165it [24:20,  1.28it/s][A[A

7166it [24:21,  1.28it/s][A[A

7167it [24

7633it [31:27,  1.05it/s][A[A

7634it [31:28,  1.06it/s][A[A

7635it [31:29,  1.06it/s][A[A

7636it [31:30,  1.06it/s][A[A

7637it [31:31,  1.06it/s][A[A

7638it [31:32,  1.06it/s][A[A

7639it [31:33,  1.06it/s][A[A

7640it [31:34,  1.06it/s][A[A

7641it [31:35,  1.06it/s][A[A

7642it [31:36,  1.06it/s][A[A

7643it [31:37,  1.06it/s][A[A

7644it [31:38,  1.06it/s][A[A

7645it [31:39,  1.06it/s][A[A

7646it [31:40,  1.06it/s][A[A

7647it [31:40,  1.05it/s][A[A

7648it [31:41,  1.05it/s][A[A

7649it [31:42,  1.05it/s][A[A

7650it [31:43,  1.04it/s][A[A

7651it [31:44,  1.04it/s][A[A

7652it [31:45,  1.05it/s][A[A

7653it [31:46,  1.04it/s][A[A

7654it [31:47,  1.05it/s][A[A

7655it [31:48,  1.05it/s][A[A

7656it [31:49,  1.05it/s][A[A

7657it [31:50,  1.05it/s][A[A

7658it [31:51,  1.04it/s][A[A

7659it [31:52,  1.04it/s][A[A

7660it [31:53,  1.04it/s][A[A

7661it [31:54,  1.04it/s][A[A

7662it [31:55,  1.04it/s][A[A

7663it [31

8129it [41:08,  1.42s/it][A[A

8130it [41:09,  1.42s/it][A[A

8131it [41:11,  1.41s/it][A[A

8132it [41:12,  1.42s/it][A[A

8133it [41:13,  1.41s/it][A[A

8134it [41:15,  1.41s/it][A[A

8135it [41:16,  1.42s/it][A[A

8136it [41:18,  1.42s/it][A[A

8137it [41:19,  1.41s/it][A[A

8138it [41:20,  1.42s/it][A[A

8139it [41:22,  1.41s/it][A[A

8140it [41:23,  1.42s/it][A[A

8141it [41:25,  1.42s/it][A[A

8142it [41:26,  1.42s/it][A[A

8143it [41:28,  1.42s/it][A[A

8144it [41:29,  1.42s/it][A[A

8145it [41:30,  1.42s/it][A[A

8146it [41:32,  1.42s/it][A[A

8147it [41:33,  1.42s/it][A[A

8148it [41:35,  1.42s/it][A[A

8149it [41:36,  1.42s/it][A[A

8150it [41:37,  1.42s/it][A[A

8151it [41:39,  1.41s/it][A[A

8152it [41:40,  1.41s/it][A[A

8153it [41:42,  1.41s/it][A[A

8154it [41:43,  1.41s/it][A[A

8155it [41:45,  1.41s/it][A[A

8156it [41:46,  1.42s/it][A[A

8157it [41:47,  1.42s/it][A[A

8158it [41:49,  1.43s/it][A[A

8159it [41

8626it [53:38,  1.65s/it][A[A

8627it [53:40,  1.65s/it][A[A

8628it [53:41,  1.65s/it][A[A

8629it [53:43,  1.65s/it][A[A

8630it [53:45,  1.65s/it][A[A

8631it [53:46,  1.63s/it][A[A

8632it [53:48,  1.62s/it][A[A

8633it [53:49,  1.61s/it][A[A

8634it [53:51,  1.61s/it][A[A

8635it [53:53,  1.60s/it][A[A

8636it [53:54,  1.60s/it][A[A

8637it [53:56,  1.60s/it][A[A

8638it [53:57,  1.60s/it][A[A

8639it [53:59,  1.60s/it][A[A

8640it [54:01,  1.60s/it][A[A

8641it [54:02,  1.60s/it][A[A

8642it [54:04,  1.60s/it][A[A

8643it [54:06,  1.61s/it][A[A

8644it [54:07,  1.61s/it][A[A

8645it [54:09,  1.61s/it][A[A

8646it [54:10,  1.61s/it][A[A

8647it [54:12,  1.61s/it][A[A

8648it [54:14,  1.60s/it][A[A

8649it [54:15,  1.60s/it][A[A

8650it [54:17,  1.60s/it][A[A

8651it [54:18,  1.60s/it][A[A

8652it [54:20,  1.61s/it][A[A

8653it [54:22,  1.61s/it][A[A

8654it [54:23,  1.61s/it][A[A

8655it [54:25,  1.61s/it][A[A

8656it [54

8872it [1:01:14,  1.95s/it][A[A

8873it [1:01:16,  1.95s/it][A[A

8874it [1:01:18,  1.95s/it][A[A

8875it [1:01:20,  1.96s/it][A[A

8876it [1:01:22,  1.96s/it][A[A

8877it [1:01:24,  1.96s/it][A[A

8878it [1:01:26,  1.96s/it][A[A

8879it [1:01:28,  1.96s/it][A[A

8880it [1:01:30,  1.96s/it][A[A

8881it [1:01:32,  1.95s/it][A[A

8882it [1:01:34,  1.95s/it][A[A

8883it [1:01:36,  1.95s/it][A[A

8884it [1:01:38,  1.95s/it][A[A

8885it [1:01:40,  1.95s/it][A[A

8886it [1:01:42,  1.95s/it][A[A

8887it [1:01:44,  1.96s/it][A[A

8888it [1:01:46,  1.96s/it][A[A

8889it [1:01:48,  1.96s/it][A[A

8890it [1:01:50,  1.97s/it][A[A

8891it [1:01:51,  1.96s/it][A[A

8892it [1:01:53,  1.96s/it][A[A

8893it [1:01:55,  1.96s/it][A[A

8894it [1:01:57,  1.96s/it][A[A

8895it [1:01:59,  1.97s/it][A[A

8896it [1:02:01,  1.97s/it][A[A

8897it [1:02:03,  1.97s/it][A[A

8898it [1:02:05,  1.96s/it][A[A

8899it [1:02:07,  1.97s/it][A[A

8900it [1:02:09,  1.

9341it [1:19:56,  2.93s/it][A[A

9342it [1:19:59,  2.94s/it][A[A

9343it [1:20:02,  2.94s/it][A[A

9344it [1:20:05,  2.94s/it][A[A

9345it [1:20:08,  2.94s/it][A[A

9346it [1:20:11,  2.93s/it][A[A

9347it [1:20:14,  2.94s/it][A[A

9348it [1:20:17,  2.94s/it][A[A

9349it [1:20:19,  2.94s/it][A[A

9350it [1:20:22,  2.94s/it][A[A

9351it [1:20:25,  2.93s/it][A[A

9352it [1:20:28,  2.93s/it][A[A

9353it [1:20:31,  2.92s/it][A[A

9354it [1:20:34,  2.92s/it][A[A

9355it [1:20:37,  2.92s/it][A[A

9356it [1:20:40,  2.92s/it][A[A

9357it [1:20:43,  2.93s/it][A[A

9358it [1:20:46,  2.91s/it][A[A

9359it [1:20:49,  2.91s/it][A[A

9360it [1:20:52,  2.92s/it][A[A

9361it [1:20:55,  2.91s/it][A[A

9362it [1:20:57,  2.91s/it][A[A

9363it [1:21:00,  2.91s/it][A[A

9364it [1:21:03,  2.91s/it][A[A

9365it [1:21:06,  2.92s/it][A[A

9366it [1:21:09,  2.91s/it][A[A

9367it [1:21:12,  2.89s/it][A[A

9368it [1:21:15,  2.88s/it][A[A

9369it [1:21:18,  2.

#### Save dataframe

In [120]:
df_good_length.to_csv("../../data/good_test_predictions_for_eval", sep=' ', index=False, header=False)
df_bad_length.to_csv("../../data/bad_test_predictions_for_eval", sep=' ', index=False, header=False)