# Distinguishing comparison in sentences

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1yLxIhWBUiq2jHk-5npeB_EXhJDVOLYZg)

## Loading dependencies and code

In [1]:
!pip install pytorch-pretrained-bert

Collecting pytorch-pretrained-bert
[?25l  Downloading https://files.pythonhosted.org/packages/d7/e0/c08d5553b89973d9a240605b9c12404bcf8227590de62bae27acbcfe076b/pytorch_pretrained_bert-0.6.2-py3-none-any.whl (123kB)
[K     |██▋                             | 10kB 23.6MB/s eta 0:00:01[K     |█████▎                          | 20kB 30.4MB/s eta 0:00:01[K     |████████                        | 30kB 23.1MB/s eta 0:00:01[K     |██████████▋                     | 40kB 11.7MB/s eta 0:00:01[K     |█████████████▎                  | 51kB 10.2MB/s eta 0:00:01[K     |███████████████▉                | 61kB 10.7MB/s eta 0:00:01[K     |██████████████████▌             | 71kB 11.0MB/s eta 0:00:01[K     |█████████████████████▏          | 81kB 10.3MB/s eta 0:00:01[K     |███████████████████████▉        | 92kB 10.7MB/s eta 0:00:01[K     |██████████████████████████▌     | 102kB 11.6MB/s eta 0:00:01[K     |█████████████████████████████▏  | 112kB 11.6MB/s eta 0:00:01[K     |████████████

Downloading the data and the code needed to run training if the BERT model from google disk for simplicity (the same code is available at the git repository).  
Link:  
https://drive.google.com/open?id=1lID_vPscxUu1zZY1UDU1jHUYG4uJW7m0

In [27]:
!wget --no-check-certificate 'https://docs.google.com/uc?export=download&id=1lID_vPscxUu1zZY1UDU1jHUYG4uJW7m0' -O for_bert.zip

--2020-04-04 23:54:12--  https://docs.google.com/uc?export=download&id=1lID_vPscxUu1zZY1UDU1jHUYG4uJW7m0
Resolving docs.google.com (docs.google.com)... 172.217.5.78, 2607:f8b0:4007:803::200e
Connecting to docs.google.com (docs.google.com)|172.217.5.78|:443... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: https://doc-08-14-docs.googleusercontent.com/docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/41s1ead5s9qpl20rmscre2bsrubpbcbl/1586044425000/13476312262238289650/*/1lID_vPscxUu1zZY1UDU1jHUYG4uJW7m0?e=download [following]
--2020-04-04 23:54:13--  https://doc-08-14-docs.googleusercontent.com/docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/41s1ead5s9qpl20rmscre2bsrubpbcbl/1586044425000/13476312262238289650/*/1lID_vPscxUu1zZY1UDU1jHUYG4uJW7m0?e=download
Resolving doc-08-14-docs.googleusercontent.com (doc-08-14-docs.googleusercontent.com)... 216.58.193.193, 2607:f8b0:4007:80b::2001
Connecting to doc-08-14-docs.googleusercontent.com (doc-08-14-docs.googleuse

In [28]:
!unzip for_bert.zip

Archive:  for_bert.zip
   creating: data/
  inflating: data/data.csv           
  inflating: data/held-out-data.csv  
  inflating: data_extraction.py      


## Loading data

In [0]:
import pandas as pd
import numpy as np
from data_extraction import ExtractMiddlePart, ExtractRawSentence
from pytorch_pretrained_bert import BertTokenizer
from multiprocessing import Pool, cpu_count
from tqdm import trange
from tqdm.notebook import tqdm as tqdm_notebook
from sklearn.metrics import classification_report

In [0]:
names = ["object_a", "object_b", "sentence", "most_frequent_label"]

df_train = pd.read_csv("data/data.csv", usecols=names)
df_test = pd.read_csv("data/held-out-data.csv", usecols=names)

### Data preparing

In [0]:
label_list = ["BETTER", "WORSE", "NONE"]
num_labels = len(label_list)
label2idx = {}
idx2label = {}
for i, label in enumerate(label_list):
  label2idx[label] = i
  idx2label[i] = label

MAX_SEQ_LENGTH = 78

DATA_DIR = "pp_data/"
MODEL_DIR = 'model/'

In [0]:
def get_bert_format(df, label2idx, mode='full'):
  df_bert = pd.DataFrame()
  df_bert['id'] = range(len(df))
  df_bert['label'] = df['most_frequent_label'].replace(label2idx)
  # df_bert['alpha'] = ['a']*df.shape[0]
  df_bert['obj_a'] = df["object_a"]
  df_bert['obj_b'] = df["object_b"]
  if mode == 'full':
    df_bert["sentence"] = ExtractRawSentence().transform(df)
  elif mode == 'mid':
    df_bert["sentence"] = ExtractMiddlePart().transform(df)  
  return df_bert

df_train_bert = get_bert_format(df_train, label2idx, mode='full')
df_test_bert = get_bert_format(df_test, label2idx, mode='full')

Loading pre-trained model tokenizer (vocabulary)

In [7]:
tokenizer = BertTokenizer.from_pretrained('bert-base-cased', do_lower_case=False)
# tokenizer = BertTokenizer.from_pretrained(MODEL_DIR + 'vocab.txt', do_lower_case=False)

100%|██████████| 213450/213450 [00:00<00:00, 1190270.94B/s]


Percentiles for lengths of token sequences using full sentences

In [6]:
lengths = []

for row in df_train_bert.iterrows():
  example = row[1]
  tokens_a = tokenizer.tokenize(example["sentence"])
  tokens_b = tokenizer.tokenize(example["obj_a"] + " " + example["obj_b"])
  total_length = len(tokens_a) + len(tokens_b) + 3
  lengths.append(total_length)

len_ser = pd.Series(lengths)
len_ser.quantile([.9, .95, .96, .97, .98, .99, 1.0])

0.90     61.20
0.95     78.00
0.96     85.00
0.97     91.00
0.98    101.84
0.99    117.00
1.00    226.00
dtype: float64

Percentiles for lengths of token sequences using middle parts of the sentences

In [11]:
lengths = []

for row in df_train_bert.iterrows():
  example = row[1]
  tokens_a = tokenizer.tokenize(example["sentence"])
  tokens_b = tokenizer.tokenize(example["obj_a"] + " " + example["obj_b"])
  total_length = len(tokens_a) + len(tokens_b) + 3
  lengths.append(total_length)

len_ser = pd.Series(lengths)
len_ser.quantile([.9, .95, .96, .97, .98, .99, 1.0])

0.90     27.00
0.95     33.00
0.96     36.00
0.97     39.00
0.98     44.00
0.99     55.42
1.00    153.00
dtype: float64

In [0]:
train_examples_for_processing = [(example[1], MAX_SEQ_LENGTH, tokenizer) for example in df_train_bert.iterrows()]
test_examples_for_processing = [(example[1], MAX_SEQ_LENGTH, tokenizer) for example in df_test_bert.iterrows()]

In [0]:
def convert_example_to_feature(example_row):
  example, max_seq_length, tokenizer = example_row
  
  tokens_a = tokenizer.tokenize(example["sentence"])
  
  tokens_b = tokenizer.tokenize(example["obj_a"] + " " + example["obj_b"])
  total_length = len(tokens_a) + len(tokens_b) + 3
  if total_length > max_seq_length:
    tokens_a = tokens_a[:(max_seq_length - (len(tokens_b) + 3))]
  
  tokens = ["[CLS]"] + tokens_a + ["[SEP]"]
  segment_ids = [0] * len(tokens)
  
  if tokens_b:
    tokens += tokens_b + ["[SEP]"]
    segment_ids += [1] * (len(tokens_b) + 1)
    
  input_ids = tokenizer.convert_tokens_to_ids(tokens)
  
  # The mask has 1 for real tokens and 0 for padding tokens. Only real
  # tokens are attended to.
  input_mask = [1] * len(input_ids)
  
  # Zero-pad up to the sequence length.
  padding = [0] * (max_seq_length - len(input_ids))
  input_ids += padding
  input_mask += padding
  segment_ids += padding

  assert len(input_ids) == max_seq_length
  assert len(input_mask) == max_seq_length
  assert len(segment_ids) == max_seq_length

  label_id = example["label"]

  return {'input_ids': input_ids,
          'input_mask': input_mask,
          'segment_ids': segment_ids,
          'label_id': label_id}

In [10]:
process_count = cpu_count() - 1
print(f'Preparing to convert {len(df_train_bert)} examples..')
print(f'Spawning {process_count} processes..')
with Pool(process_count) as p:
  train_features = list(tqdm_notebook(p.imap(convert_example_to_feature, train_examples_for_processing), total=len(df_train_bert)))
print(f'Preparing to convert {len(df_test_bert)} examples..')
print(f'Spawning {process_count} processes..')
with Pool(process_count) as p:
  test_features = list(tqdm_notebook(p.imap(convert_example_to_feature, test_examples_for_processing), total=len(df_test_bert)))

Preparing to convert 5759 examples..
Spawning 1 processes..


HBox(children=(IntProgress(value=0, max=5759), HTML(value='')))


Preparing to convert 1440 examples..
Spawning 1 processes..


HBox(children=(IntProgress(value=0, max=1440), HTML(value='')))




In [0]:
# with open(DATA_DIR + "train_features.pkl", "wb") as f:
#   pickle.dump(train_features, f)
# with open(DATA_DIR + "test_features.pkl", "wb") as f:
#   pickle.dump(test_features, f)

In [0]:
def get_tensors(features, with_labels=True):
  all_input_ids = torch.tensor([f["input_ids"] for f in features], dtype=torch.long)
  all_input_mask = torch.tensor([f["input_mask"] for f in features], dtype=torch.long)
  all_segment_ids = torch.tensor([f["segment_ids"] for f in features], dtype=torch.long)
  if with_labels:
    all_label_ids = torch.tensor([f["label_id"] for f in features], dtype=torch.long)
    return all_input_ids, all_input_mask, all_segment_ids, all_label_ids
  else:
    return all_input_ids, all_input_mask, all_segment_ids

### BERT

In [0]:
import torch
import pickle
from torch.utils.data import DataLoader, RandomSampler, SequentialSampler, TensorDataset
from torch.nn import CrossEntropyLoss

import os
from pytorch_pretrained_bert import BertForSequenceClassification
from pytorch_pretrained_bert.optimization import BertAdam


device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

In [0]:
BERT_MODEL = 'bert-base-cased'

CACHE_DIR = 'cache/'

TRAIN_BATCH_SIZE = 24
EVAL_BATCH_SIZE = 32
LEARNING_RATE = 1e-5

NUM_TRAIN_EPOCHS = 30

RANDOM_SEED = 42
GRADIENT_ACCUMULATION_STEPS = 1
WARMUP_PROPORTION = 0.1

CONFIG_NAME = "config.json"
WEIGHTS_NAME = "pytorch_model.bin"

In [0]:
# if os.path.exists(MODEL_DIR) and os.listdir(MODEL_DIR):
#   raise ValueError("Model directory ({}) already exists and is not empty.".format(MODEL_DIR))
if not os.path.exists(MODEL_DIR):
  os.makedirs(MODEL_DIR)

The BERT model is pretrained. We either load the bert-base-cased model and train it for our task or load the model we had trained previously.

In [16]:
model = BertForSequenceClassification.from_pretrained(BERT_MODEL, cache_dir=CACHE_DIR, num_labels=num_labels)
# model = BertForSequenceClassification.from_pretrained(MODEL_DIR, cache_dir=CACHE_DIR, num_labels=num_labels)

100%|██████████| 404400730/404400730 [00:10<00:00, 38983968.38B/s]


In [17]:
model.to(device)

BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(28996, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): BertLayerNorm()
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): BertLayerNorm()
              (dropout): Dropout(p=0.1, inplace=False)
   

### Training

In [0]:
num_train_optimization_steps = int(len(df_train_bert) / TRAIN_BATCH_SIZE / GRADIENT_ACCUMULATION_STEPS) * NUM_TRAIN_EPOCHS

In [0]:
param_optimizer = list(model.named_parameters())
no_decay = ['bias', 'LayerNorm.bias', 'LayerNorm.weight']
optimizer_grouped_parameters = [
    {'params': [p for n, p in param_optimizer if not any(nd in n for nd in no_decay)], 'weight_decay': 0.01},
    {'params': [p for n, p in param_optimizer if any(nd in n for nd in no_decay)], 'weight_decay': 0.0}
    ]

In [0]:
optimizer = BertAdam(optimizer_grouped_parameters,
                     lr=LEARNING_RATE,
                     warmup=WARMUP_PROPORTION,
                     t_total=num_train_optimization_steps)

In [0]:
all_input_ids, all_input_mask, all_segment_ids, all_label_ids = get_tensors(train_features)

train_data = TensorDataset(all_input_ids, all_input_mask, all_segment_ids, all_label_ids)
train_sampler = RandomSampler(train_data)
train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=TRAIN_BATCH_SIZE, drop_last=True)

In [22]:
global_step = 0
nb_tr_steps = 0
tr_loss = 0

model.train()
for _ in trange(int(NUM_TRAIN_EPOCHS), desc="Epoch"):
    tr_loss = 0
    nb_tr_examples, nb_tr_steps = 0, 0
    for step, batch in enumerate(tqdm_notebook(train_dataloader, desc="Iteration")):
        batch = tuple(t.to(device) for t in batch)
        input_ids, input_mask, segment_ids, label_ids = batch

        logits = model(input_ids, segment_ids, input_mask, labels=None)

        loss_fct = CrossEntropyLoss()
        loss = loss_fct(logits.view(-1, num_labels), label_ids.view(-1))

        if GRADIENT_ACCUMULATION_STEPS > 1:
            loss = loss / GRADIENT_ACCUMULATION_STEPS

        loss.backward()
        print("\r%f" % loss, end='')
        
        tr_loss += loss.item()
        nb_tr_examples += input_ids.size(0)
        nb_tr_steps += 1
        if (step + 1) % GRADIENT_ACCUMULATION_STEPS == 0:
            optimizer.step()
            optimizer.zero_grad()
            global_step += 1

Epoch:   0%|          | 0/30 [00:00<?, ?it/s]

HBox(children=(IntProgress(value=0, description='Iteration', max=239, style=ProgressStyle(description_width='i…

0.834589

Epoch:   3%|▎         | 1/30 [02:12<1:03:50, 132.10s/it]

0.678435


HBox(children=(IntProgress(value=0, description='Iteration', max=239, style=ProgressStyle(description_width='i…

0.473400

Epoch:   7%|▋         | 2/30 [04:24<1:01:39, 132.14s/it]

0.307587


HBox(children=(IntProgress(value=0, description='Iteration', max=239, style=ProgressStyle(description_width='i…

0.148673

Epoch:  10%|█         | 3/30 [06:37<59:36, 132.47s/it]  

0.293608


HBox(children=(IntProgress(value=0, description='Iteration', max=239, style=ProgressStyle(description_width='i…

0.141626

Epoch:  13%|█▎        | 4/30 [08:51<57:31, 132.76s/it]

0.086600


HBox(children=(IntProgress(value=0, description='Iteration', max=239, style=ProgressStyle(description_width='i…

0.040857

Epoch:  17%|█▋        | 5/30 [11:04<55:24, 132.98s/it]

0.060530


HBox(children=(IntProgress(value=0, description='Iteration', max=239, style=ProgressStyle(description_width='i…

0.111425

Epoch:  20%|██        | 6/30 [13:18<53:15, 133.15s/it]

0.011343


HBox(children=(IntProgress(value=0, description='Iteration', max=239, style=ProgressStyle(description_width='i…

0.007578

Epoch:  23%|██▎       | 7/30 [15:31<51:05, 133.27s/it]

0.072120


HBox(children=(IntProgress(value=0, description='Iteration', max=239, style=ProgressStyle(description_width='i…

0.009614

Epoch:  27%|██▋       | 8/30 [17:45<48:52, 133.32s/it]

0.043905


HBox(children=(IntProgress(value=0, description='Iteration', max=239, style=ProgressStyle(description_width='i…

0.002171

Epoch:  30%|███       | 9/30 [19:58<46:41, 133.38s/it]

0.003283


HBox(children=(IntProgress(value=0, description='Iteration', max=239, style=ProgressStyle(description_width='i…

0.006176

Epoch:  33%|███▎      | 10/30 [22:12<44:29, 133.45s/it]

0.000971


HBox(children=(IntProgress(value=0, description='Iteration', max=239, style=ProgressStyle(description_width='i…

0.001949

Epoch:  37%|███▋      | 11/30 [24:25<42:15, 133.46s/it]

0.001413


HBox(children=(IntProgress(value=0, description='Iteration', max=239, style=ProgressStyle(description_width='i…

0.112302

Epoch:  40%|████      | 12/30 [26:39<40:01, 133.43s/it]

0.002048


HBox(children=(IntProgress(value=0, description='Iteration', max=239, style=ProgressStyle(description_width='i…

0.001486

Epoch:  43%|████▎     | 13/30 [28:52<37:47, 133.40s/it]

0.000879


HBox(children=(IntProgress(value=0, description='Iteration', max=239, style=ProgressStyle(description_width='i…

0.001151

Epoch:  47%|████▋     | 14/30 [31:05<35:35, 133.46s/it]

0.034648


HBox(children=(IntProgress(value=0, description='Iteration', max=239, style=ProgressStyle(description_width='i…

0.002822

Epoch:  50%|█████     | 15/30 [33:19<33:21, 133.44s/it]

0.000993


HBox(children=(IntProgress(value=0, description='Iteration', max=239, style=ProgressStyle(description_width='i…

0.000925

Epoch:  53%|█████▎    | 16/30 [35:32<31:08, 133.45s/it]

0.003743


HBox(children=(IntProgress(value=0, description='Iteration', max=239, style=ProgressStyle(description_width='i…

0.001227

Epoch:  57%|█████▋    | 17/30 [37:46<28:54, 133.39s/it]

0.000472


HBox(children=(IntProgress(value=0, description='Iteration', max=239, style=ProgressStyle(description_width='i…

0.000645

Epoch:  60%|██████    | 18/30 [39:59<26:40, 133.37s/it]

0.001225


HBox(children=(IntProgress(value=0, description='Iteration', max=239, style=ProgressStyle(description_width='i…

0.000515

Epoch:  63%|██████▎   | 19/30 [42:12<24:27, 133.40s/it]

0.205060


HBox(children=(IntProgress(value=0, description='Iteration', max=239, style=ProgressStyle(description_width='i…

0.001791

Epoch:  67%|██████▋   | 20/30 [44:26<22:14, 133.41s/it]

0.000904


HBox(children=(IntProgress(value=0, description='Iteration', max=239, style=ProgressStyle(description_width='i…

0.000402

Epoch:  70%|███████   | 21/30 [46:39<20:01, 133.46s/it]

0.000174


HBox(children=(IntProgress(value=0, description='Iteration', max=239, style=ProgressStyle(description_width='i…

0.000295

Epoch:  73%|███████▎  | 22/30 [48:53<17:47, 133.45s/it]

0.000325


HBox(children=(IntProgress(value=0, description='Iteration', max=239, style=ProgressStyle(description_width='i…

0.011351

Epoch:  77%|███████▋  | 23/30 [51:06<15:34, 133.45s/it]

0.000198


HBox(children=(IntProgress(value=0, description='Iteration', max=239, style=ProgressStyle(description_width='i…

0.001719

Epoch:  80%|████████  | 24/30 [53:20<13:21, 133.51s/it]

0.000758


HBox(children=(IntProgress(value=0, description='Iteration', max=239, style=ProgressStyle(description_width='i…

0.000283

Epoch:  83%|████████▎ | 25/30 [55:33<11:07, 133.53s/it]

0.000494


HBox(children=(IntProgress(value=0, description='Iteration', max=239, style=ProgressStyle(description_width='i…

0.000572

Epoch:  87%|████████▋ | 26/30 [57:47<08:54, 133.69s/it]

0.000268


HBox(children=(IntProgress(value=0, description='Iteration', max=239, style=ProgressStyle(description_width='i…

0.000333

Epoch:  90%|█████████ | 27/30 [1:00:01<06:40, 133.61s/it]

0.000620


HBox(children=(IntProgress(value=0, description='Iteration', max=239, style=ProgressStyle(description_width='i…

0.000286

Epoch:  93%|█████████▎| 28/30 [1:02:15<04:27, 133.62s/it]

0.000280


HBox(children=(IntProgress(value=0, description='Iteration', max=239, style=ProgressStyle(description_width='i…

0.000207

Epoch:  97%|█████████▋| 29/30 [1:04:28<02:13, 133.63s/it]

0.000638


HBox(children=(IntProgress(value=0, description='Iteration', max=239, style=ProgressStyle(description_width='i…

0.001353

Epoch: 100%|██████████| 30/30 [1:06:42<00:00, 133.42s/it]

0.000274





Saving the trained model

In [23]:
model_to_save = model.module if hasattr(model, 'module') else model

output_model_file = os.path.join(MODEL_DIR, WEIGHTS_NAME)
output_config_file = os.path.join(MODEL_DIR, CONFIG_NAME)

torch.save(model_to_save.state_dict(), output_model_file)
model_to_save.config.to_json_file(output_config_file)
tokenizer.save_vocabulary(MODEL_DIR)

'model/vocab.txt'

### Evaluation

In [0]:
all_input_ids, all_input_mask, all_segment_ids, all_label_ids = get_tensors(test_features)

eval_data = TensorDataset(all_input_ids, all_input_mask, all_segment_ids, all_label_ids)
eval_sampler = SequentialSampler(eval_data)
eval_dataloader = DataLoader(eval_data, sampler=eval_sampler, batch_size=EVAL_BATCH_SIZE)

In [25]:
model.eval()
eval_loss = 0
nb_eval_steps = 0
preds = []

for input_ids, input_mask, segment_ids, label_ids in tqdm_notebook(eval_dataloader, desc="Evaluating"):
    input_ids = input_ids.to(device)
    input_mask = input_mask.to(device)
    segment_ids = segment_ids.to(device)
    label_ids = label_ids.to(device)

    with torch.no_grad():
        logits = model(input_ids, segment_ids, input_mask, labels=None)

    # create eval loss and other metric required by the task
    loss_fct = CrossEntropyLoss()
    tmp_eval_loss = loss_fct(logits.view(-1, num_labels), label_ids.view(-1))

    eval_loss += tmp_eval_loss.mean().item()
    nb_eval_steps += 1
    if len(preds) == 0:
        preds.append(logits.detach().cpu().numpy())
    else:
        preds[0] = np.append(
            preds[0], logits.detach().cpu().numpy(), axis=0)

eval_loss = eval_loss / nb_eval_steps
preds = preds[0]
preds = np.argmax(preds, axis=1)

HBox(children=(IntProgress(value=0, description='Evaluating', max=45, style=ProgressStyle(description_width='i…




Classification report for the BERT model using full sentences

In [26]:
print(classification_report(all_label_ids.numpy(), preds, target_names=label_list, digits=3))

              precision    recall  f1-score   support

      BETTER      0.782     0.817     0.799       273
       WORSE      0.667     0.622     0.643       119
        NONE      0.937     0.933     0.935      1048

    accuracy                          0.885      1440
   macro avg      0.795     0.791     0.793      1440
weighted avg      0.885     0.885     0.885      1440



Classification report for the BERT model using middle parts of the sentences

In [44]:
print(classification_report(all_label_ids.numpy(), preds, target_names=label_list, digits=3))

              precision    recall  f1-score   support

      BETTER      0.775     0.795     0.785       273
       WORSE      0.590     0.580     0.585       119
        NONE      0.922     0.918     0.920      1048

    accuracy                          0.867      1440
   macro avg      0.762     0.764     0.763      1440
weighted avg      0.867     0.867     0.867      1440

