# **Custom Legal Bert**

## Install

Install required packages and dependencies:

In [None]:
# !pip install -r requirements.txt

Install transformers from source (required for tokenizers dependencies):

In [None]:
# !pip install git+https://github.com/huggingface/transformers

In [None]:
# !pip install scikit-learn

## Creating Training Data

In [2]:
import pandas as pd
from sklearn.model_selection import train_test_split
import numpy as np

In [3]:
import os

df = pd.DataFrame()
for file in os.listdir("./data/ToS/Sentences"):
    if file.endswith(".txt"):
        print(file)
        sentence_path = './data/ToS/Sentences/' + file
        label_path = './data/ToS/Labels/' + file
        if df.shape[0] == 0:
            df = pd.read_csv(sentence_path, sep='\n', names=['sentences'])
            df['labels'] = pd.read_csv(label_path, sep='\n', names=['labels'])
            df['labels_converted'] = np.where(df['labels'] == -1, 0, 1) ## binary classification
        else:
            df_new = pd.read_csv(sentence_path, sep='\n', names=['sentences'])
            df_new['labels'] = pd.read_csv(label_path, sep='\n', names=['labels'])
            df_new['labels_converted'] = np.where(df_new['labels'] == -1, 0, 1)
            df = pd.concat([df, df_new], ignore_index=False, axis=0)
        

Viber.txt
Nintendo.txt
Tinder.txt
Dropbox.txt
Microsoft.txt
Betterpoints_UK.txt
Airbnb.txt
musically.txt
Crowdtangle.txt
TripAdvisor.txt
Deliveroo.txt
Moves-app.txt
Spotify.txt
Supercell.txt
9gag.txt
Booking.txt
Headspace.txt
Fitbit.txt
Syncme.txt
Vimeo.txt
Oculus.txt
Endomondo.txt
Instagram.txt
LindenLab.txt
WorldOfWarcraft.txt
YouTube.txt
Academia.txt
Yahoo.txt
WhatsApp.txt
Google.txt
Zynga.txt
Facebook.txt
Amazon.txt
Vivino.txt
Netflix.txt
PokemonGo.txt
Skype.txt
Snap.txt
eBay.txt
Masquerade.txt
Twitter.txt
LinkedIn.txt
Skyscanner.txt
Duolingo.txt
TrueCaller.txt
Uber.txt
Rovio.txt
Atlas.txt
Evernote.txt
Onavo.txt


In [4]:
df.shape

(9414, 3)

In [5]:
df

Unnamed: 0,sentences,labels,labels_converted
0,thanks for sending us good vibes by using the ...,-1,0
1,"you may be surprised , but we will refer to al...",-1,0
2,"the terms of use -lrb- or , the `` terms '' -r...",-1,0
3,the language of the terms will seem legal -lrb...,-1,0
4,"when you use our services , in addition to enj...",1,1
...,...,...,...
142,the failure of onavo to enforce any right or p...,-1,0
143,the section headings in the agreement are incl...,-1,0
144,"`` including '' , whether capitalized or not ,...",-1,0
145,this agreement may not be assigned by you with...,-1,0


In [6]:
df.isna().sum()

sentences           0
labels              0
labels_converted    0
dtype: int64

In [7]:
clauses_df = df.drop(columns=['labels'])

In [8]:
clauses_df.head()

Unnamed: 0,sentences,labels_converted
0,thanks for sending us good vibes by using the ...,0
1,"you may be surprised , but we will refer to al...",0
2,"the terms of use -lrb- or , the `` terms '' -r...",0
3,the language of the terms will seem legal -lrb...,0
4,"when you use our services , in addition to enj...",1


In [9]:
clauses_df.shape

(9414, 2)

In [10]:
clauses_df.rename(columns={'labels_converted':'label'}, inplace=True)

In [11]:
clauses_df.head()

Unnamed: 0,sentences,label
0,thanks for sending us good vibes by using the ...,0
1,"you may be surprised , but we will refer to al...",0
2,"the terms of use -lrb- or , the `` terms '' -r...",0
3,the language of the terms will seem legal -lrb...,0
4,"when you use our services , in addition to enj...",1


In [12]:
clauses_df.to_csv('./data/ToS/tos_clauses.csv', index=False)

## Terms of Service

#### Compute pretrain loss
To compute per example/average pretrain loss across the full dataset, run the `run_glue.py` script with the arguments specified in the example.
- Pass a file containing the full dataset to `validation_file`.
- Pass `ptl=True`. 
- The script requires a `train_file`, but does not use it when `ptl=True`, so the particular file passed is not important in this case.

Running the `run_glue.py` script with `ptl=True` writes per example pretrain loss (order matches order of examples in `validation_file`) to the file `per_ex_pretrain_loss.csv` in `output_dir`. The script also prints the average pretrain loss across `validation_file` examples.


*Calculate domain specificity (DS) scores*

To calculate the domain specificity (DS) score of a task, take the difference in average pretrain loss on BERT (double) and Legal-BERT $$\overline{L}_{BERT (double)} - \overline{L}_{Custom Legal-BERT}$$

It is also possible to use the script to calculate the DS score of a specific task example $i$ by taking the difference in the example $i$ pretrain loss on BERT (double) and Legal-BERT $$L^{(i)}_{BERT (double)} - L^{(i)}_{Custom Legal-BERT}$$

Set environment variable to disable tokenizers parallelism:

In [1]:
%env TOKENIZERS_PARALLELISM=false

env: TOKENIZERS_PARALLELISM=false


In [13]:
# Download model from Hugging Face model repository
!python classification/run_glue.py \
  --model_name_or_path zlucia/custom-legalbert \
  --train_file data/ToS/tos_clauses.csv \
  --validation_file data/ToS/tos_clauses.csv \
  --ptl=True \
  --max_seq_length 128 \
  --output_dir logs/ToS/custom-legalbert \
  --overwrite_output_dir

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.
Moving 0 files to the new cache system
0it [00:00, ?it/s]
10/14/2022 20:33:06 - INFO - __main__ -   Training/evaluation parameters TrainingArguments(
_n_gpu=0,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=False,
do_predict=False,
do_train=False,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=no,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_min_num_params=0,
fsdp_transformer_l

[INFO|modeling_utils.py:2156] 2022-10-14 20:33:08,206 >> loading weights file pytorch_model.bin from cache at /Users/lavina/.cache/huggingface/hub/models--zlucia--custom-legalbert/snapshots/fd49a135d7b327a315e3ffea31c2be1b40685315/pytorch_model.bin
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
100%|███████████████████████████████████████████| 10/10 [00:01<00:00,  8.65ba/s]
100%|███████████████████████████████████████████| 10/10 [00:01<00:

In [None]:
# !pip install importlib-metadata==4.13. 0 ##‘AttributeError: EntryPoints’ object has no attribute ‘get’

#### Finetune

To finetune on the dataset, run the `run_glue.py` script with the arguments specified in the example. The hyperparameters specified are the same as those from the paper.
- Pass a file containing the train split to `train_file` and a file containing the split to evaluate/predict on (dev or test split) to `validation_file`.
- Pass `do_train` to train on `train_file`, `do_eval` to evaluate on `validation_file`, and `do_predict` to predict on `validation_file`. 

Running the `run_glue.py` script with `do_train` and `do_eval` trains the specified model on `train_file`, evaluates the trained model on `validation_file`, and writes the trained model/tokenizer files and the evaluation results to the file `eval_results.txt` in `output_dir`. Passing `do_predict` writes the class label predictions on `validation_file` to the file `predictions.csv` in `output_dir`. The script also prints the evaluation results on `validation_file` (evaluation F1, evaluation loss etc.).

### Split training data into two sets

In [14]:
train, dev = train_test_split(clauses_df, test_size=0.2, random_state=42, stratify=clauses_df[['label']])

In [15]:
train.head()

Unnamed: 0,sentences,label
5,content license and intellectual property rights,0
232,reactivated skype credit is not refundable .,0
196,spotify may change the price for the paid subs...,1
18,the term of your licenses under this eula shal...,0
181,the arbitrator may award declaratory or injunc...,0


In [17]:
dev.head()

Unnamed: 0,sentences,label
50,uber reserves the right to withhold or deduct ...,0
208,niantic 's failure to enforce any right or pro...,0
275,14.3 if you feel that any member you interact ...,0
182,blizzard entertainment has the right to obtain...,0
151,myfitnesspal does not -lrb- i -rrb- guarantee ...,0


In [18]:
train.to_csv("./data/ToS/tos_clauses_train.csv", index=False)
dev.to_csv("./data/ToS/tos_clauses_dev.csv", index=False)

In [19]:
# Download model from Hugging Face model repository
!python classification/run_glue.py \
  --model_name_or_path zlucia/custom-legalbert \
  --train_file data/ToS/tos_clauses_train.csv \
  --validation_file data/ToS/tos_clauses_dev.csv \
  --do_train \
  --do_eval \
  --evaluation_strategy steps \
  --max_seq_length 128 \
  --per_device_train_batch_size=16 \
  --learning_rate=1e-5 \
  --num_train_epochs=2.0 \
  --output_dir logs/ToS/custom-legalbert \
  --overwrite_output_dir \
  --logging_steps 50

10/14/2022 22:34:45 - INFO - __main__ -   Training/evaluation parameters TrainingArguments(
_n_gpu=0,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=True,
do_predict=False,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=50,
evaluation_strategy=steps,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip

[INFO|modeling_utils.py:2156] 2022-10-14 22:34:47,139 >> loading weights file pytorch_model.bin from cache at /Users/lavina/.cache/huggingface/hub/models--zlucia--custom-legalbert/snapshots/fd49a135d7b327a315e3ffea31c2be1b40685315/pytorch_model.bin
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
100%|█████████████████████████████████████████████| 8/8 [00:00<00:00,  8.05ba/s]
100%|█████████████████████████████████████████████| 2/2 [00:00<00:


  0%|                                                   | 0/236 [00:00<?, ?it/s][A
  1%|▎                                          | 2/236 [00:04<08:34,  2.20s/it][A
  1%|▌                                          | 3/236 [00:08<11:04,  2.85s/it][A
  2%|▋                                          | 4/236 [00:13<12:47,  3.31s/it][A
  2%|▉                                          | 5/236 [00:17<14:17,  3.71s/it][A
  3%|█                                          | 6/236 [00:22<15:11,  3.96s/it][A
  3%|█▎                                         | 7/236 [00:26<15:40,  4.11s/it][A
  3%|█▍                                         | 8/236 [00:31<15:59,  4.21s/it][A
  4%|█▋                                         | 9/236 [00:35<16:10,  4.27s/it][A
  4%|█▊                                        | 10/236 [00:40<16:22,  4.35s/it][A
  5%|█▉                                        | 11/236 [00:44<16:26,  4.39s/it][A
  5%|██▏                                       | 12/236 [00:49<16:26,  4.40

 42%|█████████████████▍                        | 98/236 [07:11<10:07,  4.40s/it][A
 42%|█████████████████▌                        | 99/236 [07:15<10:02,  4.39s/it][A
 42%|█████████████████▎                       | 100/236 [07:20<09:56,  4.39s/it][A
 43%|█████████████████▌                       | 101/236 [07:24<09:51,  4.38s/it][A
 43%|█████████████████▋                       | 102/236 [07:28<09:47,  4.38s/it][A
 44%|█████████████████▉                       | 103/236 [07:33<09:42,  4.38s/it][A
 44%|██████████████████                       | 104/236 [07:37<09:38,  4.38s/it][A
 44%|██████████████████▏                      | 105/236 [07:42<09:33,  4.38s/it][A
 45%|██████████████████▍                      | 106/236 [07:46<09:29,  4.38s/it][A
 45%|██████████████████▌                      | 107/236 [07:50<09:24,  4.38s/it][A
 46%|██████████████████▊                      | 108/236 [07:55<09:20,  4.38s/it][A
 46%|██████████████████▉                      | 109/236 [07:59<09:15,  4.38s

 83%|█████████████████████████████████▉       | 195/236 [14:18<03:01,  4.42s/it][A
 83%|██████████████████████████████████       | 196/236 [14:22<02:56,  4.41s/it][A
 83%|██████████████████████████████████▏      | 197/236 [14:27<02:51,  4.41s/it][A
 84%|██████████████████████████████████▍      | 198/236 [14:31<02:47,  4.40s/it][A
 84%|██████████████████████████████████▌      | 199/236 [14:35<02:42,  4.40s/it][A
 85%|██████████████████████████████████▋      | 200/236 [14:40<02:38,  4.41s/it][A
 85%|██████████████████████████████████▉      | 201/236 [14:44<02:34,  4.42s/it][A
 86%|███████████████████████████████████      | 202/236 [14:49<02:30,  4.42s/it][A
 86%|███████████████████████████████████▎     | 203/236 [14:53<02:25,  4.42s/it][A
 86%|███████████████████████████████████▍     | 204/236 [14:58<02:21,  4.41s/it][A
 87%|███████████████████████████████████▌     | 205/236 [15:02<02:16,  4.41s/it][A
 87%|███████████████████████████████████▊     | 206/236 [15:06<02:12,  4.42s

 17%|██████▉                                   | 39/236 [02:46<14:22,  4.38s/it][A
 17%|███████                                   | 40/236 [02:50<14:18,  4.38s/it][A
 17%|███████▎                                  | 41/236 [02:55<14:14,  4.38s/it][A
 18%|███████▍                                  | 42/236 [02:59<14:09,  4.38s/it][A
 18%|███████▋                                  | 43/236 [03:04<14:05,  4.38s/it][A
 19%|███████▊                                  | 44/236 [03:08<14:01,  4.38s/it][A
 19%|████████                                  | 45/236 [03:12<13:56,  4.38s/it][A
 19%|████████▏                                 | 46/236 [03:17<13:52,  4.38s/it][A
 20%|████████▎                                 | 47/236 [03:21<13:47,  4.38s/it][A
 20%|████████▌                                 | 48/236 [03:25<13:43,  4.38s/it][A
 21%|████████▋                                 | 49/236 [03:30<13:39,  4.38s/it][A
 21%|████████▉                                 | 50/236 [03:34<13:34,  4.38s

 58%|███████████████████████▋                 | 136/236 [09:51<07:17,  4.37s/it][A
 58%|███████████████████████▊                 | 137/236 [09:55<07:13,  4.37s/it][A
 58%|███████████████████████▉                 | 138/236 [09:59<07:08,  4.37s/it][A
 59%|████████████████████████▏                | 139/236 [10:04<07:04,  4.37s/it][A
 59%|████████████████████████▎                | 140/236 [10:08<06:59,  4.37s/it][A
 60%|████████████████████████▍                | 141/236 [10:13<06:55,  4.37s/it][A
 60%|████████████████████████▋                | 142/236 [10:17<06:51,  4.37s/it][A
 61%|████████████████████████▊                | 143/236 [10:21<06:46,  4.38s/it][A
 61%|█████████████████████████                | 144/236 [10:26<06:42,  4.38s/it][A
 61%|█████████████████████████▏               | 145/236 [10:30<06:38,  4.38s/it][A
 62%|█████████████████████████▎               | 146/236 [10:34<06:33,  4.38s/it][A
 62%|█████████████████████████▌               | 147/236 [10:39<06:29,  4.38s

 99%|████████████████████████████████████████▍| 233/236 [16:56<00:13,  4.38s/it][A
 99%|████████████████████████████████████████▋| 234/236 [17:00<00:08,  4.38s/it][A
100%|████████████████████████████████████████▊| 235/236 [17:04<00:04,  4.38s/it][A
100%|█████████████████████████████████████████| 236/236 [17:06<00:00,  3.56s/it][A10/14/2022 23:54:56 - INFO - /Users/lavina/opt/anaconda3/envs/CustomLegalBERT/lib/python3.7/site-packages/datasets/metric.py -   Removing /Users/lavina/.cache/huggingface/metrics/f1/default/default_experiment-1-0.arrow
                                                                                
[A{'eval_loss': 0.2681363523006439, 'eval_f1': 0.43189368770764125, 'eval_runtime': 1032.1814, 'eval_samples_per_second': 1.824, 'eval_steps_per_second': 0.229, 'epoch': 0.21}
 11%|███▉                                 | 100/942 [1:20:07<6:11:57, 26.51s/it]
100%|█████████████████████████████████████████| 236/236 [17:07<00:00,  3.56s/it][A
{'loss': 0.2639, 'learn

 33%|█████████████▉                            | 78/236 [05:37<11:32,  4.39s/it][A
 33%|██████████████                            | 79/236 [05:42<11:28,  4.39s/it][A
 34%|██████████████▏                           | 80/236 [05:46<11:24,  4.39s/it][A
 34%|██████████████▍                           | 81/236 [05:50<11:19,  4.39s/it][A
 35%|██████████████▌                           | 82/236 [05:55<11:15,  4.39s/it][A
 35%|██████████████▊                           | 83/236 [05:59<11:10,  4.39s/it][A
 36%|██████████████▉                           | 84/236 [06:04<11:06,  4.39s/it][A
 36%|███████████████▏                          | 85/236 [06:08<11:02,  4.39s/it][A
 36%|███████████████▎                          | 86/236 [06:12<10:58,  4.39s/it][A
 37%|███████████████▍                          | 87/236 [06:17<10:53,  4.39s/it][A
 37%|███████████████▋                          | 88/236 [06:21<10:49,  4.39s/it][A
 38%|███████████████▊                          | 89/236 [06:26<10:45,  4.39s

 74%|██████████████████████████████▍          | 175/236 [12:43<04:27,  4.39s/it][A
 75%|██████████████████████████████▌          | 176/236 [12:47<04:23,  4.39s/it][A
 75%|██████████████████████████████▊          | 177/236 [12:52<04:18,  4.39s/it][A
 75%|██████████████████████████████▉          | 178/236 [12:56<04:14,  4.39s/it][A
 76%|███████████████████████████████          | 179/236 [13:01<04:10,  4.39s/it][A
 76%|███████████████████████████████▎         | 180/236 [13:05<04:05,  4.39s/it][A
 77%|███████████████████████████████▍         | 181/236 [13:09<04:01,  4.39s/it][A
 77%|███████████████████████████████▌         | 182/236 [13:14<03:57,  4.39s/it][A
 78%|███████████████████████████████▊         | 183/236 [13:18<03:52,  4.39s/it][A
 78%|███████████████████████████████▉         | 184/236 [13:23<03:48,  4.39s/it][A
 78%|████████████████████████████████▏        | 185/236 [13:27<03:43,  4.39s/it][A
 79%|████████████████████████████████▎        | 186/236 [13:31<03:39,  4.39s

  8%|███▌                                      | 20/236 [01:23<15:46,  4.38s/it][A
  9%|███▋                                      | 21/236 [01:27<15:42,  4.38s/it][A
  9%|███▉                                      | 22/236 [01:32<15:38,  4.38s/it][A
 10%|████                                      | 23/236 [01:36<15:33,  4.38s/it][A
 10%|████▎                                     | 24/236 [01:40<15:32,  4.40s/it][A
 11%|████▍                                     | 25/236 [01:45<15:28,  4.40s/it][A
 11%|████▋                                     | 26/236 [01:49<15:23,  4.40s/it][A
 11%|████▊                                     | 27/236 [01:54<15:19,  4.40s/it][A
 12%|████▉                                     | 28/236 [01:58<15:14,  4.39s/it][A
 12%|█████▏                                    | 29/236 [02:02<15:09,  4.39s/it][A
 13%|█████▎                                    | 30/236 [02:07<15:04,  4.39s/it][A
 13%|█████▌                                    | 31/236 [02:11<14:59,  4.39s

 50%|████████████████████▎                    | 117/236 [08:29<08:43,  4.40s/it][A
 50%|████████████████████▌                    | 118/236 [08:33<08:38,  4.39s/it][A
 50%|████████████████████▋                    | 119/236 [08:38<08:33,  4.39s/it][A
 51%|████████████████████▊                    | 120/236 [08:42<08:29,  4.39s/it][A
 51%|█████████████████████                    | 121/236 [08:46<08:24,  4.39s/it][A
 52%|█████████████████████▏                   | 122/236 [08:51<08:20,  4.39s/it][A
 52%|█████████████████████▎                   | 123/236 [08:55<08:15,  4.39s/it][A
 53%|█████████████████████▌                   | 124/236 [09:00<08:11,  4.38s/it][A
 53%|█████████████████████▋                   | 125/236 [09:04<08:06,  4.38s/it][A
 53%|█████████████████████▉                   | 126/236 [09:08<08:02,  4.38s/it][A
 54%|██████████████████████                   | 127/236 [09:13<07:57,  4.38s/it][A
 54%|██████████████████████▏                  | 128/236 [09:17<07:53,  4.38s

 91%|█████████████████████████████████████▏   | 214/236 [15:35<01:36,  4.39s/it][A
 91%|█████████████████████████████████████▎   | 215/236 [15:40<01:32,  4.41s/it][A
 92%|█████████████████████████████████████▌   | 216/236 [15:44<01:28,  4.41s/it][A
 92%|█████████████████████████████████████▋   | 217/236 [15:49<01:23,  4.41s/it][A
 92%|█████████████████████████████████████▊   | 218/236 [15:53<01:19,  4.40s/it][A
 93%|██████████████████████████████████████   | 219/236 [15:57<01:14,  4.40s/it][A
 93%|██████████████████████████████████████▏  | 220/236 [16:02<01:10,  4.40s/it][A
 94%|██████████████████████████████████████▍  | 221/236 [16:06<01:05,  4.40s/it][A
 94%|██████████████████████████████████████▌  | 222/236 [16:11<01:01,  4.39s/it][A
 94%|██████████████████████████████████████▋  | 223/236 [16:15<00:57,  4.39s/it][A
 95%|██████████████████████████████████████▉  | 224/236 [16:19<00:52,  4.39s/it][A
 95%|███████████████████████████████████████  | 225/236 [16:24<00:48,  4.39s

 25%|██████████                              | 59/236 [7:51:24<19:25,  6.58s/it][A
 25%|██████████▏                             | 60/236 [7:51:28<17:18,  5.90s/it][A
 26%|██████████▎                             | 61/236 [7:51:32<15:49,  5.43s/it][A
 26%|██████████▌                             | 62/236 [7:51:37<14:45,  5.09s/it][A
 27%|██████████▋                             | 63/236 [7:51:41<14:00,  4.86s/it][A
 27%|██████████▊                             | 64/236 [7:51:45<13:27,  4.70s/it][A
 28%|███████████                             | 65/236 [7:51:50<13:03,  4.58s/it][A
 28%|███████████▏                            | 66/236 [7:51:54<12:45,  4.50s/it][A
 28%|███████████▎                            | 67/236 [7:51:58<12:31,  4.45s/it][A
 29%|███████████▌                            | 68/236 [7:52:03<12:21,  4.41s/it][A
 29%|███████████▋                            | 69/236 [7:52:07<12:12,  4.39s/it][A
 30%|███████████▊                            | 70/236 [7:52:11<12:05,  4.37s

 66%|█████████████████████████▊             | 156/236 [7:58:28<05:53,  4.42s/it][A
 67%|█████████████████████████▉             | 157/236 [7:58:33<05:50,  4.44s/it][A
 67%|██████████████████████████             | 158/236 [7:58:37<05:46,  4.44s/it][A
 67%|██████████████████████████▎            | 159/236 [7:58:42<05:41,  4.43s/it][A
 68%|██████████████████████████▍            | 160/236 [7:58:46<05:37,  4.44s/it][A
 68%|██████████████████████████▌            | 161/236 [7:58:51<05:37,  4.49s/it][A
 69%|██████████████████████████▊            | 162/236 [7:58:56<05:38,  4.58s/it][A
 69%|██████████████████████████▉            | 163/236 [7:59:00<05:37,  4.62s/it][A
 69%|███████████████████████████            | 164/236 [7:59:05<05:27,  4.55s/it][A
 70%|███████████████████████████▎           | 165/236 [7:59:09<05:20,  4.51s/it][A
 70%|███████████████████████████▍           | 166/236 [7:59:14<05:13,  4.48s/it][A
 71%|███████████████████████████▌           | 167/236 [7:59:18<05:08,  4.46s

  0%|                                                   | 0/236 [00:00<?, ?it/s][A
  1%|▎                                          | 2/236 [00:04<08:49,  2.26s/it][A
  1%|▌                                          | 3/236 [00:09<11:26,  2.95s/it][A
  2%|▋                                          | 4/236 [00:13<13:13,  3.42s/it][A
  2%|▉                                          | 5/236 [00:18<14:26,  3.75s/it][A
  3%|█                                          | 6/236 [00:22<15:16,  3.98s/it][A
  3%|█▎                                         | 7/236 [00:27<15:49,  4.15s/it][A
  3%|█▍                                         | 8/236 [00:31<16:11,  4.26s/it][A
  4%|█▋                                         | 9/236 [00:36<16:25,  4.34s/it][A
  4%|█▊                                        | 10/236 [00:40<16:33,  4.40s/it][A
  5%|█▉                                        | 11/236 [00:45<16:37,  4.44s/it][A
  5%|██▏                                       | 12/236 [00:49<16:39,  4.46s

 42%|█████████████████▍                        | 98/236 [07:19<10:22,  4.51s/it][A
 42%|█████████████████▌                        | 99/236 [07:24<10:18,  4.51s/it][A
 42%|█████████████████▎                       | 100/236 [07:29<10:14,  4.52s/it][A
 43%|█████████████████▌                       | 101/236 [07:33<10:10,  4.52s/it][A
 43%|█████████████████▋                       | 102/236 [07:38<10:05,  4.52s/it][A
 44%|█████████████████▉                       | 103/236 [07:42<10:02,  4.53s/it][A
 44%|██████████████████                       | 104/236 [07:47<09:58,  4.53s/it][A
 44%|██████████████████▏                      | 105/236 [07:51<09:53,  4.53s/it][A
 45%|██████████████████▍                      | 106/236 [07:56<09:48,  4.53s/it][A
 45%|██████████████████▌                      | 107/236 [08:00<09:44,  4.53s/it][A
 46%|██████████████████▊                      | 108/236 [08:05<09:39,  4.53s/it][A
 46%|██████████████████▉                      | 109/236 [08:09<09:35,  4.53s

 83%|█████████████████████████████████▉       | 195/236 [14:39<03:06,  4.54s/it][A
 83%|██████████████████████████████████       | 196/236 [14:44<03:01,  4.54s/it][A
 83%|██████████████████████████████████▏      | 197/236 [14:48<02:57,  4.55s/it][A
 84%|██████████████████████████████████▍      | 198/236 [14:53<02:52,  4.55s/it][A
 84%|██████████████████████████████████▌      | 199/236 [14:58<02:48,  4.55s/it][A
 85%|██████████████████████████████████▋      | 200/236 [15:02<02:44,  4.56s/it][A
 85%|██████████████████████████████████▉      | 201/236 [15:07<02:39,  4.56s/it][A
 86%|███████████████████████████████████      | 202/236 [15:11<02:34,  4.56s/it][A
 86%|███████████████████████████████████▎     | 203/236 [15:16<02:30,  4.55s/it][A
 86%|███████████████████████████████████▍     | 204/236 [15:20<02:25,  4.55s/it][A
 87%|███████████████████████████████████▌     | 205/236 [15:25<02:21,  4.56s/it][A
 87%|███████████████████████████████████▊     | 206/236 [15:29<02:16,  4.56s

 17%|███████                                   | 40/236 [03:04<15:48,  4.84s/it][A
 17%|███████▎                                  | 41/236 [03:09<15:43,  4.84s/it][A
 18%|███████▍                                  | 42/236 [03:14<15:38,  4.84s/it][A
 18%|███████▋                                  | 43/236 [03:19<15:34,  4.84s/it][A
 19%|███████▊                                  | 44/236 [03:23<15:28,  4.84s/it][A
 19%|████████                                  | 45/236 [03:28<15:24,  4.84s/it][A
 19%|████████▏                                 | 46/236 [03:33<15:18,  4.83s/it][A
 20%|████████▎                                 | 47/236 [03:38<15:13,  4.83s/it][A
 20%|████████▌                                 | 48/236 [03:43<15:06,  4.82s/it][A
 21%|████████▋                                 | 49/236 [03:47<14:58,  4.80s/it][A
 21%|████████▉                                 | 50/236 [03:52<14:50,  4.79s/it][A
 22%|█████████                                 | 51/236 [03:57<14:47,  4.80s

 58%|███████████████████████▊                 | 137/236 [10:53<07:52,  4.77s/it][A
 58%|███████████████████████▉                 | 138/236 [10:57<07:46,  4.76s/it][A
 59%|████████████████████████▏                | 139/236 [11:02<07:42,  4.77s/it][A
 59%|████████████████████████▎                | 140/236 [11:07<07:34,  4.73s/it][A
 60%|████████████████████████▍                | 141/236 [11:12<07:29,  4.73s/it][A
 60%|████████████████████████▋                | 142/236 [11:16<07:23,  4.72s/it][A
 61%|████████████████████████▊                | 143/236 [11:21<07:18,  4.72s/it][A
 61%|█████████████████████████                | 144/236 [11:26<07:13,  4.71s/it][A
 61%|█████████████████████████▏               | 145/236 [11:30<07:09,  4.72s/it][A
 62%|█████████████████████████▎               | 146/236 [11:35<07:04,  4.72s/it][A
 62%|█████████████████████████▌               | 147/236 [11:40<07:00,  4.72s/it][A
 63%|█████████████████████████▋               | 148/236 [11:45<06:57,  4.74s

 99%|████████████████████████████████████████▋| 234/236 [22:47<00:08,  4.38s/it][A
100%|████████████████████████████████████████▊| 235/236 [22:52<00:04,  4.38s/it][A
100%|█████████████████████████████████████████| 236/236 [22:54<00:00,  3.57s/it][A10/15/2022 13:48:09 - INFO - /Users/lavina/opt/anaconda3/envs/CustomLegalBERT/lib/python3.7/site-packages/datasets/metric.py -   Removing /Users/lavina/.cache/huggingface/metrics/f1/default/default_experiment-1-0.arrow
                                                                                
[A{'eval_loss': 0.15559203922748566, 'eval_f1': 0.7386934673366835, 'eval_runtime': 1379.93, 'eval_samples_per_second': 1.365, 'eval_steps_per_second': 0.171, 'epoch': 0.74}
 37%|█████████████▍                      | 350/942 [15:13:19<4:31:51, 27.55s/it]
100%|█████████████████████████████████████████| 236/236 [22:55<00:00,  3.57s/it][A
{'loss': 0.1499, 'learning_rate': 5.753715498938429e-06, 'epoch': 0.85}         [A
 42%|███████████████▎    

 33%|██████████████                            | 79/236 [28:05<11:28,  4.39s/it][A
 34%|██████████████▏                           | 80/236 [28:09<11:24,  4.39s/it][A
 34%|██████████████▍                           | 81/236 [28:14<11:23,  4.41s/it][A
 35%|██████████████▌                           | 82/236 [28:18<11:19,  4.41s/it][A
 35%|██████████████▊                           | 83/236 [28:22<11:13,  4.40s/it][A
 36%|██████████████▉                           | 84/236 [28:27<11:08,  4.40s/it][A
 36%|███████████████▏                          | 85/236 [28:31<11:03,  4.40s/it][A
 36%|███████████████▎                          | 86/236 [28:36<10:58,  4.39s/it][A
 37%|███████████████▍                          | 87/236 [28:40<10:54,  4.39s/it][A
 37%|███████████████▋                          | 88/236 [28:44<10:49,  4.39s/it][A
 38%|███████████████▊                          | 89/236 [28:49<10:45,  4.39s/it][A
 38%|████████████████                          | 90/236 [28:53<10:40,  4.39s

 75%|██████████████████████████████▌          | 176/236 [35:10<04:23,  4.39s/it][A
 75%|██████████████████████████████▊          | 177/236 [35:15<04:18,  4.38s/it][A
 75%|██████████████████████████████▉          | 178/236 [35:19<04:14,  4.38s/it][A
 76%|███████████████████████████████          | 179/236 [35:24<04:09,  4.38s/it][A
 76%|███████████████████████████████▎         | 180/236 [35:28<04:05,  4.38s/it][A
 77%|███████████████████████████████▍         | 181/236 [35:32<04:01,  4.38s/it][A
 77%|███████████████████████████████▌         | 182/236 [35:37<03:56,  4.38s/it][A
 78%|███████████████████████████████▊         | 183/236 [35:41<03:52,  4.38s/it][A
 78%|███████████████████████████████▉         | 184/236 [35:45<03:47,  4.38s/it][A
 78%|████████████████████████████████▏        | 185/236 [35:50<03:43,  4.38s/it][A
 79%|████████████████████████████████▎        | 186/236 [35:54<03:39,  4.39s/it][A
 79%|████████████████████████████████▍        | 187/236 [35:59<03:34,  4.39s

  9%|███▋                                      | 21/236 [01:27<15:42,  4.38s/it][A
  9%|███▉                                      | 22/236 [01:32<15:38,  4.38s/it][A
 10%|████                                      | 23/236 [01:36<15:34,  4.39s/it][A
 10%|████▎                                     | 24/236 [01:40<15:29,  4.39s/it][A
 11%|████▍                                     | 25/236 [01:45<15:25,  4.39s/it][A
 11%|████▋                                     | 26/236 [01:49<15:21,  4.39s/it][A
 11%|████▊                                     | 27/236 [01:54<15:16,  4.39s/it][A
 12%|████▉                                     | 28/236 [01:58<15:13,  4.39s/it][A
 12%|█████▏                                    | 29/236 [02:02<15:09,  4.39s/it][A
 13%|█████▎                                    | 30/236 [02:07<15:04,  4.39s/it][A
 13%|█████▌                                    | 31/236 [02:11<14:59,  4.39s/it][A
 14%|█████▋                                    | 32/236 [02:16<14:55,  4.39s

 50%|████████████████████▌                    | 118/236 [08:33<08:37,  4.39s/it][A
 50%|████████████████████▋                    | 119/236 [08:37<08:33,  4.39s/it][A
 51%|████████████████████▊                    | 120/236 [08:42<08:28,  4.39s/it][A
 51%|█████████████████████                    | 121/236 [08:46<08:24,  4.39s/it][A
 52%|█████████████████████▏                   | 122/236 [08:50<08:20,  4.39s/it][A
 52%|█████████████████████▎                   | 123/236 [08:55<08:15,  4.39s/it][A
 53%|█████████████████████▌                   | 124/236 [08:59<08:11,  4.39s/it][A
 53%|█████████████████████▋                   | 125/236 [09:04<08:06,  4.39s/it][A
 53%|█████████████████████▉                   | 126/236 [09:08<08:02,  4.39s/it][A
 54%|██████████████████████                   | 127/236 [09:12<07:58,  4.39s/it][A
 54%|██████████████████████▏                  | 128/236 [09:17<07:53,  4.39s/it][A
 55%|██████████████████████▍                  | 129/236 [09:21<07:49,  4.39s

 91%|█████████████████████████████████████▎   | 215/236 [15:39<01:32,  4.39s/it][A
 92%|█████████████████████████████████████▌   | 216/236 [15:43<01:27,  4.39s/it][A
 92%|█████████████████████████████████████▋   | 217/236 [15:47<01:23,  4.39s/it][A
 92%|█████████████████████████████████████▊   | 218/236 [15:52<01:18,  4.38s/it][A
 93%|██████████████████████████████████████   | 219/236 [15:56<01:14,  4.38s/it][A
 93%|██████████████████████████████████████▏  | 220/236 [16:00<01:10,  4.38s/it][A
 94%|██████████████████████████████████████▍  | 221/236 [16:05<01:05,  4.38s/it][A
 94%|██████████████████████████████████████▌  | 222/236 [16:09<01:01,  4.38s/it][A
 94%|██████████████████████████████████████▋  | 223/236 [16:14<00:56,  4.38s/it][A
 95%|██████████████████████████████████████▉  | 224/236 [16:18<00:52,  4.38s/it][A
 95%|███████████████████████████████████████  | 225/236 [16:22<00:48,  4.38s/it][A
 96%|███████████████████████████████████████▎ | 226/236 [16:27<00:43,  4.39s

 25%|██████████▋                               | 60/236 [04:18<12:52,  4.39s/it][A
 26%|██████████▊                               | 61/236 [04:23<12:47,  4.39s/it][A
 26%|███████████                               | 62/236 [04:27<12:43,  4.39s/it][A
 27%|███████████▏                              | 63/236 [04:32<12:38,  4.39s/it][A
 27%|███████████▍                              | 64/236 [04:36<12:34,  4.39s/it][A
 28%|███████████▌                              | 65/236 [04:40<12:30,  4.39s/it][A
 28%|███████████▋                              | 66/236 [04:45<12:25,  4.39s/it][A
 28%|███████████▉                              | 67/236 [04:49<12:21,  4.39s/it][A
 29%|████████████                              | 68/236 [04:54<12:16,  4.39s/it][A
 29%|████████████▎                             | 69/236 [04:58<12:12,  4.39s/it][A
 30%|████████████▍                             | 70/236 [05:02<12:08,  4.39s/it][A
 30%|████████████▋                             | 71/236 [05:07<12:03,  4.39s

 67%|███████████████████████████▎             | 157/236 [11:24<05:46,  4.39s/it][A
 67%|███████████████████████████▍             | 158/236 [11:29<05:42,  4.39s/it][A
 67%|███████████████████████████▌             | 159/236 [11:33<05:37,  4.39s/it][A
 68%|███████████████████████████▊             | 160/236 [11:37<05:33,  4.39s/it][A
 68%|███████████████████████████▉             | 161/236 [11:42<05:29,  4.39s/it][A
 69%|████████████████████████████▏            | 162/236 [11:46<05:24,  4.39s/it][A
 69%|████████████████████████████▎            | 163/236 [11:50<05:20,  4.39s/it][A
 69%|████████████████████████████▍            | 164/236 [11:55<05:15,  4.39s/it][A
 70%|████████████████████████████▋            | 165/236 [11:59<05:11,  4.39s/it][A
 70%|████████████████████████████▊            | 166/236 [12:04<05:07,  4.39s/it][A
 71%|█████████████████████████████            | 167/236 [12:08<05:02,  4.39s/it][A
 71%|█████████████████████████████▏           | 168/236 [12:12<04:58,  4.39s

{'loss': 0.1307, 'learning_rate': 4.16135881104034e-06, 'epoch': 1.17}          
 58%|█████████████████████               | 550/942 [17:56:31<2:53:31, 26.56s/it][INFO|trainer.py:726] 2022-10-15 16:31:20,746 >> The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: sentences. If sentences are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
[INFO|trainer.py:2907] 2022-10-15 16:31:20,750 >> ***** Running Evaluation *****
[INFO|trainer.py:2909] 2022-10-15 16:31:20,750 >>   Num examples = 1883
[INFO|trainer.py:2912] 2022-10-15 16:31:20,750 >>   Batch size = 8

  0%|                                                   | 0/236 [00:00<?, ?it/s][A
  1%|▎                                          | 2/236 [00:04<08:33,  2.20s/it][A
  1%|▌                                          | 3/236 [00:08<11:04,  2.85s/it][A
  2%|▋                                          |

 38%|████████████████                          | 90/236 [06:30<10:41,  4.39s/it][A
 39%|████████████████▏                         | 91/236 [06:35<10:37,  4.39s/it][A
 39%|████████████████▎                         | 92/236 [06:39<10:32,  4.39s/it][A
 39%|████████████████▌                         | 93/236 [06:43<10:28,  4.39s/it][A
 40%|████████████████▋                         | 94/236 [06:48<10:23,  4.39s/it][A
 40%|████████████████▉                         | 95/236 [06:52<10:19,  4.39s/it][A
 41%|█████████████████                         | 96/236 [06:57<10:15,  4.40s/it][A
 41%|█████████████████▎                        | 97/236 [07:01<10:10,  4.39s/it][A
 42%|█████████████████▍                        | 98/236 [07:05<10:06,  4.39s/it][A
 42%|█████████████████▌                        | 99/236 [07:10<10:01,  4.39s/it][A
 42%|█████████████████▎                       | 100/236 [07:14<09:57,  4.39s/it][A
 43%|█████████████████▌                       | 101/236 [07:19<09:53,  4.39s

 79%|████████████████████████████████▍        | 187/236 [13:36<03:35,  4.39s/it][A
 80%|████████████████████████████████▋        | 188/236 [13:40<03:30,  4.39s/it][A
 80%|████████████████████████████████▊        | 189/236 [13:45<03:26,  4.39s/it][A
 81%|█████████████████████████████████        | 190/236 [13:49<03:21,  4.39s/it][A
 81%|█████████████████████████████████▏       | 191/236 [13:54<03:17,  4.39s/it][A
 81%|█████████████████████████████████▎       | 192/236 [13:58<03:13,  4.39s/it][A
 82%|█████████████████████████████████▌       | 193/236 [14:02<03:08,  4.39s/it][A
 82%|█████████████████████████████████▋       | 194/236 [14:07<03:04,  4.39s/it][A
 83%|█████████████████████████████████▉       | 195/236 [14:11<03:00,  4.39s/it][A
 83%|██████████████████████████████████       | 196/236 [14:16<02:55,  4.39s/it][A
 83%|██████████████████████████████████▏      | 197/236 [14:20<02:51,  4.39s/it][A
 84%|██████████████████████████████████▍      | 198/236 [14:24<02:46,  4.39s

 14%|█████▋                                    | 32/236 [02:16<14:53,  4.38s/it][A
 14%|█████▊                                    | 33/236 [02:20<14:49,  4.38s/it][A
 14%|██████                                    | 34/236 [02:25<14:44,  4.38s/it][A
 15%|██████▏                                   | 35/236 [02:29<14:40,  4.38s/it][A
 15%|██████▍                                   | 36/236 [02:33<14:36,  4.38s/it][A
 16%|██████▌                                   | 37/236 [02:38<14:32,  4.38s/it][A
 16%|██████▊                                   | 38/236 [02:42<14:27,  4.38s/it][A
 17%|██████▉                                   | 39/236 [02:47<14:22,  4.38s/it][A
 17%|███████                                   | 40/236 [02:51<14:18,  4.38s/it][A
 17%|███████▎                                  | 41/236 [02:55<14:13,  4.38s/it][A
 18%|███████▍                                  | 42/236 [03:00<14:09,  4.38s/it][A
 18%|███████▋                                  | 43/236 [03:04<14:04,  4.38s

 55%|██████████████████████▍                  | 129/236 [09:23<07:53,  4.42s/it][A
 55%|██████████████████████▌                  | 130/236 [09:28<07:54,  4.48s/it][A
 56%|██████████████████████▊                  | 131/236 [09:32<07:48,  4.46s/it][A
 56%|██████████████████████▉                  | 132/236 [09:37<07:43,  4.46s/it][A
 56%|███████████████████████                  | 133/236 [09:41<07:38,  4.45s/it][A
 57%|███████████████████████▎                 | 134/236 [09:46<07:31,  4.43s/it][A
 57%|███████████████████████▍                 | 135/236 [09:50<07:26,  4.42s/it][A
 58%|███████████████████████▋                 | 136/236 [09:55<07:20,  4.41s/it][A
 58%|███████████████████████▊                 | 137/236 [09:59<07:15,  4.40s/it][A
 58%|███████████████████████▉                 | 138/236 [10:03<07:10,  4.39s/it][A
 59%|████████████████████████▏                | 139/236 [10:08<07:05,  4.39s/it][A
 59%|████████████████████████▎                | 140/236 [10:12<07:01,  4.39s

 96%|███████████████████████████████████████▎ | 226/236 [16:33<00:44,  4.43s/it][A
 96%|███████████████████████████████████████▍ | 227/236 [16:37<00:39,  4.42s/it][A
 97%|███████████████████████████████████████▌ | 228/236 [16:41<00:35,  4.41s/it][A
 97%|███████████████████████████████████████▊ | 229/236 [16:46<00:30,  4.40s/it][A
 97%|███████████████████████████████████████▉ | 230/236 [16:50<00:26,  4.40s/it][A
 98%|████████████████████████████████████████▏| 231/236 [16:55<00:22,  4.42s/it][A
 98%|████████████████████████████████████████▎| 232/236 [16:59<00:17,  4.41s/it][A
 99%|████████████████████████████████████████▍| 233/236 [17:04<00:13,  4.41s/it][A
 99%|████████████████████████████████████████▋| 234/236 [17:08<00:08,  4.40s/it][A
100%|████████████████████████████████████████▊| 235/236 [17:12<00:04,  4.43s/it][A
100%|█████████████████████████████████████████| 236/236 [17:14<00:00,  3.61s/it][A10/15/2022 17:28:03 - INFO - /Users/lavina/opt/anaconda3/envs/CustomLegalBERT

 30%|████████████▋                             | 71/236 [05:06<12:03,  4.38s/it][A
 31%|████████████▊                             | 72/236 [05:11<11:59,  4.38s/it][A
 31%|████████████▉                             | 73/236 [05:15<11:54,  4.38s/it][A
 31%|█████████████▏                            | 74/236 [05:20<11:50,  4.38s/it][A
 32%|█████████████▎                            | 75/236 [05:24<11:45,  4.38s/it][A
 32%|█████████████▌                            | 76/236 [05:28<11:41,  4.38s/it][A
 33%|█████████████▋                            | 77/236 [05:33<11:36,  4.38s/it][A
 33%|█████████████▉                            | 78/236 [05:37<11:32,  4.38s/it][A
 33%|██████████████                            | 79/236 [05:41<11:28,  4.38s/it][A
 34%|██████████████▏                           | 80/236 [05:46<11:23,  4.38s/it][A
 34%|██████████████▍                           | 81/236 [05:50<11:19,  4.38s/it][A
 35%|██████████████▌                           | 82/236 [05:55<11:15,  4.38s

 71%|█████████████████████████████▏           | 168/236 [12:12<04:58,  4.38s/it][A
 72%|█████████████████████████████▎           | 169/236 [12:16<04:53,  4.38s/it][A
 72%|█████████████████████████████▌           | 170/236 [12:20<04:49,  4.38s/it][A
 72%|█████████████████████████████▋           | 171/236 [12:25<04:44,  4.38s/it][A
 73%|█████████████████████████████▉           | 172/236 [12:29<04:40,  4.38s/it][A
 73%|██████████████████████████████           | 173/236 [12:34<04:36,  4.38s/it][A
 74%|██████████████████████████████▏          | 174/236 [12:38<04:31,  4.38s/it][A
 74%|██████████████████████████████▍          | 175/236 [12:42<04:27,  4.38s/it][A
 75%|██████████████████████████████▌          | 176/236 [12:47<04:23,  4.38s/it][A
 75%|██████████████████████████████▊          | 177/236 [12:51<04:18,  4.38s/it][A
 75%|██████████████████████████████▉          | 178/236 [12:56<04:14,  4.38s/it][A
 76%|███████████████████████████████          | 179/236 [13:00<04:09,  4.38s

  6%|██▎                                       | 13/236 [00:52<16:14,  4.37s/it][A
  6%|██▍                                       | 14/236 [00:57<16:13,  4.38s/it][A
  6%|██▋                                       | 15/236 [01:01<16:10,  4.39s/it][A
  7%|██▊                                       | 16/236 [01:06<16:06,  4.40s/it][A
  7%|███                                       | 17/236 [01:10<16:01,  4.39s/it][A
  8%|███▏                                      | 18/236 [01:14<15:56,  4.39s/it][A
  8%|███▍                                      | 19/236 [01:19<15:51,  4.39s/it][A
  8%|███▌                                      | 20/236 [01:23<15:49,  4.39s/it][A
  9%|███▋                                      | 21/236 [01:28<15:44,  4.39s/it][A
  9%|███▉                                      | 22/236 [01:32<15:42,  4.40s/it][A
 10%|████                                      | 23/236 [01:36<15:39,  4.41s/it][A
 10%|████▎                                     | 24/236 [01:41<15:34,  4.41s

 47%|███████████████████                      | 110/236 [08:00<09:15,  4.41s/it][A
 47%|███████████████████▎                     | 111/236 [08:05<09:11,  4.41s/it][A
 47%|███████████████████▍                     | 112/236 [08:09<09:06,  4.41s/it][A
 48%|███████████████████▋                     | 113/236 [08:13<09:02,  4.41s/it][A
 48%|███████████████████▊                     | 114/236 [08:18<08:58,  4.41s/it][A
 49%|███████████████████▉                     | 115/236 [08:22<08:53,  4.41s/it][A
 49%|████████████████████▏                    | 116/236 [08:27<08:48,  4.41s/it][A
 50%|████████████████████▎                    | 117/236 [08:31<08:44,  4.41s/it][A
 50%|████████████████████▌                    | 118/236 [08:35<08:39,  4.40s/it][A
 50%|████████████████████▋                    | 119/236 [08:40<08:35,  4.41s/it][A
 51%|████████████████████▊                    | 120/236 [08:44<08:31,  4.41s/it][A
 51%|█████████████████████                    | 121/236 [08:49<08:26,  4.41s

 88%|███████████████████████████████████▉     | 207/236 [15:11<02:08,  4.45s/it][A
 88%|████████████████████████████████████▏    | 208/236 [15:15<02:04,  4.44s/it][A
 89%|████████████████████████████████████▎    | 209/236 [15:20<02:00,  4.45s/it][A
 89%|████████████████████████████████████▍    | 210/236 [15:24<01:56,  4.47s/it][A
 89%|████████████████████████████████████▋    | 211/236 [15:29<01:51,  4.47s/it][A
 90%|████████████████████████████████████▊    | 212/236 [15:33<01:46,  4.44s/it][A
 90%|█████████████████████████████████████    | 213/236 [15:38<01:42,  4.45s/it][A
 91%|█████████████████████████████████████▏   | 214/236 [15:42<01:37,  4.45s/it][A
 91%|█████████████████████████████████████▎   | 215/236 [15:46<01:32,  4.43s/it][A
 92%|█████████████████████████████████████▌   | 216/236 [15:51<01:28,  4.43s/it][A
 92%|█████████████████████████████████████▋   | 217/236 [15:55<01:23,  4.42s/it][A
 92%|█████████████████████████████████████▊   | 218/236 [16:00<01:19,  4.42s

 22%|█████████▎                                | 52/236 [03:49<13:31,  4.41s/it][A
 22%|█████████▍                                | 53/236 [03:53<13:25,  4.40s/it][A
 23%|█████████▌                                | 54/236 [03:57<13:19,  4.39s/it][A
 23%|█████████▊                                | 55/236 [04:02<13:14,  4.39s/it][A
 24%|█████████▉                                | 56/236 [04:06<13:09,  4.39s/it][A
 24%|██████████▏                               | 57/236 [04:11<13:04,  4.38s/it][A
 25%|██████████▎                               | 58/236 [04:15<13:01,  4.39s/it][A
 25%|██████████▌                               | 59/236 [04:19<12:58,  4.40s/it][A
 25%|██████████▋                               | 60/236 [04:24<12:53,  4.39s/it][A
 26%|██████████▊                               | 61/236 [04:28<12:47,  4.39s/it][A
 26%|███████████                               | 62/236 [04:33<12:42,  4.38s/it][A
 27%|███████████▏                              | 63/236 [04:37<12:38,  4.38s

 63%|█████████████████████████▉               | 149/236 [10:53<06:20,  4.38s/it][A
 64%|██████████████████████████               | 150/236 [10:58<06:16,  4.38s/it][A
 64%|██████████████████████████▏              | 151/236 [11:02<06:12,  4.38s/it][A
 64%|██████████████████████████▍              | 152/236 [11:07<06:07,  4.38s/it][A
 65%|██████████████████████████▌              | 153/236 [11:11<06:03,  4.38s/it][A
 65%|██████████████████████████▊              | 154/236 [11:15<05:59,  4.38s/it][A
 66%|██████████████████████████▉              | 155/236 [11:20<05:54,  4.38s/it][A
 66%|███████████████████████████              | 156/236 [11:24<05:50,  4.38s/it][A
 67%|███████████████████████████▎             | 157/236 [11:29<05:45,  4.38s/it][A
 67%|███████████████████████████▍             | 158/236 [11:33<05:41,  4.38s/it][A
 67%|███████████████████████████▌             | 159/236 [11:37<05:37,  4.38s/it][A
 68%|███████████████████████████▊             | 160/236 [11:42<05:32,  4.38s


  0%|                                                   | 0/236 [00:00<?, ?it/s][A
  1%|▎                                          | 2/236 [00:04<08:31,  2.19s/it][A
  1%|▌                                          | 3/236 [00:08<11:02,  2.84s/it][A
  2%|▋                                          | 4/236 [00:13<12:46,  3.30s/it][A
  2%|▉                                          | 5/236 [00:17<13:57,  3.63s/it][A
  3%|█                                          | 6/236 [00:21<14:45,  3.85s/it][A
  3%|█▎                                         | 7/236 [00:26<15:17,  4.01s/it][A
  3%|█▍                                         | 8/236 [00:30<15:38,  4.12s/it][A
  4%|█▋                                         | 9/236 [00:34<15:51,  4.19s/it][A
  4%|█▊                                        | 10/236 [00:39<15:59,  4.25s/it][A
  5%|█▉                                        | 11/236 [00:43<16:04,  4.29s/it][A
  5%|██▏                                       | 12/236 [00:48<16:05,  4.31

 42%|█████████████████▍                        | 98/236 [07:04<10:03,  4.37s/it][A
 42%|█████████████████▌                        | 99/236 [07:08<09:59,  4.37s/it][A
 42%|█████████████████▎                       | 100/236 [07:13<09:54,  4.37s/it][A
 43%|█████████████████▌                       | 101/236 [07:17<09:50,  4.37s/it][A
 43%|█████████████████▋                       | 102/236 [07:21<09:46,  4.37s/it][A
 44%|█████████████████▉                       | 103/236 [07:26<09:41,  4.37s/it][A
 44%|██████████████████                       | 104/236 [07:30<09:37,  4.37s/it][A
 44%|██████████████████▏                      | 105/236 [07:34<09:32,  4.37s/it][A
 45%|██████████████████▍                      | 106/236 [07:39<09:28,  4.37s/it][A
 45%|██████████████████▌                      | 107/236 [07:43<09:24,  4.37s/it][A
 46%|██████████████████▊                      | 108/236 [07:48<09:19,  4.37s/it][A
 46%|██████████████████▉                      | 109/236 [07:52<09:15,  4.37s

 83%|█████████████████████████████████▉       | 195/236 [14:08<02:59,  4.38s/it][A
 83%|██████████████████████████████████       | 196/236 [14:13<02:55,  4.38s/it][A
 83%|██████████████████████████████████▏      | 197/236 [14:17<02:50,  4.38s/it][A
 84%|██████████████████████████████████▍      | 198/236 [14:21<02:46,  4.38s/it][A
 84%|██████████████████████████████████▌      | 199/236 [14:26<02:41,  4.37s/it][A
 85%|██████████████████████████████████▋      | 200/236 [14:30<02:37,  4.37s/it][A
 85%|██████████████████████████████████▉      | 201/236 [14:34<02:33,  4.38s/it][A
 86%|███████████████████████████████████      | 202/236 [14:39<02:28,  4.38s/it][A
 86%|███████████████████████████████████▎     | 203/236 [14:43<02:24,  4.38s/it][A
 86%|███████████████████████████████████▍     | 204/236 [14:48<02:20,  4.38s/it][A
 87%|███████████████████████████████████▌     | 205/236 [14:52<02:15,  4.37s/it][A
 87%|███████████████████████████████████▊     | 206/236 [14:56<02:11,  4.37s

 17%|███████                                   | 40/236 [02:51<14:20,  4.39s/it][A
 17%|███████▎                                  | 41/236 [02:55<14:15,  4.39s/it][A
 18%|███████▍                                  | 42/236 [02:59<14:11,  4.39s/it][A
 18%|███████▋                                  | 43/236 [03:04<14:06,  4.39s/it][A
 19%|███████▊                                  | 44/236 [03:08<14:02,  4.39s/it][A
 19%|████████                                  | 45/236 [03:13<13:58,  4.39s/it][A
 19%|████████▏                                 | 46/236 [03:17<13:53,  4.39s/it][A
 20%|████████▎                                 | 47/236 [03:21<13:49,  4.39s/it][A
 20%|████████▌                                 | 48/236 [03:26<13:44,  4.39s/it][A
 21%|████████▋                                 | 49/236 [03:30<13:40,  4.39s/it][A
 21%|████████▉                                 | 50/236 [03:35<13:36,  4.39s/it][A
 22%|█████████                                 | 51/236 [03:39<13:31,  4.39s

 58%|███████████████████████▊                 | 137/236 [09:57<07:14,  4.39s/it][A
 58%|███████████████████████▉                 | 138/236 [10:01<07:10,  4.39s/it][A
 59%|████████████████████████▏                | 139/236 [10:05<07:05,  4.39s/it][A
 59%|████████████████████████▎                | 140/236 [10:10<07:01,  4.39s/it][A
 60%|████████████████████████▍                | 141/236 [10:14<06:56,  4.39s/it][A
 60%|████████████████████████▋                | 142/236 [10:18<06:52,  4.39s/it][A
 61%|████████████████████████▊                | 143/236 [10:23<06:48,  4.39s/it][A
 61%|█████████████████████████                | 144/236 [10:27<06:44,  4.39s/it][A
 61%|█████████████████████████▏               | 145/236 [10:32<06:40,  4.40s/it][A
 62%|█████████████████████████▎               | 146/236 [10:36<06:35,  4.40s/it][A
 62%|█████████████████████████▌               | 147/236 [10:40<06:31,  4.40s/it][A
 63%|█████████████████████████▋               | 148/236 [10:45<06:26,  4.39s

 99%|████████████████████████████████████████▋| 234/236 [17:03<00:08,  4.39s/it][A
100%|████████████████████████████████████████▊| 235/236 [17:07<00:04,  4.39s/it][A
100%|█████████████████████████████████████████| 236/236 [17:09<00:00,  3.57s/it][A10/15/2022 21:54:29 - INFO - /Users/lavina/opt/anaconda3/envs/CustomLegalBERT/lib/python3.7/site-packages/datasets/metric.py -   Removing /Users/lavina/.cache/huggingface/metrics/f1/default/default_experiment-1-0.arrow
                                                                                
[A{'eval_loss': 0.1568325012922287, 'eval_f1': 0.7851851851851852, 'eval_runtime': 1035.1003, 'eval_samples_per_second': 1.819, 'eval_steps_per_second': 0.228, 'epoch': 1.8}
 90%|██████████████████████████████████▎   | 850/942 [23:19:39<41:31, 27.08s/it]
100%|█████████████████████████████████████████| 236/236 [17:10<00:00,  3.57s/it][A
{'loss': 0.1038, 'learning_rate': 4.45859872611465e-07, 'epoch': 1.91}          [A
 96%|████████████████████

 33%|██████████████                            | 79/236 [05:43<11:32,  4.41s/it][A
 34%|██████████████▏                           | 80/236 [05:47<11:27,  4.41s/it][A
 34%|██████████████▍                           | 81/236 [05:51<11:22,  4.40s/it][A
 35%|██████████████▌                           | 82/236 [05:56<11:17,  4.40s/it][A
 35%|██████████████▊                           | 83/236 [06:00<11:13,  4.40s/it][A
 36%|██████████████▉                           | 84/236 [06:05<11:08,  4.40s/it][A
 36%|███████████████▏                          | 85/236 [06:09<11:03,  4.39s/it][A
 36%|███████████████▎                          | 86/236 [06:13<10:58,  4.39s/it][A
 37%|███████████████▍                          | 87/236 [06:18<10:54,  4.39s/it][A
 37%|███████████████▋                          | 88/236 [06:22<10:49,  4.39s/it][A
 38%|███████████████▊                          | 89/236 [06:27<10:45,  4.39s/it][A
 38%|████████████████                          | 90/236 [06:31<10:40,  4.39s

 75%|██████████████████████████████▌          | 176/236 [12:50<04:23,  4.39s/it][A
 75%|██████████████████████████████▊          | 177/236 [12:54<04:18,  4.39s/it][A
 75%|██████████████████████████████▉          | 178/236 [12:58<04:14,  4.39s/it][A
 76%|███████████████████████████████          | 179/236 [13:03<04:10,  4.39s/it][A
 76%|███████████████████████████████▎         | 180/236 [13:07<04:05,  4.39s/it][A
 77%|███████████████████████████████▍         | 181/236 [13:12<04:01,  4.39s/it][A
 77%|███████████████████████████████▌         | 182/236 [13:16<03:57,  4.39s/it][A
 78%|███████████████████████████████▊         | 183/236 [13:20<03:52,  4.39s/it][A
 78%|███████████████████████████████▉         | 184/236 [13:25<03:48,  4.39s/it][A
 78%|████████████████████████████████▏        | 185/236 [13:29<03:43,  4.39s/it][A
 79%|████████████████████████████████▎        | 186/236 [13:33<03:39,  4.39s/it][A
 79%|████████████████████████████████▍        | 187/236 [13:38<03:35,  4.39s