<center><h2>ALTeGraD 2022<br>Lab Session 3: NLP Frameworks</h2> 15 / 11 / 2022<br> M. Kamal Eddine, H. Abdine<br><br>


<b>Student name:</b> Baptiste PASQUIER

</center>

In this lab you will learn how to use Fairseq and HuggingFace transformers - The most used libraries by researchers and developers to pretrain and finetune language models - to finetune a pretrained French language model ($RoBERTa_{small}^{fr}$) on the sentiment analysis dataset CLS_Books where each review is labeled as positive or negative.

In the first part of this lab, you will finetune the given model on model on CLS_Books dataset using <b>Fairseq</b> by following these steps:<br>

 1- <b>Tokenize the reviews</b> (Train, Valid and Test) using trained sentencepiece tokenizer provided alongside the pretrained model.[using sentencepiece library and setting the parameter <b>out_type=str</b> in the encode function].<br>
 2- <b>Binarize the tokenized reviews and their labels</b> using the preprocess python script provided in Fairseq.<br>
 3- <b>Fintune the pretrained $RoBERTa_{small}^{fr}$ model</b> using the train python script provided in Fairseq.<br>
 
 Finally, you will finish the first part by training a random $RoBERTa_{small}^{fr}$ model on the CLS_Books dataset and compare the results against the pretrained model while <b>visualizing the accuracies on tensorboard</b>.

 In the second part of this lab, you will use <b>HuggingFace's transformers</b> library to perform the finrtuning done previously with Fairseq.



# <b>Part 1: Fairseq</b>

## <b>Preparing the environment and installing libraries, model and data</b>

In this section, we will setup the environment on Google Colab (first cell), download the pretraind model (second cell) and the finetuning dataset (third cell). In case you are using your personal computer maket sure to:

1- Use Ubuntu (or any similar linux distribution) or MacOS. <b> P.S. In case you have Windows, please use Google Colab. We won't respond to any question regarding errors on Windows. </b>

2- <b>Use Anaconda</b> and create new environment if you already installed Fairseq since we will be using a slightly modified version of this library.

3- <b>Do not run the following three cells</b>. Instead, use their content on your personal command line.

In [None]:
!mkdir altegrad.lab3 && cd altegrad.lab3 && mkdir libs
%cd altegrad.lab3/libs
!git clone https://github.com/hadi-abdine/fairseq
!pip install git+https://github.com/hadi-abdine/fairseq
!git clone https://github.com/huggingface/transformers.git
!pip install git+https://github.com/huggingface/transformers.git
!pip install datasets
!pip install evaluate
!pip install sentencepiece
!pip install tensorboardX
%load_ext tensorboard

In [None]:
!cd .. && mkdir models
%cd ../models
!wget -c "https://onedrive.live.com/download?cid=AE69638675180117&resid=AE69638675180117%21267604&authkey=ANaaKIDigQPyJlM" -O "model_fairseq.zip"
!unzip model_fairseq.zip
!rm model_fairseq.zip
!rm -rf __MACOSX/

In [None]:
!cd .. && mkdir data
%cd ../data
!wget -c "https://onedrive.live.com/download?cid=AE69638675180117&resid=AE69638675180117%21267606&authkey=AA8Td6LoeijplD4" -O "cls.books.zip"
!unzip cls.books.zip
!rm cls.books.zip
!rm -rf __MACOSX/
%cd ..

## <b> Number of parameters of the model</b>

In this section you have to compute the number of parameters of $RoBERTa_{small}^{fr}$ using PyTorch. (<b>Hint:</b> you can check the architecture of the model using model['model'])

In [4]:
import torch

n_parameters = 0
model = torch.load("models/RoBERTa_small_fr/model.pt")
# your code here


def number_parameters(list_layers):
    result = sum([model["model"][layer].shape.numel() for layer in list_layers])
    print(result)
    return result


print("Embedding parameters :")
n_embedding = number_parameters(
    [
        "encoder.sentence_encoder.embed_tokens.weight",
        "encoder.sentence_encoder.embed_positions.weight",
        # 'encoder.sentence_encoder.layernorm_embedding.weight',
        # 'encoder.sentence_encoder.layernorm_embedding.bias'
    ]
)

n_attention_layer = 0

print("Attention layer parameters :")
for i in [0, 1, 2, 3]:
    n_attention_layer += number_parameters(
        [
            f"encoder.sentence_encoder.layers.{i}.self_attn.k_proj.weight",
            f"encoder.sentence_encoder.layers.{i}.self_attn.k_proj.bias",
            f"encoder.sentence_encoder.layers.{i}.self_attn.v_proj.weight",
            f"encoder.sentence_encoder.layers.{i}.self_attn.v_proj.bias",
            f"encoder.sentence_encoder.layers.{i}.self_attn.q_proj.weight",
            f"encoder.sentence_encoder.layers.{i}.self_attn.q_proj.bias",
            f"encoder.sentence_encoder.layers.{i}.self_attn.out_proj.weight",
            f"encoder.sentence_encoder.layers.{i}.self_attn.out_proj.bias",
            # f'encoder.sentence_encoder.layers.{i}.self_attn_layer_norm.weight',
            # f'encoder.sentence_encoder.layers.{i}.self_attn_layer_norm.bias',
            f"encoder.sentence_encoder.layers.{i}.fc1.weight",
            f"encoder.sentence_encoder.layers.{i}.fc1.bias",
            f"encoder.sentence_encoder.layers.{i}.fc2.weight",
            f"encoder.sentence_encoder.layers.{i}.fc2.bias",
            # f'encoder.sentence_encoder.layers.{i}.final_layer_norm.weight',
            # f'encoder.sentence_encoder.layers.{i}.final_layer_norm.bias'
        ]
    )

print("Total 4 Attention layers parameters :")
print(n_attention_layer)

n_parameters = n_embedding + n_attention_layer

print("Total model parameters :")
print(n_parameters)

Embedding parameters :
16516096
Attention layer parameters :
1575936
1575936
1575936
1575936
Total 4 Attention layers parameters :
6303744
Total model parameters :
22819840


## <b>Tokenizing the reviews</b>

In this section we will tokenize the finetuning dataset using sentenpiece tokenizer. We have three splits in our datase: train valid and test sets. 

In this task you have to use the trained sentencepiece tokenizer (RoBERTa_small_fr/sentencepiece.bpe.model) to tokenize the three files <b>train.review</b>, <b>valid.review</b> and <b>test.review</b> and output the three files <b>train.spm.review</b>, <b>valid.spm.review</b> and <b>test.spm.review</b> containing the tokenized reviews.

In [5]:
import sentencepiece as spm

s = spm.SentencePieceProcessor(
    model_file="models/RoBERTa_small_fr/sentencepiece.bpe.model"
)

SPLITS = ["train", "test", "valid"]
SENTS = "review"

for split in SPLITS:
    with open("data/cls.books/" + split + "." + SENTS, "r") as f:
        reviews = f.readlines()
        reviews = s.encode(reviews, out_type=str)
        reviews = [" ".join(review) for review in reviews]

        # It should look something like that
        # ▁An ci enne ▁VS ▁Nouvelle ▁version ▁plus
        print(reviews[0])
    with open("data/cls.books/" + split + ".spm." + SENTS, "w") as f:
        for review in reviews:
            f.write(review + "\n")

▁Ce ▁livre ▁est ▁tout ▁simplement ▁mag ique ▁! ▁il ▁vaut ▁le ▁de tour ▁suspens e , ▁ humour , ▁magie , ▁triste sse , ▁courage , son t ▁parfaitement ▁re group és ▁dans ▁cet ▁ ouvrage ▁... ▁Un ▁vrai ▁chef - d ' oeuvre ▁! ▁Bien ▁que ▁le ▁sort ▁de ▁Harry ▁soit ▁pré visible , ▁en ▁raison ▁de ▁sa ▁sortie ▁quasi ▁simultan ée ▁avec ▁les ▁deux ▁tome s ▁suivant s ▁; ▁ce ▁livre ▁est ▁un ▁rég al ▁que ▁se ▁soit ▁pour ▁les ▁petits ▁ou ▁les ▁grands ▁!!! ▁Johann e ▁Kat hle en ▁Row ling ▁a ▁su ▁ali er ▁plusieurs ▁histoire s ▁qui ▁para issent ▁totalement ▁différentes ▁mais ▁avec ▁une ▁fin ▁qui ▁leur ▁est ▁commun ne ▁et ▁quasi ▁intro uv able ▁jusqu ' à ▁la ▁fin ▁! ▁Elle ▁à ▁également ▁su ▁comment ▁faire ▁pour ▁att ir er ▁une ▁génération ▁de ▁non ▁lecteur s ▁avec ▁ses ▁livres ▁" é norm es " ▁et ▁sans ▁images ▁! ▁Entre ▁nous ▁il ▁ya ▁de ▁quoi ▁se ▁demander ▁qui ▁le ▁magic ien ▁Harry ▁ou ▁elle ▁? ▁Bravo ▁!!!
▁J ' ai ▁lu ▁ce ▁livre ▁car ▁dans ▁ma ▁ville , ▁tout ▁le ▁monde ▁s ' en ▁sert ▁et ▁le ▁commande . ▁C

## <b>Binarizing the finetuning dataset</b>

In this section, you have to binarize the CLS_Books dataset using the <b>fairseq/fairseq_cli/preprocess.py</b> script:

1- Binarize the tokenized reviews and put the output in <b>data/cls-books-bin/input0</b>. Note: Our pretrained model's embedding matrix contains only the embedding of the vocab listed in the dictionary <b>dict.txt</b>

2- Binarize the labels (train.label, valid.label and test.label files) and put the output in <b>data/cls-books-bin/label</b>.

Use `!python libs/fairseq/fairseq_cli/preprocess.py --help` to get details about the arguments and visit the fairseq github repository for further help.

In [6]:
!(python libs/fairseq/fairseq_cli/preprocess.py \
    --only-source \
    --trainpref data/cls.books/train.spm.review \
    --validpref data/cls.books/valid.spm.review \
    --testpref data/cls.books/test.spm.review \
    --srcdict models/RoBERTa_small_fr/dict.txt \
    --destdir data/cls.books/input0 \
    --workers 8)#fill me - binarize the tokenized reviews

!(python libs/fairseq/fairseq_cli/preprocess.py \
    --only-source \
    --trainpref data/cls.books/train.label \
    --validpref data/cls.books/valid.label \
    --testpref data/cls.books/test.label \
    --destdir data/cls.books/label \
    --workers 8)#fill me - binarize the labels

2022-11-22 01:57:29 | INFO | fairseq_cli.preprocess | Namespace(no_progress_bar=False, log_interval=100, log_format=None, log_file=None, aim_repo=None, aim_run_hash=None, tensorboard_logdir=None, wandb_project=None, azureml_logging=False, seed=1, cpu=False, tpu=False, bf16=False, memory_efficient_bf16=False, fp16=False, memory_efficient_fp16=False, fp16_no_flatten_grads=False, fp16_init_scale=128, fp16_scale_window=None, fp16_scale_tolerance=0.0, on_cpu_convert_precision=False, min_loss_scale=0.0001, threshold_loss_scale=None, amp=False, amp_batch_retries=2, amp_init_scale=128, amp_scale_window=None, user_dir=None, empty_cache_freq=0, all_gather_list_size=16384, model_parallel_size=1, quantization_config_path=None, profile=False, reset_logging=False, suppress_crashes=False, use_plasma_view=False, plasma_path='/tmp/plasma', criterion='cross_entropy', tokenizer=None, bpe=None, optimizer=None, lr_scheduler='fixed', scoring='bleu', task='translation', source_lang=None, target_lang=None, tr

## <b>Finetuning $RoBERTa_{small}^{fr}$</b>

In this section you will use <b>fairseq/fairseq_cli/train.py</b> python script to finetune the pretrained model on the CLS_Books dataset (binarized data) for three different seeds: 0, 1 and 2. 

Make sure to use the following hyper-parameters: $\textit{batch size}=8, \textit{max number of epochs}: 5, \textit{optimizer}: Adam, \textit{max learning rate}: 1e-05,  \textit{warm up ratio}: 0.06, \textit{learning rate scheduler}: linear$

In [7]:
DATA_SET = "books"
TASK = "sentence_prediction"  # fill me, sentence prediction task on fairseq
MODEL = "RoBERTa_small_fr"
DATA_PATH = "data/cls.books"  # fill me
MODEL_PATH = "models/RoBERTa_small_fr/model.pt"  # fill me
MAX_EPOCH = 5  # fill me
MAX_SENTENCES = 8  # fill me, batch size
MAX_UPDATE = int(
    MAX_EPOCH * 1800 / MAX_SENTENCES
)  # fill me, n_epochs * n_train_examples / total batch size
LR = 1e-5  # fill me
VALID_SUBSET = "valid,test"  # for simplicity we will validate on both valid and test set, and then pick the value of test set corresponding the best validation score.
METRIC = "accuracy"  # fill me, use the accuracy metric
NUM_CLASSES = 2  # fill me, number of classes
SEEDS = 3
CUDA_VISIBLE_DEVICES = 0
WARMUP = 6  # fill me, warmup ratio=6% of the whole training

In [8]:
for SEED in range(SEEDS):
  TENSORBOARD_LOGS= 'tensorboard_logs/'+TASK+'/'+DATA_SET+'/'+MODEL+'_ms'+str(MAX_SENTENCES)+'_mu'+str(MAX_UPDATE)+'_lr'+str(LR)+'_me'+str(MAX_EPOCH)+'/'+str(SEED)
  SAVE_DIR= 'checkpoints/'+TASK+'/'+DATA_SET+'/'+MODEL+'_ms'+str(MAX_SENTENCES)+'_mu'+str(MAX_UPDATE)+'_lr'+str(LR)+'_me'+str(MAX_EPOCH)+'/'+str(SEED)
  !(python libs/fairseq/fairseq_cli/train.py $DATA_PATH \
                --restore-file $MODEL_PATH \
                --batch-size $MAX_SENTENCES \
                --task $TASK \
                --update-freq 1 \
                --seed $SEED \
                --reset-optimizer --reset-dataloader --reset-meters \
                --init-token 0 \
                --separator-token 2 \
                --arch roberta_small \
                --criterion sentence_prediction \
                --num-classes $NUM_CLASSES \
                --weight-decay 0.01 \
                --optimizer adam --adam-betas "(0.9, 0.98)" --adam-eps 1e-08 \
                --maximize-best-checkpoint-metric \
                --best-checkpoint-metric $METRIC \
                --save-dir $SAVE_DIR \
                --lr-scheduler polynomial_decay \
                --lr $LR \
                --max-update $MAX_UPDATE \
                --total-num-update $MAX_UPDATE \
                --no-epoch-checkpoints \
                --no-last-checkpoints \
                --tensorboard-logdir $TENSORBOARD_LOGS \
                --log-interval 5 \
                --warmup-updates $WARMUP \
                --max-epoch $MAX_EPOCH \
                --keep-best-checkpoints 1 \
                --max-positions 256 \
                --valid-subset $VALID_SUBSET \
                --shorten-method 'truncate' \
                --no-save \
                --distributed-world-size 1)


2022-11-22 01:57:34 | INFO | fairseq_cli.train | {'_name': None, 'common': {'_name': None, 'no_progress_bar': False, 'log_interval': 5, 'log_format': None, 'log_file': None, 'aim_repo': None, 'aim_run_hash': None, 'tensorboard_logdir': 'tensorboard_logs/sentence_prediction/books/RoBERTa_small_fr_ms8_mu1125_lr1e-05_me5/0', 'wandb_project': None, 'azureml_logging': False, 'seed': 0, 'cpu': False, 'tpu': False, 'bf16': False, 'memory_efficient_bf16': False, 'fp16': False, 'memory_efficient_fp16': False, 'fp16_no_flatten_grads': False, 'fp16_init_scale': 128, 'fp16_scale_window': None, 'fp16_scale_tolerance': 0.0, 'on_cpu_convert_precision': False, 'min_loss_scale': 0.0001, 'threshold_loss_scale': None, 'amp': False, 'amp_batch_retries': 2, 'amp_init_scale': 128, 'amp_scale_window': None, 'user_dir': None, 'empty_cache_freq': 0, 'all_gather_list_size': 16384, 'model_parallel_size': 1, 'quantization_config_path': None, 'profile': False, 'reset_logging': False, 'suppress_crashes': False, 'us

## <b>Random $RoBERTa_{small}^{fr}$ model training:</b>

In this section you have to finetune a random checkpinf of the model $RoBERTa_{small}^{fr}$ using the same setting as before (<b>Hint:</b> an unexisted model path will not give you an error) 

In [9]:
DATA_SET = "books"
TASK = "sentence_prediction"  # fill me, sentence prediction task on fairseq
MODEL = "RoBERTa_small_fr_random"
DATA_PATH = "data/cls.books"  # fill me
MODEL_PATH = "xxx"  # fill me
MAX_EPOCH = 5  # fill me
MAX_SENTENCES = 8  # fill me, batch size
MAX_UPDATE = int(
    MAX_EPOCH * 1800 / MAX_SENTENCES
)  # fill me, n_epochs * n_train_examples / total batch size
LR = 1e-5  # fill me
VALID_SUBSET = "valid,test"  # for simplicity we will validate on both valid and test set, and then pick the value of test set corresponding the best validation score.
METRIC = "accuracy"  # fill me, use the accuracy metric
NUM_CLASSES = 2  # fill me, number of classes
SEEDS = 3
CUDA_VISIBLE_DEVICES = 0
WARMUP = 6  # fill me, warmup ratio=6% of the whole training

In [10]:
for SEED in range(SEEDS):
  TENSORBOARD_LOGS= 'tensorboard_logs/'+TASK+'/'+DATA_SET+'/'+MODEL+'_ms'+str(MAX_SENTENCES)+'_mu'+str(MAX_UPDATE)+'_lr'+str(LR)+'_me'+str(MAX_EPOCH)+'/'+str(SEED)
  SAVE_DIR= 'checkpoints/'+TASK+'/'+DATA_SET+'/'+MODEL+'_ms'+str(MAX_SENTENCES)+'_mu'+str(MAX_UPDATE)+'_lr'+str(LR)+'_me'+str(MAX_EPOCH)+'/'+str(SEED)
  !(python libs/fairseq/fairseq_cli/train.py $DATA_PATH \
                --restore-file $MODEL_PATH \
                --batch-size $MAX_SENTENCES \
                --task $TASK \
                --update-freq 1 \
                --seed $SEED \
                --reset-optimizer --reset-dataloader --reset-meters \
                --init-token 0 \
                --separator-token 2 \
                --arch roberta_small \
                --criterion sentence_prediction \
                --num-classes $NUM_CLASSES \
                --weight-decay 0.01 \
                --optimizer adam --adam-betas "(0.9, 0.98)" --adam-eps 1e-08 \
                --maximize-best-checkpoint-metric \
                --best-checkpoint-metric $METRIC \
                --save-dir $SAVE_DIR \
                --lr-scheduler polynomial_decay \
                --lr $LR \
                --max-update $MAX_UPDATE \
                --total-num-update $MAX_UPDATE \
                --no-epoch-checkpoints \
                --no-last-checkpoints \
                --tensorboard-logdir $TENSORBOARD_LOGS \
                --log-interval 5 \
                --warmup-updates $WARMUP \
                --max-epoch $MAX_EPOCH \
                --keep-best-checkpoints 1 \
                --max-positions 256 \
                --valid-subset $VALID_SUBSET \
                --shorten-method 'truncate' \
                --no-save \
                --distributed-world-size 1)

2022-11-22 01:59:55 | INFO | fairseq_cli.train | {'_name': None, 'common': {'_name': None, 'no_progress_bar': False, 'log_interval': 5, 'log_format': None, 'log_file': None, 'aim_repo': None, 'aim_run_hash': None, 'tensorboard_logdir': 'tensorboard_logs/sentence_prediction/books/RoBERTa_small_fr_random_ms8_mu1125_lr1e-05_me5/0', 'wandb_project': None, 'azureml_logging': False, 'seed': 0, 'cpu': False, 'tpu': False, 'bf16': False, 'memory_efficient_bf16': False, 'fp16': False, 'memory_efficient_fp16': False, 'fp16_no_flatten_grads': False, 'fp16_init_scale': 128, 'fp16_scale_window': None, 'fp16_scale_tolerance': 0.0, 'on_cpu_convert_precision': False, 'min_loss_scale': 0.0001, 'threshold_loss_scale': None, 'amp': False, 'amp_batch_retries': 2, 'amp_init_scale': 128, 'amp_scale_window': None, 'user_dir': None, 'empty_cache_freq': 0, 'all_gather_list_size': 16384, 'model_parallel_size': 1, 'quantization_config_path': None, 'profile': False, 'reset_logging': False, 'suppress_crashes': Fal

## <b>Tensorboard Visualisation </B>

In the this we will use tensorboard to visualize the training, validation and test accuracies. <b>Include and analyse in you report a screenshot of the test accuracy of the six models</b>.

In [None]:
%tensorboard --logdir tensorboard_logs

# <b>Part 2: HuggingFace's Transfromers</b>

In this part of the lab, we will finetune a HuggingFace checkpoint of our $RoBERTa_{small}^{fr}$ on the CLS_Books dataset. Like in the first part we will start by downloading the HuggingFace checkpoint and <b>preparing a json format of the CLS_Books dataset</b> (Which is suitable for HuggingFace's checkpoints finetuning). Again, if you are using you personal computer, do not run the following cell and use its content to download the files on you computer, since - depending on you operating system - running this cell will produce errors. 

In [None]:
%cd models
!wget -c "https://onedrive.live.com/download?cid=AE69638675180117&resid=AE69638675180117%21267607&authkey=APJub1wVzVLAoR8" -O "model_huggingface.zip"
!unzip model_huggingface.zip
!rm model_huggingface.zip
!rm -rf __MACOSX/

%cd ../data
!mkdir cls.books-json

%cd ..

## <b>Converting the CLS_Books dataset to json line files</b>

Unlike Fairseq, you do not need to perform tokenization and binarization in Hugging Face transformer library. However, in order to use the implemented script in the transformers library, you need to convert your data to json line files (for each split: train, valid and test)

for instance, each line inside you file will consist of one and one sample only, contaning the review (accessed by the key <i>sentence1</i> and its label, accessed by the key <i>label</i>. Below you can find an example from <i>valid.json</i> file.

Note that these instructions are not valid for all kind of tasks. For other types of tasks (supported in Hugging face) you have to refer to their github for more details.<br>

---------------------------------------------------------------------
<i>
{"sentence1":"Seul ouvrage fran\u00e7ais sur le th\u00e8me Produits Structur\u00e9s \/ fonds \u00e0 formule, il permet de fa\u00e7on p\u00e9dagogique d'appr\u00e9hender parfaitement les m\u00e9canismes financiers utilis\u00e9s. Une r\u00e9f\u00e9rence pour ceux qui veulent comprendre les technicit\u00e9s de base et les raisons de l'engouement des investisseurs sur ces actifs \u00e0 hauteur de plusieurs milliards d'euros.","label":"1"}<br>
{"sentence1":"Livre tr\u00e8s int\u00e9ressant !  mais si comme moi vous cherchez des \"infos\" sur les techniques de sorties et autres \"modes d'emploi\", afin de vivre par vous m\u00eame ce genre d'exp\u00e9rience, c'est pas le bon livre.  \u00e7a ne lui enl\u00e8ve d'ailleurd rien \u00e0 son int\u00earet.","label":"0"}
</i>

---------------------------------------------------------------------



In [13]:
import json

SPLITS = ["train", "test", "valid"]

for split in SPLITS:
    with open("data/cls.books/" + split + ".review", "r") as f:
        reviews = f.readlines()
    with open("data/cls.books/" + split + ".label", "r") as f:
        labels = f.readlines()
    with open("data/cls.books-json/" + split + ".json", "w") as f:
        # fill the gap here to create train.json, valid.json and test.json
        for i, review in enumerate(reviews):
            dico = {"sentence1": review[:-1], "label": labels[i][:-1]}
            json.dump(dico, f)
            f.write("\n")

## <b>Finetuning $RoBERTa_{small}^{fr}$ using the Transformers Library</b>

In order to finrtune the model using HuggingFace, you to use the <b>run_glue.py</b> Python script located in the transformers library. For more details, refer to <a href="https://github.com/huggingface/transformers/tree/main/examples/pytorch/text-classification" target="_blank">the Huggingface/transformers repository on Github</a>. Make sure to use the same hyperparameter as in the first part of this lab.

In [14]:
DATA_SET = "books"
TRAIN_FILE = "data/cls.books-json/train.json"
VALIDATION_FILE = "data/cls.books-json/valid.json"
TEST_FILE = "data/cls.books-json/test.json"
MODEL = "RoBERTa_small_fr_huggingface"
MODEL_PATH = "models/RoBERTa_small_fr_HuggingFace/config.json"
MODEL_PATH = "models/RoBERTa_small_fr_HuggingFace"
MAX_SENTENCES = 8  # fill me, batch size.
LR = 1e-5  # fill me, learning rate
MAX_EPOCH = 5  # fill me
NUM_CLASSES = 2  # fill me
SEEDS = 3
CUDA_VISIBLE_DEVICES = 0

In [15]:
for SEED in range(SEEDS):
  SAVE_DIR= 'checkpoints/'+TASK+'/'+DATA_SET+'/'+MODEL+'_ms'+str(MAX_SENTENCES)+'_lr'+str(LR)+'_me'+str(MAX_EPOCH)+'/'+str(SEED)
  !(python libs/transformers/examples/pytorch/text-classification/run_glue.py \
    --model_name_or_path $MODEL_PATH \
    --train_file $TRAIN_FILE \
    --validation_file $VALIDATION_FILE \
    --test_file $TEST_FILE \
    --output_dir $SAVE_DIR \
    --do_train \
    --do_eval \
    --do_predict \
    --evaluation_strategy epoch \
    --per_device_train_batch_size $MAX_SENTENCES \
    --per_device_eval_batch_size $MAX_SENTENCES \
    --learning_rate $LR \
    --lr_scheduler_type polynomial \
    --weight_decay 0.01 \
    --optim adamw_hf \
    --adam_beta1 0.9 \
    --adam_beta2 0.98 \
    --adam_epsilon 1e-08 \
    --num_train_epochs $MAX_EPOCH \
    --seed $SEED \
    --save_strategy no \
  )#fill me 

11/22/2022 02:02:25 - INFO - __main__ - Training/evaluation parameters TrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.98,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=True,
do_predict=True,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=epoch,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip=F

In [None]:
%tensorboard --logdir checkpoints