**Copyright 2021 Antoine SIMOULIN.**

Licensed under the Apache License, Version 2.0 (the "License");

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

# Evaluate GPT-fr 🇫🇷 on FLUE

<img src="https://raw.githubusercontent.com/AntoineSimoulin/gpt-fr/main/imgs/logo.png" alt="GPT-fr logo" width="200">

**GPT-fr** is a French GPT model for French developped by [Quantmetry](https://www.quantmetry.com/) and the [Laboratoire de Linguistique Formelle (LLF)](http://www.llf.cnrs.fr/en).

In this notebook, we provide the minimal script to evaluate the model on the FLUE benchmark ([Le et al., 2020a](#le-2020-en), [2020b](#le-2020-fr)). FLUE aims to better compare and evaluate NLP models for French.

If you're opening this Notebook on colab, you will probably need to install 🤗 Transformers and 🤗 Tokenizers. We also provice some scripts to download the data and fine-tune the model. The scripts are based on the one provided with [FLUE benchmark](https://github.com/getalp/Flaubert).

In [None]:
%%capture
!pip install git+https://github.com/huggingface/transformers.git
!pip install tokenizers
!pip install datasets
!test -f download_flue_data.sh || wget https://github.com/AntoineSimoulin/gpt-fr/tree/main/scripts/download_flue_data.sh .
!test -f run_flue.py || wget https://github.com/AntoineSimoulin/gpt-fr/tree/main/scripts/run_flue.py .
!test -f run_flue.py || wget https://github.com/AntoineSimoulin/gpt-fr/tree/main/scripts/spinner.sh .
!chmod +x ./download_flue_data.sh
!chmod +x ./spinner.sh

## Requirements

In [None]:
import torch
import transformers
from transformers import GPT2Tokenizer, GPT2LMHeadModel

In [None]:
# Check GPU is available and libraries version
print('Pytorch version ...............{}'.format(torch.__version__))
print('Transformers version ..........{}'.format(transformers.__version__))
print('GPU available .................{}'.format('\u2705' if torch.cuda.device_count() > 0 else '\u274c'))
print('Available devices .............{}'.format(torch.cuda.device_count()))
print('Active CUDA Device: ...........{}'.format(torch.cuda.current_device()))
print('Current cuda device: ..........{}'.format(torch.cuda.current_device()))

Pytorch version ...............1.9.0+cu102
Transformers version ..........4.11.0.dev0
GPU available .................✅
Available devices .............1
Active CUDA Device: ...........0
Current cuda device: ..........0


## Download and prepare data

FLUE includes 6 tasks with various level of difficulty, degree of formality, and amount of training samples:
* The Cross Lingual Sentiment (**CLS**) task is a sentiment classification on Amazon reviews. Each subtask (books, dvd, music) is a bonary classification task (positive/negative).
* The Cross-lingual Adversarial Dataset for Paraphrase Identification (**PAWSX**) is a paraphrase identification task. The goal is to predict whether the sentences in these pairs are semantically equivalent or not.
* The Cross-lingual NLI (**XNLI**) is a natural language inference task given a premise (p) and an hypothese (h), the goal is to determine whether p entails, contradicts or neither entails nor contradicts h.
* The **Parsing and Part-of-Speech Tagging** task aims to infer constituency and dependency syntactic trees and part-of-speech tags.
* The Word Sense Disambiguation (**WSD**) is a classification task
which aims to predict the sense of words in a given context according to a specific sense inventory.

In [None]:
TASK = 'CLS-Books' #@param ["CLS-Books", "CLS-DVD", "CLS-Music", "PAWSX", "XNLI", "Parsing-Dep", "Parsing-Const", "WSD-Verb", "WSD-Nouns"]
TASK_NAME = TASK.lower().split('-')[0]

In [None]:
# We download all FLUE data. If you want to download all data, please don't use the flag `-t $TASK`
# With `TASK` in "CLS-Books" "CLS-DVD" "CLS-Music" "PAWSX" "XNLI" 
# "Parsing-Dep" "Parsing-Const" "WSD-Verb" "WSD-Nouns".
# The Parsing data are under licences which require to create a account 
 #and need therefore to be manually downloaded.
# Please report to https://dokufarm.phil.hhu.de/spmrl2014/ for instructions

!test -d ./flue_data || mkdir ./flue_data
!./download_flue_data.sh -d ./flue_data -t $TASK

[34m[1m⣽[m Downloading CLS
[34m[1m⣻[m Preprocessing CLS


## Evaluate on FLUE

In [None]:
MODEL = 'asi/gpt-fr-cased-small' #@param {type:"string"}
MAX_SEQ_LENGTH = 256 #@param {type:"integer"}
#@markdown batch size and learning rate should be separated with "/" for cross validation parameter search.
BATCH_SIZES = 8 #@param {type:"string"}
LEARNING_RATES = 5e-5/3e-5/2e-5/5e-6/1e-6 #@param {type:"string"}
NUM_TRAIN_EPOCHS = 4 #@param {type:"integer"}
#@markdown For the CLS task, the train set size is limited. Standard variation might me high and random seed search might impact results. 
N_SEEDS = 5 #@param {type:"integer"}
#@markdown If batch size do not fit into device memory, it is possible to adjust the accumulation step. Final batch size will be equals to `GRAD_ACCUMULATION_STEPS * BATCH_SIZE`.
GRAD_ACCUMULATION_STEPS = 1 #@param {type:"integer"}

In [None]:
!python run_flue.py \
    --train_file /content/flue_data/$TASK/train.tsv \
    --validation_file /content/flue_data/$TASK/dev.tsv \
    --predict_file /content/flue_data/$TASK/test.tsv \
    --model_name_or_path $MODEL \
    --tokenizer_name $MODEL \
    --output_dir /content/flue_data/$TASK \
    --max_seq_length $MAX_SEQ_LENGTH \
    --do_train \
    --do_eval \
    --do_predict \
    --task_name $TASK_NAME \
    --learning_rates 5e-6 \
    --batch_sizes $BATCH_SIZES \
    --gradient_accumulation_steps $GRAD_ACCUMULATION_STEPS \
    --num_train_epochs $NUM_TRAIN_EPOCHS \
    --n_seeds $N_SEEDS

09/06/2021 16:13:20 - INFO - __main__ -   Training/evaluation parameters TrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_find_unused_parameters=None,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=True,
do_predict=True,
do_train=True,
eval_accumulation_steps=None,
eval_steps=None,
evaluation_strategy=IntervalStrategy.NO,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
gradient_accumulation_steps=1,
greater_is_better=None,
group_by_length=False,
ignore_data_skip=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=5e-05,
length_column_name=length,
load_best_model_at_end=False,
local_rank=-1,
log_level=-1,
log_level_replica=-1,
log_on_each_node=True,
logging_dir=/content/flue_data/CLS-Books/runs/Sep06_16-13-20_9b1263df06e8,
logging_first_step=False,
logging_steps=500,
logging_strategy=IntervalStrategy.STEPS,

## References

><div id="le-2020-en">Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab:
<a href="https://www.aclweb.org/anthology/2020.lrec-1.302/"> FlauBERT: Unsupervised Language Model Pre-training for French</a>. LREC 2020: 2479-2490</div>

><div id="le-2020-fr">Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab:
<a href="https://www.aclweb.org/anthology/2020.jeptalnrecital-taln.26/">FlauBERT : des modèles de langue contextualisés pré-entraînés pour le français</a> (FlauBERT : Unsupervised Language Model Pre-training for French). JEP-TALN-RECITAL (2) 2020: 268-278</div>

