## Main ideas

* Create zero-shot baseline
* Train XLM-R on one language and zero-shot to another
* Create learning curves to see how much labelled data we need to beat the baseline
* Add youtube videos from the course
* Add id2label and label2id, e.g. "terrible", "poor", "ok", "good", "great"
* Add try it out
* Add learning objectives

🤗

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
# For HF machines
import os
os.environ["CUDA_VISIBLE_DEVICES"]="0"

## Fine-tuning your first Transformer!

In this notebook we'll take a look at fine-tuning a multilingual Transformer model called XLM-RoBERTa for text classification. At the end of this notebook you should:

* Be able to load and process a dataset from the Hugging Face Hub
* Know how to create a baseline with the zero-shot classification pipeline
* Know how to fine-tune and evaluate pretrained model on your data
* Know how to push a model to the Hugging Face Hub

## Setup

If you're running this notebook on Google Colab or locally, you'll need a few dependencies installed. You can install them with `pip` as follows:

In [None]:
#! pip install datasets transformers sentencepiece

To be able to share your model with the community and generate results like the one shown in the picture below via the inference API, there are a few more steps to follow.

First you have to store your authentication token from the Hugging Face website (sign up [here](https://huggingface.co/join) if you haven't already!) then execute the following cell and input your username and password:

In [3]:
from huggingface_hub import notebook_login

notebook_login()

Then you need to install Git-LFS. Uncomment and execute the following cell:

In [None]:
# !apt install git-lfs

## The dataset

In this tutorial we'll use the [Multilingual Amazon Reviews Corpus](https://huggingface.co/datasets/amazon_reviews_multi) (or MARC for short). This is a large-scale collection of Amazon product reviews in several languages: English, Japanese, German, French, Spanish, and Chinese. 

We can download the dataset from the Hugging Face Hub with the 🤗 Datasets library, but first let's take a look at the available subsets (also called configs):

In [3]:
from datasets import get_dataset_config_names

dataset_name = "amazon_reviews_multi"
langs = get_dataset_config_names(dataset_name)
langs

['all_languages', 'de', 'en', 'es', 'fr', 'ja', 'zh']

Okay, we can see the language codes associated with each language, as well as an `all_languages` subset which presumably concatenates all the languages together. Let's begin by downloading the German subset with the `load_dataset()` function from 🤗 Datasets:

In [7]:
from datasets import load_dataset

marc_de = load_dataset(path=dataset_name, name="de")
marc_de

Reusing dataset amazon_reviews_multi (/data/.cache/hf/datasets/amazon_reviews_multi/de/1.0.0/724e94f4b0c6c405ce7e476a6c5ef4f87db30799ad49f765094cf9770e0f7609)


  0%|          | 0/3 [00:00<?, ?it/s]

DatasetDict({
    train: Dataset({
        features: ['review_id', 'product_id', 'reviewer_id', 'stars', 'review_body', 'review_title', 'language', 'product_category'],
        num_rows: 200000
    })
    validation: Dataset({
        features: ['review_id', 'product_id', 'reviewer_id', 'stars', 'review_body', 'review_title', 'language', 'product_category'],
        num_rows: 5000
    })
    test: Dataset({
        features: ['review_id', 'product_id', 'reviewer_id', 'stars', 'review_body', 'review_title', 'language', 'product_category'],
        num_rows: 5000
    })
})

One cool feature of 🤗 Datasets is that `load_dataset()` will cache the files at `~/.cache/huggingface/dataset/`, so you won't need to re-download the dataset the next time your run the notebook. We can see that `german_dataset` is a `DatasetDict` object which is similar to a Python dictionary, with each key corresponding to a different split. 

We can access one ot these splits just like an ordinary dictionary:

In [8]:
train_ds = marc_de["train"]
train_ds

Dataset({
    features: ['review_id', 'product_id', 'reviewer_id', 'stars', 'review_body', 'review_title', 'language', 'product_category'],
    num_rows: 200000
})

This returns a `Dataset` object which behaves like a Python container, so we can query its length:

In [9]:
len(train_ds)

200000

or access a single example by its index:

In [10]:
train_ds[0]

{'stars': 1,
 'product_id': 'product_de_0865382',
 'product_category': 'sports',
 'review_title': 'Leider nach 1 Jahr kaputt',
 'review_id': 'de_0203609',
 'review_body': 'Armband ist leider nach 1 Jahr kaputt gegangen',
 'reviewer_id': 'reviewer_de_0267719',
 'language': 'de'}

This certainly looks like an Amazon product review (in this case the `review_body` text translates to "Bracelet is unfortunately broken after 1 year") and we can see the number of stars associated with the review, as well as some metadata like the language and product category. We can also see that a single row is represented as a dictionary, where the keys are the same as the column names:

In [11]:
train_ds.column_names

['review_id',
 'product_id',
 'reviewer_id',
 'stars',
 'review_body',
 'review_title',
 'language',
 'product_category']

We can also access several rows with a slice:

In [15]:
german_dataset["train"][:3]

{'review_id': ['de_0203609', 'de_0559494', 'de_0238777'],
 'product_id': ['product_de_0865382',
  'product_de_0678997',
  'product_de_0372235'],
 'reviewer_id': ['reviewer_de_0267719',
  'reviewer_de_0783625',
  'reviewer_de_0911426'],
 'stars': [1, 1, 1],
 'review_body': ['Armband ist leider nach 1 Jahr kaputt gegangen',
  'In der Lieferung war nur Ein Akku!',
  'Ein Stern, weil gar keine geht nicht. Es handelt sich um gebraucht Waren, die Stein haben so ein Belag drauf, wo man sich dabei denken kann, dass jemand schon die benutzt und nicht Mal richtig gewaschen. Bei ein paar ist die Qualität Mangelhaft, siehe Bild. Ein habe ich ausprobiert, richtig gewaschen, dann verfärbt sich..... Wärme halt nicht lange. Deswegen wird es zurückgeschickt.'],
 'review_title': ['Leider nach 1 Jahr kaputt',
  'EINS statt ZWEI Akkus!!!',
  'Achtung Abzocke'],
 'language': ['de', 'de', 'de'],
 'product_category': ['sports', 'home_improvement', 'drugstore']}

and note that now we get a list of values for each column. This is because 🤗 Datasets is based on Apache Arrow, which defines a typed columnar format that is very memory efficient. We can see the types that are used to represent our dataset by accessing the `features` attribute:

In [16]:
train_ds.features

{'review_id': Value(dtype='string', id=None),
 'product_id': Value(dtype='string', id=None),
 'reviewer_id': Value(dtype='string', id=None),
 'stars': Value(dtype='int32', id=None),
 'review_body': Value(dtype='string', id=None),
 'review_title': Value(dtype='string', id=None),
 'language': Value(dtype='string', id=None),
 'product_category': Value(dtype='string', id=None)}

Now that we've had a quick look at the objects in 🤗 Datasets, let's explore the data in more detail by using our favourite tool - Pandas!

## From Datasets to DataFrames and back

🤗 Datasets is designed to be interoperable with libraries like Pandas, as well as NumPy, PyTorch, TensorFlow, and JAX. To enable the conversion between various third-party libraries, 🤗 Datasets provides a Dataset.set_format() function. This function only changes the output format of the dataset, so you can easily switch to another format without affecting the underlying data format which is Apache Arrow. The formatting is done in-place, so let’s convert our dataset to Pandas and look at a random sample:

In [17]:
from IPython.display import display, HTML

marc_de.set_format("pandas")
df = marc_de["train"][:]
# Create a random sample
sample = df.sample(n=5, random_state=42)
display(HTML(sample.to_html()))

Unnamed: 0,review_id,product_id,reviewer_id,stars,review_body,review_title,language,product_category
119737,de_0970901,product_de_0712478,reviewer_de_0308094,3,Ist ok ...blondierung quillt schnell auf,Ok,de,beauty
72272,de_0042217,product_de_0734686,reviewer_de_0904358,2,Kein typischer Geruch oder Geschmack von einem Ghee! Ich würde es nicht wieder kaufen oder weiter empfehlen. Konkurrenz Produkt fand ich besser.,Kein typischer Geruch oder Geschmack von einem Ghee !,de,grocery
158154,de_0278932,product_de_0388890,reviewer_de_0940030,4,Dieses Buch hat mir sehr geholfen mit dem ersten Schlupf und der weiteren Aufzucht. Kann ich nur weiter empfehlen.,Sehr hilfreich,de,book
65426,de_0737352,product_de_0560586,reviewer_de_0632435,2,"super Schale, wunderschön, gutes Produkt ABER Der Saugnapf geht von der Schale runter, da die Maße des Saugnapf Ringes nicht passen. Man muss aufpassen dass man den nicht dauernd neu aufsetzen muss.",der Saugnapf hält nicht,de,baby_product
30074,de_0455430,product_de_0375951,reviewer_de_0482228,1,"Artikel ist niemals angekommen, habe ihn aber bezahlt! Und dann steht noch dort ich hätte unterschrieben, als er angeblich angekommen sei! null Sterne! Unglaublich 😒",Artikel ist niemals angekommen!!,de,book


We can see that the column headers are the same as we saw in the Arrow format and from the reviews we can see that negative reviews are associated with a lower star rating. Since we're now dealing with a `pandas.DataFrame` we can easily query our dataset. For example, let's see what the distribution of reviews per product category looks like: 

In [19]:
df["product_category"].value_counts()

home                        26063
wireless                    19964
sports                      13748
home_improvement            12408
apparel                     10178
toy                          9781
pc                           8577
drugstore                    8075
lawn_and_garden              7426
beauty                       7162
electronics                  7114
other                        6460
furniture                    6334
kitchen                      5787
automotive                   5321
pet_products                 5028
book                         4927
office_product               4343
baby_product                 4070
shoes                        3568
luggage                      3256
digital_video_download       2970
personal_care_appliances     2836
grocery                      2737
digital_ebook_purchase       2720
jewelry                      2380
camera                       1906
watch                        1706
video_games                  1219
industrial_sup

Okay, the `home`, `wireless`, and `sports` categories seem to be the most popular. How about the distribution of star ratings?

In [21]:
df["stars"].value_counts()

1    40000
2    40000
3    40000
4    40000
5    40000
Name: stars, dtype: int64

In this case we can see that the dataset is balanced across each star rating, which will make it somewhat easier to evaluate our models on. Imbalanced datasets are much more common in the real-world and in these cases some additional tricks like up- or down-sampling are usually needed.

Now that we've got a rough idea about the kind of data we're dealing with, let's reset the output format from `pandas` back to `arrow`:

In [22]:
marc_de.reset_format()

## Filtering for a product category

Although we could go ahead and fine-tune a Transformer model on the whole set of 200,000 German reviews, this will take several hours on a single GPU. So instead, we'll focus on fine-tuning a model for a single product category! In 🤗 Datasets, we can filter data very quickly by using the `Dataset.filter()` method. This method expects a function that returns Boolean values, in our case `True` if the `product_category` matches the chosen category and `False` otherwise. Here's one way to implement this, and we'll pick the `sports` category as the domain to train on:

In [31]:
product_category = "sports"

def filter_for_product(example, product_category=product_category):
    return example["product_category"] == product_category

Now when we pass `filter_for_product()` to `Dataset.filter()` we get a filtered dataset:

In [35]:
product_dataset = marc_de.filter(filter_for_product)
product_dataset

Loading cached processed dataset at /data/.cache/hf/datasets/amazon_reviews_multi/de/1.0.0/724e94f4b0c6c405ce7e476a6c5ef4f87db30799ad49f765094cf9770e0f7609/cache-415aaa5af094e2c4.arrow
Loading cached processed dataset at /data/.cache/hf/datasets/amazon_reviews_multi/de/1.0.0/724e94f4b0c6c405ce7e476a6c5ef4f87db30799ad49f765094cf9770e0f7609/cache-60d769fb1e7b4f2f.arrow
Loading cached processed dataset at /data/.cache/hf/datasets/amazon_reviews_multi/de/1.0.0/724e94f4b0c6c405ce7e476a6c5ef4f87db30799ad49f765094cf9770e0f7609/cache-9618290228399c16.arrow


DatasetDict({
    train: Dataset({
        features: ['review_id', 'product_id', 'reviewer_id', 'stars', 'review_body', 'review_title', 'language', 'product_category'],
        num_rows: 13748
    })
    validation: Dataset({
        features: ['review_id', 'product_id', 'reviewer_id', 'stars', 'review_body', 'review_title', 'language', 'product_category'],
        num_rows: 339
    })
    test: Dataset({
        features: ['review_id', 'product_id', 'reviewer_id', 'stars', 'review_body', 'review_title', 'language', 'product_category'],
        num_rows: 329
    })
})

Yep, this looks good - we have 13,748 reviews in the train split which agrees the number we saw in the distribution of categories earlier. Let's do a quick sanity check by taking a look at a few samples. Here 🤗 Datasets provides `Dataset.shuffle()` and `Dataset.select()` functions that we can chain to get a random sample:

In [44]:
product_dataset["train"].shuffle(seed=42).select(range(3))[:]

Loading cached shuffled indices for dataset at /data/.cache/hf/datasets/amazon_reviews_multi/de/1.0.0/724e94f4b0c6c405ce7e476a6c5ef4f87db30799ad49f765094cf9770e0f7609/cache-34cfe66dac005389.arrow


{'review_id': ['de_0592068', 'de_0659764', 'de_0617399'],
 'product_id': ['product_de_0646545',
  'product_de_0628607',
  'product_de_0596523'],
 'reviewer_id': ['reviewer_de_0065351',
  'reviewer_de_0938057',
  'reviewer_de_0996678'],
 'stars': [5, 2, 3],
 'review_body': ['Dieses aufblasbare Sofa ist sehr einfach aufzubauen (einfach in Wind halten) und leicht wieder einzupacken. Es war gut verpackt (eine Tasche mit Tragegurt war dabei), hat Aufbewahrungsmöglichkeiten an der Rechten Seite und einen Hering zum Befestigen am oberen Rand. Es ist sehr bequem und die Preis/Leistung ist einfach super! Ich kann es wirklich nur empfehlen :)',
  'Leider nach ca. 1 Jahr ist die Schnalle abgerissen. Schade!!!',
  'An sich ist das X-Bike nicht schlecht bis auf die Verarbeitung vom Computer. Sehr zu bemängeln habe ich aber die Pedalen bzw. Die Kugellager darin, am zweiten Tag und nach circa 30 km haben sich die Kugellager aufgelöst. Pedale lassen dich nicht mehr drehen.'],
 'review_title': ['Sehr b

Okay, now that we have our corpus of sports reviews, let's do one last bit of data preparation: creating label mappings from star ratings to human readable strings.

## Re-mapping the labels

During training, 🤗 Transformers expects the labels to be ordered, starting from 0 to N. But we've seen that our star ratings range from 1-5, so let's fix that. While we're at it, we'll create a mapping between the label IDs and names, which will be handy later on when we want to run inference with our model. First we'll define the label mapping from ID to name:

In [47]:
labels = ["terrible", "poor", "ok", "good", "great"]
id2label = {idx:label for idx, label in enumerate(labels)}
id2label

{0: 'terrible', 1: 'poor', 2: 'ok', 3: 'good', 4: 'great'}

We can then apply this mapping to our whole dataset by using the `Dataset.map()` method. Similar to the `Dataset.filter()` method, this one expects a function which receives examples as input, but returns a Python dictionary as output. The keys of the dictionary correspond to the columns, while the values correspond to the column entries. The following function creates two new columns:

* A `labels` column which is the star rating shifted down by one
* A `label_name` column which provides a nice string for each rating

In [48]:
def map_labels(example):
    # Shift labels to start from 0
    label_id = example["stars"] - 1
    return {"labels": label_id, "label_name": id2label[label_id]}

To apply this mapping, we simply feed it to `Dataset.map` as follows:

In [51]:
product_dataset = product_dataset.map(map_labels)
# Peek at the first example
product_dataset["train"][0]

  0%|          | 0/13748 [00:00<?, ?ex/s]

  0%|          | 0/339 [00:00<?, ?ex/s]

  0%|          | 0/329 [00:00<?, ?ex/s]

{'stars': 1,
 'product_id': 'product_de_0865382',
 'product_category': 'sports',
 'review_title': 'Leider nach 1 Jahr kaputt',
 'label_name': 'terrible',
 'labels': 0,
 'review_id': 'de_0203609',
 'review_body': 'Armband ist leider nach 1 Jahr kaputt gegangen',
 'reviewer_id': 'reviewer_de_0267719',
 'language': 'de'}

Great, it works! We'll also need the reverse label mapping later, so let's define it here: 

In [52]:
label2id = {v:k for k,v in id2label.items()}

## Creating a strong baseline

In [18]:
from transformers import pipeline 

zeroshot_classifier = pipeline("zero-shot-classification", model="joeddav/xlm-roberta-large-xnli", device=0)

Some weights of the model checkpoint at joeddav/xlm-roberta-large-xnli were not used when initializing XLMRobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing XLMRobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing XLMRobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [21]:
zeroshot_classifier("Dieser Wifi-Router ist perfect!", candidate_labels=["terrible","poor","ok","good","great"])

{'sequence': 'Dieser Wifi-Router ist perfect!',
 'labels': ['great', 'good', 'ok', 'poor', 'terrible'],
 'scores': [0.4981793165206909,
  0.3877113461494446,
  0.1132708266377449,
  0.0005077120731584728,
  0.00033083048765547574]}

In [22]:
def compute_zeroshot_preds(examples):
    preds = zeroshot_classifier(examples["review_body"], candidate_labels=[0,1,2,3,4])
    return {"zeroshot_prediction": preds["labels"][0]}

In [23]:
wireless_test_dataset = wireless_dataset["test"].map(compute_zeroshot_preds)
wireless_test_dataset

  0%|          | 0/491 [00:00<?, ?ex/s]



Dataset({
    features: ['review_id', 'product_id', 'reviewer_id', 'labels', 'review_body', 'review_title', 'language', 'product_category', 'zeroshot_prediction'],
    num_rows: 491
})

In [24]:
wireless_test_dataset["zeroshot_prediction"][:10]

[0, 2, 1, 0, 0, 0, 1, 2, 1, 0]

In [25]:
wireless_test_dataset["labels"][0]

0

In [28]:
from sklearn.metrics import mean_absolute_error

mean_absolute_error(wireless_test_dataset["labels"], wireless_test_dataset["zeroshot_prediction"])

1.3217922606924644

## From text to tokens

In [30]:
from transformers import AutoTokenizer

model_checkpoint = "xlm-roberta-base"
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)

In [31]:
def tokenize_reviews(examples):
    return tokenizer(examples["review_body"], truncation=True, max_length=512)

In [39]:
tokenized_dataset = wireless_dataset.map(tokenize_reviews, batched=True)
tokenized_dataset

  0%|          | 0/20 [00:00<?, ?ba/s]

  0%|          | 0/1 [00:00<?, ?ba/s]

  0%|          | 0/1 [00:00<?, ?ba/s]

DatasetDict({
    train: Dataset({
        features: ['attention_mask', 'input_ids', 'labels', 'language', 'product_category', 'product_id', 'review_body', 'review_id', 'review_title', 'reviewer_id'],
        num_rows: 19964
    })
    validation: Dataset({
        features: ['attention_mask', 'input_ids', 'labels', 'language', 'product_category', 'product_id', 'review_body', 'review_id', 'review_title', 'reviewer_id'],
        num_rows: 500
    })
    test: Dataset({
        features: ['attention_mask', 'input_ids', 'labels', 'language', 'product_category', 'product_id', 'review_body', 'review_id', 'review_title', 'reviewer_id'],
        num_rows: 491
    })
})

## Load model

In [33]:
from transformers import AutoModelForSequenceClassification

num_labels = 5
model = AutoModelForSequenceClassification.from_pretrained(model_checkpoint, num_labels=num_labels)

Some weights of the model checkpoint at xlm-roberta-base were not used when initializing XLMRobertaForSequenceClassification: ['lm_head.decoder.weight', 'lm_head.dense.weight', 'roberta.pooler.dense.bias', 'lm_head.dense.bias', 'roberta.pooler.dense.weight', 'lm_head.bias', 'lm_head.layer_norm.weight', 'lm_head.layer_norm.bias']
- This IS expected if you are initializing XLMRobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing XLMRobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of XLMRobertaForSequenceClassification were not initialized from the model checkpoint at xlm-roberta-base and are newly initialized: ['classifier.dense

## Create metrics

In [44]:
import numpy as np

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return {"MAE": mean_absolute_error(labels, predictions)}

## Create Trainer

In [35]:
%env TOKENIZERS_PARALLELISM=false

env: TOKENIZERS_PARALLELISM=false


In [41]:
from transformers import TrainingArguments

model_name = model_checkpoint.split("/")[-1]
batch_size = 16
num_train_epochs = 2
logging_steps = len(tokenized_dataset["train"]) // (batch_size * num_train_epochs)

args = TrainingArguments(
    f"{model_name}-finetuned-marc",
    evaluation_strategy = "epoch",
    save_strategy = "epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    num_train_epochs=num_train_epochs,
    weight_decay=0.01,
    logging_steps=logging_steps,
    push_to_hub=True,
)

Loading cached shuffled indices for dataset at /data/.cache/hf/datasets/amazon_reviews_multi/de/1.0.0/724e94f4b0c6c405ce7e476a6c5ef4f87db30799ad49f765094cf9770e0f7609/cache-d522bf90a337de81.arrow


In [42]:
from transformers import Trainer 

trainer = Trainer(
    model,
    args,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["validation"],
    tokenizer=tokenizer,
    compute_metrics=compute_metrics
)

Cloning https://huggingface.co/lewtun/xlm-roberta-base-finetuned-marc-19964-samples into local empty directory.


In [45]:
trainer.evaluate()

The following columns in the evaluation set  don't have a corresponding argument in `XLMRobertaForSequenceClassification.forward` and have been ignored: review_body, review_id, reviewer_id, product_id, product_category, language, review_title.
***** Running Evaluation *****
  Num examples = 500
  Batch size = 16


{'eval_loss': 1.6095925569534302,
 'eval_MAE': 2.104,
 'eval_runtime': 1.6249,
 'eval_samples_per_second': 307.72,
 'eval_steps_per_second': 19.694}

In [46]:
trainer.train()

The following columns in the training set  don't have a corresponding argument in `XLMRobertaForSequenceClassification.forward` and have been ignored: review_body, review_id, reviewer_id, product_id, product_category, language, review_title.
***** Running training *****
  Num examples = 19964
  Num Epochs = 2
  Instantaneous batch size per device = 16
  Total train batch size (w. parallel, distributed & accumulation) = 16
  Gradient Accumulation steps = 1
  Total optimization steps = 2496


Epoch,Training Loss,Validation Loss,Mae
1,1.017,0.948932,0.486
2,0.8903,0.94739,0.47


The following columns in the evaluation set  don't have a corresponding argument in `XLMRobertaForSequenceClassification.forward` and have been ignored: review_body, review_id, reviewer_id, product_id, product_category, language, review_title.
***** Running Evaluation *****
  Num examples = 500
  Batch size = 16
Saving model checkpoint to xlm-roberta-base-finetuned-marc-19964-samples/checkpoint-1248
Configuration saved in xlm-roberta-base-finetuned-marc-19964-samples/checkpoint-1248/config.json
Model weights saved in xlm-roberta-base-finetuned-marc-19964-samples/checkpoint-1248/pytorch_model.bin
tokenizer config file saved in xlm-roberta-base-finetuned-marc-19964-samples/checkpoint-1248/tokenizer_config.json
Special tokens file saved in xlm-roberta-base-finetuned-marc-19964-samples/checkpoint-1248/special_tokens_map.json
tokenizer config file saved in xlm-roberta-base-finetuned-marc-19964-samples/tokenizer_config.json
Special tokens file saved in xlm-roberta-base-finetuned-marc-19964-s

TrainOutput(global_step=2496, training_loss=1.0097694710279121, metrics={'train_runtime': 626.4529, 'train_samples_per_second': 63.737, 'train_steps_per_second': 3.984, 'total_flos': 3456130803242208.0, 'train_loss': 1.0097694710279121, 'epoch': 2.0})

In [49]:
trainer.push_to_hub(commit_message="Training complete", blocking=False)

Saving model checkpoint to xlm-roberta-base-finetuned-marc-19964-samples
Configuration saved in xlm-roberta-base-finetuned-marc-19964-samples/config.json
Model weights saved in xlm-roberta-base-finetuned-marc-19964-samples/pytorch_model.bin
tokenizer config file saved in xlm-roberta-base-finetuned-marc-19964-samples/tokenizer_config.json
Special tokens file saved in xlm-roberta-base-finetuned-marc-19964-samples/special_tokens_map.json


OSError: On branch main
Your branch is up to date with 'origin/main'.

nothing to commit, working tree clean


## Zero-shot cross-lingual evaluation

In [72]:
def evaluate_corpus(lang):
    dataset = load_dataset(dataset_name, lang, split="test")
    dataset = dataset.rename_column("stars", "labels")
    dataset = dataset.map(map_labels)
    tokenized_dataset = dataset.map(tokenize_reviews, batched=True)
    preds = trainer.evaluate(eval_dataset=tokenized_dataset)
    return {"MAE": preds["eval_MAE"]}

In [73]:
evaluate_corpus("en")

Reusing dataset amazon_reviews_multi (/data/.cache/hf/datasets/amazon_reviews_multi/en/1.0.0/724e94f4b0c6c405ce7e476a6c5ef4f87db30799ad49f765094cf9770e0f7609)


  0%|          | 0/5000 [00:00<?, ?ex/s]

  0%|          | 0/5 [00:00<?, ?ba/s]

The following columns in the evaluation set  don't have a corresponding argument in `XLMRobertaForSequenceClassification.forward` and have been ignored: reviewer_id, product_id, review_title, review_body, review_id, product_category, language.
***** Running Evaluation *****
  Num examples = 5000
  Batch size = 16


{'MAE': 0.874}

In [74]:
evaluate_corpus("en")

Reusing dataset amazon_reviews_multi (/data/.cache/hf/datasets/amazon_reviews_multi/fr/1.0.0/724e94f4b0c6c405ce7e476a6c5ef4f87db30799ad49f765094cf9770e0f7609)


  0%|          | 0/5000 [00:00<?, ?ex/s]

  0%|          | 0/5 [00:00<?, ?ba/s]

The following columns in the evaluation set  don't have a corresponding argument in `XLMRobertaForSequenceClassification.forward` and have been ignored: reviewer_id, product_id, review_title, review_body, review_id, product_category, language.
***** Running Evaluation *****
  Num examples = 5000
  Batch size = 16


{'MAE': 0.8742}

## Using your fine-tuned model

In [75]:
classifier = pipeline("text-classification", model=trainer.model, tokenizer=trainer.tokenizer, device=0)

In [76]:
classifier("I love this book!")

[{'label': 'LABEL_3', 'score': 0.30068162083625793}]

In [77]:
classifier("Ich hasse dieses Buch!")

[{'label': 'LABEL_0', 'score': 0.372832715511322}]

In [78]:
classifier("J'adore ce livre")

[{'label': 'LABEL_3', 'score': 0.2572416365146637}]