# Assignment 2
In this second assignment, you are challenged to employ Hugging Face transformers for the same classification task as in the first assignment.

You should explore Hugging Face models to find a pre-trained model that is suitable and promising for fine-tuning to your task. It should make sense to pick one that has been pre-trained for the same language and/or text genre.

As a bonus, you can also employ a domain adaptation approach.

You should compare the performance of your model(s) with the ones developed for the first assignment. For the final delivery, prepare a short presentation (max 10 slides) documenting your approach.

## Imports

In [3]:
! pip install transformers datasets accelerate
#! pip install --upgrade accelerate

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.29.2-py3-none-any.whl (7.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.1/7.1 MB[0m [31m62.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting datasets
  Downloading datasets-2.12.0-py3-none-any.whl (474 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m474.6/474.6 kB[0m [31m49.8 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.14.1 (from transformers)
  Downloading huggingface_hub-0.14.1-py3-none-any.whl (224 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m224.5/224.5 kB[0m [31m27.1 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from transformers)
  Downloading tokenizers-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m11

In [4]:
import pandas as pd
from datasets import load_dataset
import json
from datasets import Dataset, DatasetDict
from transformers import AutoTokenizer
from transformers import AutoModelForSequenceClassification
from transformers import TrainingArguments, Trainer
from transformers import DataCollatorWithPadding
from datasets import load_metric
import numpy as np
from transformers import AutoModelForMaskedLM
from transformers import TrainingArguments, Trainer
from transformers import DataCollatorWithPadding
from transformers import pipeline
from datasets import load_metric
import numpy as np
from transformers import TextClassificationPipeline
from transformers import DataCollatorForLanguageModeling
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import math
from huggingface_hub import notebook_login

## Loading dataset

In [7]:
oos = False

In [10]:
# Importing the dataset

def get_df_hf(oos=False) :
    with open('data_full.json') as json_file: 
        data_dict = json.load(json_file) 

    train_data = data_dict['train']
    val_data = data_dict['val']
    test_data = data_dict['test']

    oos_train = data_dict['oos_train']
    oos_val = data_dict['oos_val']
    oos_test = data_dict['oos_test']


    train_df = pd.DataFrame(train_data, columns =['query', 'label'])
    val_df = pd.DataFrame(val_data, columns =['query', 'label'])
    test_df = pd.DataFrame(test_data, columns =['query', 'label'])

    train_oos_df = pd.DataFrame(oos_train,columns=['query','label'])
    val_oos_df = pd.DataFrame(oos_val,columns=['query','label'])
    test_oos_df = pd.DataFrame(oos_test,columns=['query','label'])

    if oos :
        # Concatenate dataframes to consider oos as a specific intent
        train_df = pd.concat([train_df,train_oos_df])
        val_df = pd.concat([val_df,val_oos_df])
        test_df = pd.concat([test_df,test_oos_df])

    unique_labels = train_df['label'].unique()
    labels_dict = {i: v for i, v in enumerate(unique_labels)}

    train_df['label'], _ = pd.factorize(train_df['label'])
    val_df['label'], _ = pd.factorize(val_df['label'])
    test_df['label'], _ = pd.factorize(test_df['label'])
    
    return Dataset.from_pandas(train_df), Dataset.from_pandas(val_df), Dataset.from_pandas(test_df), labels_dict

train_df, val_df, test_df, label_mapping = get_df_hf(oos)
train_valid_test_dataset = DatasetDict({
    'train': train_df,
    'validation': val_df,
    'test': test_df
})

train_valid_test_dataset


DatasetDict({
    train: Dataset({
        features: ['query', 'label'],
        num_rows: 15000
    })
    validation: Dataset({
        features: ['query', 'label'],
        num_rows: 3000
    })
    test: Dataset({
        features: ['query', 'label'],
        num_rows: 4500
    })
})

## Fine tune a classifier
Models used :
- *xlm-roberta-base (fine-tuned on Amazon Massive)*

### xlm-roberta-base

#### Tokenizer

In [None]:
model_name = "cartesinus/xlm-r-base-amazon-massive-intent"

def preprocess_function(sample):
    return tokenizer(sample["query"], truncation=True, padding=True, return_tensors='pt')

tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenized_dataset = train_valid_test_dataset.map(preprocess_function, batched=True)

Map:   0%|          | 0/15000 [00:00<?, ? examples/s]

Map:   0%|          | 0/3000 [00:00<?, ? examples/s]

Map:   0%|          | 0/4500 [00:00<?, ? examples/s]

In [None]:
tokenized_dataset

DatasetDict({
    train: Dataset({
        features: ['query', 'label', 'input_ids', 'attention_mask'],
        num_rows: 15000
    })
    validation: Dataset({
        features: ['query', 'label', 'input_ids', 'attention_mask'],
        num_rows: 3000
    })
    test: Dataset({
        features: ['query', 'label', 'input_ids', 'attention_mask'],
        num_rows: 4500
    })
})

#### Load the pretrained model

In [None]:
num_labels = 150 if not oos else 151

In [None]:
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=num_labels, id2label = label_mapping, ignore_mismatched_sizes=True)

Some weights of XLMRobertaForSequenceClassification were not initialized from the model checkpoint at cartesinus/xlm-r-base-amazon-massive-intent and are newly initialized because the shapes did not match:
- classifier.out_proj.weight: found shape torch.Size([60, 768]) in the checkpoint and torch.Size([150, 768]) in the model instantiated
- classifier.out_proj.bias: found shape torch.Size([60]) in the checkpoint and torch.Size([150]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


#### Train the model using a Trainer

In [None]:
# There is a bug in huggingface, this is the workaround
! pip uninstall -y transformers accelerate
! pip install transformers accelerate

Found existing installation: transformers 4.29.2
Uninstalling transformers-4.29.2:
  Successfully uninstalled transformers-4.29.2
Found existing installation: accelerate 0.19.0
Uninstalling accelerate-0.19.0:
  Successfully uninstalled accelerate-0.19.0
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Using cached transformers-4.29.2-py3-none-any.whl (7.1 MB)
Collecting accelerate
  Using cached accelerate-0.19.0-py3-none-any.whl (219 kB)
Installing collected packages: transformers, accelerate
Successfully installed accelerate-0.19.0 transformers-4.29.2


In [None]:
metric = load_metric("accuracy")

# TODO compute different metrics
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

# TODO hyperparameters tuning
training_args = TrainingArguments(
    output_dir="./results",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=3,
    weight_decay=0.01,
    evaluation_strategy="epoch", # run validation at the end of each epoch
    save_strategy="epoch",
    load_best_model_at_end=True,
)

data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["validation"],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics
)

  metric = load_metric("accuracy")


In [None]:
# Using a GPU on GoogleColab it will takes ~10 min
trainer.train()

You're using a XLMRobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss,Accuracy
1,3.7587,1.41671,0.877667
2,1.1732,0.541769,0.940667
3,0.5156,0.390296,0.951333


TrainOutput(global_step=2814, training_loss=1.524562328693794, metrics={'train_runtime': 703.1885, 'train_samples_per_second': 63.994, 'train_steps_per_second': 4.002, 'total_flos': 940999238299008.0, 'train_loss': 1.524562328693794, 'epoch': 3.0})

In [None]:
trainer.evaluate()

{'eval_loss': 0.39029553532600403,
 'eval_accuracy': 0.9513333333333334,
 'eval_runtime': 7.4456,
 'eval_samples_per_second': 402.92,
 'eval_steps_per_second': 25.25,
 'epoch': 3.0}

In [None]:
trainer.predict(test_dataset=tokenized_dataset["test"])

PredictionOutput(predictions=array([[ 6.6990213 ,  0.7862378 , -0.82716596, ..., -0.4360419 ,
        -0.67726606,  0.4648793 ],
       [ 5.993173  ,  0.5582445 , -0.80416477, ..., -0.18876764,
        -0.3759189 ,  0.43482333],
       [ 6.606549  ,  0.8590205 , -0.96354413, ..., -0.2881303 ,
        -0.4971538 ,  0.46892655],
       ...,
       [ 0.06003368, -0.68513346, -1.71185   , ..., -0.899356  ,
        -0.78991413,  6.781199  ],
       [ 0.20897834, -0.5338833 , -1.5847536 , ..., -0.7006661 ,
        -0.38084835,  6.2793016 ],
       [ 0.21770513, -0.58311075, -1.9088244 , ..., -0.7813802 ,
        -0.5526543 ,  6.909775  ]], dtype=float32), label_ids=array([  0,   0,   0, ..., 149, 149, 149]), metrics={'test_loss': 0.4176858365535736, 'test_accuracy': 0.9402222222222222, 'test_runtime': 9.3297, 'test_samples_per_second': 482.331, 'test_steps_per_second': 30.226})

#### Saving the model and load an existing model

In [27]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
trainer.save_model("/content/drive/MyDrive/NLP/xml-r-base-amazon-massive")

In [None]:
tokenizer2 = AutoTokenizer.from_pretrained("/content/drive/MyDrive/NLP/xml-r-base-amazon-massive")
model2 = AutoModelForSequenceClassification.from_pretrained("/content/drive/MyDrive/NLP/xml-r-base-amazon-massive", num_labels=num_labels)

In [None]:
accuracy_metric = load_metric("accuracy")
precision_metric = load_metric("precision")
recall_metric = load_metric("recall")
f1_metric = load_metric("f1")

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    accuracy = accuracy_metric.compute(predictions=predictions, references=labels)
    precision = precision_metric.compute(predictions=predictions, references=labels, average="macro") 
    recall = recall_metric.compute(predictions=predictions, references=labels, average="macro") 
    f1 = f1_metric.compute(predictions=predictions, references=labels, average="macro") 
    return { "accuracy": accuracy, "precision": precision, "recall": recall, "f1": f1 }


trainer2 = Trainer(
    model=model2,
    eval_dataset=tokenized_dataset["test"],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics
)

In [None]:
trainer2.evaluate()

Trainer is attempting to log a value of "{'accuracy': 0.9402222222222222}" of type <class 'dict'> for key "eval/accuracy" as a scalar. This invocation of Tensorboard's writer.add_scalar() is incorrect so we dropped this attribute.
Trainer is attempting to log a value of "{'precision': 0.9430025578544524}" of type <class 'dict'> for key "eval/precision" as a scalar. This invocation of Tensorboard's writer.add_scalar() is incorrect so we dropped this attribute.
Trainer is attempting to log a value of "{'recall': 0.9402222222222224}" of type <class 'dict'> for key "eval/recall" as a scalar. This invocation of Tensorboard's writer.add_scalar() is incorrect so we dropped this attribute.
Trainer is attempting to log a value of "{'f1': 0.939717697385228}" of type <class 'dict'> for key "eval/f1" as a scalar. This invocation of Tensorboard's writer.add_scalar() is incorrect so we dropped this attribute.


{'eval_loss': 0.417685866355896,
 'eval_accuracy': {'accuracy': 0.9402222222222222},
 'eval_precision': {'precision': 0.9430025578544524},
 'eval_recall': {'recall': 0.9402222222222224},
 'eval_f1': {'f1': 0.939717697385228},
 'eval_runtime': 12.5145,
 'eval_samples_per_second': 359.582,
 'eval_steps_per_second': 44.988}

In [None]:
pipe('set the alarm at 5 o clock')

[{'label': 'alarm', 'score': 0.8824137449264526}]

## Domain Adaptation
Fine tuning the language model on our specific dataset. Models used:
- *xlm-roberta-base (fine-tuned on Amazon Massive)*

### xlm-roberta-base

In [5]:
model_checkpoint = "cartesinus/xlm-r-base-amazon-massive-intent"
model = AutoModelForMaskedLM.from_pretrained(model_checkpoint)

Downloading (…)lve/main/config.json:   0%|          | 0.00/4.12k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/1.11G [00:00<?, ?B/s]

Some weights of the model checkpoint at cartesinus/xlm-r-base-amazon-massive-intent were not used when initializing XLMRobertaForMaskedLM: ['classifier.out_proj.bias', 'classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.weight']
- This IS expected if you are initializing XLMRobertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing XLMRobertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of XLMRobertaForMaskedLM were not initialized from the model checkpoint at cartesinus/xlm-r-base-amazon-massive-intent and are newly initialized: ['lm_head.bias', 'lm_head.dense.weight', 'lm_head.layer_norm.bias', 'lm_head.dense.bias', 'lm_head.layer_norm.weig

 #### Tokenizer

In [11]:
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
def tokenize_function(examples):
    result = tokenizer(examples["query"])
    if tokenizer.is_fast:
        result["word_ids"] = [result.word_ids(i) for i in range(len(result["input_ids"]))]
    return result

tokenized_datasets = train_valid_test_dataset.map(
    tokenize_function, batched=True, remove_columns=["query", "label"]
)
tokenized_datasets

Map:   0%|          | 0/15000 [00:00<?, ? examples/s]

Map:   0%|          | 0/3000 [00:00<?, ? examples/s]

Map:   0%|          | 0/4500 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['input_ids', 'attention_mask', 'word_ids'],
        num_rows: 15000
    })
    validation: Dataset({
        features: ['input_ids', 'attention_mask', 'word_ids'],
        num_rows: 3000
    })
    test: Dataset({
        features: ['input_ids', 'attention_mask', 'word_ids'],
        num_rows: 4500
    })
})

#### Pre-processing dataset

In [12]:
chunk_size = 8

def group_texts(examples):
    # Concatenate all texts
    concatenated_examples = {k: sum(examples[k], []) for k in examples.keys()}
    # Compute length of concatenated texts
    total_length = len(concatenated_examples[list(examples.keys())[0]])
    # We drop the last chunk if it's smaller than chunk_size
    total_length = (total_length // chunk_size) * chunk_size
    # Split by chunks of max_len
    result = {
        k: [t[i : i + chunk_size] for i in range(0, total_length, chunk_size)]
        for k, t in concatenated_examples.items()
    }
    # Create a new labels column
    result["labels"] = result["input_ids"].copy()
    return result

In [13]:
lm_datasets = tokenized_datasets#.map(group_texts, batched=True)
lm_datasets

DatasetDict({
    train: Dataset({
        features: ['input_ids', 'attention_mask', 'word_ids'],
        num_rows: 15000
    })
    validation: Dataset({
        features: ['input_ids', 'attention_mask', 'word_ids'],
        num_rows: 3000
    })
    test: Dataset({
        features: ['input_ids', 'attention_mask', 'word_ids'],
        num_rows: 4500
    })
})

In [14]:
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm_probability=0.15)

In [None]:
'''
# JUST TO SEE HOW MASKING WORKS
samples = [lm_datasets["train"][i] for i in range(2)]
for sample in samples:
    _ = sample.pop("word_ids")

for chunk in data_collator(samples)["input_ids"]:
    print(f"\n'>>> {tokenizer.decode(chunk)}'")'''

You're using a DistilBertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.



'>>> [CLS] what [MASK] [MASK] i use to [MASK]'

'>>> i love you if i were [MASK] italian'


#### HugginFace login

In [15]:
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

#### Fine-tune

In [16]:
batch_size = 64
# Show the training loss with every epoch
logging_steps = len(lm_datasets["train"]) // batch_size
model_name = model_checkpoint.split("/")[-1]

training_args = TrainingArguments(
    output_dir=f"{model_name}-finetuned-clinc150",
    overwrite_output_dir=True,
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    weight_decay=0.01,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    push_to_hub=True,
    fp16=True,
    logging_steps=logging_steps,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=lm_datasets["train"],
    eval_dataset=lm_datasets["test"],
    data_collator=data_collator,
    tokenizer=tokenizer,
)

Cloning https://huggingface.co/lelleib/xlm-r-base-amazon-massive-intent-finetuned-clinc150 into local empty directory.


In [17]:
eval_results = trainer.evaluate()
print(f">>> Perplexity: {math.exp(eval_results['eval_loss']):.2f}")

You're using a XLMRobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


>>> Perplexity: 215309147740.82


In [18]:
trainer.train()



Epoch,Training Loss,Validation Loss
1,8.1668,4.932704
2,4.7891,4.043367
3,4.139,3.809586


Adding files tracked by Git LFS: ['tokenizer.json']. This may take a bit of time if the files are large.


TrainOutput(global_step=705, training_loss=5.689801197187275, metrics={'train_runtime': 308.6347, 'train_samples_per_second': 145.803, 'train_steps_per_second': 2.284, 'total_flos': 552768897435840.0, 'train_loss': 5.689801197187275, 'epoch': 3.0})

In [19]:
eval_results = trainer.evaluate()
print(f">>> Perplexity: {math.exp(eval_results['eval_loss']):.2f}")

>>> Perplexity: 42.84


In [20]:
trainer.push_to_hub()

Upload file pytorch_model.bin:   0%|          | 1.00/1.04G [00:00<?, ?B/s]

Upload file runs/May22_12-02-54_7669a8dfe9b4/events.out.tfevents.1684757365.7669a8dfe9b4.857.2:   0%|         …

Upload file runs/May22_12-02-54_7669a8dfe9b4/events.out.tfevents.1684757021.7669a8dfe9b4.857.0:   0%|         …

To https://huggingface.co/lelleib/xlm-r-base-amazon-massive-intent-finetuned-clinc150
   d2494a0..21d8b90  main -> main

   d2494a0..21d8b90  main -> main

To https://huggingface.co/lelleib/xlm-r-base-amazon-massive-intent-finetuned-clinc150
   21d8b90..502ea9d  main -> main

   21d8b90..502ea9d  main -> main



'https://huggingface.co/lelleib/xlm-r-base-amazon-massive-intent-finetuned-clinc150/commit/21d8b900b882a9f9b1a31315b42031d6bd4a5058'

#### Testing the fine-tuned LM

In [22]:
text = "Set the <mask> at 5 am"
#text = input()
mask_filler = pipeline(
    "fill-mask", model="lelleib/xlm-r-base-amazon-massive-intent-finetuned-clinc150"
)
preds = mask_filler(text)

for pred in preds:
    print(f">>> {pred['sequence']}")

>>> Set the timer at 5 am
>>> Set the day at 5 am
>>> Set the alarm at 5 am
>>> Set the tire at 5 am
>>> Set the balance at 5 am


#### Fine tune the classification head

In [23]:
checkpoint = 'lelleib/xlm-r-base-amazon-massive-intent-finetuned-clinc150'
num_labels = 150 if not oos else 151

def preprocess_function(sample):
    return tokenizer(sample["query"], truncation=True, padding=True, return_tensors='pt')

accuracy_metric = load_metric("accuracy")
precision_metric = load_metric("precision")
recall_metric = load_metric("recall")
f1_metric = load_metric("f1")

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    accuracy = accuracy_metric.compute(predictions=predictions, references=labels)
    precision = precision_metric.compute(predictions=predictions, references=labels, average="macro") 
    recall = recall_metric.compute(predictions=predictions, references=labels, average="macro") 
    f1 = f1_metric.compute(predictions=predictions, references=labels, average="macro") 
    return { "accuracy": accuracy, "precision": precision, "recall": recall, "f1": f1 }

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=num_labels, id2label = label_mapping)

tokenized_dataset = train_valid_test_dataset.map(preprocess_function, batched=True)

training_args = TrainingArguments(
    output_dir="./results",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=3,
    weight_decay=0.01,
    evaluation_strategy="epoch", # run validation at the end of each epoch
    save_strategy="epoch",
    load_best_model_at_end=True,
)

data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["validation"],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics
)

  accuracy_metric = load_metric("accuracy")


Downloading builder script:   0%|          | 0.00/1.65k [00:00<?, ?B/s]

Downloading builder script:   0%|          | 0.00/2.58k [00:00<?, ?B/s]

Downloading builder script:   0%|          | 0.00/2.52k [00:00<?, ?B/s]

Downloading builder script:   0%|          | 0.00/2.32k [00:00<?, ?B/s]

Some weights of the model checkpoint at lelleib/xlm-r-base-amazon-massive-intent-finetuned-clinc150 were not used when initializing XLMRobertaForSequenceClassification: ['lm_head.bias', 'lm_head.dense.weight', 'lm_head.layer_norm.bias', 'lm_head.dense.bias', 'lm_head.layer_norm.weight']
- This IS expected if you are initializing XLMRobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing XLMRobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of XLMRobertaForSequenceClassification were not initialized from the model checkpoint at lelleib/xlm-r-base-amazon-massive-intent-finetuned-clinc150 and are newly initialized: ['classifier.out_p

Map:   0%|          | 0/15000 [00:00<?, ? examples/s]

Map:   0%|          | 0/3000 [00:00<?, ? examples/s]

Map:   0%|          | 0/4500 [00:00<?, ? examples/s]

In [24]:
trainer.train()

You're using a XLMRobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9855,1.678155,{'accuracy': 0.868},{'precision': 0.8850659679596582},{'recall': 0.8680000000000001},{'f1': 0.8531261817261424}
2,1.4266,0.691456,{'accuracy': 0.9363333333333334},{'precision': 0.941559558776277},{'recall': 0.9363333333333337},{'f1': 0.9359640973118818}
3,0.6715,0.485715,{'accuracy': 0.946},{'precision': 0.949260801413065},{'recall': 0.9460000000000002},{'f1': 0.9458459171810941}


  _warn_prf(average, modifier, msg_start, len(result))
Trainer is attempting to log a value of "{'accuracy': 0.868}" of type <class 'dict'> for key "eval/accuracy" as a scalar. This invocation of Tensorboard's writer.add_scalar() is incorrect so we dropped this attribute.
Trainer is attempting to log a value of "{'precision': 0.8850659679596582}" of type <class 'dict'> for key "eval/precision" as a scalar. This invocation of Tensorboard's writer.add_scalar() is incorrect so we dropped this attribute.
Trainer is attempting to log a value of "{'recall': 0.8680000000000001}" of type <class 'dict'> for key "eval/recall" as a scalar. This invocation of Tensorboard's writer.add_scalar() is incorrect so we dropped this attribute.
Trainer is attempting to log a value of "{'f1': 0.8531261817261424}" of type <class 'dict'> for key "eval/f1" as a scalar. This invocation of Tensorboard's writer.add_scalar() is incorrect so we dropped this attribute.
Trainer is attempting to log a value of "{'accur

TrainOutput(global_step=2814, training_loss=1.736971282416739, metrics={'train_runtime': 790.8087, 'train_samples_per_second': 56.904, 'train_steps_per_second': 3.558, 'total_flos': 940999238299008.0, 'train_loss': 1.736971282416739, 'epoch': 3.0})

In [25]:
trainer.evaluate()

Trainer is attempting to log a value of "{'accuracy': 0.946}" of type <class 'dict'> for key "eval/accuracy" as a scalar. This invocation of Tensorboard's writer.add_scalar() is incorrect so we dropped this attribute.
Trainer is attempting to log a value of "{'precision': 0.949260801413065}" of type <class 'dict'> for key "eval/precision" as a scalar. This invocation of Tensorboard's writer.add_scalar() is incorrect so we dropped this attribute.
Trainer is attempting to log a value of "{'recall': 0.9460000000000002}" of type <class 'dict'> for key "eval/recall" as a scalar. This invocation of Tensorboard's writer.add_scalar() is incorrect so we dropped this attribute.
Trainer is attempting to log a value of "{'f1': 0.9458459171810941}" of type <class 'dict'> for key "eval/f1" as a scalar. This invocation of Tensorboard's writer.add_scalar() is incorrect so we dropped this attribute.


{'eval_loss': 0.48571541905403137,
 'eval_accuracy': {'accuracy': 0.946},
 'eval_precision': {'precision': 0.949260801413065},
 'eval_recall': {'recall': 0.9460000000000002},
 'eval_f1': {'f1': 0.9458459171810941},
 'eval_runtime': 5.4975,
 'eval_samples_per_second': 545.7,
 'eval_steps_per_second': 34.197,
 'epoch': 3.0}

In [26]:
trainer.predict(test_dataset=tokenized_dataset["test"])

PredictionOutput(predictions=array([[ 6.020978  ,  0.5688809 ,  0.00798655, ..., -0.70020473,
        -1.6227876 , -0.77312243],
       [ 5.556945  ,  0.14889519, -0.21704945, ..., -0.41389477,
        -1.3567059 , -0.5901651 ],
       [ 5.880918  ,  0.48578033, -0.2000524 , ..., -0.40615407,
        -1.6565032 , -0.8557249 ],
       ...,
       [-0.29908958, -0.46137875,  0.1749099 , ...,  0.1726649 ,
         0.5752851 ,  5.9724936 ],
       [-0.35870802,  0.0106652 , -0.07618536, ..., -0.43468016,
         0.37064326,  4.7026234 ],
       [-0.41692457, -0.23218518,  0.3254148 , ..., -0.19444035,
         0.71625865,  5.878188  ]], dtype=float32), label_ids=array([  0,   0,   0, ..., 149, 149, 149]), metrics={'test_loss': 0.5189417600631714, 'test_accuracy': {'accuracy': 0.938}, 'test_precision': {'precision': 0.9416083159729415}, 'test_recall': {'recall': 0.9380000000000002}, 'test_f1': {'f1': 0.9373993819337788}, 'test_runtime': 8.5101, 'test_samples_per_second': 528.785, 'test_ste

In [28]:
trainer.save_model("/content/drive/MyDrive/NLP/xml-r-base-amazon-massive-finetuned-clinc150")

In [29]:
tokenizer2 = AutoTokenizer.from_pretrained("/content/drive/MyDrive/NLP/xml-r-base-amazon-massive-finetuned-clinc150")
model2 = AutoModelForSequenceClassification.from_pretrained("/content/drive/MyDrive/NLP/xml-r-base-amazon-massive-finetuned-clinc150", num_labels=num_labels)

In [30]:
trainer2 = Trainer(
    model=model2,
    eval_dataset=tokenized_dataset["test"],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics
)
trainer2.evaluate()

Trainer is attempting to log a value of "{'accuracy': 0.938}" of type <class 'dict'> for key "eval/accuracy" as a scalar. This invocation of Tensorboard's writer.add_scalar() is incorrect so we dropped this attribute.
Trainer is attempting to log a value of "{'precision': 0.9416083159729415}" of type <class 'dict'> for key "eval/precision" as a scalar. This invocation of Tensorboard's writer.add_scalar() is incorrect so we dropped this attribute.
Trainer is attempting to log a value of "{'recall': 0.9380000000000002}" of type <class 'dict'> for key "eval/recall" as a scalar. This invocation of Tensorboard's writer.add_scalar() is incorrect so we dropped this attribute.
Trainer is attempting to log a value of "{'f1': 0.9373993819337788}" of type <class 'dict'> for key "eval/f1" as a scalar. This invocation of Tensorboard's writer.add_scalar() is incorrect so we dropped this attribute.


{'eval_loss': 0.5189417600631714,
 'eval_accuracy': {'accuracy': 0.938},
 'eval_precision': {'precision': 0.9416083159729415},
 'eval_recall': {'recall': 0.9380000000000002},
 'eval_f1': {'f1': 0.9373993819337788},
 'eval_runtime': 10.9803,
 'eval_samples_per_second': 409.826,
 'eval_steps_per_second': 51.274}

In [31]:
pipe = TextClassificationPipeline(model=model2, tokenizer=tokenizer2) #, return_all_scores=True)

Xformers is not installed correctly. If you want to use memorry_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.


In [32]:
pipe('tell how much money i have')