<a href="https://colab.research.google.com/github/RebeccaKessler/Machine_Learning/blob/main/Codes/CamemBert_Model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install sentencepiece
!pip install accelerate -U
!pip install pandas numpy matplotlib
!pip install scikit-learn seaborn
!pip install optuna

Collecting accelerate
  Downloading accelerate-0.30.1-py3-none-any.whl (302 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m302.6/302.6 kB[0m [31m8.5 MB/s[0m eta [36m0:00:00[0m
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch>=1.10.0->accelerate)
  Using cached nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch>=1.10.0->accelerate)
  Using cached nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch>=1.10.0->accelerate)
  Using cached nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)
Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch>=1.10.0->accelerate)
  Using cached nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl (731.7 MB)
Collecting nvidia-cublas-cu12==12.1.3.1 (from torch>=1.10.0->accelerate)
  Using cached nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.w

## Import Packages

In [2]:
from transformers import Trainer, TrainingArguments, CamembertTokenizer, CamembertForSequenceClassification, CamembertConfig
import torch
from torch.utils.data import Dataset
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from transformers import pipeline
import numpy as np
from sklearn.model_selection import KFold
from sklearn.metrics import accuracy_score, precision_recall_fscore_support
from transformers import pipeline
import pickle

## Define Functions

In [3]:
# Define compute metrics function
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = logits.argmax(-1)
    accuracy = accuracy_score(labels, predictions)
    precision, recall, f1, _ = precision_recall_fscore_support(labels, predictions, average='weighted')
    return {
        'accuracy': accuracy,
        'precision': precision,
        'recall': recall,
        'f1': f1
    }

In [4]:
 # Define pre-processing function
 class CustomDataset(Dataset):
    def __init__(self, data, tokenizer, max_length=128):
        self.data = data
        self.tokenizer = tokenizer
        self.max_length = max_length

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        sentence = str(self.data.iloc[idx]['sentence'])
        label = int(self.data.iloc[idx]['encoded_labels'])

        encoding = self.tokenizer.encode_plus(
            sentence,
            max_length=self.max_length,
            padding='max_length',
            truncation=True,
            return_tensors='pt'
        )

        return {
            'input_ids': encoding['input_ids'].flatten(),
            'attention_mask': encoding['attention_mask'].flatten(),
            'labels': torch.tensor(label, dtype=torch.long)
        }

## Fine-tune the CamemBert Model

In [5]:
# Load the data
url = 'https://raw.githubusercontent.com/RebeccaKessler/Machine_Learning/main/training_data.csv'
data = pd.read_csv(url)

In [6]:
# Apply LabelEncoder to labels
label_encoder = LabelEncoder()
data['encoded_labels'] = label_encoder.fit_transform(data['difficulty'])

In [7]:
# Load the tokenizer
tokenizer = CamembertTokenizer.from_pretrained('camembert-base')

# Create empty lists for statistics
accuracy_list = []
precision_list = []
recall_list = []
f1_list = []


# K-Fold cross-validation
kf = KFold(n_splits=5, shuffle=True, random_state=42)

for fold, (train_idx, val_idx) in enumerate(kf.split(data)):
    print(f"Training fold {fold+1}")
    train_data = data.iloc[train_idx]
    val_data = data.iloc[val_idx]

    # Tokenize the datasets
    train_dataset = CustomDataset(train_data, tokenizer)
    eval_dataset = CustomDataset(val_data, tokenizer)

    # Load pre-trained model
    model = CamembertForSequenceClassification.from_pretrained('camembert-base', num_labels=6)

    # Define training arguments
    training_args = TrainingArguments(
        output_dir=f'./results_fold_{fold}',
        learning_rate=0.00015,
        num_train_epochs=7,
        per_device_train_batch_size=16,
        warmup_steps=1000,
        weight_decay=0.05,
        logging_dir=f'./logs_fold_{fold}',
        logging_steps=10,
        evaluation_strategy="steps",
        eval_steps=100,
        save_strategy="steps",
        save_steps=500,
        fp16=True,
    )

    # Initialize and define trainer
    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=train_dataset,
        eval_dataset=eval_dataset,
        compute_metrics=compute_metrics
    )

    # Train and evaluate the model
    trainer.train()
    eval_result = trainer.evaluate()
    accuracy_list.append(eval_result['eval_accuracy'])
    precision_list.append(eval_result['eval_precision'])
    recall_list.append(eval_result['eval_recall'])
    f1_list.append(eval_result['eval_f1'])

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

sentencepiece.bpe.model:   0%|          | 0.00/811k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.40M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/508 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Training fold 1


model.safetensors:   0%|          | 0.00/445M [00:00<?, ?B/s]

Some weights of CamembertForSequenceClassification were not initialized from the model checkpoint at camembert-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Step,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
100,1.7545,1.730477,0.31875,0.371088,0.31875,0.266463
200,1.3397,1.351768,0.38125,0.408268,0.38125,0.300056
300,1.1439,1.249445,0.460417,0.464862,0.460417,0.432468
400,0.9721,1.075975,0.54375,0.535395,0.54375,0.529675
500,0.9198,1.122346,0.491667,0.533407,0.491667,0.458403
600,0.9875,1.084997,0.529167,0.539281,0.529167,0.529006
700,0.9372,1.120717,0.529167,0.542952,0.529167,0.524739
800,0.6812,1.220028,0.533333,0.551245,0.533333,0.533472
900,0.9162,1.264355,0.520833,0.541123,0.520833,0.520539
1000,0.8666,1.570242,0.470833,0.492426,0.470833,0.448899


  _warn_prf(average, modifier, msg_start, len(result))


Training fold 2


Some weights of CamembertForSequenceClassification were not initialized from the model checkpoint at camembert-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Step,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
100,1.7505,1.740816,0.35625,0.451404,0.35625,0.263141
200,1.3787,1.350074,0.466667,0.527643,0.466667,0.427248
300,1.155,1.195463,0.489583,0.468511,0.489583,0.464459
400,1.0686,1.139733,0.529167,0.536204,0.529167,0.52466
500,1.0115,1.141538,0.527083,0.539088,0.527083,0.527125
600,0.9271,1.178185,0.532292,0.551138,0.532292,0.51431
700,0.9407,1.08383,0.544792,0.547521,0.544792,0.540727
800,0.9134,1.182113,0.554167,0.568148,0.554167,0.550805
900,0.8584,1.215135,0.544792,0.555831,0.544792,0.539701
1000,0.6815,1.338314,0.5375,0.544179,0.5375,0.528739


Training fold 3


Some weights of CamembertForSequenceClassification were not initialized from the model checkpoint at camembert-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Step,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
100,1.7533,1.730479,0.402083,0.503023,0.402083,0.340765
200,1.449,1.33216,0.479167,0.493615,0.479167,0.4681
300,1.146,1.327953,0.414583,0.444986,0.414583,0.394914
400,1.239,1.167745,0.482292,0.515603,0.482292,0.465251
500,1.0034,1.234345,0.467708,0.543198,0.467708,0.460758
600,0.9349,1.085659,0.55,0.557837,0.55,0.545169
700,0.9266,1.079704,0.551042,0.558167,0.551042,0.538216
800,0.7589,1.205244,0.517708,0.571726,0.517708,0.522328
900,0.8753,1.185056,0.551042,0.561288,0.551042,0.538934
1000,0.7528,1.196425,0.571875,0.600224,0.571875,0.567252


Training fold 4


Some weights of CamembertForSequenceClassification were not initialized from the model checkpoint at camembert-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Step,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
100,1.7518,1.730541,0.375,0.405155,0.375,0.277002
200,1.3777,1.326081,0.466667,0.457903,0.466667,0.428816
300,1.1643,1.184575,0.49375,0.50003,0.49375,0.476795
400,0.9875,1.135844,0.492708,0.492834,0.492708,0.45573
500,0.902,1.092596,0.5375,0.559108,0.5375,0.515293
600,0.9761,1.054222,0.527083,0.553796,0.527083,0.521344
700,0.9735,1.078629,0.53125,0.550828,0.53125,0.5226
800,0.7354,1.15908,0.533333,0.528093,0.533333,0.524604
900,0.8095,1.189128,0.535417,0.573669,0.535417,0.52882
1000,0.6873,1.41905,0.479167,0.540663,0.479167,0.474619


  _warn_prf(average, modifier, msg_start, len(result))


Training fold 5


Some weights of CamembertForSequenceClassification were not initialized from the model checkpoint at camembert-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Step,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
100,1.7596,1.745336,0.271875,0.353052,0.271875,0.162137
200,1.3691,1.348234,0.4,0.347845,0.4,0.317172
300,1.0477,1.23661,0.453125,0.470326,0.453125,0.429189
400,1.0727,1.178716,0.477083,0.487083,0.477083,0.450694
500,0.8167,1.064532,0.545833,0.538302,0.545833,0.53428
600,0.879,1.051603,0.536458,0.540424,0.536458,0.533119
700,0.8793,1.093736,0.532292,0.564423,0.532292,0.524744
800,0.7475,1.065691,0.557292,0.585308,0.557292,0.561425
900,0.7902,1.436191,0.44375,0.479044,0.44375,0.411756
1000,0.6078,1.254987,0.513542,0.568733,0.513542,0.515525


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


In [8]:
# Compute overall statistics of the model
overall_accuracy = sum(accuracy_list) / len(accuracy_list)
overall_precision = sum(precision_list) / len(precision_list)
overall_recall = sum(recall_list) / len(recall_list)
overall_f1 = sum(f1_list) / len(f1_list)

print(f"Overall Accuracy: {overall_accuracy:.4f}")
print(f"Overall Precision: {overall_precision:.4f}")
print(f"Overall Recall: {overall_recall:.4f}")
print(f"Overall F1 Score: {overall_f1:.4f}")

Overall Accuracy: 0.5835
Overall Precision: 0.6066
Overall Recall: 0.5835
Overall F1 Score: 0.5849


## Re-train on Full Dataset

In [9]:
# Combine training and validation data
combined_data = pd.concat([train_data, val_data])

In [10]:
#Load pre-trained model
model = CamembertForSequenceClassification.from_pretrained('camembert-base', num_labels=6)

# Load the tokenizer
tokenizer = CamembertTokenizer.from_pretrained('camembert-base')

# Tokenize the dataset
final_dataset = CustomDataset(combined_data, tokenizer)

# Define training arguments
training_args = TrainingArguments(
    output_dir='./results',
    learning_rate=0.00015,
    num_train_epochs=7,
    per_device_train_batch_size=16,
    warmup_steps=1000,
    weight_decay=0.05,
    logging_dir='./logs',
    logging_steps=10,
    evaluation_strategy="no",
    save_strategy="steps",
    save_steps=500,
    fp16=True,
    )

# Re-initialize and define the trainer
final_trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=final_dataset,
    compute_metrics=None
)

# Retrain the model on the whole dataset
final_trainer.train()

Some weights of CamembertForSequenceClassification were not initialized from the model checkpoint at camembert-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Step,Training Loss
10,1.7921
20,1.7904
30,1.7863
40,1.7955
50,1.7884
60,1.7829
70,1.7846
80,1.774
90,1.7623
100,1.7568


TrainOutput(global_step=2100, training_loss=0.7715630920728048, metrics={'train_runtime': 424.9513, 'train_samples_per_second': 79.068, 'train_steps_per_second': 4.942, 'total_flos': 2210212240588800.0, 'train_loss': 0.7715630920728048, 'epoch': 7.0})

In [11]:
# Save the final fine-tuned model and tokenizer
model.save_pretrained('./final_model')
tokenizer.save_pretrained('./final_model')

('./final_model/tokenizer_config.json',
 './final_model/special_tokens_map.json',
 './final_model/sentencepiece.bpe.model',
 './final_model/added_tokens.json')

## Make Predictions

In [12]:
# Load the unlabelled data
url = 'https://raw.githubusercontent.com/RebeccaKessler/Machine_Learning/main/unlabelled_test_data.csv'
unlabelled_data = pd.read_csv(url)

In [13]:
# Load the saved fine-tuned model and tokenizer
model_path = './final_model'
model = CamembertForSequenceClassification.from_pretrained(model_path)
tokenizer = CamembertTokenizer.from_pretrained(model_path)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [14]:
# Create a prediction pipeline
device = 0 if torch.cuda.is_available() else -1
classifier = pipeline('text-classification', model=model, tokenizer=tokenizer, framework='pt', device=device)

# Predict labels for the unlabelled data
predictions = classifier(unlabelled_data['sentence'].tolist())

# Decode the numeric labels to original labels using the loaded LabelEncoder
predicted_labels = [label_encoder.inverse_transform([int(pred['label'].split('_')[-1])])[0] for pred in predictions]

# Create a new DataFrame with predictions
results_df = pd.DataFrame({
    'id': unlabelled_data['id'],
    'difficulty': predicted_labels
})

# Save the results to a new CSV file
results_df.to_csv('predicted_difficulties.csv', index=False)

print("Predictions saved to 'predicted_difficulties.csv'")


Predictions saved to 'predicted_difficulties.csv'


## Re-train on Extended Dataset

In [15]:
# Load extended data set
url = 'https://raw.githubusercontent.com/RebeccaKessler/Machine_Learning/main/combined_random_french_sentences.csv'
full_data = pd.read_csv(url)

In [16]:
# Apply LabelEncoder to labels
label_encoder = LabelEncoder()
full_data['encoded_labels'] = label_encoder.fit_transform(full_data['difficulty'])

In [17]:
# Load pre-trained model
model = CamembertForSequenceClassification.from_pretrained('camembert-base', num_labels=6)

# Load the tokenizer
tokenizer = CamembertTokenizer.from_pretrained('camembert-base')

# Tokenize dataset
full_dataset = CustomDataset(full_data, tokenizer)

# Define training arguments
training_args = TrainingArguments(
    output_dir='./results',
    learning_rate=0.00015,
    num_train_epochs=7,
    per_device_train_batch_size=16,
    warmup_steps=1000,
    weight_decay=0.05,
    logging_dir='./logs',
    logging_steps=10,
    evaluation_strategy='no',
    save_strategy="steps",
    save_steps=500,
    fp16=True,
    )

# Re-initialize and define the trainer
final_trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=full_dataset,
    compute_metrics=None
)

# Retrain the model on the extended dataset
final_trainer.train()


Some weights of CamembertForSequenceClassification were not initialized from the model checkpoint at camembert-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Step,Training Loss
10,1.7909
20,1.7934
30,1.7926
40,1.7895
50,1.7803
60,1.7822
70,1.7603
80,1.7481
90,1.7281
100,1.7163


TrainOutput(global_step=4291, training_loss=0.35101464429822066, metrics={'train_runtime': 823.9365, 'train_samples_per_second': 83.259, 'train_steps_per_second': 5.208, 'total_flos': 4512516657868800.0, 'train_loss': 0.35101464429822066, 'epoch': 7.0})

In [18]:
#save the model
model_path_full = "./fine_tuned_model_full"
model.save_pretrained(model_path_full)
tokenizer.save_pretrained(model_path_full)

('./fine_tuned_model_full/tokenizer_config.json',
 './fine_tuned_model_full/special_tokens_map.json',
 './fine_tuned_model_full/sentencepiece.bpe.model',
 './fine_tuned_model_full/added_tokens.json')

## Make Predictions

In [19]:
# Load the unlabelled data
url = 'https://raw.githubusercontent.com/RebeccaKessler/Machine_Learning/main/unlabelled_test_data.csv'
unlabelled_data = pd.read_csv(url)

In [20]:
# Load the fine-tuned model and tokenizer for predictions
model = CamembertForSequenceClassification.from_pretrained(model_path_full)
tokenizer = CamembertTokenizer.from_pretrained(model_path_full)

# Define prediction pipeline
classifier = pipeline('text-classification', model=model, tokenizer=tokenizer)

# Predict the labels of the unlabelled data
predictions = classifier(unlabelled_data['sentence'].tolist())
predicted_labels = [label_encoder.inverse_transform([int(pred['label'].split('_')[-1])])[0] for pred in predictions]

# Create a new dataframe with predictions
results_df = pd.DataFrame({
    'id': unlabelled_data['id'],
    'difficulty': predicted_labels
})
results_df.to_csv('predicted_difficulties_full.csv', index=False)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


##Hyperparameter Optimization

In [21]:
import optuna

In [22]:
# Load the data
url = 'https://raw.githubusercontent.com/RebeccaKessler/Machine_Learning/main/training_data.csv'
data = pd.read_csv(url)

In [None]:
# Define objective function for hyperoptimization with Optuna
def objective(trial):
    # Define hyperparameter search space
    learning_rate = trial.suggest_loguniform("learning_rate", 10e-5, 20e-5)
    per_device_train_batch_size = trial.suggest_categorical("per_device_train_batch_size", [8, 16, 32])
    num_train_epochs = trial.suggest_int("num_train_epochs", 3, 7)

    # Define training arguments
    training_args = TrainingArguments(
        output_dir='./results',
        num_train_epochs=num_train_epochs,
        learning_rate=learning_rate,
        per_device_train_batch_size=per_device_train_batch_size,
        logging_dir='./logs',
        logging_steps=10,
        warmup_steps=1000,
        weight_decay=0.01,
        evaluation_strategy="epoch",
        save_strategy="epoch",
        fp16=True
    )

    # Load the tokenizer
    tokenizer = CamembertTokenizer.from_pretrained('camembert-base')

    # Encode the labels
    label_encoder = LabelEncoder()
    data['encoded_labels'] = label_encoder.fit_transform(data['difficulty'])

    # Split the dataset into training and validation sets
    train_data, val_data = train_test_split(data, test_size=0.2, stratify=data['encoded_labels'], random_state=42)

    # Tokenize datasets
    train_dataset = CustomDataset(train_data, tokenizer)
    eval_dataset = CustomDataset(val_data, tokenizer)

    # Load the pre-trained model pre-trained
    model = CamembertForSequenceClassification.from_pretrained('camembert-base', num_labels=6)

    # Initialize and define the trainer
    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=train_dataset,
        eval_dataset=eval_dataset,
        compute_metrics=compute_metrics
    )

    # Train and evaluate the model
    trainer.train()
    eval_result = trainer.evaluate()
    return eval_result["eval_accuracy"]

# Create and optimize the Optuna study
study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=20)

# Print the best combination of parameters
print(f"Best trial accuracy: {study.best_trial.value}")
print(f"Best parameters: {study.best_trial.params}")

[I 2024-05-20 07:25:26,692] A new study created in memory with name: no-name-462f2924-9f0c-408d-801c-6c67a67daea3
  learning_rate = trial.suggest_loguniform("learning_rate", 10e-5, 20e-5)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Some weights of CamembertForSequenceClassification were not initialized from the model checkpoint at camembert-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2547,1.14539,0.510417,0.517697,0.510417,0.503358
2,1.26,1.192852,0.445833,0.553739,0.445833,0.406171
3,1.0683,1.204332,0.523958,0.531801,0.523958,0.506286
4,0.7211,1.182934,0.563542,0.572406,0.563542,0.558251
5,0.5707,1.455378,0.561458,0.574313,0.561458,0.562886
6,0.1506,1.94188,0.56875,0.5912,0.56875,0.572187
7,0.0163,2.151232,0.569792,0.58246,0.569792,0.572959


[I 2024-05-20 07:34:00,526] Trial 0 finished with value: 0.5697916666666667 and parameters: {'learning_rate': 0.00016168459055575829, 'per_device_train_batch_size': 8, 'num_train_epochs': 7}. Best is trial 0 with value: 0.5697916666666667.
  learning_rate = trial.suggest_loguniform("learning_rate", 10e-5, 20e-5)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Some weights of CamembertForSequenceClassification were not initialized from the model checkpoint at camembert-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2052,1.162163,0.5,0.486322,0.5,0.483649
2,1.2129,1.290667,0.43125,0.508879,0.43125,0.36966
3,0.9044,1.158098,0.535417,0.543876,0.535417,0.533591
4,0.7538,1.251276,0.548958,0.559118,0.548958,0.550179
5,0.4708,1.60959,0.546875,0.564074,0.546875,0.548378
6,0.0925,1.865296,0.563542,0.588891,0.563542,0.567165


[I 2024-05-20 07:42:39,448] Trial 1 finished with value: 0.5635416666666667 and parameters: {'learning_rate': 0.00015163262020418634, 'per_device_train_batch_size': 8, 'num_train_epochs': 6}. Best is trial 0 with value: 0.5697916666666667.
  learning_rate = trial.suggest_loguniform("learning_rate", 10e-5, 20e-5)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Some weights of CamembertForSequenceClassification were not initialized from the model checkpoint at camembert-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.604,1.532946,0.398958,0.40358,0.398958,0.312531
2,1.2008,1.180415,0.486458,0.46784,0.486458,0.467187
3,0.9912,1.228545,0.455208,0.485059,0.455208,0.442654


[I 2024-05-20 07:46:12,403] Trial 2 finished with value: 0.4552083333333333 and parameters: {'learning_rate': 0.000152477541940944, 'per_device_train_batch_size': 32, 'num_train_epochs': 3}. Best is trial 0 with value: 0.5697916666666667.
  learning_rate = trial.suggest_loguniform("learning_rate", 10e-5, 20e-5)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Some weights of CamembertForSequenceClassification were not initialized from the model checkpoint at camembert-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2406,1.185301,0.471875,0.464707,0.471875,0.44403
2,1.2557,1.230299,0.433333,0.503988,0.433333,0.381631
3,0.7535,1.101826,0.552083,0.558476,0.552083,0.553256


[I 2024-05-20 07:50:29,281] Trial 3 finished with value: 0.5520833333333334 and parameters: {'learning_rate': 0.00013440706325232795, 'per_device_train_batch_size': 8, 'num_train_epochs': 3}. Best is trial 0 with value: 0.5697916666666667.
  learning_rate = trial.suggest_loguniform("learning_rate", 10e-5, 20e-5)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Some weights of CamembertForSequenceClassification were not initialized from the model checkpoint at camembert-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2208,1.141358,0.520833,0.509685,0.520833,0.501194
2,1.4383,1.448151,0.346875,0.26048,0.346875,0.234771
3,0.9185,1.112006,0.548958,0.565185,0.548958,0.545003
4,0.4823,1.256259,0.566667,0.571573,0.566667,0.561109
5,0.2898,1.375755,0.572917,0.599097,0.572917,0.576738


  _warn_prf(average, modifier, msg_start, len(result))


[I 2024-05-20 07:57:49,770] Trial 4 finished with value: 0.5729166666666666 and parameters: {'learning_rate': 0.00011286342636746662, 'per_device_train_batch_size': 8, 'num_train_epochs': 5}. Best is trial 4 with value: 0.5729166666666666.
  learning_rate = trial.suggest_loguniform("learning_rate", 10e-5, 20e-5)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Some weights of CamembertForSequenceClassification were not initialized from the model checkpoint at camembert-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6097,1.554721,0.395833,0.431407,0.395833,0.344192
2,1.1627,1.173416,0.490625,0.479356,0.490625,0.476014
3,0.9352,1.150064,0.49375,0.526382,0.49375,0.480685


[I 2024-05-20 07:59:58,371] Trial 5 finished with value: 0.49375 and parameters: {'learning_rate': 0.00018306657889602206, 'per_device_train_batch_size': 32, 'num_train_epochs': 3}. Best is trial 4 with value: 0.5729166666666666.
  learning_rate = trial.suggest_loguniform("learning_rate", 10e-5, 20e-5)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Some weights of CamembertForSequenceClassification were not initialized from the model checkpoint at camembert-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1548,1.135983,0.492708,0.487162,0.492708,0.486534
2,1.2264,1.160465,0.517708,0.540042,0.517708,0.510209
3,0.9676,1.194731,0.535417,0.541947,0.535417,0.53208
4,0.6977,1.219162,0.561458,0.585021,0.561458,0.567085


[I 2024-05-20 08:04:30,416] Trial 6 finished with value: 0.5614583333333333 and parameters: {'learning_rate': 0.00016791808565411084, 'per_device_train_batch_size': 8, 'num_train_epochs': 4}. Best is trial 4 with value: 0.5729166666666666.
  learning_rate = trial.suggest_loguniform("learning_rate", 10e-5, 20e-5)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Some weights of CamembertForSequenceClassification were not initialized from the model checkpoint at camembert-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2426,1.233514,0.476042,0.487454,0.476042,0.432595
2,1.1499,1.299601,0.429167,0.459542,0.429167,0.388676
3,0.9334,1.22988,0.48125,0.525144,0.48125,0.473787
4,0.8598,1.12154,0.547917,0.58409,0.547917,0.549057
5,0.6988,1.348965,0.533333,0.560359,0.533333,0.530769
6,0.393,1.357532,0.5625,0.582522,0.5625,0.566734


[I 2024-05-20 08:09:36,309] Trial 7 finished with value: 0.5625 and parameters: {'learning_rate': 0.00016711626024801964, 'per_device_train_batch_size': 16, 'num_train_epochs': 6}. Best is trial 4 with value: 0.5729166666666666.
  learning_rate = trial.suggest_loguniform("learning_rate", 10e-5, 20e-5)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Some weights of CamembertForSequenceClassification were not initialized from the model checkpoint at camembert-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1985,1.159828,0.5,0.494841,0.5,0.491808
2,1.8622,1.810817,0.164583,0.027088,0.164583,0.046519
3,1.7937,1.794467,0.164583,0.027088,0.164583,0.046519
4,1.7954,1.792487,0.164583,0.027088,0.164583,0.046519
5,1.7945,1.792063,0.164583,0.027088,0.164583,0.046519


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


  _warn_prf(average, modifier, msg_start, len(result))
[I 2024-05-20 08:15:27,454] Trial 8 finished with value: 0.16458333333333333 and parameters: {'learning_rate': 0.00017369313273270159, 'per_device_train_batch_size': 8, 'num_train_epochs': 5}. Best is trial 4 with value: 0.5729166666666666.
  learning_rate = trial.suggest_loguniform("learning_rate", 10e-5, 20e-5)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Some weights of CamembertForSequenceClassification were not initialized from the model checkpoint at camembert-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3275,1.311947,0.434375,0.445215,0.434375,0.358567
2,1.0893,1.177874,0.482292,0.506085,0.482292,0.460025
3,0.957,1.144054,0.523958,0.543304,0.523958,0.520735
4,0.8629,1.162653,0.540625,0.539554,0.540625,0.531937
5,0.4692,1.236807,0.563542,0.579472,0.563542,0.565289


[I 2024-05-20 08:20:01,504] Trial 9 finished with value: 0.5635416666666667 and parameters: {'learning_rate': 0.00013128694667746682, 'per_device_train_batch_size': 16, 'num_train_epochs': 5}. Best is trial 4 with value: 0.5729166666666666.
  learning_rate = trial.suggest_loguniform("learning_rate", 10e-5, 20e-5)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Some weights of CamembertForSequenceClassification were not initialized from the model checkpoint at camembert-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3645,1.348166,0.401042,0.36814,0.401042,0.318868
2,1.11,1.232006,0.466667,0.493078,0.466667,0.44005
3,0.866,1.255418,0.482292,0.510091,0.482292,0.477854
4,0.8125,1.127237,0.5375,0.558742,0.5375,0.537631
5,0.5592,1.306281,0.555208,0.55922,0.555208,0.543419
6,0.3624,1.449194,0.567708,0.573144,0.567708,0.564639
7,0.1266,1.704041,0.578125,0.599406,0.578125,0.57739


[I 2024-05-20 08:26:38,667] Trial 10 finished with value: 0.578125 and parameters: {'learning_rate': 0.00010502108881020471, 'per_device_train_batch_size': 16, 'num_train_epochs': 7}. Best is trial 10 with value: 0.578125.
  learning_rate = trial.suggest_loguniform("learning_rate", 10e-5, 20e-5)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Some weights of CamembertForSequenceClassification were not initialized from the model checkpoint at camembert-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.367,1.356315,0.420833,0.417001,0.420833,0.3446
2,1.0932,1.205986,0.465625,0.484015,0.465625,0.438395
3,0.9345,1.195978,0.509375,0.54461,0.509375,0.490381
4,0.8463,1.237566,0.54375,0.544817,0.54375,0.533999
5,0.5554,1.341948,0.561458,0.565547,0.561458,0.553779
6,0.4019,1.385425,0.576042,0.590438,0.576042,0.579963
7,0.114,1.651396,0.569792,0.590903,0.569792,0.572529


[I 2024-05-20 08:32:35,257] Trial 11 finished with value: 0.5697916666666667 and parameters: {'learning_rate': 0.0001030286950223985, 'per_device_train_batch_size': 16, 'num_train_epochs': 7}. Best is trial 10 with value: 0.578125.
  learning_rate = trial.suggest_loguniform("learning_rate", 10e-5, 20e-5)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Some weights of CamembertForSequenceClassification were not initialized from the model checkpoint at camembert-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3826,1.377594,0.408333,0.381165,0.408333,0.323935
2,1.1007,1.296531,0.423958,0.456804,0.423958,0.390656
3,0.9093,1.162722,0.515625,0.523983,0.515625,0.507224
4,0.7527,1.172946,0.536458,0.535606,0.536458,0.532523
5,0.5556,1.217887,0.577083,0.578424,0.577083,0.572996
