**What is Bert**

**BERT is a bidirectional transformer pre-trained using a combination of masked language modeling and next sentence prediction.**

**Why BERT is useful?
There are many tasks that BERT can solve that hugging face provides such as Masked Language Modeling, Next Sentence Prediction, Language Modeling, and Question Answering.**

**how to predict the [MASK] token with Pipeline, AutoModel with pipeline**

In [11]:
import torch
from transformers import pipeline

pipeline = pipeline(
    task="fill-mask",
    model="google-bert/bert-base-uncased",
    dtype=torch.float16,
    device=0
)
pipeline("The capital of Burkina Faso is [MASK].")

Some weights of the model checkpoint at google-bert/bert-base-uncased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cpu


[{'score': 0.5166015625,
  'token': 25307,
  'token_str': 'bam',
  'sequence': 'the capital of burkina faso is bam.'},
 {'score': 0.048980712890625,
  'token': 23089,
  'token_str': 'burkina',
  'sequence': 'the capital of burkina faso is burkina.'},
 {'score': 0.0310211181640625,
  'token': 22773,
  'token_str': 'faso',
  'sequence': 'the capital of burkina faso is faso.'},
 {'score': 0.0221710205078125,
  'token': 8945,
  'token_str': 'bo',
  'sequence': 'the capital of burkina faso is bo.'},
 {'score': 0.0196380615234375,
  'token': 17377,
  'token_str': 'gao',
  'sequence': 'the capital of burkina faso is gao.'}]

In [15]:
#Bert for pretraining
from transformers import AutoTokenizer, BertForPreTraining
import torch

tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased")
model = BertForPreTraining.from_pretrained("google-bert/bert-base-uncased")

inputs = tokenizer("I am Innocent", return_tensors="pt")
outputs = model(**inputs)

prediction_logits = outputs.prediction_logits
seq_relationship_logits = outputs.seq_relationship_logits
prediction_logits

tensor([[[ -6.9213,  -6.8583,  -6.8834,  ...,  -6.2530,  -6.1651,  -4.1071],
         [-14.1652, -14.0619, -14.3757,  ..., -11.8748, -11.7579,  -9.4421],
         [-12.8017, -12.7201, -13.1785,  ..., -10.3830, -11.0236,  -8.1403],
         [-10.7554, -10.8680, -11.1225,  ..., -10.3950, -10.3898,  -5.1232],
         [-12.9326, -12.7705, -12.6410,  ..., -10.4432, -10.7870, -10.8853]]],
       grad_fn=<ViewBackward0>)

In [16]:
seq_relationship_logits

tensor([[ 3.0853, -1.8202]], grad_fn=<AddmmBackward0>)

**Bert for text classification**

In [12]:
import pandas as pd
from datasets import Dataset
from transformers import BertTokenizer, BertForSequenceClassification, Trainer, TrainingArguments
import torch
from sklearn.model_selection import train_test_split
import numpy as np

# Lire le fichier
df = pd.read_csv("messi_commentaire.csv", header=None)

print(f"Shape du DataFrame: {df.shape}")
print(f"Colonnes actuelles: {df.columns.tolist()}")

# Cr√©er des noms de colonnes bas√©s sur le nombre r√©el de colonnes
if len(df.columns) == 1:
    df.columns = ['commentaire']
elif len(df.columns) == 2:
    df.columns = ['commentaire', 'label']
elif len(df.columns) == 3:
    df.columns = ['commentaire', 'auteur', 'date']
else:
    # Pour plus de colonnes, utiliser des noms g√©n√©riques
    df.columns = ['commentaire'] + [f'colonne_{i}' for i in range(1, len(df.columns))]
#df.fillna(0,inplace=True)
# Sauvegarder
#df.to_csv("messi_commentaire.csv", index=False)
#print("Fichier sauvegard√© avec succ√®s!")
#print(df.head())
# Compter le nombre de NaN
nan_count = df.isna().sum().sum()

if nan_count > 0:
    # Trouver les indices des NaN
    nan_indices = df[df.isna().any(axis=1)].index
    
    # Remplacer les 100 premiers NaN par 1
    if len(nan_indices) > 100:
        df.loc[nan_indices[:100]] = df.loc[nan_indices[:100]].fillna(1)
        # Remplacer le reste par 0
        df.loc[nan_indices[100:]] = df.loc[nan_indices[100:]].fillna(0)
    else:
        # S'il y a moins de 100 NaN, tous par 1
        df.loc[nan_indices] = df.loc[nan_indices].fillna(1)
df.to_csv("messi_commentaire.csv", index=False)

Shape du DataFrame: (210, 2)
Colonnes actuelles: [0, 1]


In [13]:
# Convertir en dataset Hugging Face
from datasets import Dataset
import tqdm as notebook_tqdm
dataset = Dataset.from_pandas(df)

In [14]:
# S√©parer en train/test (80% / 20%)
dataset = dataset.train_test_split(test_size=0.2)
train_data = dataset["train"]
test_data = dataset["test"]

In [15]:
#Tokenisation avec BERT
from transformers import BertTokenizer

tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

def tokenize_function(example):
    return tokenizer(example["commentaire"], padding="max_length", truncation=True, max_length=128)

train_encodings = train_data.map(tokenize_function, batched=True)
test_encodings = test_data.map(tokenize_function, batched=True)


Map: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 168/168 [00:00<00:00, 1541.23 examples/s]
Map: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 42/42 [00:00<00:00, 1312.84 examples/s]


In [16]:
#Pr√©parer pour PyTorch
import torch

class TextDataset(torch.utils.data.Dataset):
    def __init__(self, encodings, labels):
        self.encodings = encodings
        self.labels = labels

    def __len__(self):
        return len(self.labels)

    def __getitem__(self, idx):
        item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
        item["labels"] = torch.tensor(self.labels[idx])
        return item

 #Extraire les labels
train_labels = train_encodings["label"]
test_labels = test_encodings["label"]

train_dataset = TextDataset(train_encodings, train_labels)
test_dataset = TextDataset(test_encodings, test_labels)


In [17]:
#Charger BERT pour classification
from transformers import BertForSequenceClassification

# Ajuste num_labels selon ton dataset (ex. 2 pour positif/n√©gatif)
model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=len(set(df["label"])))


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [20]:
# D'abord installer simpletransformers
#!pip install simpletransformers

from simpletransformers.classification import ClassificationModel
import pandas as pd

# 7. Chargement du mod√®le BERT
model = BertForSequenceClassification.from_pretrained(
    "bert-base-uncased", 
    num_labels=2  # Classification binaire (positif/n√©gatif)
)

# 8. Configuration de l'entra√Ænement
training_args = TrainingArguments(
    output_dir='./results',
    num_train_epochs=3,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    warmup_steps=100,
    weight_decay=0.01,
    logging_dir='./logs',
    logging_steps=50,
    # Param√®tres compatibles avec les anciennes versions
    evaluation_strategy="no",  # D√©sactiv√© pour compatibilit√©
    save_strategy="no",        # D√©sactiv√© pour compatibilit√©
    # Alternative pour les versions plus r√©centes:
    # eval_strategy="epoch",   # Pour les versions r√©centes
    # save_strategy="epoch",   # Pour les versions r√©centes
)

# 9. Entra√Ænement avec Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
    tokenizer=tokenizer,
)

# 10. Lancement de l'entra√Ænement
trainer.train()

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


TypeError: TrainingArguments.__init__() got an unexpected keyword argument 'evaluation_strategy'

In [None]:
# 11. √âvaluation
results = trainer.evaluate()
print(f"R√©sultats de l'√©valuation: {results}")

In [26]:
!pip install simpletransformers pandas numpy scikit-learn

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from simpletransformers.classification import ClassificationModel
import torch

# 1. Chargement et pr√©paration des donn√©es
df = pd.read_csv("messi_commentaire.csv", header=None)

if len(df.columns) == 1:
    df.columns = ['text']
elif len(df.columns) == 2:
    df.columns = ['text', 'labels']
elif len(df.columns) == 3:
    df.columns = ['commentaire', 'auteur', 'date']
else:
    # Pour plus de colonnes, utiliser des noms g√©n√©riques
    df.columns = ['text'] + [f'colonne_{i}' for i in range(1, len(df.columns))]
#df.fillna(0,inplace=True)
# Sauvegarder
#df.to_csv("messi_commentaire.csv", index=False)
#print("Fichier sauvegard√© avec succ√®s!")
#print(df.head())
# Compter le nombre de NaN
nan_count = df.isna().sum().sum()

if nan_count > 0:
    # Trouver les indices des NaN
    nan_indices = df[df.isna().any(axis=1)].index
    
    # Remplacer les 100 premiers NaN par 1
    if len(nan_indices) > 100:
        df.loc[nan_indices[:100]] = df.loc[nan_indices[:100]].fillna(1)
        # Remplacer le reste par 0
        df.loc[nan_indices[100:]] = df.loc[nan_indices[100:]].fillna(0)
    else:
        # S'il y a moins de 100 NaN, tous par 1
        df.loc[nan_indices] = df.loc[nan_indices].fillna(1)
# Supprimer les 5 premi√®res lignes du DataFrame
df = df.iloc[5:]

# R√©initialiser l'index apr√®s la suppression
df = df.reset_index(drop=True)

# Nettoyage des donn√©es
#df['text'] = df['text'].fillna('').astype(str)
#df['labels'] = df['labels'].fillna(0).astype(int)

df





Unnamed: 0,text,labels
0,commentaire,date
1,Messi est trop surcot√©,1
2,Il dispara√Æt souvent dans les grands matchs.,1
3,"Sans Barcelone, il n‚Äôaurait jamais eu autant d...",1
4,Il n‚Äôa pas la m√™me influence en s√©lection qu‚Äôe...,1
...,...,...
201,Messi a transcend√© le football.,0
202,Il restera dans les m√©moires √† jamais.,0
203,Messi est l‚Äôessence m√™me du football.,0
204,Il est un mythe vivant.,0


**Avec Simple transformer**

In [None]:
!pip install simpletransformers pandas numpy scikit-learn

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from simpletransformers.classification import ClassificationModel
import torch

# 1. Chargement et pr√©paration des donn√©es
df = pd.read_csv("messi_commentaire.csv", header=None)

# Supprimer les 5 premi√®res lignes si n√©cessaire
df = df.iloc[5:].reset_index(drop=True)

if len(df.columns) == 1:
    df.columns = ['text']
elif len(df.columns) == 2:
    df.columns = ['text', 'labels']
elif len(df.columns) == 3:
    df.columns = ['commentaire', 'auteur', 'date']
else:
    df.columns = ['text'] + [f'colonne_{i}' for i in range(1, len(df.columns))]

# Gestion des valeurs NaN
nan_count = df.isna().sum().sum()
if nan_count > 0:
    nan_indices = df[df.isna().any(axis=1)].index
    if len(nan_indices) > 100:
        df.loc[nan_indices[:100]] = df.loc[nan_indices[:100]].fillna(0)
        df.loc[nan_indices[100:]] = df.loc[nan_indices[100:]].fillna(1)
    else:
        df.loc[nan_indices] = df.loc[nan_indices].fillna(1)

# 2. V√©rification et cr√©ation des labels
print("=== ANALYSE DES DONN√âES ===")
print(f"Shape: {df.shape}")
print(f"Colonnes: {df.columns.tolist()}")

# V√©rifier si la colonne 'labels' existe et est num√©rique
if 'labels' not in df.columns:
    print("Cr√©ation de labels...")
    df['labels'] = np.random.randint(0, 2, size=len(df))
else:
    # Convertir les labels en num√©rique
    try:
        df['labels'] = pd.to_numeric(df['labels'], errors='coerce').fillna(0).astype(int)
    except:
        print("Conversion des labels √©chou√©e, cr√©ation de nouveaux labels...")
        df['labels'] = np.random.randint(0, 2, size=len(df))

# V√©rifier la distribution des labels
label_distribution = df['labels'].value_counts()
print(f"Distribution des labels: {label_distribution.to_dict()}")

# 3. V√©rifier s'il y a assez d'exemples pour chaque classe
min_samples = label_distribution.min()
if min_samples < 2:
    print(f"‚ö†Ô∏è  Classe avec seulement {min_samples} exemple(s). Cr√©ation d'un dataset √©quilibr√©...")
    
    # Cr√©er des labels √©quilibr√©s manuellement
    df['labels'] = np.random.randint(0, 2, size=len(df))
    print(f"Nouvelle distribution: {df['labels'].value_counts().to_dict()}")

# 4. Nettoyage du texte
df['text'] = df['text'].fillna('').astype(str)
df = df[df['text'].str.strip() != '']  # Supprimer les textes vides

print(f"Dataset final: {len(df)} lignes")

# 5. S√©paration train/test SANS stratification
train_df, test_df = train_test_split(
    df[['text', 'labels']], 
    test_size=0.2, 
    random_state=42
    # On retire stratify=df['labels'] pour √©viter l'erreur
)

print(f"Train size: {len(train_df)}, Test size: {len(test_df)}")
print(f"Distribution train: {train_df['labels'].value_counts().to_dict()}")
print(f"Distribution test: {test_df['labels'].value_counts().to_dict()}")

# 6. Entra√Ænement du mod√®le BERT
try:
    model = ClassificationModel(
        'bert', 
        'bert-base-uncased',
        num_labels=len(df['labels'].unique()),
        use_cuda=False,
        args={
            'num_train_epochs': 2,
            'learning_rate': 2e-5,
            'train_batch_size': 8,
            'eval_batch_size': 8,
            'max_seq_length': 128,
            'overwrite_output_dir': True,
            'output_dir': './bert-sentiment-model/',
            'logging_steps': 10,
            'evaluate_during_training': False,
            'manual_seed': 42,
        }
    )

    print("üöÄ D√©but de l'entra√Ænement...")
    model.train_model(train_df, eval_df=test_df)
    print("‚úÖ Entra√Ænement termin√© avec succ√®s!")

    # 7. √âvaluation
    print("üìä √âvaluation du mod√®le...")
    result, model_outputs, wrong_predictions = model.eval_model(test_df)
    print("R√©sultats:")
    for key, value in result.items():
        if isinstance(value, (int, float)):
            print(f"{key}: {value:.4f}")

    # 8. Pr√©dictions
    print("\nüîÆ Tests de pr√©diction:")
    test_texts = [
        "Messi es le meilleur",
        "Messi est tr√®s performant tr√®s technique",
        "Messi est null ,messi n'aucune technique",
        "Messi non je le deteste"
    ]
    
    predictions, raw_outputs = model.predict(test_texts)
    
    for text, pred in zip(test_texts, predictions):
        sentiment = "POSITIF" if pred == 1 else "N√âGATIF"
        print(f"'{text}' -> {sentiment}")

except Exception as e:
    print(f"‚ùå Erreur: {e}")
    
    # Solution de secours avec un tr√®s petit dataset
    print("Tentative avec un dataset minimal...")
    try:
        # Cr√©er un petit dataset √©quilibr√©
        mini_texts = [
            "great amazing wonderful best",  # positif
            "bad terrible awful poor",       # n√©gatif
            "excellent fantastic superb",    # positif
            "horrible disappointing weak"    # n√©gatif
        ]
        mini_labels = [1, 0, 1, 0]
        
        mini_df = pd.DataFrame({'text': mini_texts, 'labels': mini_labels})
        
        mini_model = ClassificationModel(
            'bert', 
            'bert-base-uncased',
            num_labels=2,
            use_cuda=False,
            args={'num_train_epochs': 1, 'train_batch_size': 2}
        )
        
        mini_model.train_model(mini_df)
        print("‚úÖ Entra√Ænement minimal r√©ussi!")
        
        # Test
        predictions, _ = mini_model.predict(["good great", "bad terrible"])
        print(f"Pr√©dictions: {predictions}")
        
    except Exception as e2:
        print(f"‚ùå Erreur m√™me avec le dataset minimal: {e2}")



  from .autonotebook import tqdm as notebook_tqdm


=== ANALYSE DES DONN√âES ===
Shape: (206, 2)
Colonnes: ['text', 'labels']
Distribution des labels: {0: 106, 1: 100}
Dataset final: 206 lignes
Train size: 164, Test size: 42
Distribution train: {0: 83, 1: 81}
Distribution test: {0: 23, 1: 19}


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


üöÄ D√©but de l'entra√Ænement...


0it [00:00, ?it/s]

In [30]:
# 8. Pr√©dictions
print("\nüîÆ Tests de pr√©diction:")
test_texts = [
        "Messi is the best player in history",
        "Terrible performance very disappointing",
        "Amazing skills wonderful game",
        "Bad match poor quality"
]
    
predictions, raw_outputs = model.predict(test_texts)
    
for text, pred in zip(test_texts, predictions):
        sentiment = "POSITIF" if pred == 0 else "N√âGATIF"
        print(f"'{text}' -> {sentiment}")


üîÆ Tests de pr√©diction:


1it [00:17, 17.21s/it]
Predicting: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1/1 [00:00<00:00,  1.04it/s]

'Messi is the best player in history' -> POSITIF
'Terrible performance very disappointing' -> POSITIF
'Amazing skills wonderful game' -> POSITIF
'Bad match poor quality' -> POSITIF





**Deuxi√®me methodologie**

In [2]:
!pip install torch torchvision --index-url https://download.pytorch.org/whl/cu126

Looking in indexes: https://download.pytorch.org/whl/cu126
Collecting torchvision
  Downloading https://download.pytorch.org/whl/cu126/torchvision-0.23.0%2Bcu126-cp311-cp311-win_amd64.whl.metadata (6.3 kB)
Collecting torch
  Downloading https://download.pytorch.org/whl/cu126/torch-2.8.0%2Bcu126-cp311-cp311-win_amd64.whl.metadata (29 kB)
Downloading https://download.pytorch.org/whl/cu126/torchvision-0.23.0%2Bcu126-cp311-cp311-win_amd64.whl (6.2 MB)
   ---------------------------------------- 0.0/6.2 MB ? eta -:--:--
   ---------------------------------------- 0.0/6.2 MB ? eta -:--:--
   --- ------------------------------------ 0.5/6.2 MB 1.7 MB/s eta 0:00:04
   ------ --------------------------------- 1.0/6.2 MB 2.1 MB/s eta 0:00:03
   ---------- ----------------------------- 1.6/6.2 MB 2.3 MB/s eta 0:00:03
   ---------------- ----------------------- 2.6/6.2 MB 3.1 MB/s eta 0:00:02
   ----------------------- ---------------- 3.7/6.2 MB 3.3 MB/s eta 0:00:01
   ---------------------------

ERROR: Exception:
Traceback (most recent call last):
  File "C:\Python311\Lib\site-packages\pip\_vendor\urllib3\response.py", line 438, in _error_catcher
    yield
  File "C:\Python311\Lib\site-packages\pip\_vendor\urllib3\response.py", line 561, in read
    data = self._fp_read(amt) if not fp_closed else b""
           ^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\pip\_vendor\urllib3\response.py", line 527, in _fp_read
    return self._fp.read(amt) if amt is not None else self._fp.read()
           ^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\pip\_vendor\cachecontrol\filewrapper.py", line 102, in read
    self.__buf.write(data)
  File "C:\Python311\Lib\tempfile.py", line 483, in func_wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
OSError: [Errno 28] No space left on device

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Python311\Lib\site-packages\pip\_internal\cli\b

In [3]:
!pip install transformers request Beautifulsoup4 pandas numpy



ERROR: Could not find a version that satisfies the requirement request (from versions: none)
ERROR: No matching distribution found for request


In [6]:
from transformers import AutoTokenizer,AutoModelForSequenceClassification
from bs4 import BeautifulSoup
import torch
import re

In [8]:
tokenizer=AutoTokenizer.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')
model=AutoModelForSequenceClassification.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')

In [16]:
tokens_positif=tokenizer.encode("Messi est le meilleur",return_tensors="pt")
tokens_negatif=tokenizer.encode("Messi est surcot√© ,il est null",return_tensors="pt")

In [17]:
tokens_positif
tokens_negatif

tensor([[  101, 63989, 10182, 10344, 34394, 10111,   117, 10145, 10182, 47985,
           102]])

In [18]:
tokenizer.decode(tokens_positif[0])
tokenizer.decode(tokens_negatif[0])

'[CLS] messi est surcote, il est null [SEP]'

In [19]:
result_positif=model(tokens_positif)
result_negatif=model(tokens_negatif)

In [20]:
result_positif.logits

tensor([[-1.6402, -1.9029, -0.5921,  0.7712,  2.8600]],
       grad_fn=<AddmmBackward0>)

In [21]:
torch.argmax(result_positif.logits)

tensor(4)

In [22]:
torch.argmax(result_negatif.logits)

tensor(0)