## Fine-tune for Sentiment Analysis

As a first step, we install the specific libraries necessary to make this example work.

* accelerate is a distributed training library for PyTorch by HuggingFace. It allows you to train your models on multiple GPUs or CPUs in parallel (distributed configurations), which can significantly speed up training in presence of multiple GPUs (we won't use it in our example).
* peft is a Python library by HuggingFace for efficient adaptation of pre-trained language models (PLMs) to various downstream applications without fine-tuning all the model's parameters. PEFT methods only fine-tune a small number of (extra) model parameters, thereby greatly decreasing the computational and storage costs.
* bitsandbytes by Tim Dettmers, is a lightweight wrapper around CUDA custom functions, in particular 8-bit optimizers, matrix multiplication (LLM.int8()), and quantization functions. It allows to run models stored in 4-bit precision: while 4-bit bitsandbytes stores weights in 4-bits, the computation still happens in 16 or 32-bit and here any combination can be chosen (float16, bfloat16, float32, and so on).
* transformers is a Python library for natural language processing (NLP). It provides a number of pre-trained models for NLP tasks such as text classification, question answering, and machine translation.
* trl is a full stack library by HuggingFace providing a set of tools to train transformer language models with Reinforcement Learning, from the Supervised Fine-tuning step (SFT), Reward Modeling step (RM) to the Proximal Policy Optimization (PPO) step.

## Installations and imports

In [None]:
!pip install -q -U "torch==2.1.2" tensorboard
!pip install -q -U git+https://github.com/huggingface/trl@a3c5b7178ac4f65569975efadc97db2f3749c65e
!pip install -q -U git+https://github.com/huggingface/peft@4a1559582281fc3c9283892caea8ccef1d6f5a4f
!pip install -q -U "transformers==4.36.2" "datasets==2.16.1" "accelerate==0.26.1" "bitsandbytes==0.42.0"

In [1]:
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
os.environ["TOKENIZERS_PARALLELISM"] = "false"

In [2]:
import warnings
warnings.filterwarnings("ignore")

In [3]:
import pandas as pd
import os
from tqdm import tqdm
import numpy as np
import torch
import torch.nn as nn
import transformers
from sklearn.metrics import (accuracy_score,
                             classification_report,
                             confusion_matrix)
from sklearn.model_selection import train_test_split


In [4]:
from transformers import AutoModelForSequenceClassification, AutoTokenizer, TrainingArguments, Trainer, pipeline
from transformers import (AutoModelForCausalLM,
                          AutoTokenizer,
                          BitsAndBytesConfig,
                          TrainingArguments,
                          pipeline,
                          logging)

In [5]:
from peft import LoraConfig, PeftConfig
from datasets import Dataset
from trl import SFTTrainer
from trl import setup_chat_format


In [6]:
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from peft import LoraConfig, get_peft_model


# Ensure everything is on CPU
device = torch.device("cpu")

# Load model and tokenizer, explicitly setting model to CPU
model_name = "distilbert-base-uncased"
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=3)
model.to(device)  # Move model to CPU
tokenizer = AutoTokenizer.from_pretrained(model_name)


Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.bias', 'pre_classifier.weight', 'classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [8]:
# Load dataset
filename = "all-data.csv"
df = pd.read_csv(filename, names=["sentiment", "text"], encoding="utf-8", encoding_errors="replace")

#Convert sentiment labels to numbers (required for training)
label_map = {"positive": 2, "neutral": 1, "negative": 0}
df["label"] = df["sentiment"].map(label_map)

# Ensure there are no missing labels
if df["label"].isnull().any():
    raise ValueError("Error: Some labels are missing from dataset mapping!")

# Split dataset into training, testing, and evaluation sets
X_train, X_test = [], []
for sentiment in ["positive", "neutral", "negative"]:
    train, test = train_test_split(df[df.sentiment == sentiment], train_size=300, test_size=300, random_state=42)
    X_train.append(train)
    X_test.append(test)

X_train = pd.concat(X_train).sample(frac=1, random_state=10).reset_index(drop=True)
X_test = pd.concat(X_test)

eval_idx = [idx for idx in df.index if idx not in list(X_train.index) + list(X_test.index)]
X_eval = df[df.index.isin(eval_idx)].groupby("sentiment", group_keys=False).apply(lambda x: x.sample(n=50, random_state=10, replace=True))

# Create train/eval datasets
train_df = X_train[["text", "label"]]
eval_df = X_eval[["text", "label"]]
test_df = X_test[["text"]]

# Define Tokenization Function
def preprocess_function(examples):
    return tokenizer(examples["text"], padding=True, truncation=True, max_length=512)


# Convert pandas DataFrame to Hugging Face Dataset format
train_data = Dataset.from_pandas(train_df).map(preprocess_function, batched=True)
eval_data = Dataset.from_pandas(eval_df).map(preprocess_function, batched=True)

# Debug: Ensure dataset is not empty
print(f" Training set size: {len(train_data)}, Evaluation set size: {len(eval_data)}")
if len(train_data) == 0 or len(eval_data) == 0:
    raise ValueError(" Error: Training or Evaluation dataset is empty!")

Map: 100%|██████████| 900/900 [00:00<00:00, 6570.73 examples/s]
Map: 100%|██████████| 150/150 [00:00<00:00, 8350.97 examples/s]

 Training set size: 900, Evaluation set size: 150





In [9]:
def evaluate(y_true, y_pred):
    labels = ['positive', 'neutral', 'negative']
    mapping = {'positive': 2, 'neutral': 1, 'none':1, 'negative': 0}
    def map_func(x):
        return mapping.get(x, 1)

    y_true = np.vectorize(map_func)(y_true)
    y_pred = np.vectorize(map_func)(y_pred)

    # Calculate accuracy
    accuracy = accuracy_score(y_true=y_true, y_pred=y_pred)
    print(f'Accuracy: {accuracy:.3f}')

    # Generate accuracy report
    unique_labels = set(y_true)  # Get unique labels

    for label in unique_labels:
        label_indices = [i for i in range(len(y_true))
                         if y_true[i] == label]
        label_y_true = [y_true[i] for i in label_indices]
        label_y_pred = [y_pred[i] for i in label_indices]
        accuracy = accuracy_score(label_y_true, label_y_pred)
        print(f'Accuracy for label {label}: {accuracy:.3f}')

    # Generate classification report
    class_report = classification_report(y_true=y_true, y_pred=y_pred)
    print('\nClassification Report:')
    print(class_report)

    # Generate confusion matrix
    conf_matrix = confusion_matrix(y_true=y_true, y_pred=y_pred, labels=[0, 1, 2])
    print('\nConfusion Matrix:')
    print(conf_matrix)

## Testing the model without fine-tuning

In [10]:
# Load Pretrained Model and Tokenizer for Classification
model_name = "distilbert-base-uncased"  # Alternatively, use "bert-base-uncased"
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=3)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Configure LoRA for Efficient Fine-Tuning
peft_config = LoraConfig(
    r=8,
    lora_alpha=16,
    lora_dropout=0.1,
    bias="none",
    target_modules=[
        "q_lin",  # Query projection in DistilBERT
        "k_lin",  # Key projection in DistilBERT
        "v_lin",  # Value projection in DistilBERT
        "out_lin",  # Output projection from attention
        "ffn.lin1",  # First layer in feed-forward network
        "ffn.lin2",  # Second layer in feed-forward network
    ],
    task_type="SEQ_CLS",  # Sequence classification
)

# Apply LoRA to the model
model = get_peft_model(model, peft_config)
device = torch.device("cpu")  # 仅使用 CPU
model.to(device)


Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.bias', 'pre_classifier.weight', 'classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


'NoneType' object has no attribute 'cadam32bit_grad_fp32'


PeftModelForSequenceClassification(
  (base_model): LoraModel(
    (model): DistilBertForSequenceClassification(
      (distilbert): DistilBertModel(
        (embeddings): Embeddings(
          (word_embeddings): Embedding(30522, 768, padding_idx=0)
          (position_embeddings): Embedding(512, 768)
          (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
        (transformer): Transformer(
          (layer): ModuleList(
            (0-5): 6 x TransformerBlock(
              (attention): MultiHeadSelfAttention(
                (dropout): Dropout(p=0.1, inplace=False)
                (q_lin): lora.Linear(
                  (base_layer): Linear(in_features=768, out_features=768, bias=True)
                  (lora_dropout): ModuleDict(
                    (default): Dropout(p=0.1, inplace=False)
                  )
                  (lora_A): ModuleDict(
                    (default): Linear(in_features=768

In [11]:
# Define sentiment label mapping
label_mapping = {0: "negative", 1: "neutral", 2: "positive"}

def predict_sentiment(test_df, model, tokenizer, mapping):
    """
    Perform inference using a fine-tuned model and return sentiment analysis labels.

    Args:
        test_df (DataFrame): DataFrame containing test text samples.
        model (transformers.PreTrainedModel): Fine-tuned model.
        tokenizer (transformers.PreTrainedTokenizer): Tokenizer for text preprocessing.
        mapping (dict): Mapping from numeric labels to sentiment strings.

    Returns:
        list: Predicted sentiment labels as strings.
    """
    model.eval()  # Set model to evaluation mode
    y_pred = []

    for text in tqdm(test_df["text"]):
        # Tokenize input text and convert to tensors, ensuring it's on CPU
        inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512)
        # Move all tensors to CPU explicitly
        for key in inputs:
            inputs[key] = inputs[key].to(torch.device("cpu"))
        # Disable gradient calculation for efficient inference
        with torch.no_grad():
            outputs = model(**inputs)
        logits = outputs.logits.to("cpu")
        # Get the predicted numeric label
        numeric_label = torch.argmax(logits, dim=1).item()
        # Map the numeric label to sentiment string
        sentiment_label = mapping[numeric_label]
        y_pred.append(sentiment_label)

    return y_pred


y_pred_sentiment = predict_sentiment(X_test, model, tokenizer, label_mapping)
print("Predicted sentiment labels:")
print(y_pred_sentiment)

100%|██████████| 900/900 [01:22<00:00, 10.93it/s]

Predicted sentiment labels:
['positive', 'positive', 'positive', 'negative', 'positive', 'negative', 'positive', 'positive', 'positive', 'positive', 'positive', 'negative', 'negative', 'positive', 'positive', 'positive', 'positive', 'positive', 'positive', 'negative', 'negative', 'positive', 'positive', 'positive', 'positive', 'positive', 'negative', 'positive', 'positive', 'negative', 'positive', 'positive', 'positive', 'negative', 'negative', 'negative', 'positive', 'positive', 'positive', 'positive', 'positive', 'positive', 'positive', 'positive', 'positive', 'positive', 'positive', 'positive', 'positive', 'positive', 'positive', 'positive', 'positive', 'positive', 'positive', 'positive', 'positive', 'positive', 'positive', 'positive', 'negative', 'positive', 'positive', 'positive', 'positive', 'positive', 'positive', 'positive', 'positive', 'positive', 'negative', 'positive', 'positive', 'positive', 'positive', 'positive', 'positive', 'negative', 'positive', 'negative', 'positive',




In [12]:
# Define sentiment label mapping
label_mapping = {"negative": 0, "neutral": 1, "positive": 2}

# Convert predicted sentiment labels (strings) to numerical labels
y_pred_numeric = np.array([label_mapping[label] for label in y_pred_sentiment])

# Convert true labels to numpy array
y_true = np.array(X_test["label"].values)

# Ensure lengths match (to avoid dimension mismatches)
if len(y_true) != len(y_pred_numeric):
    print(f"⚠️ Warning: Mismatched lengths! y_true: {len(y_true)}, y_pred: {len(y_pred_numeric)}")
    min_length = min(len(y_true), len(y_pred_numeric))
    y_true = y_true[:min_length]
    y_pred_numeric = y_pred_numeric[:min_length]

# Compute overall accuracy
accuracy = accuracy_score(y_true, y_pred_numeric)
print(f"\n Overall Accuracy: {accuracy:.3f}")

# Print classification report
print("\n Classification Report:")
print(classification_report(y_true, y_pred_numeric, target_names=["negative", "neutral", "positive"]))

# Print confusion matrix
print("\n Confusion Matrix:")
print(confusion_matrix(y_true, y_pred_numeric, labels=[0, 1, 2]))

# Compute accuracy for each label
unique_labels = np.unique(y_true)
for label in unique_labels:
    label_indices = [i for i in range(len(y_true)) if y_true[i] == label]
    label_y_true = [y_true[i] for i in label_indices]
    label_y_pred = [y_pred_numeric[i] for i in label_indices]
    label_accuracy = accuracy_score(label_y_true, label_y_pred)
    print(f" Accuracy for label {label} ({list(label_mapping.keys())[list(label_mapping.values()).index(label)]}): {label_accuracy:.3f}")


 Overall Accuracy: 0.310

 Classification Report:
              precision    recall  f1-score   support

    negative       0.28      0.20      0.23       300
     neutral       0.00      0.00      0.00       300
    positive       0.32      0.73      0.44       300

    accuracy                           0.31       900
   macro avg       0.20      0.31      0.23       900
weighted avg       0.20      0.31      0.23       900


 Confusion Matrix:
[[ 59   0 241]
 [ 71   0 229]
 [ 80   0 220]]
 Accuracy for label 0 (negative): 0.197
 Accuracy for label 1 (neutral): 0.000
 Accuracy for label 2 (positive): 0.733


## Fine-tuning

In the next cell we set everything ready for the fine-tuning. We configures and initializes a Simple Fine-tuning Trainer (SFTTrainer) for training a large language model using the Parameter-Efficient Fine-Tuning (PEFT) method, which should save time as it operates on a reduced number of parameters compared to the model's overall size. The PEFT method focuses on refining a limited set of (additional) model parameters, while keeping the majority of the pre-trained LLM parameters fixed. This significantly reduces both computational and storage expenses. Additionally, this strategy addresses the challenge of catastrophic forgetting, which often occurs during the complete fine-tuning of LLMs.

PEFTConfig:

The peft_config object specifies the parameters for PEFT. The following are some of the most important parameters:

* lora_alpha: The learning rate for the LoRA update matrices.
* lora_dropout: The dropout probability for the LoRA update matrices.
* r: The rank of the LoRA update matrices.
* bias: The type of bias to use. The possible values are none, additive, and learned.
* task_type: The type of task that the model is being trained for. The possible values are CAUSAL_LM and MASKED_LM.

TrainingArguments:

The training_arguments object specifies the parameters for training the model. The following are some of the most important parameters:

* output_dir: The directory where the training logs and checkpoints will be saved.
* num_train_epochs: The number of epochs to train the model for.
* per_device_train_batch_size: The number of samples in each batch on each device.
* gradient_accumulation_steps: The number of batches to accumulate gradients before updating the model parameters.
* optim: The optimizer to use for training the model.
* save_steps: The number of steps after which to save a checkpoint.
* logging_steps: The number of steps after which to log the training metrics.
* learning_rate: The learning rate for the optimizer.
* weight_decay: The weight decay parameter for the optimizer.
* fp16: Whether to use 16-bit floating-point precision.
* bf16: Whether to use BFloat16 precision.
* max_grad_norm: The maximum gradient norm.
* max_steps: The maximum number of steps to train the model for.
* warmup_ratio: The proportion of the training steps to use for warming up the learning rate.
* group_by_length: Whether to group the training samples by length.
* lr_scheduler_type: The type of learning rate scheduler to use.
* report_to: The tools to report the training metrics to.
* evaluation_strategy: The strategy for evaluating the model during training.

SFTTrainer:

The SFTTrainer is a custom trainer class from the TRL library. It is used to train large language models (also using the PEFT method).

The SFTTrainer object is initialized with the following arguments:

* model: The model to be trained.
* train_dataset: The training dataset.
* eval_dataset: The evaluation dataset.
* peft_config: The PEFT configuration.
* dataset_text_field: The name of the text field in the dataset.
* tokenizer: The tokenizer to use.
* args: The training arguments.
* packing: Whether to pack the training samples.
* max_seq_length: The maximum sequence length.

Once the SFTTrainer object is initialized, it can be used to train the model by calling the train() method

In [13]:
output_dir = "./model_output"  # Define a directory for model output

training_arguments = TrainingArguments(
    output_dir=output_dir,
    num_train_epochs=3,
    per_device_train_batch_size=8,
    gradient_accumulation_steps=2,
    gradient_checkpointing=False,
    optim="adamw_torch",
    save_steps=100,  # Save every 100 steps to show progress
    logging_steps=10,  # Log progress every 10 steps
    logging_dir="./logs",  # Directory for logs
    logging_first_step=True,  # Log first step
    logging_strategy="steps",  # Log progress every few steps
    learning_rate=2e-5,
    weight_decay=0.01,
    max_grad_norm=1.0,
    warmup_ratio=0.1,
    group_by_length=False,
    lr_scheduler_type="linear",
    report_to="tensorboard",
    evaluation_strategy="epoch",
)

trainer = Trainer(
    model=model,
    args=training_arguments,
    train_dataset=train_data,
    eval_dataset=eval_data,
    tokenizer=tokenizer,
)

# Train with built-in progress bar
trainer.train()


  0%|          | 0/168 [00:00<?, ?it/s]You're using a DistilBertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
  1%|          | 1/168 [00:08<24:04,  8.65s/it]

{'loss': 1.0868, 'learning_rate': 1.1764705882352942e-06, 'epoch': 0.02}


  6%|▌         | 10/168 [00:11<01:03,  2.49it/s]

{'loss': 1.1052, 'learning_rate': 1.1764705882352942e-05, 'epoch': 0.18}


 12%|█▏        | 20/168 [00:14<00:43,  3.41it/s]

{'loss': 1.1, 'learning_rate': 1.960264900662252e-05, 'epoch': 0.35}


 18%|█▊        | 30/168 [00:17<00:40,  3.42it/s]

{'loss': 1.1044, 'learning_rate': 1.827814569536424e-05, 'epoch': 0.53}


 24%|██▍       | 40/168 [00:20<00:37,  3.42it/s]

{'loss': 1.0912, 'learning_rate': 1.6953642384105963e-05, 'epoch': 0.71}


 30%|██▉       | 50/168 [00:23<00:33,  3.49it/s]

{'loss': 1.0857, 'learning_rate': 1.5629139072847682e-05, 'epoch': 0.88}


                                                
 34%|███▍      | 57/168 [00:28<02:17,  1.24s/it]

{'eval_loss': 1.0791126489639282, 'eval_runtime': 1.5431, 'eval_samples_per_second': 97.207, 'eval_steps_per_second': 12.313, 'epoch': 0.99}


 36%|███▌      | 60/168 [00:29<01:05,  1.64it/s]

{'loss': 1.0804, 'learning_rate': 1.4304635761589404e-05, 'epoch': 1.06}


 42%|████▏     | 70/168 [00:32<00:29,  3.37it/s]

{'loss': 1.0749, 'learning_rate': 1.2980132450331127e-05, 'epoch': 1.24}


 48%|████▊     | 80/168 [00:34<00:25,  3.51it/s]

{'loss': 1.0762, 'learning_rate': 1.1655629139072849e-05, 'epoch': 1.42}


 54%|█████▎    | 90/168 [00:37<00:22,  3.53it/s]

{'loss': 1.0628, 'learning_rate': 1.033112582781457e-05, 'epoch': 1.59}


 60%|█████▉    | 100/168 [00:40<00:19,  3.53it/s]

{'loss': 1.0673, 'learning_rate': 9.006622516556293e-06, 'epoch': 1.77}


 65%|██████▌   | 110/168 [00:44<00:17,  3.31it/s]

{'loss': 1.0622, 'learning_rate': 7.682119205298014e-06, 'epoch': 1.95}


                                                 
 67%|██████▋   | 113/168 [00:46<00:15,  3.65it/s]

{'eval_loss': 1.055053472518921, 'eval_runtime': 0.79, 'eval_samples_per_second': 189.883, 'eval_steps_per_second': 24.052, 'epoch': 2.0}


 71%|███████▏  | 120/168 [00:48<00:17,  2.78it/s]

{'loss': 1.0561, 'learning_rate': 6.357615894039736e-06, 'epoch': 2.12}


 77%|███████▋  | 130/168 [00:51<00:11,  3.33it/s]

{'loss': 1.0459, 'learning_rate': 5.033112582781458e-06, 'epoch': 2.3}


 83%|████████▎ | 140/168 [00:54<00:07,  3.50it/s]

{'loss': 1.0433, 'learning_rate': 3.708609271523179e-06, 'epoch': 2.48}


 89%|████████▉ | 150/168 [00:57<00:05,  3.52it/s]

{'loss': 1.0484, 'learning_rate': 2.384105960264901e-06, 'epoch': 2.65}


 95%|█████████▌| 160/168 [01:00<00:02,  3.51it/s]

{'loss': 1.0451, 'learning_rate': 1.0596026490066227e-06, 'epoch': 2.83}


                                                 
100%|██████████| 168/168 [01:03<00:00,  2.65it/s]

{'eval_loss': 1.043080449104309, 'eval_runtime': 0.7851, 'eval_samples_per_second': 191.068, 'eval_steps_per_second': 24.202, 'epoch': 2.97}
{'train_runtime': 63.4989, 'train_samples_per_second': 42.52, 'train_steps_per_second': 2.646, 'train_loss': 1.0705707640874953, 'epoch': 2.97}





TrainOutput(global_step=168, training_loss=1.0705707640874953, metrics={'train_runtime': 63.4989, 'train_samples_per_second': 42.52, 'train_steps_per_second': 2.646, 'train_loss': 1.0705707640874953, 'epoch': 2.97})

In [14]:
model = model.cpu()  # This should move all parameters to CPU

# Optionally, verify that all parameters are on CPU:
for name, param in model.named_parameters():
    if param.device.type != "cpu":
        print(f"Parameter {name} is on {param.device}")
import os
os.environ["PYTORCH_ENABLE_MPS_FALLBACK"] = "1"
# Manual inference function (fully CPU-only execution)
def predict(test_df, model, tokenizer):
    """
    Perform inference using a fine-tuned model, ensuring full CPU execution.

    Args:
        test_df (DataFrame): DataFrame containing test text samples.
        model (transformers.PreTrainedModel): Fine-tuned model.
        tokenizer (transformers.PreTrainedTokenizer): Tokenizer for text preprocessing.

    Returns:
        list: Predicted sentiment labels.
    """
    model.eval()  # Set model to evaluation mode
    y_pred = []

    for text in tqdm(test_df["text"]):
        # Tokenize input text and convert to tensors, ensuring it's on CPU
        inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512)

        # Move all tensors to CPU explicitly
        for key in inputs:
            inputs[key] = inputs[key].to(torch.device("cpu"))

        # Disable gradient calculation for efficient inference
        with torch.no_grad():
            outputs = model(**inputs)

        logits = outputs.logits.to("cpu")  # Ensure logits are on CPU
        predicted_label = torch.argmax(logits, dim=1).item()  # Get highest probability label
        y_pred.append(predicted_label)

    return y_pred

In [15]:
# Run inference on CPU
y_pred = predict(X_test, model, tokenizer)

100%|██████████| 900/900 [01:24<00:00, 10.63it/s]


In [None]:
evaluate(y_true, y_pred)

100%|██████████| 900/900 [03:51<00:00,  3.89it/s]

Accuracy: 0.847
Accuracy for label 0: 0.890
Accuracy for label 1: 0.870
Accuracy for label 2: 0.780

Classification Report:
              precision    recall  f1-score   support

           0       0.96      0.89      0.92       300
           1       0.73      0.87      0.79       300
           2       0.88      0.78      0.83       300

    accuracy                           0.85       900
   macro avg       0.86      0.85      0.85       900
weighted avg       0.86      0.85      0.85       900


Confusion Matrix:
[[267  31   2]
 [ 10 261  29]
 [  1  65 234]]





## Save Model

In [18]:
# Save model
model.save_pretrained("sentiment_model")

# Save tokenizer (use the correct tokenizer class)
tokenizer.save_pretrained("sentiment_model")

('sentiment_model/tokenizer_config.json',
 'sentiment_model/special_tokens_map.json',
 'sentiment_model/vocab.txt',
 'sentiment_model/added_tokens.json',
 'sentiment_model/tokenizer.json')