# **Fine tuning Bert model for rating reviews _ (Text-Classification task)**

In [1]:
!pip install --upgrade transformers datasets evaluate huggingface_hub torch

Collecting transformers
  Downloading transformers-4.44.2-py3-none-any.whl.metadata (43 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.7/43.7 kB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting datasets
  Downloading datasets-2.21.0-py3-none-any.whl.metadata (21 kB)
Collecting evaluate
  Downloading evaluate-0.4.2-py3-none-any.whl.metadata (9.3 kB)
Collecting huggingface_hub
  Downloading huggingface_hub-0.24.6-py3-none-any.whl.metadata (13 kB)
Collecting pyarrow>=15.0.0 (from datasets)
  Downloading pyarrow-17.0.0-cp310-cp310-manylinux_2_28_x86_64.whl.metadata (3.3 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess (from datasets)
  Downloading multiprocess-0.70.16-py310-none-any.whl.metadata (7.2 kB)
Downloading transformers-4.44.2-py

### Import libraries :

In [4]:
from datasets import load_dataset
from transformers import AutoTokenizer
from transformers import AutoModelForSequenceClassification
from transformers import TrainingArguments, Trainer
import evaluate
import numpy as np


### Load dataset :

In [None]:
dataset = load_dataset("yelp_review_full")

In [7]:
# print the second row data
print(dataset['train']['text'][1])
print(dataset['train']['label'][1])

Unfortunately, the frustration of being Dr. Goldberg's patient is a repeat of the experience I've had with so many other doctors in NYC -- good doctor, terrible staff.  It seems that his staff simply never answers the phone.  It usually takes 2 hours of repeated calling to get an answer.  Who has time for that or wants to deal with it?  I have run into this problem with many other doctors and I just don't get it.  You have office workers, you have patients with medical needs, why isn't anyone answering the phone?  It's incomprehensible and not work the aggravation.  It's with regret that I feel that I have to give Dr. Goldberg 2 stars.
1


### Tokenize Data :

In [10]:

tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
def tokenizer_function(dataset):
    return tokenizer(dataset["text"], padding="max_length", truncation=True)

tokenized_datasets = dataset.map(tokenizer_function, batched=True)



Map:   0%|          | 0/650000 [00:00<?, ? examples/s]

Map:   0%|          | 0/50000 [00:00<?, ? examples/s]

### Reduce dataset size for training and testing:

In [19]:
red_train_data=tokenized_datasets['train'].shuffle(seed=42).select(range(1000))
red_test_data=tokenized_datasets['test'].shuffle(seed=42).select(range(500))

### Fine tune the bert model using our dataset :

In [20]:
metric=evaluate.load('accuracy')
model=AutoModelForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=5)

def compute_metrics(eval_pred):
    logits,labels=eval_pred
    predictions=np.argmax(logits,axis=-1)
    return metric.compute(predictions=predictions,references=labels)

training_args = TrainingArguments(output_dir="test_trainer")

trainer=Trainer(
    model=model,
    args=training_args,
    train_dataset=red_train_data,
    eval_dataset=red_test_data,
    compute_metrics=metric
)
trainer.train()

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Step,Training Loss


TrainOutput(global_step=375, training_loss=1.6281943359375, metrics={'train_runtime': 305.1992, 'train_samples_per_second': 9.83, 'train_steps_per_second': 1.229, 'total_flos': 789354427392000.0, 'train_loss': 1.6281943359375, 'epoch': 3.0})

### Save the model to hugging face hub :

In [23]:
from huggingface_hub import login
login()


VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [26]:
model.push_to_hub("bfz/bert_based_model_for_rating_reviews")

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/bfz/bert_based_model_for_rating_reviews/commit/eb2af20805cdd03a04a3bc0b5bbf9aece05095b9', commit_message='Upload BertForSequenceClassification', commit_description='', oid='eb2af20805cdd03a04a3bc0b5bbf9aece05095b9', pr_url=None, pr_revision=None, pr_num=None)