<a href="https://colab.research.google.com/github/Rmostert/LLM-use-cases/blob/main/analyzing_car_reviews_with_llms.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In this notebook I will use a variety of pre-trained Hugging Face LLMs to respond to a series of tasks, including classifying the sentiment in a car’s text review, answering a customer question, summarizing and translating text.


In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
pip install evaluate

Collecting evaluate
  Downloading evaluate-0.4.6-py3-none-any.whl.metadata (9.5 kB)
Downloading evaluate-0.4.6-py3-none-any.whl (84 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.1/84.1 kB[0m [31m6.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: evaluate
Successfully installed evaluate-0.4.6


In [None]:
# Import necessary packages
import pandas as pd
import torch
import evaluate
from transformers import logging,pipeline
logging.set_verbosity(logging.WARNING)

In [None]:
# Read in the review data

car_reviews = pd.read_csv('/content/drive/MyDrive/datasets/car_reviews.csv',sep=';')
display(car_reviews)

Unnamed: 0,Review,Class
0,I am very satisfied with my 2014 Nissan NV SL....,POSITIVE
1,The car is fine. It's a bit loud and not very ...,NEGATIVE
2,"My first foreign car. Love it, I would buy ano...",POSITIVE
3,I've come across numerous reviews praising the...,NEGATIVE
4,I've been dreaming of owning an SUV for quite ...,POSITIVE


The first taks is to classify reviews as positive or negative (sentiment analysis). Ont the Hugging face [website](https://huggingface.co/models?sort=trending&search=Helsinki-NLP%2Fopus-mt-es-en)  there are over 1500 models that can be used for different tasks, like image classification, sentiment analysis, text summarization or translation (what we would call transfer learning). I'll be using the DistilBERT base uncased finetuned SST-2 model. All details around this model - including the details of the model and evaluation results (for benchmarking) can be found [here](https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english)

In [None]:
sentiment_classifier = pipeline(task='sentiment-analysis',model='distilbert-base-uncased-finetuned-sst-2-english')

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Device set to use cpu


In [None]:
# The pipeline expects a list of inputs
reviews = car_reviews['Review'].to_list()

In [None]:
real_labels = car_reviews['Class'].to_list()

In [None]:
# To get the accuracy and other fit metrics, the label 'POSITIVE' needs to be converted to 1 and 0 otherwise
references = [1 if label == "POSITIVE" else 0 for label in real_labels]

In [None]:
predicted_labels = sentiment_classifier(reviews)

In [None]:
predicted_labels

[{'label': 'POSITIVE', 'score': 0.9293978214263916},
 {'label': 'POSITIVE', 'score': 0.8654265403747559},
 {'label': 'POSITIVE', 'score': 0.9994640946388245},
 {'label': 'NEGATIVE', 'score': 0.9935314059257507},
 {'label': 'POSITIVE', 'score': 0.9986565113067627}]

In [None]:
predictions = [1 if item['label'] == "POSITIVE" else 0 for item in predicted_labels]

In [None]:
accuracy = evaluate.load('accuracy')
f1 =  evaluate.load('f1')

Downloading builder script: 0.00B [00:00, ?B/s]

Downloading builder script: 0.00B [00:00, ?B/s]

In [None]:
accuracy_result = accuracy.compute(predictions= predictions, references=references)

In [None]:
print(accuracy_result)

{'accuracy': 0.8}


In [None]:
f1_result = f1.compute(predictions= predictions, references=references)

In [None]:
print(f1_result)

{'f1': 0.8571428571428571}


The next task is to translate the reviews from English into Spanish. We'll use the `opus-mt-en-es` model on Hugging Face. For this task we also have the reference translation for evaluating how well the model translates the first two sencences of the the first review


In [None]:
text_to_translate = '.'.join(reviews[0].split('.')[:2])

In [None]:
text_to_translate

'I am very satisfied with my 2014 Nissan NV SL. I use this van for my business deliveries and personal use'

In [None]:
# Import reference text

with open('/content/drive/MyDrive/datasets/reference_translations.txt', 'r', encoding='utf-8') as f:
    reference_translations = f.readlines()
reference_translations = [line.strip() for line in reference_translations]

In [None]:
reference_translations

['Estoy muy satisfecho con mi Nissan NV SL 2014. Utilizo esta camioneta para mis entregas comerciales y uso personal.',
 'Estoy muy satisfecho con mi Nissan NV SL 2014. Uso esta furgoneta para mis entregas comerciales y uso personal.']

In [None]:
translation_model = pipeline(task='translation_en_to_es',
                             model='Helsinki-NLP/opus-mt-en-es')

config.json: 0.00B [00:00, ?B/s]

pytorch_model.bin:   0%|          | 0.00/312M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/312M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/44.0 [00:00<?, ?B/s]

source.spm:   0%|          | 0.00/802k [00:00<?, ?B/s]

target.spm:   0%|          | 0.00/826k [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

Device set to use cpu


In [None]:
translation_output = translation_model(text_to_translate)

In [None]:
translated_review = translation_output[0]['translation_text']

In [None]:
print(translated_review)

Estoy muy satisfecho con mi Nissan NV SL 2014. Uso esta camioneta para mis entregas de negocios y uso personal


In [None]:
# I will use the BLEU score to evaluate the translation. The evaluate package is a convenient way to evaluate Hugging face models.

bleu = evaluate.load("bleu")

Downloading builder script: 0.00B [00:00, ?B/s]

Downloading extra modules:   0%|          | 0.00/1.55k [00:00<?, ?B/s]

Downloading extra modules: 0.00B [00:00, ?B/s]

In [None]:
bleu_score = bleu.compute(
    predictions=[translated_review],
    references=[reference_translations]
)

In [None]:
print(bleu_score)

{'bleu': 0.7671176261207451, 'precisions': [0.9047619047619048, 0.85, 0.7368421052631579, 0.6111111111111112], 'brevity_penalty': 1.0, 'length_ratio': 1.0, 'translation_length': 21, 'reference_length': 21}


The third use case is question answering. For example we might ask what particular features a user liked in a product.

In [None]:
second_review = reviews[1]

In [None]:
print(second_review)

The car is fine. It's a bit loud and not very powerful. On one hand, compared to its peers, the interior is well-built. The transmission failed a few years ago, and the dealer replaced it under warranty with no issues. Now, about 60k miles later, the transmission is failing again. It sounds like a truck, and the issues are well-documented. The dealer tells me it is normal, refusing to do anything to resolve the issue. After owning the car for 4 years, there are many other vehicles I would purchase over this one. Initially, I really liked what the brand is about: ride quality, reliability, etc. But I will not purchase another one. Despite these concerns, I must say, the level of comfort in the car has always been satisfactory, but not worth the rest of issues found.


In [None]:
qa_model = pipeline(task='question-answering',model='deepset/minilm-uncased-squad2')

config.json:   0%|          | 0.00/477 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

Some weights of the model checkpoint at deepset/minilm-uncased-squad2 were not used when initializing BertForQuestionAnswering: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


tokenizer_config.json:   0%|          | 0.00/107 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Device set to use cpu


In [None]:
question="What did he like about the brand?"
context = second_review


In [None]:
qa_mode_output = qa_model(question=question,context=context)

In [None]:
answer = qa_mode_output['answer']

In [None]:
print(answer)

ride quality, reliability


The qualities the user most liked in the car were ride quality and reliability. This however ended up not being the case for this particular car   

The last taks is to summarize a review. Again Hugging face makes it very easy to these types of tasks, by just substituting the correct taks and model in the pipeline parameters - it's not even neccessary to do tokenisation as a pre-processing step

In [None]:
last_review = reviews[-1]
print(last_review)

I've been dreaming of owning an SUV for quite a while, but I've been driving cars that were already paid for during an extended period. I ultimately made the decision to transition to a brand-new car, which, of course, involved taking on new payments. However, given that I don't drive extensively, I was inclined to avoid a substantial financial commitment. The Nissan Rogue provides me with the desired SUV experience without burdening me with an exorbitant payment; the financial arrangement is quite reasonable. Handling and styling are great; I have hauled 12 bags of mulch in the back with the seats down and could have held more. I am VERY satisfied overall. I find myself needing to exercise extra caution when making lane changes, particularly owing to the blind spots resulting from the small side windows situated towards the rear of the vehicle. To address this concern, I am actively engaged in making adjustments to my mirrors and consciously reducing the frequency of lane changes. The

In [None]:
summarizer = pipeline(task='summarization', model="cnicu/t5-small-booksum")

# Pass the long text to the model. We limit the amound of tokens to 55. This control the lenght of the summarization
output = summarizer(last_review, max_new_tokens=55)

Device set to use cpu


In [None]:
summarized_text = output[0]['summary_text']

In [None]:
print(summarized_text)


the Nissan Rogue provides me with the desired SUV experience without burdening me with an exorbitant payment; the financial arrangement is quite reasonable. I have hauled 12 bags of mulch in the back with the seats down and could have held more. I find myself
