# Analyzing Car Reviews with LLM's

![image](car.jpeg)

**Car-ing is sharing**, an auto dealership company for car sales and rental, is taking their services to the next level thanks to **Large Language Models (LLMs)**.

As their newly recruited AI and NLP developer, you've been asked to prototype a chatbot app with multiple functionalities that not only assist customers but also provide support to human agents in the company.

The solution should receive textual prompts and use a variety of pre-trained Hugging Face LLMs to respond to a series of tasks, e.g. classifying the sentiment in a car’s text review, answering a customer question, summarizing or translating text, etc.


## Before you start

In order to complete the project you may wish to install some Hugging Face libraries such as `transformers` and `evaluate`.

In [1]:
#!pip install transformers
#!pip install evaluate
#!pip install xformers
!pip install rouge_score

from transformers import logging
logging.set_verbosity(logging.WARNING)

Defaulting to user installation because normal site-packages is not writeable


# Classify Car Reviews
- Text/Sentiment Classification LLM (Enconder Only Arch type)

In [2]:
# Start your code here!

# Import the function for loading Hugging Face pipelines
from transformers import pipeline
import pandas as pd


#read csv
df = pd.read_csv('data/car_reviews.csv', delimiter=';')
#print(df.head(5))

train = df['Review']

# Define the model name
model_name = "distilbert-base-uncased-finetuned-sst-2-english"

# Load the pipeline for sentiment-analysis or text-classification
classifier = pipeline("sentiment-analysis", model=model_name)

predicted_labels = [classifier(x) for x in train]
print(predicted_labels)

label = [x[0]['label'] for x in predicted_labels]

print(label)


Xformers is not installed correctly. If you want to use memorry_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.


[[{'label': 'POSITIVE', 'score': 0.9293975830078125}], [{'label': 'POSITIVE', 'score': 0.8654279708862305}], [{'label': 'POSITIVE', 'score': 0.9994640946388245}], [{'label': 'NEGATIVE', 'score': 0.9935314059257507}], [{'label': 'POSITIVE', 'score': 0.9986565113067627}]]
['POSITIVE', 'POSITIVE', 'POSITIVE', 'NEGATIVE', 'POSITIVE']


In [3]:
# Function Comprehension & Lambda mapping
predictions = list(map(lambda x: 1 if x == 'POSITIVE' else 0,label))
print(predictions)

true_labels = [1 if x == 'POSITIVE' else 0 for x in df['Class']]
print(true_labels)


[1, 1, 1, 0, 1]
[1, 0, 1, 0, 1]


In [4]:
import evaluate

accuracy = evaluate.load("accuracy")
f1 = evaluate.load("f1")

accuracy_result = accuracy.compute(references=true_labels, predictions=predictions)
accuracy_result = accuracy_result['accuracy']

f1_result = f1.compute(references=true_labels, predictions=predictions)
f1_result = f1_result['f1']

print(f"Accuracy: {accuracy_result}")
print(f"F1 Score: {f1_result}")

Accuracy: 0.8
F1 Score: 0.8571428571428571


# Translate a Car Review
- Language Translation LLM (Encoder -> Decoder Arch type)

In [5]:
#Extract the first two sentence in the First review. 
new = df['Review'][0].split('.') #split each sentence by '.'
print(new)

print(end='\n')

first_two = new[0:2]

rev_1 = ".".join(first_two)
print(rev_1)

['I am very satisfied with my 2014 Nissan NV SL', ' I use this van for my business deliveries and personal use', ' Camping, road trips, etc', ' We dont have any children so I store most of the seats in my warehouse', ' I wanted the passenger van for the rear air conditioning', ' We drove our van from Florida to California for a Cross Country trip in 2014', ' We averaged about 18 mpg', ' We drove thru a lot of rain and It was a very comfortable and stable vehicle', ' The V8 Nissan Titan engine is a 500k mile engine', ' It has been tested many times by delivery and trucking companies', ' This is why Nissan gives you a 5 year or 100k mile bumper to bumper warranty', ' Many people are scared about driving this van because of its size', ' But with front and rear sonar sensors, large mirrors and the back up camera', ' It is easy to drive', ' The front and rear sensors also monitor the front and rear sides of the bumpers making it easier to park close to objects', ' Our Nissan NV is a Tow Mon

In [6]:
# loading a Hugging Face LLM into a pipeline for for English-to-Spanish translation
from transformers import pipeline

model_name = "Helsinki-NLP/opus-mt-en-es"

# Define pipeline for Spanish-to-English translation
translator = pipeline('translation', model=model_name,max_length=30)

# Translate the input text
translations = translator(rev_1)


In [7]:
print(translations)

[{'translation_text': 'Estoy muy satisfecho con mi Nissan NV SL 2014. Uso esta camioneta para mis entregas de negocios y uso personal'}]


In [8]:
# Access the output to print the translated text in spanish
translated_review = translations[0]['translation_text']
print(f"Translated Texts: {translated_review}")

print(end='\n')

Translated Texts: Estoy muy satisfecho con mi Nissan NV SL 2014. Uso esta camioneta para mis entregas de negocios y uso personal



In [9]:
#Load References:
with open('data/reference_translations.txt', 'r') as file:
    # Read the content of the file into a variable
    references = file.read()
    references = references.split('\n')
print(references)

['Estoy muy satisfecho con mi Nissan NV SL 2014. Utilizo esta camioneta para mis entregas comerciales y uso personal.', 'Estoy muy satisfecho con mi Nissan NV SL 2014. Uso esta furgoneta para mis entregas comerciales y uso personal.']


In [10]:
import evaluate
bleu = evaluate.load("bleu")

#Note that list of references has to be nested in a list
results = bleu.compute(predictions=[translated_review], references=[references])
print(results)
print(end='\n')
bleu_score = results['bleu']
print(f"Bleu score: {bleu_score}")

{'bleu': 0.7671176261207451, 'precisions': [0.9047619047619048, 0.85, 0.7368421052631579, 0.6111111111111112], 'brevity_penalty': 1.0, 'length_ratio': 1.0, 'translation_length': 21, 'reference_length': 21}

Bleu score: 0.7671176261207451


# Ask a question about a car review
- Extractive Question & Answer LLM (Encoder only Arch type)

In [11]:
#second review serve as context for ExtractiveQ&A LLM
display(df.head(2))
rev2_context = df['Review'][1]
print(rev2_context)


Unnamed: 0,Review,Class
0,I am very satisfied with my 2014 Nissan NV SL....,POSITIVE
1,The car is fine. It's a bit loud and not very ...,NEGATIVE


The car is fine. It's a bit loud and not very powerful. On one hand, compared to its peers, the interior is well-built. The transmission failed a few years ago, and the dealer replaced it under warranty with no issues. Now, about 60k miles later, the transmission is failing again. It sounds like a truck, and the issues are well-documented. The dealer tells me it is normal, refusing to do anything to resolve the issue. After owning the car for 4 years, there are many other vehicles I would purchase over this one. Initially, I really liked what the brand is about: ride quality, reliability, etc. But I will not purchase another one. Despite these concerns, I must say, the level of comfort in the car has always been satisfactory, but not worth the rest of issues found.


In [12]:
# loading a Hugging Face LLM into a pipeline for question-answering (QA).

model_name = "deepset/minilm-uncased-squad2"

# Load the model pipeline for question-answering
qa_model = pipeline('question-answering', model=model_name)

question = "What did he like about the brand?"
context = rev2_context

# Pass the necessary inputs to the LLM pipeline for question-answering
outputs = qa_model(question=question, context=context)

# Access and print the answer
answer = outputs['answer']
print(answer)

ride quality, reliability


___

In [None]:
"""
# optional: extractive QA with Autoclasss

#  Import auto classes 
import torch
from transformers import AutoTokenizer
from transformers import AutoModelForQuestionAnswering

# Instantiate model and tokenizer
model_ckp = "deepset/minilm-uncased-squad2"
tokenizer = AutoTokenizer.from_pretrained(model_ckp)
model = AutoModelForQuestionAnswering.from_pretrained(model_ckp)

# Define context and question, and tokenize them
context = rev2_context
print(f"Context:\n{context}")
question = "What did he like about the brand?"
inputs = tokenizer(question, context, return_tensors="pt")

# Perform inference and extract answer from raw outputs
with torch.no_grad():
  outputs = model(**inputs)
start_idx = torch.argmax(outputs.start_logits)
end_idx = torch.argmax(outputs.end_logits) + 1
answer_span = inputs["input_ids"][0][start_idx:end_idx]

# Decode and show answer
answer = tokenizer.decode(answer_span)
print("Answer: ", answer)
"""

# Summarize and analyze a car review:
- Text Summarization (encoder -> Decoder Arch type)

In [13]:
# Summarize the last review in the dataset
rev_last = df['Review'][4]
print(rev_last)

I've been dreaming of owning an SUV for quite a while, but I've been driving cars that were already paid for during an extended period. I ultimately made the decision to transition to a brand-new car, which, of course, involved taking on new payments. However, given that I don't drive extensively, I was inclined to avoid a substantial financial commitment. The Nissan Rogue provides me with the desired SUV experience without burdening me with an exorbitant payment; the financial arrangement is quite reasonable. Handling and styling are great; I have hauled 12 bags of mulch in the back with the seats down and could have held more. I am VERY satisfied overall. I find myself needing to exercise extra caution when making lane changes, particularly owing to the blind spots resulting from the small side windows situated towards the rear of the vehicle. To address this concern, I am actively engaged in making adjustments to my mirrors and consciously reducing the frequency of lane changes. The

In [14]:
# Example on summarization:  loading a Hugging Face LLM into a pipeline for text summarization. 
from transformers import pipeline

model_name= 't5-small'

# Load the model pipeline for text summarization
summarizer = pipeline("summarization", model=model_name)

# Pass the long text to the model to summarize it
outputs = summarizer(rev_last, min_length=50,max_length=55)

# Access and print the summarized text in the outputs variable
summarized_text = outputs[0]['summary_text'] 
print(summarized_text)

the Nissan Rogue provides me with the desired SUV experience without burdening me with an exorbitant payment . the financial arrangement is quite reasonable; the handling and styling are great; I have hauled 12 bags of mulch in the back with the seats down


In [15]:
# Evaluate the results

import evaluate
rouge = evaluate.load("rouge")

predictions = [summarized_text]

references = [rev_last]

results = rouge.compute(predictions=predictions,references=references)
print(results)

{'rouge1': 0.38565022421524664, 'rouge2': 0.3619909502262443, 'rougeL': 0.37668161434977576, 'rougeLsum': 0.37668161434977576}
