![image](car.jpeg)

**Car-ing is sharing**, an auto dealership company for car sales and rental, is taking their services to the next level thanks to **Large Language Models (LLMs)**.

As their newly recruited AI and NLP developer, you've been asked to prototype a chatbot app with multiple functionalities that not only assist customers but also provide support to human agents in the company.

The solution should receive textual prompts and use a variety of pre-trained Hugging Face LLMs to respond to a series of tasks, e.g. classifying the sentiment in a car’s text review, answering a customer question, summarizing or translating text, etc.


In [56]:
# Import necessary packages
import pandas as pd
import torch
from sklearn.metrics import accuracy_score, f1_score
from transformers import pipeline

from transformers import logging
logging.set_verbosity(logging.WARNING)

In [57]:
# Start your code here!
sentiment_model = pipeline("sentiment-analysis")

df = pd.read_csv("data/car_reviews.csv", sep=";")
texts = df["Review"].tolist()
true_labels = df["Class"].map({"POSITIVE": 1, "NEGATIVE": 0}).tolist()

predicted_labels = sentiment_model(texts)
predictions = [1 if pred["label"] == "POSITIVE" else 0 for pred in predicted_labels]

accuracy_result = accuracy_score(true_labels, predictions)
f1_result = f1_score(true_labels, predictions)

print(accuracy_result, f1_result)

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


0.8 0.8571428571428571


In [58]:
from transformers import MarianMTModel, MarianTokenizer
import nltk
nltk.download('punkt')
from nltk.tokenize import sent_tokenize
from nltk.translate.bleu_score import sentence_bleu

model_name = "Helsinki-NLP/opus-mt-en-es"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)

[nltk_data] Downloading package punkt to /home/repl/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [59]:
first_review = df["Review"].iloc[0]
print(first_review)

I am very satisfied with my 2014 Nissan NV SL. I use this van for my business deliveries and personal use. Camping, road trips, etc. We dont have any children so I store most of the seats in my warehouse. I wanted the passenger van for the rear air conditioning. We drove our van from Florida to California for a Cross Country trip in 2014. We averaged about 18 mpg. We drove thru a lot of rain and It was a very comfortable and stable vehicle. The V8 Nissan Titan engine is a 500k mile engine. It has been tested many times by delivery and trucking companies. This is why Nissan gives you a 5 year or 100k mile bumper to bumper warranty. Many people are scared about driving this van because of its size. But with front and rear sonar sensors, large mirrors and the back up camera. It is easy to drive. The front and rear sensors also monitor the front and rear sides of the bumpers making it easier to park close to objects. Our Nissan NV is a Tow Monster. It pulls our 5000 pound travel trailer li

In [60]:
nltk.download('punkt')
nltk.download('punkt_tab')
sent_tokenize(first_review)

[nltk_data] Downloading package punkt to /home/repl/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to /home/repl/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!


['I am very satisfied with my 2014 Nissan NV SL.',
 'I use this van for my business deliveries and personal use.',
 'Camping, road trips, etc.',
 'We dont have any children so I store most of the seats in my warehouse.',
 'I wanted the passenger van for the rear air conditioning.',
 'We drove our van from Florida to California for a Cross Country trip in 2014.',
 'We averaged about 18 mpg.',
 'We drove thru a lot of rain and It was a very comfortable and stable vehicle.',
 'The V8 Nissan Titan engine is a 500k mile engine.',
 'It has been tested many times by delivery and trucking companies.',
 'This is why Nissan gives you a 5 year or 100k mile bumper to bumper warranty.',
 'Many people are scared about driving this van because of its size.',
 'But with front and rear sonar sensors, large mirrors and the back up camera.',
 'It is easy to drive.',
 'The front and rear sensors also monitor the front and rear sides of the bumpers making it easier to park close to objects.',
 'Our Nissan 

In [61]:
first_two_sentences = " ".join(sent_tokenize(first_review)[:2])
print(first_two_sentences)

I am very satisfied with my 2014 Nissan NV SL. I use this van for my business deliveries and personal use.


In [62]:
inputs = tokenizer(first_two_sentences, return_tensors="pt", truncation=True, max_length=512)
inputs

{'input_ids': tensor([[   33,   675,   310, 11684,    41,   125,  2262, 40906,    25, 25366,
         36957,     3,    33,   268,    58,  1999,    23,   125,   770, 34543,
            10,   429,   268,     3,     0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1]])}

In [63]:
translated_tokens = model.generate(inputs["input_ids"], max_length=512, num_beams=4, early_stopping=True)
translated_tokens

tensor([[65000,  1686,   239, 22669,    29,   155, 40906,   716,  1239,   175,
           399,  8356,  9652,   135, 26327,    24,  1071,  3678,     9,     4,
          4553,    11,   694,   429,     3,     0]])

In [64]:
translated_review = tokenizer.decode(translated_tokens[0], skip_special_tokens=True)
translated_review

'Estoy muy satisfecho con mi Nissan NV SL 2014. Uso esta camioneta para mis entregas de negocios y uso personal.'

In [65]:
with open("data/reference_translations.txt", "r", encoding="utf-8") as f:
    reference_texts = f.readlines()

reference_sentences = [line.strip().split() for line in reference_texts if line.strip()]
print(reference_sentences)
reference = reference_sentences[0]  # Assuming 1-to-1 mapping with first review
print(reference)

[['Estoy', 'muy', 'satisfecho', 'con', 'mi', 'Nissan', 'NV', 'SL', '2014.', 'Utilizo', 'esta', 'camioneta', 'para', 'mis', 'entregas', 'comerciales', 'y', 'uso', 'personal.'], ['Estoy', 'muy', 'satisfecho', 'con', 'mi', 'Nissan', 'NV', 'SL', '2014.', 'Uso', 'esta', 'furgoneta', 'para', 'mis', 'entregas', 'comerciales', 'y', 'uso', 'personal.']]
['Estoy', 'muy', 'satisfecho', 'con', 'mi', 'Nissan', 'NV', 'SL', '2014.', 'Utilizo', 'esta', 'camioneta', 'para', 'mis', 'entregas', 'comerciales', 'y', 'uso', 'personal.']


In [66]:
candidate = translated_review.split()
print(candidate)

['Estoy', 'muy', 'satisfecho', 'con', 'mi', 'Nissan', 'NV', 'SL', '2014.', 'Uso', 'esta', 'camioneta', 'para', 'mis', 'entregas', 'de', 'negocios', 'y', 'uso', 'personal.']


In [67]:
bleu_score = sentence_bleu([reference], candidate)
bleu_score

0.6514613449066712

In [68]:
print(reference)

['Estoy', 'muy', 'satisfecho', 'con', 'mi', 'Nissan', 'NV', 'SL', '2014.', 'Utilizo', 'esta', 'camioneta', 'para', 'mis', 'entregas', 'comerciales', 'y', 'uso', 'personal.']


In [69]:
import evaluate
bleu = evaluate.load("bleu")

In [70]:
from evaluate import load

bleu = load("bleu")

candidate = "Estoy muy satisfecho con mi Nissan NV SL 2014. Uso esta camioneta para mis entregas de negocios y uso personal."
reference = "Estoy muy satisfecho con mi Nissan NV SL 2014. Utilizo esta camioneta para mis entregas comerciales y uso personal."

# ✅ CORRECTLY formatted input
# predictions = [candidate]
# references = [reference]  # NOT nested!

bleu_score = bleu.compute(predictions=[candidate], references=[reference])
print(bleu_score)


{'bleu': 0.6888074582865503, 'precisions': [0.8636363636363636, 0.7619047619047619, 0.65, 0.5263157894736842], 'brevity_penalty': 1.0, 'length_ratio': 1.0476190476190477, 'translation_length': 22, 'reference_length': 21}


In [71]:
from transformers import AutoTokenizer, AutoModelForQuestionAnswering
import torch

qa_model_name = "deepset/minilm-uncased-squad2"
tokenizer = AutoTokenizer.from_pretrained(qa_model_name)
model = AutoModelForQuestionAnswering.from_pretrained(qa_model_name)

question = "What did he like about the brand?"
context = df["Review"].iloc[1]  # 2nd review

inputs = tokenizer(question, context, return_tensors="pt", truncation=True)
with torch.no_grad():
    outputs = model(**inputs)

start_logits = outputs.start_logits
end_logits = outputs.end_logits

start_idx = torch.argmax(start_logits)
end_idx = torch.argmax(end_logits) + 1  # Include end token

answer_tokens = inputs["input_ids"][0][start_idx:end_idx]
answer = tokenizer.decode(answer_tokens, skip_special_tokens=True)
answer

Some weights of the model checkpoint at deepset/minilm-uncased-squad2 were not used when initializing BertForQuestionAnswering: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


'ride quality, reliability'

In [72]:
from transformers import pipeline

summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
last_review = df["Review"].iloc[-1]
summary_output = summarizer(last_review, max_length=55, min_length=50, do_sample=False)
summarized_text = summary_output[0]['summary_text']
texts = [summarized_text]
tox_model = pipeline("text-classification", model="unitary/toxic-bert", top_k=None)
toxicity_scores = tox_model(texts)

# Get max toxicity score for the summary
tox_values = [score['score'] for score in toxicity_scores[0] if score['label'] == 'toxic']
max_toxicity = max(tox_values) if tox_values else 0.0


Device set to use cpu
Device set to use cpu


In [73]:
texts, tox_values, max_toxicity

(['The Nissan Rogue provides me with the desired SUV experience without burdening me with an exorbitant payment. Handling and styling are great; I have hauled 12 bags of mulch in the back with the seats down and could have held more. The engine delivers strong'],
 [0.000554619706235826],
 0.000554619706235826)

In [74]:
import evaluate

toxicity = evaluate.load("toxicity")
regard = evaluate.load("regard")

Device set to use cpu
Device set to use cpu


In [75]:
texts = [summarized_text]  # Ensure it's a list
toxicity_result = toxicity.compute(predictions=texts)
regard_result = regard.compute(data=texts)
toxicity_result,regard_result

({'toxicity': [0.00013863427739124745]},
 {'regard': [[{'label': 'positive', 'score': 0.6263338923454285},
    {'label': 'neutral', 'score': 0.20273476839065552},
    {'label': 'other', 'score': 0.12291575223207474},
    {'label': 'negative', 'score': 0.04801557958126068}]]})

In [76]:
regard_result["regard"]

[[{'label': 'positive', 'score': 0.6263338923454285},
  {'label': 'neutral', 'score': 0.20273476839065552},
  {'label': 'other', 'score': 0.12291575223207474},
  {'label': 'negative', 'score': 0.04801557958126068}]]

In [77]:
texts = [summarized_text]

toxicity_result = toxicity.compute(predictions=texts)
regard_result = regard.compute(data=texts)

max_toxicity = max(toxicity_result["toxicity"])
max_regard = max(score_dict["score"] for score_dict in regard_result["regard"][0])
top_regard_label = max(regard_result["regard"][0], key=lambda x: x["score"])["label"]

In [78]:
print("Max toxicity:", max_toxicity)
print("Max regard score:", max_regard)
print("Top regard label:", top_regard_label)

Max toxicity: 0.00013863427739124745
Max regard score: 0.6263338923454285
Top regard label: positive
