![image](car.jpeg)

**Car-ing is sharing**, an auto dealership company for car sales and rental, is taking their services to the next level thanks to **Large Language Models (LLMs)**.

As their newly recruited AI and NLP developer, you've been asked to prototype a chatbot app with multiple functionalities that not only assist customers but also provide support to human agents in the company.

The solution should receive textual prompts and use a variety of pre-trained Hugging Face LLMs to respond to a series of tasks, e.g. classifying the sentiment in a car’s text review, answering a customer question, summarizing or translating text, etc.


## Before you start

In order to complete the project you may wish to install some Hugging Face libraries such as `transformers` and `evaluate`.

In [1]:
!pip install transformers
!pip install evaluate
!pip install pandas

from transformers import logging
import torch
from transformers import AutoModelForSequenceClassification,AutoModelForSeq2SeqLM, AutoModelForQuestionAnswering, AutoTokenizer
import evaluate
import pandas as pd
logging.set_verbosity(logging.WARNING)

Defaulting to user installation because normal site-packages is not writeable

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3 -m pip install --upgrade pip[0m
Defaulting to user installation because normal site-packages is not writeable

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3 -m pip install --upgrade pip[0m
Defaulting to user installation because normal site-packages is not writeable

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3 -m pip instal

In [2]:
# Start your code here!

# First Task

### Importing data

In [3]:
SELECT * FROM 'data/car_reviews.csv'

Unnamed: 0,column0,column1
0,Review,Class
1,I am very satisfied with my 2014 Nissan NV SL....,POSITIVE
2,The car is fine. It's a bit loud and not very ...,NEGATIVE
3,"My first foreign car. Love it, I would buy ano...",POSITIVE
4,I've come across numerous reviews praising the...,NEGATIVE
5,I've been dreaming of owning an SUV for quite ...,POSITIVE


### Selecting model and extract output for sentiment Analysis

In [4]:
# Model name from HuggingFace
model_name = 'distilbert/distilbert-base-uncased-finetuned-sst-2-english'

# Instantiat Tokenizer for this model
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Instantiate Model from HuggingFace for Sentiment
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Input Texts
texts = df['column0'][1:]
texts = [sample for sample in texts]

# Tokenizing Inputs
inputs = [tokenizer(text, return_tensors='pt') for text in texts]

# Extracting Outputs
outputs = [model(**inp) for inp in inputs]

# logits of Output
logits = [output.logits for output in outputs]

# maximum probabilities of Logits
predicted_labels = [torch.argmax(logit, dim=1).item() for logit in logits]

In [5]:
predicted_labels

[1, 1, 1, 0, 1]

### checking outputs and convert classes to {0, 1}

### convert real data to {0, 1}

In [6]:
real_label = df['column1'][1:]
real_label = [1 if real=='POSITIVE' else 0 for real in real_label]
real_label

[1, 0, 1, 0, 1]

### Evaluate real_data and predictions

In [7]:
# Instantiate Scores Object
accuracy = evaluate.load('accuracy')
f1 = evaluate.load('f1')

# compute accuracy and f1 score
accuracy_result = accuracy.compute(references=real_label, predictions=predicted_labels)['accuracy']
print(accuracy_result)
f1_result = f1.compute(references=real_label, predictions=predicted_labels)['f1']
print(f1_result)

0.8
0.8571428571428571


# Second Task

### Translation

In [8]:
# importing references
references = pd.read_csv('data/reference_translations.txt', names=['Spanish'])
references

Unnamed: 0,Spanish
0,Estoy muy satisfecho con mi Nissan NV SL 2014....
1,Estoy muy satisfecho con mi Nissan NV SL 2014....


In [9]:
# slicing two sentence and translating it with LLM models
texts_1 = df['column0'][1:3]

# Model name from HuggingFace Translation
model_name_translation = 'Helsinki-NLP/opus-mt-tc-big-en-es'

# Instantiat Tokenizer for this model
tokenizer_1 = AutoTokenizer.from_pretrained(model_name_translation)

# Instantiate Model from HuggingFace for Translation

model_1 = AutoModelForSeq2SeqLM.from_pretrained(model_name_translation)

# Input Texts Encode
input_ids = [tokenizer_1.encode(text, return_tensors='pt') for text in texts_1]

#Feed into model|
translate_ids = [model_1.generate(input_id, max_length=25) for input_id in input_ids]

# Output Texts Output
outputs_1 = [tokenizer_1.decode(translate_id[0], skip_special_tokens=True) for translate_id in translate_ids]

### calculating bleu score

In [10]:
# converting dataframe to list for comparison
reference = references['Spanish'].to_list()

In [11]:
print(reference)
print(outputs_1)
print(len(reference[0].split()))
print(len(outputs_1[0].split()))

['Estoy muy satisfecho con mi Nissan NV SL 2014. Utilizo esta camioneta para mis entregas comerciales y uso personal.', 'Estoy muy satisfecho con mi Nissan NV SL 2014. Uso esta furgoneta para mis entregas comerciales y uso personal.']
['Estoy muy satisfecho con mi 2014 Nissan NV. Tengo esta camioneta para mis entregas de negocios y uso personal. La', 'El coche está bien. Es un poco ruidoso y no muy potente. Por un lado, en comparación con sus']
19
20


In [12]:
# predictions to new variable
translated_review = outputs_1

# Instantiate bleu score
bleu = evaluate.load('bleu')

# compute bleu
bleu_score = bleu.compute(predictions=translated_review, references=reference)
bleu_score

{'bleu': 0.2503836001392257,
 'precisions': [0.5227272727272727,
  0.2857142857142857,
  0.2,
  0.13157894736842105],
 'brevity_penalty': 1.0,
 'length_ratio': 1.0476190476190477,
 'translation_length': 44,
 'reference_length': 42}

# Third Task

In [13]:
# model name
model_name_qa = "deepset/minilm-uncased-squad2"

# tokenize question and context
tokenizer_qa = AutoTokenizer.from_pretrained(model_name_qa)

# question text
question = "What did he like about the brand?"

# Context
context = df['column0'][2]

# tokenize question and context
inputs = tokenizer_qa(question, context, return_tensors='pt')

# Instantiate model
model = AutoModelForQuestionAnswering.from_pretrained(model_name_qa)

# Predicting outputs (start logits, end logits)

outputs = model(**inputs)

# Extracting start index and end index
start_idx = torch.argmax(outputs.start_logits, dim=1)
end_idx = torch.argmax(outputs.end_logits, dim=1) +1

In [14]:
# Decoding answer from inputs tokenized sequence
answer = tokenizer.decode(inputs['input_ids'][0][start_idx:end_idx])
answer

'ride quality, reliability'

# Fourth Task

In [15]:
# slicing last sentence
texts_2 = df.iloc[-1,0]

# Model name from HuggingFace Summarization
model_name_summarization = 't5-small'

# Instantiat Tokenizer for this model
tokenizer_2 = AutoTokenizer.from_pretrained(model_name_summarization)

# Instantiate Model from HuggingFace for Summarization
model_2 = AutoModelForSeq2SeqLM.from_pretrained(model_name_summarization)

# Input Texts Encode with summarize at the start point
input_id = tokenizer_2.encode("summarize: " + texts_2, return_tensors='pt')

#Feed into model
summarize_ids = model_2.generate(input_id, max_length=55)

# Output Texts Output
summarized_text = tokenizer_2.decode(summarize_ids[0], skip_special_tokens=True)
summarized_text

'the Nissan Rogue provides me with the desired experience without burdening me with an exorbitant payment. the financial arrangement is quite reasonable.'