![image](car.jpeg)

**Car-ing is sharing**, an auto dealership company for car sales and rental, is taking their services to the next level thanks to **Large Language Models (LLMs)**.

As their newly recruited AI and NLP developer, you've been asked to prototype a chatbot app with multiple functionalities that not only assist customers but also provide support to human agents in the company.

The solution should receive textual prompts and use a variety of pre-trained Hugging Face LLMs to respond to a series of tasks, e.g. classifying the sentiment in a car’s text review, answering a customer question, summarizing or translating text, etc.


## Before you start

In order to complete the project you may wish to install some Hugging Face libraries such as `transformers` and `evaluate`.

In [32]:
!pip install transformers
!pip install evaluate

from transformers import logging
logging.set_verbosity(logging.WARNING)

Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable


In [33]:
import pandas as pd 
from sklearn.metrics import accuracy_score
import evaluate

# EDA

In [34]:
df=pd.read_csv("data/car_reviews.csv", sep=';')
df.head()

Unnamed: 0,Review,Class
0,I am very satisfied with my 2014 Nissan NV SL....,POSITIVE
1,The car is fine. It's a bit loud and not very ...,NEGATIVE
2,"My first foreign car. Love it, I would buy ano...",POSITIVE
3,I've come across numerous reviews praising the...,NEGATIVE
4,I've been dreaming of owning an SUV for quite ...,POSITIVE


In [35]:
df.describe()

Unnamed: 0,Review,Class
count,5,5
unique,5,2
top,I am very satisfied with my 2014 Nissan NV SL....,POSITIVE
freq,1,3


In [36]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Review  5 non-null      object
 1   Class   5 non-null      object
dtypes: object(2)
memory usage: 208.0+ bytes


In [37]:
df=df.astype({'Review': 'string', 'Class': 'category'})
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype   
---  ------  --------------  -----   
 0   Review  5 non-null      string  
 1   Class   5 non-null      category
dtypes: category(1), string(1)
memory usage: 297.0 bytes


# Classify Sentiment

## Design Model

In [38]:
print(df.Review.to_list()[1])

The car is fine. It's a bit loud and not very powerful. On one hand, compared to its peers, the interior is well-built. The transmission failed a few years ago, and the dealer replaced it under warranty with no issues. Now, about 60k miles later, the transmission is failing again. It sounds like a truck, and the issues are well-documented. The dealer tells me it is normal, refusing to do anything to resolve the issue. After owning the car for 4 years, there are many other vehicles I would purchase over this one. Initially, I really liked what the brand is about: ride quality, reliability, etc. But I will not purchase another one. Despite these concerns, I must say, the level of comfort in the car has always been satisfactory, but not worth the rest of issues found.


In [39]:
from transformers import pipeline
classifier=pipeline("text-classification")
preds=classifier(df.Review.to_list())
predicted_labels = [pred['label'] for pred in preds]
predictions = [0 if i=='NEGATIVE' else 1 for i in predicted_labels]
print(predicted_labels)
print(predictions)


No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


['POSITIVE', 'POSITIVE', 'POSITIVE', 'NEGATIVE', 'POSITIVE']
[1, 1, 1, 0, 1]


## Model Evaluation

In [40]:
true_labels=df.Class.to_list()
true=[0 if i=='NEGATIVE' else 1 for i in true_labels]
accuracy_result=accuracy_score(true_labels, predicted_labels)
f1=evaluate.load("f1")
f1_result=f1.compute(references=true, predictions=predictions)['f1']
print(accuracy_result)
print(f1_result)

0.8
0.8571428571428571


# Translate

# Formulate Question

# Summarize