<a href="https://colab.research.google.com/github/CherrrYY123/Car-Reviews-Sentiment-Analysis/blob/main/exercise_sentiment_analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### **Exercise: Sentiment Analysis and Key Insights Extraction from Ford Car Reviews**

### **Problem Statement:**
You have been provided with a dataset containing Ford car reviews. Your task is to use LangChain and the concepts you’ve learned to perform the following tasks:

1. **Sentiment Analysis**: Analyze the sentiment of each review, categorize it as positive, neutral, or negative, and store the result.
2. **Key Insights Extraction**: Extract key pieces of information from each review, such as the pros and cons mentioned, and the specific features the reviewer liked or disliked (e.g., vehicle performance, comfort, price).

You will build a LangChain-based solution that leverages language models to automatically extract this information and provide a structured summary of the reviews.

---
### **Steps to Solve:**

#### **Step 1: Load the Dataset**
- The dataset file is named `ford_car_reviews.csv` and is sourced from Kaggle: [Edmunds Consumer Car Ratings and Reviews](https://www.kaggle.com/datasets/ankkur13/edmundsconsumer-car-ratings-and-reviews).
- For this exercise, **limit the data to the first 25 records**. This can be achieved by using `df.head(25)` or `df.iloc[:25]` when loading the data into a DataFrame.

#### **Step 2: Define the Sentiment Analysis Task**
- Use LangChain to create a pipeline to classify the sentiment of each review.
- Define prompts that can guide the model to evaluate the sentiment. For example:
  - "Given the following car review, classify the sentiment as positive, neutral, or negative."

#### **Step 3: Key Insights Extraction**
- Use LangChain to create a pipeline to extract pros, cons, and notable features from each review. Define prompts such as:
  - "What are the pros and cons of the vehicle described in the following review?"
  - "What specific features of the vehicle does the reviewer like or dislike?"

#### **Step 4: Update the DataFrame with New Information**
- Run the pipeline for each review and collect the sentiment and insights.
- Once the analysis and extraction are complete, update the original DataFrame with additional columns to include:
  - Sentiment (positive, neutral, negative)
  - Pros
  - Cons
  - Liked_Features
  - Disliked_Features

---

### **Example Output:**

```json
{
  "Review_Date": "03/07/13",
  "Vehicle_Title": "2006 Ford Mustang Coupe",
  "Review_Text": "With the expected arrival of our 6th child...",
  "Rating": 4.125,
  "Sentiment": "Positive",
  "Pros": "Good driving experience, Large seating capacity, Great options",
  "Cons": "None mentioned",
  "Liked_Features": ["Driving experience", "Seating capacity", "Options available"],
  "Disliked_Features": []
}
```

#**Solution:**

###By: Neha Roy Choudhury, 25PGAI0096

In [87]:
file_path = '/content/ford_car_reviews.csv'
df = pd.read_csv(file_path, engine='python',nrows=25)
df.head(25)

Unnamed: 0.1,Unnamed: 0,Review_Date,Author_Name,Vehicle_Title,Review_Title,Review,Rating
0,0,on 06/06/18 14:19 PM (PDT),Vicki,2006 Ford Mustang Coupe GT Premium 2dr Coupe (...,2006 Mustang GT,Doesn’t disappoint,5.0
1,1,on 08/12/17 06:06 AM (PDT),Tom,2006 Ford Mustang Coupe V6 Standard 2dr Coupe ...,DREAM CAR,I bought mine 4/17 with 98K. Have been wantin...,3.0
2,2,on 06/15/17 05:43 AM (PDT),Ray,2006 Ford Mustang Coupe V6 Premium 2dr Coupe (...,Great Ride,There will always be a 05-09 mustang for sale...,5.0
3,3,on 05/18/17 17:33 PM (PDT),Don Watson,2006 Ford Mustang Coupe V6 Deluxe 2dr Coupe (4...,I have wanted a Mustang for 40 years.,I bought my car from an auction I work at ( A...,5.0
4,4,on 01/03/16 18:03 PM (PST),One owner,2006 Ford Mustang Coupe GT Premium 2dr Coupe (...,One owner,I bought this car spankin new and i still am ...,5.0
5,5,on 10/24/15 12:40 PM (PDT),Adam,2006 Ford Mustang Coupe GT Premium 2dr Coupe (...,Poor,"Lots of problems with Ford these days, sensor...",3.0
6,6,on 10/29/11 04:57 AM (PDT),amos247,2006 Ford Mustang Coupe GT Premium 2dr Coupe (...,06 mustang gt with a few mods,Bought mine used 20k on it and have added a S...,4.625
7,7,on 07/25/11 12:15 PM (PDT),dave3012,2006 Ford Mustang Coupe GT Premium 2dr Coupe (...,2006 Ford Mustang GT Premium 2dr Coupe (4.6L 8...,I bought my preowned 06 a few weeks back and ...,4.375
8,8,on 07/21/11 11:28 AM (PDT),ronnzy98,2006 Ford Mustang Coupe GT Deluxe 2dr Coupe (4...,Get Rid of it before 100K,I drive 50 miles each way to work and traded ...,3.5
9,9,on 12/06/10 00:00 AM (PST),Anonymous,2006 Ford Mustang Coupe GT Premium 2dr Coupe (...,Get this car.,This car is just awesome. The 4.6L V8 makes ...,4.625


In [88]:
%%capture --no-stderr
%pip install --quiet -U langchain_groq langchain_core langchain_community tavily-python

In [89]:
import os, getpass
from dotenv import load_dotenv
load_dotenv(".env", override=True)

False

In [90]:
def _set_env(var: str):
    if not os.environ.get(var):
        os.environ[var] = getpass.getpass(f"{var}: ")

_set_env("GROQ_API_KEY")

In [91]:
from langchain_groq import ChatGroq
model_id = "llama3-8b-8192" #llama3-8b-8192, llama-3.1-8b-instant, llama3-groq-8b-8192-tool-use-preview, llama3-groq-70b-8192-tool-use-preview
llm = ChatGroq(model_name=model_id, temperature=0, )

In [92]:
import pandas as pd
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_groq import ChatGroq
from langchain.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser, JsonOutputParser

# Sentiment Analysis Chain
sentiment_chain = (ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant that analyzes text."),
    ("human", "Given the following car review, classify the sentiment as positive, neutral, or negative in a single word: {review}")
]) | llm | StrOutputParser())

# Insight Extraction Chain
insights_chain = (ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant that analyzes text and provides insights in JSON format."),
    ("human", "What are the pros and cons of the vehicle described in the following review? Provide your answer as a JSON object with keys 'pros', 'cons', 'liked_features', and 'disliked_features': {review}")
]) | llm | JsonOutputParser())

# Functions to get sentiment and insights
def get_sentiment(review):
    try:
        return sentiment_chain.invoke({"review": review}).strip()  # Ensure clean output
    except Exception as e:
        return "Error"  # Return error message if any issue occurs

def get_insights(review):
    try:
        insights = insights_chain.invoke({"review": review})
        # Ensure empty lists if no data present
        return {
            "pros": insights.get("pros", "None mentioned"),
            "cons": insights.get("cons", "None mentioned"),
            "liked_features": insights.get("liked_features", []),
            "disliked_features": insights.get("disliked_features", [])
        }
    except Exception as e:
        return {"pros": "None mentioned", "cons": "None mentioned", "liked_features": [], "disliked_features": []}


# Apply functions to DataFrame and merge results
df["Sentiment"] = df["Review"].apply(get_sentiment)
insights_df = df["Review"].apply(get_insights).apply(pd.Series)
df = pd.concat([df, insights_df], axis=1).rename(columns={
    "pros": "Pros", "cons": "Cons", "liked_features": "Liked_Features", "disliked_features": "Disliked_Features"
})


In [93]:
df.head()

Unnamed: 0.1,Unnamed: 0,Review_Date,Author_Name,Vehicle_Title,Review_Title,Review,Rating,Sentiment,Pros,Cons,Liked_Features,Disliked_Features
0,0,on 06/06/18 14:19 PM (PDT),Vicki,2006 Ford Mustang Coupe GT Premium 2dr Coupe (...,2006 Mustang GT,Doesn’t disappoint,5.0,Neutral,None mentioned,None mentioned,[],[]
1,1,on 08/12/17 06:06 AM (PDT),Tom,2006 Ford Mustang Coupe V6 Standard 2dr Coupe ...,DREAM CAR,I bought mine 4/17 with 98K. Have been wantin...,3.0,Neutral,"[Sounds good, Great mileage, Good power, Smoki...","[Orneriest transmission I've ever used, Diffic...","[Engine, Appearance]","[Transmission, Ride quality, Noise level]"
2,2,on 06/15/17 05:43 AM (PDT),Ray,2006 Ford Mustang Coupe V6 Premium 2dr Coupe (...,Great Ride,There will always be a 05-09 mustang for sale...,5.0,Positive.,"[fairly reasonable, great investment]",[],"[price, investment potential]",[]
3,3,on 05/18/17 17:33 PM (PDT),Don Watson,2006 Ford Mustang Coupe V6 Deluxe 2dr Coupe (4...,I have wanted a Mustang for 40 years.,I bought my car from an auction I work at ( A...,5.0,Positive,[The car is a beast and can outperform other s...,[],"[V6 engine, Air aid cold air injector, Throttl...",[]
4,4,on 01/03/16 18:03 PM (PST),One owner,2006 Ford Mustang Coupe GT Premium 2dr Coupe (...,One owner,I bought this car spankin new and i still am ...,5.0,Positive.,"[hugs the road, does whatever you ask at a mom...","[alternator needed to be fixed, repairs like t...","[handling, performance, reliability]","[alternator, maintenance costs]"


In [94]:
val = df.iloc[22]

# Convert the row to a dictionary
val_dict = val.to_dict()

# Print the dictionary
val_dict

{'Unnamed: 0': 22,
 'Review_Date': ' on 11/12/09 08:34 AM (PST)',
 'Author_Name': 'stevie ',
 'Vehicle_Title': '2006 Ford Mustang Coupe GT Premium 2dr Coupe (4.6L 8cyl 5M)',
 'Review_Title': 'great car bad battery and window system',
 'Review': " I love to drive the car and to look at the car.  I have had problems with the battery draining.  I've jumped it 5 times.  I've also had problems with the driver side window.  I had to take it to the dealership 2 times for the same issue.  The 3rd time it was out of warranty and the dealership didn't want to take ownership of the problem.",
 'Rating': 4.5,
 'Sentiment': 'Negative.',
 'Pros': ['I love to drive the car', 'I love to look at the car'],
 'Cons': ['battery draining', 'driver side window problems'],
 'Liked_Features': ['design', 'driving experience'],
 'Disliked_Features': ['battery life', 'driver side window reliability']}