### **Exercise: Sentiment Analysis and Key Insights Extraction from Ford Car Reviews**

### **Problem Statement:**
You have been provided with a dataset containing Ford car reviews. Your task is to use LangChain and the concepts you’ve learned to perform the following tasks:

1. **Sentiment Analysis**: Analyze the sentiment of each review, categorize it as positive, neutral, or negative, and store the result.
2. **Key Insights Extraction**: Extract key pieces of information from each review, such as the pros and cons mentioned, and the specific features the reviewer liked or disliked (e.g., vehicle performance, comfort, price).

You will build a LangChain-based solution that leverages language models to automatically extract this information and provide a structured summary of the reviews. 

---
### **Steps to Solve:**

#### **Step 1: Load the Dataset**
- The dataset file is named `ford_car_reviews.csv` and is sourced from Kaggle: [Edmunds Consumer Car Ratings and Reviews](https://www.kaggle.com/datasets/ankkur13/edmundsconsumer-car-ratings-and-reviews).
- For this exercise, **limit the data to the first 25 records**. This can be achieved by using `df.head(25)` or `df.iloc[:25]` when loading the data into a DataFrame.

#### **Step 2: Define the Sentiment Analysis Task**
- Use LangChain to create a pipeline to classify the sentiment of each review.
- Define prompts that can guide the model to evaluate the sentiment. For example:
  - "Given the following car review, classify the sentiment as positive, neutral, or negative."

#### **Step 3: Key Insights Extraction**
- Use LangChain to create a pipeline to extract pros, cons, and notable features from each review. Define prompts such as:
  - "What are the pros and cons of the vehicle described in the following review?"
  - "What specific features of the vehicle does the reviewer like or dislike?"

#### **Step 4: Update the DataFrame with New Information**
- Run the pipeline for each review and collect the sentiment and insights.
- Once the analysis and extraction are complete, update the original DataFrame with additional columns to include:
  - Sentiment (positive, neutral, negative)
  - Pros
  - Cons
  - Liked_Features
  - Disliked_Features

---

### **Example Output:**

```json
{
  "Review_Date": "03/07/13",
  "Vehicle_Title": "2006 Ford Mustang Coupe",
  "Review_Text": "With the expected arrival of our 6th child...",
  "Rating": 4.125,
  "Sentiment": "Positive",
  "Pros": "Good driving experience, Large seating capacity, Great options",
  "Cons": "None mentioned",
  "Liked_Features": ["Driving experience", "Seating capacity", "Options available"],
  "Disliked_Features": []
}
```

### Solution

In [1]:
# # # create virtual environment
# # %python -m venv python_venv
# %pip install --quiet langchain-community==0.3.0 langgraph==0.2.22 langchain-groq==0.2.0 python-dotenv pandas

In [2]:
import os, json, re, getpass
from dotenv import load_dotenv

load_dotenv("../.env", override=True)
if "GROQ_API_KEY" not in os.environ:
    os.environ["GROQ_API_KEY"] = getpass.getpass("GROQ API Key: ")

#### Load Data

In [3]:
import pandas as pd

# Load the first 25 records from the dataset
df = pd.read_csv('ford_car_reviews.csv', nrows=25)
df.drop("Unnamed: 0",axis=1,inplace=True)
df.head()

Unnamed: 0,Review_Date,Author_Name,Vehicle_Title,Review_Title,Review,Rating
0,on 06/06/18 14:19 PM (PDT),Vicki,2006 Ford Mustang Coupe GT Premium 2dr Coupe (...,2006 Mustang GT,Doesn’t disappoint,5.0
1,on 08/12/17 06:06 AM (PDT),Tom,2006 Ford Mustang Coupe V6 Standard 2dr Coupe ...,DREAM CAR,I bought mine 4/17 with 98K. Have been wantin...,3.0
2,on 06/15/17 05:43 AM (PDT),Ray,2006 Ford Mustang Coupe V6 Premium 2dr Coupe (...,Great Ride,There will always be a 05-09 mustang for sale...,5.0
3,on 05/18/17 17:33 PM (PDT),Don Watson,2006 Ford Mustang Coupe V6 Deluxe 2dr Coupe (4...,I have wanted a Mustang for 40 years.,I bought my car from an auction I work at ( A...,5.0
4,on 01/03/16 18:03 PM (PST),One owner,2006 Ford Mustang Coupe GT Premium 2dr Coupe (...,One owner,I bought this car spankin new and i still am ...,5.0


#### Setup LLM

In [4]:
import os, json, re, getpass
from dotenv import load_dotenv

load_dotenv(".env", override=True)
if "GROQ_API_KEY" not in os.environ:
    os.environ["GROQ_API_KEY"] = getpass.getpass("GROQ API Key: ")

In [5]:
# LLM
from langchain_groq import ChatGroq

model_id = "llama3-8b-8192" #llama3-8b-8192, llama-3.1-8b-instant, llama3-groq-8b-8192-tool-use-preview, llama3-groq-70b-8192-tool-use-preview
llm = ChatGroq(model_name=model_id, temperature=0, )

#### Define the Sentiment Analysis Task

In [6]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import JsonOutputParser

# Define the prompt template for sentiment analysis
system_message = """
Given the following car review, classify the sentiment as positive, neutral, or negative.
Return a structured JSON object with the attribute 'sentiment'.  output should be in markdown format, ```json\n <json response>\n.
"""

template = ChatPromptTemplate([
    ("system", system_message),
    ("human", "{user_input}"),
])
sentiment_analysis_chain = template | llm | JsonOutputParser()

# Create a function to analyze the sentiment of each review
def analyze_sentiment(review_text):
    response = sentiment_analysis_chain.invoke({'user_input':review_text})
    return response['sentiment']

# Apply sentiment analysis on the first 25 reviews
df_out = df.copy(deep=True)
df_out['Sentiment'] = df_out['Review'].apply(analyze_sentiment)

In [7]:
display(df_out.Sentiment.value_counts())
display(df_out.head())

Sentiment
positive    13
negative     7
mixed        3
neutral      2
Name: count, dtype: int64

Unnamed: 0,Review_Date,Author_Name,Vehicle_Title,Review_Title,Review,Rating,Sentiment
0,on 06/06/18 14:19 PM (PDT),Vicki,2006 Ford Mustang Coupe GT Premium 2dr Coupe (...,2006 Mustang GT,Doesn’t disappoint,5.0,neutral
1,on 08/12/17 06:06 AM (PDT),Tom,2006 Ford Mustang Coupe V6 Standard 2dr Coupe ...,DREAM CAR,I bought mine 4/17 with 98K. Have been wantin...,3.0,mixed
2,on 06/15/17 05:43 AM (PDT),Ray,2006 Ford Mustang Coupe V6 Premium 2dr Coupe (...,Great Ride,There will always be a 05-09 mustang for sale...,5.0,positive
3,on 05/18/17 17:33 PM (PDT),Don Watson,2006 Ford Mustang Coupe V6 Deluxe 2dr Coupe (4...,I have wanted a Mustang for 40 years.,I bought my car from an auction I work at ( A...,5.0,positive
4,on 01/03/16 18:03 PM (PST),One owner,2006 Ford Mustang Coupe GT Premium 2dr Coupe (...,One owner,I bought this car spankin new and i still am ...,5.0,positive


####  Key Insights Extraction

In [8]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import JsonOutputParser

# Define the prompt template for key insights extraction
system_message = """
Given the following car review, What are the pros and cons of the vehicle described? Also, list any specific features that the reviewer liked or disliked.
Return a structured JSON object with the attributes 'Pros', 'Cons', 'Liked_Features', 'Disliked_Features'.  output should be in markdown format, ```json\n <json response>\n.
value can be empty if no values to fill in. ALWAYS respond with a valid json. even if user input is empty.
"""
template = ChatPromptTemplate([
    ("system", system_message),
    ("human", "Review: {user_input} \n\n end of review"),
])
extract_key_insights_chain = template | llm | JsonOutputParser()

# Create a function to extract key insights from each review
def extract_key_insights(review_text):
    json_output = extract_key_insights_chain.invoke({'user_input':review_text})
    return pd.Series(json_output)

# Apply key insights extraction
key_insights = df_out['Review'].apply(extract_key_insights)

# Join the expanded columns back to the DataFrame
df_out_final = df_out.join(key_insights)

In [9]:
df_out_final.head()

Unnamed: 0,Review_Date,Author_Name,Vehicle_Title,Review_Title,Review,Rating,Sentiment,Pros,Cons,Liked_Features,Disliked_Features
0,on 06/06/18 14:19 PM (PDT),Vicki,2006 Ford Mustang Coupe GT Premium 2dr Coupe (...,2006 Mustang GT,Doesn’t disappoint,5.0,neutral,,,,
1,on 08/12/17 06:06 AM (PDT),Tom,2006 Ford Mustang Coupe V6 Standard 2dr Coupe ...,DREAM CAR,I bought mine 4/17 with 98K. Have been wantin...,3.0,mixed,"[Good power, Great mileage, Smokin' hot lookin...","[Orneriest transmission I've ever used, Diffic...",[Engine sounds good],"[Transmission requires a lot of finesse, Gear ..."
2,on 06/15/17 05:43 AM (PDT),Ray,2006 Ford Mustang Coupe V6 Premium 2dr Coupe (...,Great Ride,There will always be a 05-09 mustang for sale...,5.0,positive,[Fairly reasonable price],[],[],[]
3,on 05/18/17 17:33 PM (PDT),Don Watson,2006 Ford Mustang Coupe V6 Deluxe 2dr Coupe (4...,I have wanted a Mustang for 40 years.,I bought my car from an auction I work at ( A...,5.0,positive,The reviewer loves the car and mentions that i...,No cons mentioned in the review.,"[V6 engine, Air aid cold air injector, Throttl...",[]
4,on 01/03/16 18:03 PM (PST),One owner,2006 Ford Mustang Coupe GT Premium 2dr Coupe (...,One owner,I bought this car spankin new and i still am ...,5.0,positive,"[Hugs the road, Responsive and does whatever y...","[Alternator needed to be replaced, Tires and b...",[],[Alternator needed to be replaced]


#### Leveraging Structured Output Generation in LangChain

In [10]:
model_id = "llama3-8b-8192"
llm = ChatGroq(model_name=model_id, temperature=0, )

In [11]:
import typing
from pydantic import BaseModel, Field
class ReviewAnalysis(BaseModel):
    """Sentiment Analysis and Insight extraction of user review, 
    
    Given the following car review, What are the pros and cons of the vehicle described\
    Also, list any specific features that the reviewer liked or disliked."""

    Sentiment: typing.Literal["positive", "negative", "neutral"] = Field(description="The sentiment for the user review")
    Pros: typing.Optional[typing.List[str]] = Field(
        default=None, description="Any pros the user has discussed in the review"
    )
    Cons: typing.Optional[typing.List[str]] = Field(
        default=None, description="Any cons the user has discussed in the review"
    )
    Liked_Features: typing.Optional[typing.List[str]] = Field(
        default=None, description="Any features the user has liked as per the review"
    )
    Disliked_Features: typing.Optional[typing.List[str]] = Field(
        default=None, description="Any features the user has disliked as per the review"
    )

In [12]:
structured_llm = llm.with_structured_output(ReviewAnalysis)

In [13]:
#.model_dump() method converts a model to a dictionary
structured_llm.invoke(df['Review'][1]).model_dump()

{'Sentiment': 'neutral',
 'Pros': ['great mileage', 'good power'],
 'Cons': ['difficult transmission', 'harsh ride', 'road noise'],
 'Liked_Features': ['appearance', 'fun to drive'],
 'Disliked_Features': ['transmission', 'ride']}

In [14]:
# Create a function to extract key insights and sentiment from each review
import numpy as np
def analyze_review(review_text):
    try:
        json_output = structured_llm.invoke(review_text).model_dump()
        return pd.Series(json_output)
    except:
        return np.nan    

In [15]:
#Performing for initial 10 reviews only, as for others, there is an error arising due to context length likely 
response = df['Review'][:10].apply(analyze_review)

In [16]:
# Join the expanded columns back to the DataFrame
df_out_pyd = df[:10].join(response)

In [17]:
df_out_pyd.head()

Unnamed: 0,Review_Date,Author_Name,Vehicle_Title,Review_Title,Review,Rating,Sentiment,Pros,Cons,Liked_Features,Disliked_Features
0,on 06/06/18 14:19 PM (PDT),Vicki,2006 Ford Mustang Coupe GT Premium 2dr Coupe (...,2006 Mustang GT,Doesn’t disappoint,5.0,neutral,,,,
1,on 08/12/17 06:06 AM (PDT),Tom,2006 Ford Mustang Coupe V6 Standard 2dr Coupe ...,DREAM CAR,I bought mine 4/17 with 98K. Have been wantin...,3.0,neutral,"[great mileage, good power]","[difficult transmission, harsh ride, road noise]","[appearance, fun to drive]","[transmission, ride]"
2,on 06/15/17 05:43 AM (PDT),Ray,2006 Ford Mustang Coupe V6 Premium 2dr Coupe (...,Great Ride,There will always be a 05-09 mustang for sale...,5.0,positive,[great investment],,,
3,on 05/18/17 17:33 PM (PDT),Don Watson,2006 Ford Mustang Coupe V6 Deluxe 2dr Coupe (4...,I have wanted a Mustang for 40 years.,I bought my car from an auction I work at ( A...,5.0,positive,[I bought my car from an auction I work at (Ad...,,"[v6 with an air aid cold air injector, throttl...",
4,on 01/03/16 18:03 PM (PST),One owner,2006 Ford Mustang Coupe GT Premium 2dr Coupe (...,One owner,I bought this car spankin new and i still am ...,5.0,positive,[This car hugs the road and does whatever you ...,,,
