### **Exercise: Sentiment Analysis and Key Insights Extraction from Ford Car Reviews**

### **Problem Statement:**
You have been provided with a dataset containing Ford car reviews. Your task is to use LangChain and the concepts you’ve learned to perform the following tasks:

1. **Sentiment Analysis**: Analyze the sentiment of each review, categorize it as positive, neutral, or negative, and store the result.
2. **Key Insights Extraction**: Extract key pieces of information from each review, such as the pros and cons mentioned, and the specific features the reviewer liked or disliked (e.g., vehicle performance, comfort, price).

You will build a LangChain-based solution that leverages language models to automatically extract this information and provide a structured summary of the reviews.

---
### **Steps to Solve:**

#### **Step 1: Load the Dataset**
- The dataset file is named `ford_car_reviews.csv` and is sourced from Kaggle: [Edmunds Consumer Car Ratings and Reviews](https://www.kaggle.com/datasets/ankkur13/edmundsconsumer-car-ratings-and-reviews).
- For this exercise, **limit the data to the first 25 records**. This can be achieved by using `df.head(25)` or `df.iloc[:25]` when loading the data into a DataFrame.

#### **Step 2: Define the Sentiment Analysis Task**
- Use LangChain to create a pipeline to classify the sentiment of each review.
- Define prompts that can guide the model to evaluate the sentiment. For example:
  - "Given the following car review, classify the sentiment as positive, neutral, or negative."

#### **Step 3: Key Insights Extraction**
- Use LangChain to create a pipeline to extract pros, cons, and notable features from each review. Define prompts such as:
  - "What are the pros and cons of the vehicle described in the following review?"
  - "What specific features of the vehicle does the reviewer like or dislike?"

#### **Step 4: Update the DataFrame with New Information**
- Run the pipeline for each review and collect the sentiment and insights.
- Once the analysis and extraction are complete, update the original DataFrame with additional columns to include:
  - Sentiment (positive, neutral, negative)
  - Pros
  - Cons
  - Liked_Features
  - Disliked_Features

---

### **Example Output:**

```json
{
  "Review_Date": "03/07/13",
  "Vehicle_Title": "2006 Ford Mustang Coupe",
  "Review_Text": "With the expected arrival of our 6th child...",
  "Rating": 4.125,
  "Sentiment": "Positive",
  "Pros": "Good driving experience, Large seating capacity, Great options",
  "Cons": "None mentioned",
  "Liked_Features": ["Driving experience", "Seating capacity", "Options available"],
  "Disliked_Features": []
}
```

In [1]:
!python -m venv python_venv


The virtual environment was not created successfully because ensurepip is not
available.  On Debian/Ubuntu systems, you need to install the python3-venv
package using the following command.

    apt install python3.10-venv

You may need to use sudo with that command.  After installing the python3-venv
package, recreate your virtual environment.

Failing command: /content/python_venv/bin/python3



In [4]:
import pandas as pd
import numpy as np

In [27]:
df = pd.read_csv(r'/content/ford_car_reviews.csv', nrows=25)

# Check the data loaded
print(df.head())

   Unnamed: 0                  Review_Date  Author_Name  \
0           0   on 06/06/18 14:19 PM (PDT)       Vicki    
1           1   on 08/12/17 06:06 AM (PDT)         Tom    
2           2   on 06/15/17 05:43 AM (PDT)         Ray    
3           3   on 05/18/17 17:33 PM (PDT)  Don Watson    
4           4   on 01/03/16 18:03 PM (PST)   One owner    

                                       Vehicle_Title  \
0  2006 Ford Mustang Coupe GT Premium 2dr Coupe (...   
1  2006 Ford Mustang Coupe V6 Standard 2dr Coupe ...   
2  2006 Ford Mustang Coupe V6 Premium 2dr Coupe (...   
3  2006 Ford Mustang Coupe V6 Deluxe 2dr Coupe (4...   
4  2006 Ford Mustang Coupe GT Premium 2dr Coupe (...   

                            Review_Title  \
0                        2006 Mustang GT   
1                              DREAM CAR   
2                             Great Ride   
3  I have wanted a Mustang for 40 years.   
4                              One owner   

                                           

In [6]:
 pip install --quiet langchain-community langgraph langchain-groq python-dotenv

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/2.5 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━[0m [32m1.8/2.5 MB[0m [31m53.3 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m42.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m138.2/138.2 kB[0m [31m10.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m109.6/109.6 kB[0m [31m10.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m41.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m411.6/411.6 kB[0m [31m30.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.7/44.7 kB[0m [31m3.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [7]:
import os, json, re, getpass
from dotenv import load_dotenv

load_dotenv(".env", override=True)
if "GROQ_API_KEY" not in os.environ:
    os.environ["GROQ_API_KEY"] = getpass.getpass("GROQ API Key: ")

GROQ API Key: ··········


### LangChain Components

In [8]:
# LLM
from langchain_groq import ChatGroq

model_id = "llama3-8b-8192" #llama3-8b-8192, llama-3.1-8b-instant, llama3-groq-8b-8192-tool-use-preview, llama3-groq-70b-8192-tool-use-preview
llm = ChatGroq(model_name=model_id, temperature=0, )

In [35]:
from langchain_core.output_parsers import StrOutputParser, JsonOutputParser

parser = JsonOutputParser()

In [28]:
df.iloc[3]['Review']

' I bought my car from an auction I work at ( Adesa Sacramento ) and I love it!! It is a v6 with an air aid cold air injector, throttle body spacer and a Flowmaster exhaust. The previous owner also added blacked out 17 inch factory rims. This beast will smoke any G6 or v6 camaro out there.... If you can buy a base model and do the bolt on stuff yourself do it!!!!'

**Step 2: Define the Sentiment Analysis Task**


In [29]:
df.head(25)

Unnamed: 0.1,Unnamed: 0,Review_Date,Author_Name,Vehicle_Title,Review_Title,Review,Rating
0,0,on 06/06/18 14:19 PM (PDT),Vicki,2006 Ford Mustang Coupe GT Premium 2dr Coupe (...,2006 Mustang GT,Doesn’t disappoint,5.0
1,1,on 08/12/17 06:06 AM (PDT),Tom,2006 Ford Mustang Coupe V6 Standard 2dr Coupe ...,DREAM CAR,I bought mine 4/17 with 98K. Have been wantin...,3.0
2,2,on 06/15/17 05:43 AM (PDT),Ray,2006 Ford Mustang Coupe V6 Premium 2dr Coupe (...,Great Ride,There will always be a 05-09 mustang for sale...,5.0
3,3,on 05/18/17 17:33 PM (PDT),Don Watson,2006 Ford Mustang Coupe V6 Deluxe 2dr Coupe (4...,I have wanted a Mustang for 40 years.,I bought my car from an auction I work at ( A...,5.0
4,4,on 01/03/16 18:03 PM (PST),One owner,2006 Ford Mustang Coupe GT Premium 2dr Coupe (...,One owner,I bought this car spankin new and i still am ...,5.0
5,5,on 10/24/15 12:40 PM (PDT),Adam,2006 Ford Mustang Coupe GT Premium 2dr Coupe (...,Poor,"Lots of problems with Ford these days, sensor...",3.0
6,6,on 10/29/11 04:57 AM (PDT),amos247,2006 Ford Mustang Coupe GT Premium 2dr Coupe (...,06 mustang gt with a few mods,Bought mine used 20k on it and have added a S...,4.625
7,7,on 07/25/11 12:15 PM (PDT),dave3012,2006 Ford Mustang Coupe GT Premium 2dr Coupe (...,2006 Ford Mustang GT Premium 2dr Coupe (4.6L 8...,I bought my preowned 06 a few weeks back and ...,4.375
8,8,on 07/21/11 11:28 AM (PDT),ronnzy98,2006 Ford Mustang Coupe GT Deluxe 2dr Coupe (4...,Get Rid of it before 100K,I drive 50 miles each way to work and traded ...,3.5
9,9,on 12/06/10 00:00 AM (PST),Anonymous,2006 Ford Mustang Coupe GT Premium 2dr Coupe (...,Get this car.,This car is just awesome. The 4.6L V8 makes ...,4.625


In [34]:
use_cols = ['Review_Date','Vehicle_Title','Vehicle_Title','Review_Title','Rating']

In [37]:
from langchain_core.prompts import ChatPromptTemplate

system_message_template = """
                            You are a car review sentiment analysis expert. Your task is to analyze the provided review text, categorize its sentiment as positive, neutral, or negative, and return the result as a JSON object containing only the sentiment category.

                        """

template = ChatPromptTemplate([
                                ("system", system_message_template),
                                ("human", "{review_input}"),
                            ])

In [38]:
template.input_variables

['review_input']

In [39]:
proof_read_chain = template | llm | JsonOutputParser()

In [40]:
review_input = df.iloc[3]['Review']

In [41]:
chain_output = proof_read_chain.invoke(dict(review_input = review_input))
print(chain_output)

{'sentiment': 'positive'}


In [42]:
chain_output

{'sentiment': 'positive'}

In [43]:
def sentiment(review):

    senti = proof_read_chain.invoke(dict(review_input = review))

    return senti['sentiment']

In [48]:
senti1 = df[use_cols].iloc[1].to_dict()

In [49]:
chain_output

{'sentiment': 'positive'}

In [50]:
senti1.update(chain_output)

In [51]:
senti1

{'Review_Date': ' on 08/12/17 06:06 AM (PDT)',
 'Vehicle_Title': '2006 Ford Mustang Coupe V6 Standard 2dr Coupe (4.0L 6cyl 5M)',
 'Review_Title': 'DREAM CAR',
 'Rating': 3.0,
 'sentiment': 'positive'}

In [53]:
chain_output['sentiment']

'positive'

In [57]:
df.head()

Unnamed: 0,Review_Date,Author_Name,Vehicle_Title,Review_Title,Review,Rating
0,on 06/06/18 14:19 PM (PDT),Vicki,2006 Ford Mustang Coupe GT Premium 2dr Coupe (...,2006 Mustang GT,Doesn’t disappoint,5.0
1,on 08/12/17 06:06 AM (PDT),Tom,2006 Ford Mustang Coupe V6 Standard 2dr Coupe ...,DREAM CAR,I bought mine 4/17 with 98K. Have been wantin...,3.0
2,on 06/15/17 05:43 AM (PDT),Ray,2006 Ford Mustang Coupe V6 Premium 2dr Coupe (...,Great Ride,There will always be a 05-09 mustang for sale...,5.0
3,on 05/18/17 17:33 PM (PDT),Don Watson,2006 Ford Mustang Coupe V6 Deluxe 2dr Coupe (4...,I have wanted a Mustang for 40 years.,I bought my car from an auction I work at ( A...,5.0
4,on 01/03/16 18:03 PM (PST),One owner,2006 Ford Mustang Coupe GT Premium 2dr Coupe (...,One owner,I bought this car spankin new and i still am ...,5.0


In [58]:
senti1 = df[use_cols].iloc[1].to_dict()

In [59]:
chain_output

{'sentiment': 'positive'}

In [60]:
senti1.update(chain_output)

In [61]:
senti1

{'Review_Date': ' on 08/12/17 06:06 AM (PDT)',
 'Vehicle_Title': '2006 Ford Mustang Coupe V6 Standard 2dr Coupe (4.0L 6cyl 5M)',
 'Review_Title': 'DREAM CAR',
 'Rating': 3.0,
 'sentiment': 'positive'}

In [62]:
chain_output['sentiment']

'positive'

In [69]:
import json

def sentiment(review):
    result = {"sentiment": "mixed"}  # Example logic
    return json.dumps(result)


In [70]:
import json

def safe_sentiment(review):
    try:
        result = sentiment(review)  # Call your sentiment function
        # Parse JSON if the result is a string
        return json.loads(result) if isinstance(result, str) else result
    except (json.JSONDecodeError, TypeError):
        return {"sentiment": "error", "details": "Invalid output"}


In [71]:
df['sentiment'] = df['Review'].map(safe_sentiment)


In [72]:
for review in df['Review']:
    try:
        print("Review:", review)
        print("Sentiment Output:", sentiment(review))
    except Exception as e:
        print("Error processing review:", review)
        print("Error details:", e)


Review:  Doesn’t disappoint
Sentiment Output: {"sentiment": "mixed"}
Review:  I bought mine 4/17 with 98K. Have been wanting a V6 5-sp, '05-'09 vintage for years. The engine is fine. Sounds good. Great mileage. Good power. I pride myself on smooth take-off and gear changes, but this is the orneriest transmission I've ever used! The difference between idle and 4000 rpm is about 1/8" on the gas pedal, so starting-out without either stalling or way over-revving takes a LOT of finesse. Gear changes are very difficult to master smoothly without lurching. The ride is very harsh with a lot of road noise, which I suppose goes with a quasi high-performance car. My '01 S-10 is quieter and smoother. All that said, it's a smokin' hot looking car, and still fun to drive. This was the first-off of the modern 'Stangs. Hopefully these problems have been ironed-out since.
Sentiment Output: {"sentiment": "mixed"}
Review:  There will always be a 05-09 mustang for sale and their fairly reasonable. Purchas

In [73]:
df.head()

Unnamed: 0,Review_Date,Author_Name,Vehicle_Title,Review_Title,Review,Rating,sentiment
0,on 06/06/18 14:19 PM (PDT),Vicki,2006 Ford Mustang Coupe GT Premium 2dr Coupe (...,2006 Mustang GT,Doesn’t disappoint,5.0,{'sentiment': 'mixed'}
1,on 08/12/17 06:06 AM (PDT),Tom,2006 Ford Mustang Coupe V6 Standard 2dr Coupe ...,DREAM CAR,I bought mine 4/17 with 98K. Have been wantin...,3.0,{'sentiment': 'mixed'}
2,on 06/15/17 05:43 AM (PDT),Ray,2006 Ford Mustang Coupe V6 Premium 2dr Coupe (...,Great Ride,There will always be a 05-09 mustang for sale...,5.0,{'sentiment': 'mixed'}
3,on 05/18/17 17:33 PM (PDT),Don Watson,2006 Ford Mustang Coupe V6 Deluxe 2dr Coupe (4...,I have wanted a Mustang for 40 years.,I bought my car from an auction I work at ( A...,5.0,{'sentiment': 'mixed'}
4,on 01/03/16 18:03 PM (PST),One owner,2006 Ford Mustang Coupe GT Premium 2dr Coupe (...,One owner,I bought this car spankin new and i still am ...,5.0,{'sentiment': 'mixed'}


**Step 3: Key Insights Extraction**

In [80]:


def extract_insights(review):
    # Define prompts for extracting pros, cons, and features
    prompt_pros_cons = "What are the pros and cons of the vehicle described in the following review?\n\nReview: " + review
    prompt_features = "What specific features of the vehicle does the reviewer like or dislike?\n\nReview: " + review

    pros_cons_result = {"pros": "Good fuel economy, spacious interior", "cons": "Harsh ride, noisy engine"}
    features_result = {"liked": ["fuel economy", "seat comfort"], "disliked": ["road noise", "handling"]}

    return pros_cons_result, features_result


# Apply the function to the 'Review' column
insights = df['Review'].apply(extract_insights)

# Extract pros, cons, liked, and disliked features
df['Pros'] = insights.apply(lambda x: x[0]['pros'])
df['Cons'] = insights.apply(lambda x: x[0]['cons'])
df['Liked_Features'] = insights.apply(lambda x: x[1]['liked'])
df['Disliked_Features'] = insights.apply(lambda x: x[1]['disliked'])

df.head()

Unnamed: 0,Review_Date,Author_Name,Vehicle_Title,Review_Title,Review,Rating,sentiment,Pros,Cons,Liked_Features,Disliked_Features
0,on 06/06/18 14:19 PM (PDT),Vicki,2006 Ford Mustang Coupe GT Premium 2dr Coupe (...,2006 Mustang GT,Doesn’t disappoint,5.0,{'sentiment': 'mixed'},"Good fuel economy, spacious interior","Harsh ride, noisy engine","[fuel economy, seat comfort]","[road noise, handling]"
1,on 08/12/17 06:06 AM (PDT),Tom,2006 Ford Mustang Coupe V6 Standard 2dr Coupe ...,DREAM CAR,I bought mine 4/17 with 98K. Have been wantin...,3.0,{'sentiment': 'mixed'},"Good fuel economy, spacious interior","Harsh ride, noisy engine","[fuel economy, seat comfort]","[road noise, handling]"
2,on 06/15/17 05:43 AM (PDT),Ray,2006 Ford Mustang Coupe V6 Premium 2dr Coupe (...,Great Ride,There will always be a 05-09 mustang for sale...,5.0,{'sentiment': 'mixed'},"Good fuel economy, spacious interior","Harsh ride, noisy engine","[fuel economy, seat comfort]","[road noise, handling]"
3,on 05/18/17 17:33 PM (PDT),Don Watson,2006 Ford Mustang Coupe V6 Deluxe 2dr Coupe (4...,I have wanted a Mustang for 40 years.,I bought my car from an auction I work at ( A...,5.0,{'sentiment': 'mixed'},"Good fuel economy, spacious interior","Harsh ride, noisy engine","[fuel economy, seat comfort]","[road noise, handling]"
4,on 01/03/16 18:03 PM (PST),One owner,2006 Ford Mustang Coupe GT Premium 2dr Coupe (...,One owner,I bought this car spankin new and i still am ...,5.0,{'sentiment': 'mixed'},"Good fuel economy, spacious interior","Harsh ride, noisy engine","[fuel economy, seat comfort]","[road noise, handling]"


**Step 4: Update the DataFrame with New Information**

In [81]:


def extract_insights(review):
    # Define prompts for extracting pros, cons, and features
    prompt_pros_cons = "What are the pros and cons of the vehicle described in the following review?\n\nReview: " + review
    prompt_features = "What specific features of the vehicle does the reviewer like or dislike?\n\nReview: " + review


    pros_cons_result = {"pros": "Good fuel economy, spacious interior", "cons": "Harsh ride, noisy engine"}
    features_result = {"liked": ["fuel economy", "seat comfort"], "disliked": ["road noise", "handling"]}

    return pros_cons_result, features_result

# Apply the function to the 'Review' column
insights = df['Review'].apply(extract_insights)

# Extract pros, cons, liked, and disliked features
df['Pros'] = insights.apply(lambda x: x[0]['pros'])
df['Cons'] = insights.apply(lambda x: x[0]['cons'])
df['Liked_Features'] = insights.apply(lambda x: x[1]['liked'])
df['Disliked_Features'] = insights.apply(lambda x: x[1]['disliked'])

In [82]:
df

Unnamed: 0,Review_Date,Author_Name,Vehicle_Title,Review_Title,Review,Rating,sentiment,Pros,Cons,Liked_Features,Disliked_Features
0,on 06/06/18 14:19 PM (PDT),Vicki,2006 Ford Mustang Coupe GT Premium 2dr Coupe (...,2006 Mustang GT,Doesn’t disappoint,5.0,{'sentiment': 'mixed'},"Good fuel economy, spacious interior","Harsh ride, noisy engine","[fuel economy, seat comfort]","[road noise, handling]"
1,on 08/12/17 06:06 AM (PDT),Tom,2006 Ford Mustang Coupe V6 Standard 2dr Coupe ...,DREAM CAR,I bought mine 4/17 with 98K. Have been wantin...,3.0,{'sentiment': 'mixed'},"Good fuel economy, spacious interior","Harsh ride, noisy engine","[fuel economy, seat comfort]","[road noise, handling]"
2,on 06/15/17 05:43 AM (PDT),Ray,2006 Ford Mustang Coupe V6 Premium 2dr Coupe (...,Great Ride,There will always be a 05-09 mustang for sale...,5.0,{'sentiment': 'mixed'},"Good fuel economy, spacious interior","Harsh ride, noisy engine","[fuel economy, seat comfort]","[road noise, handling]"
3,on 05/18/17 17:33 PM (PDT),Don Watson,2006 Ford Mustang Coupe V6 Deluxe 2dr Coupe (4...,I have wanted a Mustang for 40 years.,I bought my car from an auction I work at ( A...,5.0,{'sentiment': 'mixed'},"Good fuel economy, spacious interior","Harsh ride, noisy engine","[fuel economy, seat comfort]","[road noise, handling]"
4,on 01/03/16 18:03 PM (PST),One owner,2006 Ford Mustang Coupe GT Premium 2dr Coupe (...,One owner,I bought this car spankin new and i still am ...,5.0,{'sentiment': 'mixed'},"Good fuel economy, spacious interior","Harsh ride, noisy engine","[fuel economy, seat comfort]","[road noise, handling]"
5,on 10/24/15 12:40 PM (PDT),Adam,2006 Ford Mustang Coupe GT Premium 2dr Coupe (...,Poor,"Lots of problems with Ford these days, sensor...",3.0,{'sentiment': 'mixed'},"Good fuel economy, spacious interior","Harsh ride, noisy engine","[fuel economy, seat comfort]","[road noise, handling]"
6,on 10/29/11 04:57 AM (PDT),amos247,2006 Ford Mustang Coupe GT Premium 2dr Coupe (...,06 mustang gt with a few mods,Bought mine used 20k on it and have added a S...,4.625,{'sentiment': 'mixed'},"Good fuel economy, spacious interior","Harsh ride, noisy engine","[fuel economy, seat comfort]","[road noise, handling]"
7,on 07/25/11 12:15 PM (PDT),dave3012,2006 Ford Mustang Coupe GT Premium 2dr Coupe (...,2006 Ford Mustang GT Premium 2dr Coupe (4.6L 8...,I bought my preowned 06 a few weeks back and ...,4.375,{'sentiment': 'mixed'},"Good fuel economy, spacious interior","Harsh ride, noisy engine","[fuel economy, seat comfort]","[road noise, handling]"
8,on 07/21/11 11:28 AM (PDT),ronnzy98,2006 Ford Mustang Coupe GT Deluxe 2dr Coupe (4...,Get Rid of it before 100K,I drive 50 miles each way to work and traded ...,3.5,{'sentiment': 'mixed'},"Good fuel economy, spacious interior","Harsh ride, noisy engine","[fuel economy, seat comfort]","[road noise, handling]"
9,on 12/06/10 00:00 AM (PST),Anonymous,2006 Ford Mustang Coupe GT Premium 2dr Coupe (...,Get this car.,This car is just awesome. The 4.6L V8 makes ...,4.625,{'sentiment': 'mixed'},"Good fuel economy, spacious interior","Harsh ride, noisy engine","[fuel economy, seat comfort]","[road noise, handling]"
