### **Exercise: Sentiment Analysis and Key Insights Extraction from Ford Car Reviews**

### **Problem Statement:**
You have been provided with a dataset containing Ford car reviews. Your task is to use LangChain and the concepts you’ve learned to perform the following tasks:

1. **Sentiment Analysis**: Analyze the sentiment of each review, categorize it as positive, neutral, or negative, and store the result.
2. **Key Insights Extraction**: Extract key pieces of information from each review, such as the pros and cons mentioned, and the specific features the reviewer liked or disliked (e.g., vehicle performance, comfort, price).

You will build a LangChain-based solution that leverages language models to automatically extract this information and provide a structured summary of the reviews.

---
### **Steps to Solve:**

#### **Step 1: Load the Dataset**
- The dataset file is named `ford_car_reviews.csv` and is sourced from Kaggle: [Edmunds Consumer Car Ratings and Reviews](https://www.kaggle.com/datasets/ankkur13/edmundsconsumer-car-ratings-and-reviews).
- For this exercise, **limit the data to the first 25 records**. This can be achieved by using `df.head(25)` or `df.iloc[:25]` when loading the data into a DataFrame.

#### **Step 2: Define the Sentiment Analysis Task**
- Use LangChain to create a pipeline to classify the sentiment of each review.
- Define prompts that can guide the model to evaluate the sentiment. For example:
  - "Given the following car review, classify the sentiment as positive, neutral, or negative."

#### **Step 3: Key Insights Extraction**
- Use LangChain to create a pipeline to extract pros, cons, and notable features from each review. Define prompts such as:
  - "What are the pros and cons of the vehicle described in the following review?"
  - "What specific features of the vehicle does the reviewer like or dislike?"

#### **Step 4: Update the DataFrame with New Information**
- Run the pipeline for each review and collect the sentiment and insights.
- Once the analysis and extraction are complete, update the original DataFrame with additional columns to include:
  - Sentiment (positive, neutral, negative)
  - Pros
  - Cons
  - Liked_Features
  - Disliked_Features

---

### **Example Output:**

```json
{
  "Review_Date": "03/07/13",
  "Vehicle_Title": "2006 Ford Mustang Coupe",
  "Review_Text": "With the expected arrival of our 6th child...",
  "Rating": 4.125,
  "Sentiment": "Positive",
  "Pros": "Good driving experience, Large seating capacity, Great options",
  "Cons": "None mentioned",
  "Liked_Features": ["Driving experience", "Seating capacity", "Options available"],
  "Disliked_Features": []
}
```

CODE

In [1]:
!pip install langchain_groq



In [2]:
!pip install python-dotenv



In [3]:
from langchain_core.messages import HumanMessage, SystemMessage

In [4]:
from langchain_groq import ChatGroq

In [5]:
from langchain_core.output_parsers import JsonOutputParser

In [6]:
import pandas as pd

In [9]:
#Loading the data

data = pd.read_csv('ford_car_reviews.csv',nrows=25)

In [10]:
#Checking loaded data

data.head()

Unnamed: 0.1,Unnamed: 0,Review_Date,Author_Name,Vehicle_Title,Review_Title,Review,Rating
0,0,on 06/06/18 14:19 PM (PDT),Vicki,2006 Ford Mustang Coupe GT Premium 2dr Coupe (...,2006 Mustang GT,Doesn’t disappoint,5.0
1,1,on 08/12/17 06:06 AM (PDT),Tom,2006 Ford Mustang Coupe V6 Standard 2dr Coupe ...,DREAM CAR,I bought mine 4/17 with 98K. Have been wantin...,3.0
2,2,on 06/15/17 05:43 AM (PDT),Ray,2006 Ford Mustang Coupe V6 Premium 2dr Coupe (...,Great Ride,There will always be a 05-09 mustang for sale...,5.0
3,3,on 05/18/17 17:33 PM (PDT),Don Watson,2006 Ford Mustang Coupe V6 Deluxe 2dr Coupe (4...,I have wanted a Mustang for 40 years.,I bought my car from an auction I work at ( A...,5.0
4,4,on 01/03/16 18:03 PM (PST),One owner,2006 Ford Mustang Coupe GT Premium 2dr Coupe (...,One owner,I bought this car spankin new and i still am ...,5.0


In [11]:
#Dropping unwanted columns

data.drop("Unnamed: 0", axis = 1, inplace = True)

In [12]:
#Checking cleaned data frame

data.head()

Unnamed: 0,Review_Date,Author_Name,Vehicle_Title,Review_Title,Review,Rating
0,on 06/06/18 14:19 PM (PDT),Vicki,2006 Ford Mustang Coupe GT Premium 2dr Coupe (...,2006 Mustang GT,Doesn’t disappoint,5.0
1,on 08/12/17 06:06 AM (PDT),Tom,2006 Ford Mustang Coupe V6 Standard 2dr Coupe ...,DREAM CAR,I bought mine 4/17 with 98K. Have been wantin...,3.0
2,on 06/15/17 05:43 AM (PDT),Ray,2006 Ford Mustang Coupe V6 Premium 2dr Coupe (...,Great Ride,There will always be a 05-09 mustang for sale...,5.0
3,on 05/18/17 17:33 PM (PDT),Don Watson,2006 Ford Mustang Coupe V6 Deluxe 2dr Coupe (4...,I have wanted a Mustang for 40 years.,I bought my car from an auction I work at ( A...,5.0
4,on 01/03/16 18:03 PM (PST),One owner,2006 Ford Mustang Coupe GT Premium 2dr Coupe (...,One owner,I bought this car spankin new and i still am ...,5.0


In [13]:
#Setting API KEY

import os, json, re, getpass
from dotenv import load_dotenv

load_dotenv(".env", override=True)
if "GROQ_API_KEY" not in os.environ:
    os.environ["GROQ_API_KEY"] = getpass.getpass("GROQ API Key:")

#key name : exercise1

GROQ API Key: ········


In [14]:
#Setting the model

model_id = "llama3-8b-8192" #llama3-8b-8192, llama-3.1-8b-instant, llama3-groq-8b-8192-tool-use-preview, llama3-groq-70b-8192-tool-use-preview
llm = ChatGroq(model_name=model_id, temperature=0, max_tokens=1024)

In [15]:
#Function for Retreiving Sentiment

def get_sentiment(query):
  messages = [
      SystemMessage(content="Given the following car review, classify the sentiment as positive, neutral, or negative. Just mention positive/neutral/negative. Nothing extra."),
      HumanMessage(content=query),
  ]

  ai_response = llm.invoke(messages)
  return(ai_response.content)

In [16]:
#Function to retrieve Pros, Cons, Liked_Features, Disliked_Features

def get_review_details(review):
  try:
    messages = [
        SystemMessage(content="""You are an expert in review analysis. You will receive a review from a user, and your job is to extract into comma separated text based on the following attributes:
    1. Pros: what are the pros of the car?
    2. Cons: what are the cons of the car?
    3. Liked_Features: what all things customer liked of the car?
    4. Disliked_Features: what all things customer liked of the car?

    Return a structured JSON object with these four attributes."""),
        HumanMessage(content=review)
    ]
    response = llm.invoke(messages)

    json_response = JsonOutputParser().invoke(response)


    return json_response

  except Exception as e:

    return {'pros': '', 'cons': '', 'likes': '', 'dislikes': ''}


In [17]:
#Temp checking single input
print("Test Sentiment: ",get_sentiment("Doesn't disappoint"))
print("Test Review Details: ",get_review_details("Doesn't disappoint"))

Test Sentiment:  Positive
Test Review Details:  {'Pros': "doesn't disappoint", 'Cons': '', 'Liked_Features': "doesn't disappoint", 'Disliked_Features': ''}


In [18]:
#Collecting Sentiment and storing as new column

data['Sentiment'] = data['Review'].apply(get_sentiment)

In [19]:
#Collecting Sentiment and storing as new column

data[['Pros', 'Cons', 'Liked_Features', 'Disliked_Features']] = pd.json_normalize(data['Review'].apply(get_review_details))


In [20]:
#Checking Final DataFrame
data.head()

Unnamed: 0,Review_Date,Author_Name,Vehicle_Title,Review_Title,Review,Rating,Sentiment,Pros,Cons,Liked_Features,Disliked_Features
0,on 06/06/18 14:19 PM (PDT),Vicki,2006 Ford Mustang Coupe GT Premium 2dr Coupe (...,2006 Mustang GT,Doesn’t disappoint,5.0,Neutral,Doesn't disappoint,,Doesn't disappoint,
1,on 08/12/17 06:06 AM (PDT),Tom,2006 Ford Mustang Coupe V6 Standard 2dr Coupe ...,DREAM CAR,I bought mine 4/17 with 98K. Have been wantin...,3.0,neutral,"Sounds good, Great mileage, Good power","Orneriest transmission I've ever used, Harsh r...","Smokin' hot looking car, Fun to drive","Transmission is difficult to master, Ride is h..."
2,on 06/15/17 05:43 AM (PDT),Ray,2006 Ford Mustang Coupe V6 Premium 2dr Coupe (...,Great Ride,There will always be a 05-09 mustang for sale...,5.0,Positive,"great investment, reasonable price",,,
3,on 05/18/17 17:33 PM (PDT),Don Watson,2006 Ford Mustang Coupe V6 Deluxe 2dr Coupe (4...,I have wanted a Mustang for 40 years.,I bought my car from an auction I work at ( A...,5.0,Positive,"It is a v6 with an air aid cold air injector, ...",,"v6, air aid cold air injector, throttle body s...",
4,on 01/03/16 18:03 PM (PST),One owner,2006 Ford Mustang Coupe GT Premium 2dr Coupe (...,One owner,I bought this car spankin new and i still am ...,5.0,Positive,"hugs the road, does whatever you ask at a mome...","alternator needed to be fixed, tires and brake...","handling, responsiveness, reliability","alternator issue, tire and brake maintenance"


In [21]:
#Converting DataFrame to JSON-like format
result = data.to_dict(orient="records")  # Convert rows to list of dictionaries
formatted_json = json.dumps(result[0], indent=2)  # Convert the first record to JSON with indentation

#Printing Final Output in the formatted JSON
print(formatted_json)

{
  "Review_Date": " on 06/06/18 14:19 PM (PDT)",
  "Author_Name": "Vicki ",
  "Vehicle_Title": "2006 Ford Mustang Coupe GT Premium 2dr Coupe (4.6L 8cyl 5M)",
  "Review_Title": "2006 Mustang GT",
  "Review": " Doesn\u2019t disappoint",
  "Rating": 5.0,
  "Sentiment": "Neutral",
  "Pros": "Doesn't disappoint",
  "Cons": "",
  "Liked_Features": "Doesn't disappoint",
  "Disliked_Features": ""
}
