### **Exercise: Sentiment Analysis and Key Insights Extraction from Ford Car Reviews**

### **Problem Statement:**
You have been provided with a dataset containing Ford car reviews. Your task is to use LangChain and the concepts you’ve learned to perform the following tasks:

1. **Sentiment Analysis**: Analyze the sentiment of each review, categorize it as positive, neutral, or negative, and store the result.
2. **Key Insights Extraction**: Extract key pieces of information from each review, such as the pros and cons mentioned, and the specific features the reviewer liked or disliked (e.g., vehicle performance, comfort, price).

You will build a LangChain-based solution that leverages language models to automatically extract this information and provide a structured summary of the reviews. 

---
### **Steps to Solve:**

#### **Step 1: Load the Dataset**
- The dataset file is named `ford_car_reviews.csv` and is sourced from Kaggle: [Edmunds Consumer Car Ratings and Reviews](https://www.kaggle.com/datasets/ankkur13/edmundsconsumer-car-ratings-and-reviews).
- For this exercise, **limit the data to the first 25 records**. This can be achieved by using `df.head(25)` or `df.iloc[:25]` when loading the data into a DataFrame.

#### **Step 2: Define the Sentiment Analysis Task**
- Use LangChain to create a pipeline to classify the sentiment of each review.
- Define prompts that can guide the model to evaluate the sentiment. For example:
  - "Given the following car review, classify the sentiment as positive, neutral, or negative."

#### **Step 3: Key Insights Extraction**
- Use LangChain to create a pipeline to extract pros, cons, and notable features from each review. Define prompts such as:
  - "What are the pros and cons of the vehicle described in the following review?"
  - "What specific features of the vehicle does the reviewer like or dislike?"

#### **Step 4: Update the DataFrame with New Information**
- Run the pipeline for each review and collect the sentiment and insights.
- Once the analysis and extraction are complete, update the original DataFrame with additional columns to include:
  - Sentiment (positive, neutral, negative)
  - Pros
  - Cons
  - Liked_Features
  - Disliked_Features

---

### **Example Output:**

```json
{
  "Review_Date": "03/07/13",
  "Vehicle_Title": "2006 Ford Mustang Coupe",
  "Review_Text": "With the expected arrival of our 6th child...",
  "Rating": 4.125,
  "Sentiment": "Positive",
  "Pros": "Good driving experience, Large seating capacity, Great options",
  "Cons": "None mentioned",
  "Liked_Features": ["Driving experience", "Seating capacity", "Options available"],
  "Disliked_Features": []
}
```

In [None]:
#pip install ChatGroq
#!pip install langchain_groq

Collecting langchain_groq
  Using cached langchain_groq-0.2.3-py3-none-any.whl.metadata (3.0 kB)
Collecting groq<1,>=0.4.1 (from langchain_groq)
  Using cached groq-0.15.0-py3-none-any.whl.metadata (14 kB)
Collecting langchain-core<0.4.0,>=0.3.29 (from langchain_groq)
  Using cached langchain_core-0.3.29-py3-none-any.whl.metadata (6.3 kB)
Collecting distro<2,>=1.7.0 (from groq<1,>=0.4.1->langchain_groq)
  Using cached distro-1.9.0-py3-none-any.whl.metadata (6.8 kB)
Collecting jsonpatch<2.0,>=1.33 (from langchain-core<0.4.0,>=0.3.29->langchain_groq)
  Using cached jsonpatch-1.33-py2.py3-none-any.whl.metadata (3.0 kB)
Collecting langsmith<0.3,>=0.1.125 (from langchain-core<0.4.0,>=0.3.29->langchain_groq)
  Using cached langsmith-0.2.10-py3-none-any.whl.metadata (14 kB)
Collecting requests-toolbelt<2.0.0,>=1.0.0 (from langsmith<0.3,>=0.1.125->langchain-core<0.4.0,>=0.3.29->langchain_groq)
  Using cached requests_toolbelt-1.0.0-py2.py3-none-any.whl.metadata (14 kB)
Using cached langchain_g

DEPRECATION: Loading egg at c:\python311\lib\site-packages\vboxapi-1.0-py3.11.egg is deprecated. pip 24.3 will enforce this behaviour change. A possible replacement is to use pip for package installation. Discussion can be found at https://github.com/pypa/pip/issues/12330
ERROR: Could not install packages due to an OSError: [Errno 13] Permission denied: 'C:\\Python311\\Scripts\\jsondiff'
Consider using the `--user` option or check the permissions.


[notice] A new release of pip is available: 24.1.2 -> 24.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [4]:
import pandas as pd 
import numpy as np 
import os, json, re, getpass
from dotenv import load_dotenv

# Install the missing package
%pip install langchain_groq

from langchain_groq import ChatGroq


Collecting langchain_groq
  Downloading langchain_groq-0.2.3-py3-none-any.whl.metadata (3.0 kB)
Collecting groq<1,>=0.4.1 (from langchain_groq)
  Downloading groq-0.15.0-py3-none-any.whl.metadata (14 kB)
Collecting langchain-core<0.4.0,>=0.3.29 (from langchain_groq)
  Downloading langchain_core-0.3.29-py3-none-any.whl.metadata (6.3 kB)
Collecting distro<2,>=1.7.0 (from groq<1,>=0.4.1->langchain_groq)
  Using cached distro-1.9.0-py3-none-any.whl.metadata (6.8 kB)
Collecting jsonpatch<2.0,>=1.33 (from langchain-core<0.4.0,>=0.3.29->langchain_groq)
  Downloading jsonpatch-1.33-py2.py3-none-any.whl.metadata (3.0 kB)
Collecting langsmith<0.3,>=0.1.125 (from langchain-core<0.4.0,>=0.3.29->langchain_groq)
  Downloading langsmith-0.2.10-py3-none-any.whl.metadata (14 kB)
Collecting tenacity!=8.4.0,<10.0.0,>=8.1.0 (from langchain-core<0.4.0,>=0.3.29->langchain_groq)
  Using cached tenacity-9.0.0-py3-none-any.whl.metadata (1.2 kB)
Collecting jsonpointer>=1.9 (from jsonpatch<2.0,>=1.33->langchain-

DEPRECATION: Loading egg at c:\python311\lib\site-packages\vboxapi-1.0-py3.11.egg is deprecated. pip 24.3 will enforce this behaviour change. A possible replacement is to use pip for package installation. Discussion can be found at https://github.com/pypa/pip/issues/12330
ERROR: Could not install packages due to an OSError: [Errno 13] Permission denied: 'c:\\Python311\\Scripts\\jsonpointer'
Consider using the `--user` option or check the permissions.


[notice] A new release of pip is available: 24.1.2 -> 24.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip


ModuleNotFoundError: No module named 'langchain_groq'

In [4]:
load_dotenv(".env", override=True)
if "GROQ_API_KEY" not in os.environ:
    os.environ["GROQ_API_KEY"] = getpass.getpass("GROQ API Key: ")

In [5]:
model_id = "llama3-8b-8192" #llama3-8b-8192, llama-3.1-8b-instant, llama3-groq-8b-8192-tool-use-preview, llama3-groq-70b-8192-tool-use-preview
llm = ChatGroq(model_name=model_id, temperature=0, )

#### PromptTemplate

In [28]:
from langchain_core.prompts import ChatPromptTemplate
system_message_template = """Analyze the sentiment of each review, categorize it as positive, neutral, or negative, and show the result.
                                Secondly, Identify the following details:
                                - Pros: List the positive aspects of the vehicle mentioned in the review.
                                - Cons: List the negative aspects of the vehicle mentioned in the review.
                                - Liked Features: Highlight specific features the reviewer liked (if mentioned).
                                - Disliked Features: Highlight specific features the reviewer disliked (if mentioned).
                                Respond with a JSON object containing these fields: Sentiment, Pros, Cons, Liked_Features, Disliked_Features.

Review_Date : {review_date}
author : {author}
Review_Title : {review_title}
Review_Text : {review_text}
Sentiment : {sentiment}
"""
template = ChatPromptTemplate([
    ("system", system_message_template),
    ("human", "{user_input}"),
])

In [26]:
df = pd.read_csv("E:/JIO QTR 3/LLM/Lab1/Exercise 1/ford_car_reviews.csv", encoding='utf-8', on_bad_lines='skip', engine='python')
df = df.head(25)

NameError: name 'StrOutputParser' is not defined

In [32]:
from langchain_core.output_parsers import StrOutputParser, JsonOutputParser

parser = JsonOutputParser()   

In [33]:
proof_read_chain = template | llm | JsonOutputParser()

In [38]:
results = []
for _, row in df.iterrows():
    chain_output = proof_read_chain.invoke(dict(
        user_input="Analyze the sentiment of each review, categorize it as positive, neutral, or negative, and show the result in JSON format.",
        author=row['Author_Name'],
        review_date=row['Review_Date'],
        review_title=row['Review_Title'],
        review_text=row['Review'],
        sentiment="" 
    ))
    results.append(chain_output)

print(results)

[{'Review': {'Date': '06/06/18 14:19 PM (PDT)', 'Author': 'Vicki', 'Title': '2006 Mustang GT', 'Text': 'Doesn’t disappoint', 'Sentiment': 'Positive'}}, {'sentiment': 'mixed', 'categories': [{'category': 'positive', 'reasons': ["smokin' hot looking car", 'fun to drive']}, {'category': 'negative', 'reasons': ['orneriest transmission', 'difficult to master smoothly', 'harsh ride', 'road noise']}]}, {'sentiment': 'positive', 'reason': "The reviewer mentions that there will always be a 05-09 mustang for sale and that they purchased one as a second car, calling it a 'great investment'. This indicates a positive sentiment towards the product."}, {'sentiment': 'positive', 'review': 'I have wanted a Mustang for 40 years.\nI bought my car from an auction I work at ( Adesa Sacramento ) and I love it!! It is a v6 with an air aid cold air injector, throttle body spacer and a Flowmaster exhaust. The previous owner also added blacked out 17 inch factory rims. This beast will smoke any G6 or v6 camaro