## Option 1: Sentiment Analysis and Key Insights Extraction from Ford Car Reviews

### **Problem Statement:**
You have been provided with a dataset containing Ford car reviews. Your task is to use LangChain and the concepts you’ve learned to perform the following tasks:

1. **Sentiment Analysis**: Analyze the sentiment of each review, categorize it as positive, neutral, or negative, and store the result.
2. **Key Insights Extraction**: Extract key pieces of information from each review, such as the pros and cons mentioned, and the specific features the reviewer liked or disliked (e.g., vehicle performance, comfort, price).

You will build a LangChain-based solution that leverages language models to automatically extract this information and provide a structured summary of the reviews. 

---
### **Steps to Solve:**

#### **Step 1: Load the Dataset**
- The dataset file is named `ford_car_reviews.csv` and is sourced from Kaggle: [Edmunds Consumer Car Ratings and Reviews](https://www.kaggle.com/datasets/ankkur13/edmundsconsumer-car-ratings-and-reviews).
- For this exercise, **limit the data to the first 25 records**. This can be achieved by using `df.head(25)` or `df.iloc[:25]` when loading the data into a DataFrame.

#### **Step 2: Define the Sentiment Analysis Task**
- Use LangChain to create a pipeline to classify the sentiment of each review.
- Define prompts that can guide the model to evaluate the sentiment. For example:
  - "Given the following car review, classify the sentiment as positive, neutral, or negative."

#### **Step 3: Key Insights Extraction**
- Use LangChain to create a pipeline to extract pros, cons, and notable features from each review. Define prompts such as:
  - "What are the pros and cons of the vehicle described in the following review?"
  - "What specific features of the vehicle does the reviewer like or dislike?"

#### **Step 4: Update the DataFrame with New Information**
- Run the pipeline for each review and collect the sentiment and insights.
- Once the analysis and extraction are complete, update the original DataFrame with additional columns to include:
  - Sentiment (positive, neutral, negative)
  - Pros
  - Cons
  - Liked_Features
  - Disliked_Features

---


## Option 2: Job Postings Analysis - Role Categorization and Requirements Extraction (55 marks)

**Problem Statement:**

Using a provided dataset of job postings, your goal is to use LangChain and an LLM to automatically analyze each posting and extract structured information.
1.  **Job Category Classification**: Classify the job role into a broad category like Technology/IT, Finance, Marketing, or Healthcare.
2.  **Key Requirements Extraction**: Extract specific requirements from the job description, including:
    * **Required Skills/Technologies**: Identify specific skills, programming languages, or tools mentioned (e.g., Python, project management).
    * **Education Level**: Identify any required or preferred education levels (e.g., Bachelor's degree, MBA).
    * **Experience**: Identify mentions of required experience, such as years of experience or a specific level (e.g., "3+ years experience" or "senior-level").

The final output should augment each job posting with a category label and fields detailing the extracted requirements.

**Steps to Solve:**

* **Step 1: Load the Dataset**
    * Use the provided job postings dataset, loading it into a DataFrame.
    * Limit the data to the first 25 job postings to keep processing feasible.
    * The expected output is a loaded dataframe.
* **Step 2: Define the Job Category Classification Task**
    * Develop a prompt to classify the job posting into a broad category.
    * A sample prompt is: "Given the following job title and description, categorize the job into one of the following domains: Technology/IT, Finance, Marketing, Healthcare, Education, Others."
    * The LLM should return a single category label.
    * The expected output is a demonstration of this working for a sample data point.
* **Step 3: Define the Requirements Extraction Task**
    * Create prompts to extract key requirements: skills, education, and experience.
    * You can use separate prompts for each sub-task or a single composite prompt.
    * The expected output is a demonstration of this working for a sample data point.
* **Step 4: Apply the LLM Chain to Each Job Posting**
    * Loop through each job posting in the DataFrame.
    * For each entry, feed the job title/description into the classification and extraction prompts.
    * Make sure to handle cases where information is not mentioned, such as outputting "Not specified" or null.
    * The expected output is a Pandas dataframe with all the required additional columns.
* **Step 5: Update the DataFrame with New Columns**
    * Add new columns to the DataFrame for the outputs, such as **Predicted Category**, **Required Skills**, **Education Required**, and **Experience Required**.
    * Fill these columns with the LLM's output for each posting.
    * The expected output is a final Pandas dataframe containing all original and new columns.

### **Example Output in JSON (should be a merged Pandas dataframe finally):**

```json

{
"Job_Title": "Senior Data Analyst",
"Job_Description": "We are seeking a Senior Data Analyst to join our team...
Responsibilities include analyzing sales data, creating dashboards, and reporting
insights. **Requirements:** 5+ years experience in data analysis, proficiency in
SQL and Python, familiarity with BI tools. Bachelor's degree in Finance,
Statistics, or related field required. Excellent communication skills and
attention to detail are a must.",
"Predicted_Category": "Finance/Analytics",
"Required_Skills": ["SQL", "Python", "Business Intelligence tools", "data
analysis"],
"Education_Required": "Bachelor’s degree (Finance, Statistics, or related)",
"Experience_Required": "5+ years"
}
```

---

### Bonus (optional)

Complete the analysis for all rows in the dataset. Note that using Ollama is advised over Groq to avoid rate limits, and you should balance inference speed with accuracy.