# AI Engineer Technical Assessment

## Overview
Build an AI-powered solution for sentiment analysis of movie reviews that leverages the existing dataset to improve accuracy. This assessment is designed to be completed in 2-3 hours, we do NOT expect very detailed answers or long explanations.

## Notes
- AI assistance is allowed and, in fact, encouraged. caveats are:
    - Concise explanations and simple code are preferred
    - Solutions that use newer information and go beyond LLMs cuttof date are valuable.
    - You must be able to explain the code you write here

- Look up any information you need, copy and paste code is allowed.
- Setup the environment as needed. You can use your local environment, colab, or any other environment of your preferenc.
- Focus on working solutions, leave iteration and improvements if you have extra time.

## Setup
The following cells will download and prepare the IMDB dataset. 

In [28]:
import pandas as pd
from datasets import load_dataset

# Load IMDB dataset
dataset = load_dataset("imdb")
train_df = pd.DataFrame(dataset['train'])
test_df = pd.DataFrame(dataset['test'])

# Sample subset for quicker development
train_df = train_df.sample(n=5000, random_state=42)
test_df = test_df.sample(n=10, random_state=42)

print(f"Training samples: {len(train_df)}")
print(f"Test samples: {len(test_df)}")

# Display sample data
print("\nSample review:")
sample = train_df.iloc[0]
print(f"Text: {sample['text'][:200]}...")
print(f"Sentiment: {'Positive' if sample['label'] == 1 else 'Negative'}")

Training samples: 5000
Test samples: 10

Sample review:
Text: Dumb is as dumb does, in this thoroughly uninteresting, supposed black comedy. Essentially what starts out as Chris Klein trying to maintain a low profile, eventually morphs into an uninspired version...
Sentiment: Negative


In [29]:
train_df.head()

Unnamed: 0,text,label
6868,"Dumb is as dumb does, in this thoroughly unint...",0
24016,I dug out from my garage some old musicals and...,1
9668,After watching this movie I was honestly disap...,0
13640,This movie was nominated for best picture but ...,1
14018,Just like Al Gore shook us up with his painful...,1


## Task 1: Model Implementation
Implement a solution that analyzes sentiment in movie reviews. This part is explicitly open-ended: Explore ways to leverage the example dataset to enhance predictions. You can consider a pre-trained language model that can understand and generate text, external API's, RAG systems etc. 
Feel free to use any library or tool you are comfortable with.

To address this sentiment analysis assessment, I have chosen to evaluate two approaches:

1. **Lightweight Encoder Model (ModernBERT):** Using an encoder-only transformer model trained for sentiment classification. Specifically, I have decided to use [ModernBERT](https://arxiv.org/pdf/2412.13663), a next-generation encoder model (2024) that introduces several architectural advancements over the original BERT.
2. **Sentiment Classification with a Small LLM:** The second approach explores the use of a small-scale Large Language Model (LLM), ranging between 1B to 3B parameters, to perform sentiment classification through prompt engineering.

In [4]:
# Lightweight Encoder Model (ModernBERT)
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

class LightweightModelService:
    def __init__(self, model_name: str ="clapAI/modernBERT-base-multilingual-sentiment") -> None:
        self.device = "cuda" if torch.cuda.is_available() else "cpu"

        # Load the tokenizer and model
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        self.model = AutoModelForSequenceClassification.from_pretrained(model_name, torch_dtype=torch.float16).to(self.device).eval()

        print(f"Retrieve labels from the model's configuration  {self.model.config.id2label}")

    def predict(self, text: str) -> int:
        inputs = self.tokenizer(text, return_tensors="pt").to(self.device)

        with torch.inference_mode():
            outputs = self.model(**inputs)
            predictions = outputs.logits.argmax(dim=-1)

        return predictions.item()

In [38]:
from typing import Any, cast

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline  # type: ignore

### Instructions:
# System-level prompt defining the assistant's behavior
def get_system_prompt() -> str:
    return """You are an expert in sentiment analysis of movie reviews. Your task is to evaluate the sentiment of the given review and classify it as "positive" or "negative". Your analysis must consider both explicit and implicit sentiment cues while maintaining a focus on the target entity. 
    
    1. **Sentiment Classification Criteria**:  
    - **Positive**: The review expresses favorable opinions about the movie, highlighting its strengths or praising specific aspects.  
    - **Negative**: The review conveys unfavorable opinions, criticizing elements of the movie or expressing disappointment.

    3. **Handling Mixed Sentiments and Implicit Sentiment**:  
    - If both positive and negative elements exist, classify based on the **dominant sentiment**, considering intensity and frequency.  
    - Detect subtle tones, including **irony, implied sentiment, and framing biases** (e.g., selective comparisons, loaded phrases).  

    5. **Output Format**:  
    - Return only one word: **"positive" or "negative"**  
    - Do not include explanations, additional text, or punctuation.
    """

# User-level prompt defining the interaction model
def get_user_prompt(movie_review: str) -> str:
    return f"""Analyze the sentiment of the provided movie review: {movie_review}"""


class Phi4LLM:
    """
    A service class for interacting with the PHI 4 model.
    """

    def __init__(
        self, model_name: str = "microsoft/Phi-4-mini-instruct", temperature: float = 0.1, max_new_tokens: int = 500
    ) -> None:
        torch.random.manual_seed(0)
        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        self.temperature = temperature
        self.max_new_tokens = max_new_tokens

        self.model_name = model_name

        self.initialize_model()

    def initialize_model(self) -> None:

        self.model = AutoModelForCausalLM.from_pretrained(
            self.model_name,
            **self.get_model_kwargs(),
        ).eval()

        self.tokenizer = AutoTokenizer.from_pretrained(self.model_name)

        with torch.no_grad():
            self.completion_pipeline = pipeline(
                "text-generation",
                model=self.model,
                tokenizer=self.tokenizer,
            )

    def get_completions(self, movie_review: list[str]) -> str:
        """
        Generate completions based on the provided messages.
        """

        messages = [
            {"role": "system", "content": get_system_prompt()},
            {
                "role": "user",
                "content": get_user_prompt(movie_review),
            },
        ]

        generation_args = {
            "max_new_tokens": self.max_new_tokens,
            "return_full_text": False,
            "temperature": self.temperature,
            "do_sample": False,
        }

        with torch.no_grad():
            output = self.completion_pipeline(messages, **generation_args)

        return cast(str, output[0]["generated_text"])

    def get_model_kwargs(self) -> dict[str, Any]:
        """
        Return default kwargs for huggingface model loading.
        """
        model_kwargs = {
            "device_map": "auto",
            "torch_dtype": "auto",
            "trust_remote_code": True,
        }
        # if self.device.type == "cuda":
        #     model_kwargs["attn_implementation"] = "flash_attention_2"

        return model_kwargs


## Task 2: API Implementation
Create a simple API using FastAPI that serves your solution. The API should accept a review text and return the sentiment analysis result.

Expected format:
```python
# Request
{
    "review_text": "This movie exceeded my expectations..."
}

# Response
{
    "sentiment": "positive",
    "confidence": 0.92,
    "similar_reviews": [
        {},
        {}
    ]
}
```

In [None]:
from fastapi import FastAPI
from pydantic import BaseModel

# Your API implementation here

## Task 3: Testing and Performance
Evaluate your solution's performance on the test set. Include:
1. Accuracy metrics (precision, recall, F1-score)
2. Inference speed (average time per prediction)

Compare performance with and without using the example data to demonstrate any improvements.

In [None]:
import time
from sklearn.metrics import classification_report

# Your testing code here

## Task 4: Deployment Strategy

1. Describe your deployment strategy considering:
   - Data storage and retrieval
   - Scalability
   - Resource requirements
   - Cost considerations

2. Create a simple Dockerfile to package your solution

In [None]:
# Write your deployment strategy here as a markdown cell
deployment_strategy = """
# Deployment Strategy

## Infrastructure
...

## Scalability Approach
...

## Model & Data Storage
...

## Resource & Cost Considerations
...
"""

print(deployment_strategy)

# Write your Dockerfile content
dockerfile_content = """
# Your Dockerfile here
...
"""

print("\nDockerfile:")
print(dockerfile_content)


# Deployment Strategy

## Infrastructure
...

## Scalability Approach
...

## Model & Data Storage
...

## Resource & Cost Considerations
...


Dockerfile:

# Your Dockerfile here
...



## Evaluation Criteria
- Implementation that can process reviews and return sentiments
- Use of extra data to improve predictions
- Proper API design
- Reasonable deployment strategy

Good luck!