<a href="https://colab.research.google.com/github/Nobobi-Hasan/Fine_Tune_Llama/blob/main/Llama_Email_Classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **DataCamp Project**

Every day, professionals wade through hundreds of emails, from urgent client requests to promotional offers. It's like trying to find important messages in a digital ocean. But AI can help you stay afloat by automatically sorting emails to highlight what matters most.

You've been asked to build an intelligent email assistant using Llama, to help users automatically classify their incoming emails. Your system will identify which emails need immediate attention, which are regular updates, and which are promotions that can wait or be archived.

### The Data
You'll work with a dataset of various email examples, ranging from urgent business communications to promotional offers. Here's a peek at what you'll be working with:

### email_categories_data.csv

 Column | Description |
|--------|-------------|
| email_id | A unique identifier for each email in the dataset. |
| email_content | The full email text including subject line and body. Each email follows a format of "Subject" followed by the message content on a new line. |
| expected_category | The correct classification of the email: `Priority`, `Updates`, or `Promotions`. This will be used to validate your model's performance. |


In [None]:
# Run the following cells first
# Install necessary packages, then import the model running the cell below
!pip install llama-cpp-python==0.2.82 -q -q -q

In [None]:
SELECT *
FROM 'models.csv'
LIMIT 5

In [None]:
# Import required libraries
import pandas as pd
from llama_cpp import Llama

In [None]:
# Load the email dataset
emails_df = pd.read_csv('data/email_categories_data.csv')
# Display the first few rows of our dataset
print("Preview of our email dataset:")
emails_df.head(2)

In [None]:
# Set the model path
model_path = "/files-integrations/files/c9696c24-44f3-45f7-8ccd-4b9b046e7e53/tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf"

In [None]:
for c in emails_df['email_content']:
    print(len(c))

In [None]:
# Initialize the Llama model
# We set n_gpu_layers=0 to run the model on the CPU, ensuring portability
llm = Llama(
    model_path=model_path,
    n_ctx=128, # Context window size
    n_gpu_layers=0, # Number of layers to offload to GPU
    verbose=False
)

In [None]:
# Create the prompt string
prompt = """
You are an expert email classification system. Your task is to read an email and classify it into one of the following three categories: "Priority", "Updates", or "Promotions".

Only output the single category name. Do not include any other text, explanation, or punctuation.

Email: {email_content}

Category:
"""

In [None]:
# Get the content of the first two emails
email1 = emails_df.loc[0, 'email_content']
email2 = emails_df.loc[1, 'email_content']

In [None]:
# --- Test on the First Email ---
# Format the prompt with the first email content
prompt1 = prompt.format(email_content=email1)

# Generate the classification result
output1 = llm.create_completion(
    prompt=prompt1,
    max_tokens=20, # Only need a few tokens for the category name
    stop=["\n"],   # Stop generation at the first newline character
    temperature=0.0 # Set temperature to 0 for deterministic classification
)

# Extract and clean the result
result1 = output1['choices'][0]['text'].strip()
print(f"Classification for Email 1: {result1}")

In [None]:
# --- Test on the Second Email ---
# Format the prompt with the second email content
prompt2 = prompt.format(email_content=email2)

# Generate the classification result
output2 = llm.create_completion(
    prompt=prompt2,
    max_tokens=20,
    stop=["\n"],
    temperature=0.0
)

# Extract and clean the result
result2 = output2['choices'][0]['text'].strip()
print(f"Classification for Email 2: {result2}")

In [None]:
print("\n--- Final Variables ---")
print(f"Prompt (Template):\n{prompt}")
print(f"Result 1 (Classification for Email 1):\n{result1}")
print(f"Result 2 (Classification for Email 2):\n{result2}")