![email inbox](email_inbox.jpg)

Every day, professionals wade through hundreds of emails, from urgent client requests to promotional offers. It's like trying to find important messages in a digital ocean. But AI can help you stay afloat by automatically sorting emails to highlight what matters most.

You've been asked to build an intelligent email assistant using Llama, to help users automatically classify their incoming emails. Your system will identify which emails need immediate attention, which are regular updates, and which are promotions that can wait or be archived.

### The Data
You'll work with a dataset of various email examples, ranging from urgent business communications to promotional offers. Here's a peek at what you'll be working with:

### email_categories_data.csv

 Column | Description |
|--------|-------------|
| email_id | A unique identifier for each email in the dataset. |
| email_content | The full email text including subject line and body. Each email follows a format of "Subject" followed by the message content on a new line. |
| expected_category | The correct classification of the email: `Priority`, `Updates`, or `Promotions`. This will be used to validate your model's performance. |



In [13]:
# Run the following cells first
# Install necessary packages
!pip install llama-cpp-python==0.2.82 -q -q -q

In [14]:
# Download the model
!wget -q https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v0.3-GGUF/resolve/main/tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf?download=true -O model.gguf

In [15]:
# Import required libraries
import pandas as pd
from llama_cpp import Llama

In [16]:
# Load the email dataset
emails_df = pd.read_csv('data/email_categories_data.csv')
# Display the first few rows of our dataset
print("Preview of our email dataset:")
emails_df.head(2)

Preview of our email dataset:


Unnamed: 0,email_id,email_content,expected_category
0,1,Urgent: Server Maintenance Required\nOur main ...,Priority
1,2,50% Off Spring Collection!\nDon't miss our big...,Promotions


In [17]:
# Initialize the Llama model
llm = Llama("model.gguf")

llama_model_loader: loaded meta data with 20 key-value pairs and 201 tensors from model.gguf (version GGUF V2)
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = py007_tinyllama-1.1b-chat-v0.3
llama_model_loader: - kv   2:                       llama.context_length u32              = 2048
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 2048
llama_model_loader: - kv   4:                          llama.block_count u32              = 22
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 5632
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 64
llama_model_loader: - kv   7:                 llama.attention.head_count u32             

In [18]:
# Design the prompt
prompt = """
You need to classify the emails into three category: Priority, Updates and Promotions.

Email 1: Important - Billing Issue. Your payment failed. Please update your billing details immediately.
Category: Priority

Email 2: We’ve got some exciting news! We’re always working to bring you the best, and we’re thrilled to share our latest product.
Category: Updates

Email 3: Limited-Time Offer! 🎉 Get up to 30% OFF on our top-quality product this summer!
Category: Promotions

"""

In [19]:
# Processing Messages
def process_message(llm, user_prompt, message):
    input_prompt = user_prompt + f"""Email 4: {message}
    Category:"""
    
    output = llm(
        input_prompt,
        max_tokens=5,
        temperature=0,
        stop=["Example", "\n"]
    )
    
    return output["choices"][0]["text"].strip()

In [20]:
# Testing the model
results = []

for i in range(2):
    message = emails_df.loc[i]["email_content"]
    
    print("Message:", message)
    print("Expected Category:", emails_df.loc[i]["expected_category"])
    
    results.append(process_message(llm, prompt, message))
    print("Predicted Category:", results[-1])

Message: Urgent: Server Maintenance Required\nOur main server needs immediate maintenance due to critical errors. Please address ASAP.
Expected Category: Priority



llama_print_timings:        load time =   10073.80 ms
llama_print_timings:      sample time =       0.49 ms /     3 runs   (    0.16 ms per token,  6147.54 tokens per second)
llama_print_timings: prompt eval time =   10073.58 ms /   176 tokens (   57.24 ms per token,    17.47 tokens per second)
llama_print_timings:        eval time =     209.45 ms /     2 runs   (  104.73 ms per token,     9.55 tokens per second)
llama_print_timings:       total time =   10285.91 ms /   178 tokens
Llama.generate: prefix-match hit


Predicted Category: Priority
Message: 50% Off Spring Collection!\nDon't miss our biggest sale of the season! All spring items half off. Limited time offer.
Expected Category: Promotions



llama_print_timings:        load time =   10073.80 ms
llama_print_timings:      sample time =       0.46 ms /     3 runs   (    0.15 ms per token,  6479.48 tokens per second)
llama_print_timings: prompt eval time =    3404.23 ms /    34 tokens (  100.12 ms per token,     9.99 tokens per second)
llama_print_timings:        eval time =     208.96 ms /     2 runs   (  104.48 ms per token,     9.57 tokens per second)
llama_print_timings:       total time =    3614.79 ms /    36 tokens


Predicted Category: Clothing


In [21]:
result1, result2 = results

result1, result2

('Priority', 'Clothing')