#Email Spam Classification using GPT2 and LangChain


In [2]:
!pip install langchain-huggingface transformers

Collecting langchain-huggingface
  Downloading langchain_huggingface-0.3.0-py3-none-any.whl.metadata (996 bytes)
Downloading langchain_huggingface-0.3.0-py3-none-any.whl (27 kB)
Installing collected packages: langchain-huggingface
Successfully installed langchain-huggingface-0.3.0


Import required libraries


In [24]:
import pandas as pd
from langchain_huggingface import HuggingFacePipeline
from langchain.prompts import PromptTemplate


Load the dataset


In [25]:
df = pd.read_csv("spam.csv")
emails = df['text'].tolist()  # Adjust 'text' if your column name differs


Initialize GPT2 model pipeline with LangChain

In [26]:
hf_gpt = HuggingFacePipeline.from_model_id(
    model_id="gpt2",
    task="text-generation",
    pipeline_kwargs={"max_new_tokens": 10},
)


Device set to use cpu


Create a prompt template for classification


In [27]:
template = """
Classify the following email as "spam" or "not spam". Respond only with the class name.
Email: {email}
Class:"""
prompt = PromptTemplate.from_template(template)


Define the classification function


In [28]:
def classify_email(email_text):
    input_prompt = prompt.format(email=email_text)
    output = hf_gpt(input_prompt)
    output_clean = output.lower().strip()

    if "spam" in output_clean:
        return "spam"
    else:
        return "not spam"


Classify emails and display results

In [29]:
classified_results = []
for email in emails[:10]:
    classification = classify_email(email)
    classified_results.append({'email': email, 'classification': classification})

for result in classified_results:
    print(f"Email: {result['email'][:50]}...\nClassification: {result['classification']}\n---")


Email: Go until jurong point, crazy.. Available only in b...
Classification: spam
---
Email: Ok lar... Joking wif u oni......
Classification: spam
---
Email: Free entry in 2 a wkly comp to win FA Cup final tk...
Classification: spam
---
Email: U dun say so early hor... U c already then say......
Classification: spam
---
Email: Nah I don't think he goes to usf, he lives around ...
Classification: spam
---
Email: FreeMsg Hey there darling it's been 3 week's now a...
Classification: spam
---
Email: Even my brother is not like to speak with me. They...
Classification: spam
---
Email: As per your request 'Melle Melle (Oru Minnaminungi...
Classification: spam
---
Email: WINNER!! As a valued network customer you have bee...
Classification: spam
---
Email: Had your mobile 11 months or more? U R entitled to...
Classification: spam
---


#Summary
We used LangChain’s HuggingFacePipeline to interface GPT2 for email classification.

The prompt template directs the model to output a concise class label.

