# Text classification

Using the dataset `dataset_emails.csv` (or other dataset of your choice) create three text classificators:
* Using rule-based approach (regex)
* Using naive-bayes
* Using Spacy 3 

Finally, compare the results and show what is better and why. 

In [3]:
import os
import sys

In [12]:
dir = os.getcwd()
sys.path.append(dir)

In [18]:
# Import the csv file
def get_csv_file() -> str:
    with open(os.path.join(dir, "dataset_emails.csv"), "r") as csv_file:
        return csv_file.read()

In [22]:
csv_file = get_csv_file()
print(csv_file)

prompt,label
"Can I send an email, please?",send
I'd like to compose an email.,send
I need to send an email.,send
Could you help me write an email?,send
Is it possible to send an email with you?,send
Let's write an email.,send
Time to send an email.,send
I want to email someone.,send
Open email for writing.,send
Compose a new message.,send
I have an email to send.,send
There's someone I need to email.,send
I want to get in touch with [someone] through email.,send
Could you draft an email for me?,send
I need to send an email about [topic].,send
Time to send a quick email.,send
Let's shoot someone an email.,send
I have an important email to write.,send
Is there a way to email [someone]?,send
Can you help me send an email regarding [topic]?,send
I need to drop someone a line.,send
Let's ping someone with an email.,send
Time to fire off an email.,send
Is it cool if I send an email?,send
Can you whip up an email for me?,send
I could use an email assistant right now.,send
Let's get in touch 

# Text Classification using regex

In [25]:
# Text Classification using regex (rule-based approach)

import re

send_patterns = [
    r'\bemail\b',     
    r'\bcompose\b',  
    r'\bdraft\b',     
    r'\bwrite\b',     
    r'\bmessage\b',   
    r'\bmail\b'       
]

send_regex = re.compile('|'.join(send_patterns), re.IGNORECASE)

def classify_text(text):
    """
    Classify text as 'send' if it contains any of the send_patterns, otherwise 'other'.
    """
    if send_regex.search(text):
        return "send"
    else:
        return "other"


for t in csv_file.split("\n"):
    print(f"Texto: {t}\nClasificación: {classify_text(t)}\n")

Texto: prompt,label
Clasificación: other

Texto: "Can I send an email, please?",send
Clasificación: send

Texto: I'd like to compose an email.,send
Clasificación: send

Texto: I need to send an email.,send
Clasificación: send

Texto: Could you help me write an email?,send
Clasificación: send

Texto: Is it possible to send an email with you?,send
Clasificación: send

Texto: Let's write an email.,send
Clasificación: send

Texto: Time to send an email.,send
Clasificación: send

Texto: I want to email someone.,send
Clasificación: send

Texto: Open email for writing.,send
Clasificación: send

Texto: Compose a new message.,send
Clasificación: send

Texto: I have an email to send.,send
Clasificación: send

Texto: There's someone I need to email.,send
Clasificación: send

Texto: I want to get in touch with [someone] through email.,send
Clasificación: send

Texto: Could you draft an email for me?,send
Clasificación: send

Texto: I need to send an email about [topic].,send
Clasificación: send

T

# Text Classification using Bayes

In [28]:
import csv
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import make_pipeline

texts = []
labels = []

with open("dataset_emails.csv", "r", encoding="utf-8") as f:
    reader = csv.reader(f)
    # omitir el encabezado si existe
    next(reader, None)
    for row in reader:
        if len(row) < 2:
            continue
        prompt, label = row[0], row[1]
        texts.append(prompt)
        labels.append(label)

# Crear el modelo
model = make_pipeline(CountVectorizer(), MultinomialNB())

# Entrenar
model.fit(texts, labels)

# Predecir
predictions = model.predict(texts)

for text, label, prediction in zip(texts, labels, predictions):
    print(f"Texto: {text}\nEtiqueta real: {label}\nPredicción: {prediction}\n")

Texto: Can I send an email, please?
Etiqueta real: send
Predicción: send

Texto: I'd like to compose an email.
Etiqueta real: send
Predicción: send

Texto: I need to send an email.
Etiqueta real: send
Predicción: send

Texto: Could you help me write an email?
Etiqueta real: send
Predicción: send

Texto: Is it possible to send an email with you?
Etiqueta real: send
Predicción: send

Texto: Let's write an email.
Etiqueta real: send
Predicción: send

Texto: Time to send an email.
Etiqueta real: send
Predicción: send

Texto: I want to email someone.
Etiqueta real: send
Predicción: send

Texto: Open email for writing.
Etiqueta real: send
Predicción: send

Texto: Compose a new message.
Etiqueta real: send
Predicción: send

Texto: I have an email to send.
Etiqueta real: send
Predicción: send

Texto: There's someone I need to email.
Etiqueta real: send
Predicción: send

Texto: I want to get in touch with [someone] through email.
Etiqueta real: send
Predicción: send

Texto: Could you draft an e