## This is a  machine learning Logistic Regression model
that predicts whether an email is spam or not

In [18]:
import pandas as pd

data = {
    "text": [
        "Win a brand new car! Click the link to claim your prize now.",
        "Meeting at 3 PM. Let me know if you can make it.",
        "Congratulations! You've won a free vacation to the Bahamas.",
        "Reminder: Your dentist appointment is scheduled for tomorrow.",
        "Limited time offer! Buy one, get one free on all items.",
        "Hey, are we still on for dinner tonight?",
        "Claim your free gift card now by clicking this link!",
        "Please review the attached document before our meeting.",
        "Your loan has been approved! Apply now to receive funds.",
        "Happy birthday! Hope you have a great day."
    ]
}

y = [1, 1, 0, 1, 1, 1, 1, 1, 0, 1]

df = pd.DataFrame(data)

Here's a dataset with email text samples and their corresponding labels (spam or not spam).

Before we train a model, we need to convert text into numerical features using CountVectorizer.

In [11]:
from sklearn.feature_extraction.text import CountVectorizer

vectorizer = CountVectorizer()

X_counts = vectorizer.fit_transform(df["text"])

CountVectorizer: Converts text into a matrix of word counts (e.g., how many times each word appears in the text).

In [19]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X_train, X_test, y_train, y_test = train_test_split(X_counts, y, test_size=0.2, random_state=42)

Split data into training and test sets (80% train, 20% test)

### Initialize and train the Logistic Regression model

In [33]:
model = LogisticRegression()
model.fit(X_train, y_train)

#Predict on the test set
y_pred = model.predict(X_test)

Evaluate accuracy

In [24]:
accuracy = accuracy_score(y_test, y_pred)

print(accuracy)

0.5


The model with 0.5 accuracy is only performing slightly better than random guessing.

We can use a different way of converting text to numerical values.

We can use TF-IDF instead of CountVectorizer

In [27]:
from sklearn.feature_extraction.text import TfidfVectorizer

vectorizer = TfidfVectorizer()
X_tfidf = vectorizer.fit_transform(df["text"])

In [42]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X_train, X_test, y_train, y_test = train_test_split(X_tfidf, y, test_size=0.2, random_state=42)

model = LogisticRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

In [46]:
accuracy = accuracy_score(y_test, y_pred)
print(accuracy)

0.5


Still the accuracy stays 0.5

Lets try again with more data

In [88]:
import pandas as pd

bigData = {
    "bigText": [
        "Could you please approve the meeting request for tomorrow at 10 AM?",
        "I would like to request a day off next Friday due to personal matters.",
        "Can you clarify the procedure for submitting a complaint?",
        "Please confirm if we can have the meeting at 2 PM on Wednesday.",
        "Can you approve my day off request for next Monday?",
        "I need to take a day off next Tuesday. Could you please approve?",
        "I'm requesting a sick day for tomorrow. Please let me know if that works.",
        "Please review the attached document for our upcoming discussion.",
        "Can you confirm if the meeting time works for everyone?",
        "I am requesting a half-day off on Thursday. Please confirm.",
        "I need to reschedule the meeting for next Tuesday at 3 PM.",
        "I am requesting approval for a vacation day on the 25th of this month.",
        "Please confirm if I can take a personal day next Friday.",
        "Can I take the afternoon off to attend an appointment?",
        "Please approve my request for a day off next Monday.",
        "I’m requesting time off on the 18th for family reasons.",
        "Can you approve my request to leave early today?",
        "Please let me know if the meeting can be moved to another day.",
        "Can I take a day off this coming Monday for personal reasons?",
        "Your Crypto wallet was hacked from India, call us here",
        "Requesting your approval to take Thursday off next week.",
        "Please confirm if the proposed meeting time works for you.",
        "I’d like to request a day off for personal reasons on the 12th.",
        "Can you confirm the meeting schedule for tomorrow?",
        "I am requesting a leave of absence for next Wednesday.",
        "Can you approve my sick day request for next Tuesday?",
        "Please approve my request for personal time off tomorrow.",
        "Could you confirm if we can reschedule the meeting for next Friday?",
        "Requesting approval for time off on the 7th of next month.",
        "I need to take the day off for a doctor's appointment on the 3rd.",
        "Please confirm if I can take a vacation day on the 20th.",
        "I’m requesting approval to take the afternoon off tomorrow.",
        "Sexy woman around you looking for a father for their child",
        "Could you approve my leave request for the 10th of this month?",
        "Please let me know if I can take a personal day next Wednesday.",
        "Can you confirm if we are still meeting tomorrow at 11 AM?",
        "I need to leave early today for an appointment. Please approve.",
        "Can you confirm the new schedule for our meeting on the 14th?",
        "I’d like to request a day off on the 5th. Let me know if that’s okay.",
        "Could you approve my sick leave for the 2nd of next week?",
        "Please let me know if I can take the afternoon off on the 17th.",
        "I’m requesting a day off on the 21st for personal reasons.",
        "Could you approve my leave request for this Friday?",
        "Please confirm if we can reschedule the meeting for next Monday.",
        "I’m requesting approval for time off next Thursday due to personal reasons.",
        "Please confirm if I can leave early tomorrow.",
        "Could you approve my vacation request for the 26th?",
        "I need to take a day off on the 1st. Please approve.",
        "Wow you are the great winner of 100 Billion Dollars!",
        "Can you confirm the time for our meeting tomorrow?",
        "Please approve my request for a half-day leave tomorrow.",
        "I need to leave early today to attend a family event. Please approve.",
        "I would like to request time off next Tuesday for a personal matter.",
        "Can you approve my request for a sick day on the 19th?",
        "Please let me know if I can take the day off on the 22nd.",
        "Can you approve my leave request for next Wednesday?",
        "Please confirm if I can take a personal day on the 8th.",
        "Can you confirm the meeting time for the 15th?",
        "I’m requesting approval for a leave of absence next Monday.",
        "Could you approve my request for time off next Friday?",
        "Please confirm if I can take a vacation day on the 29th."
    ]
}

df = pd.DataFrame(bigData)

y = df['bigText'].apply(
    lambda x: 1 if any(keyword in x.lower() for keyword in ['request', 'day off', 'approve', 'sick day', 'leave']) else 0
)


In [89]:
from sklearn.feature_extraction.text import TfidfVectorizer

vectorizer = TfidfVectorizer()
X_tfidf = vectorizer.fit_transform(df["bigText"])

In [90]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X_train, X_test, y_train, y_test = train_test_split(X_tfidf, y, test_size=0.2, random_state=42)

model = LogisticRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

In [91]:
accuracy = accuracy_score(y_test, y_pred)
print(accuracy)

0.6923076923076923


69.2% is good for me rn.

This was my first machine learning project <3

Blessed to have the opportunity is build cool stuff.