<a href="https://colab.research.google.com/github/229askbqu/hello-world/blob/main/Freelance_AI_Day1_Text_Classifier.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Freelance AI Project: Text Classification
## Project Description

This project demonstrates a machine learning model for text classification.
It accepts raw text messages and predicts whether they are spam or legitimate.
The system is built using Python, NLP techniques, and scikit-learn.


# Day 1 — Baseline model

In [None]:
import pandas as pd

url = "https://raw.githubusercontent.com/justmarkham/pycon-2016-tutorial/master/data/sms.tsv"
data = pd.read_csv(url, sep='\t', names=['label', 'text'])

data.head()


Unnamed: 0,label,text
0,ham,"Go until jurong point, crazy.. Available only ..."
1,ham,Ok lar... Joking wif u oni...
2,spam,Free entry in 2 a wkly comp to win FA Cup fina...
3,ham,U dun say so early hor... U c already then say...
4,ham,"Nah I don't think he goes to usf, he lives aro..."


In [None]:
X = data['text']
y = data['label']


In [None]:
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

model = Pipeline([
    ('tfidf', TfidfVectorizer()),
    ('clf', MultinomialNB())
])

model.fit(X_train, y_train)

predictions = model.predict(X_test)
accuracy = accuracy_score(y_test, predictions)

accuracy


0.9668161434977578

In [None]:
client_messages = [
    "Congratulations! You have won a free ticket",
    "Please review the attached meeting notes",
    "Urgent! Claim your reward now",
    "Can we reschedule our appointment?"
]

model.predict(client_messages)



array(['ham', 'ham', 'spam', 'ham'], dtype='<U4')

### Day 1 complete.
I built and tested a real ML model used in freelance work.
# ------Day 1------
# Day 2 — Refactoring for Client-Ready functions


In [None]:
def train_text_classifier(texts, labels):
    """
    Trains a text classification model using TF-IDF and Naive Bayes.
    Returns a trained model pipeline.
    """
    from sklearn.feature_extraction.text import TfidfVectorizer
    from sklearn.naive_bayes import MultinomialNB
    from sklearn.pipeline import Pipeline

    model = Pipeline([
        ('tfidf', TfidfVectorizer()),
        ('clf', MultinomialNB())
    ])

    model.fit(texts, labels)
    return model


In [None]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)


In [None]:
model = train_text_classifier(X_train, y_train)


In [None]:
test_predictions = model.predict(X_test)
accuracy = accuracy_score(y_test, test_predictions)
accuracy


0.9668161434977578

In [None]:
def predict_text(model, texts):
    """
    Predicts labels for new text inputs.
    Handles empty or invalid input, including Pandas Series.
    """
    if texts is None or len(texts) == 0:
        return []

    # Convert Pandas Series to list
    if isinstance(texts, pd.Series):
        texts = texts.tolist()
    elif isinstance(texts, str):
        texts = [texts]

    return model.predict(texts)


In [None]:
client_messages = [
    "Urgent! Claim your free voucher now",
    "Please confirm the meeting schedule",
    "You have been selected for a prize",
    "Let's discuss the project tomorrow"
]

client_predictions = predict_text(model, client_messages)
client_predictions


array(['spam', 'ham', 'ham', 'ham'], dtype='<U4')

## Model Performance Summary

- Task: Text Classification (Spam Detection)
- Algorithm: Multinomial Naive Bayes
- Feature Engineering: TF-IDF
- Accuracy: ~96%

This model can be retrained on new datasets and used
to classify incoming text messages automatically.


## Freelance Service Description

I provide machine learning solutions for text classification
such as spam detection and message filtering.

Deliverables:
- Trained ML model
- Reusable prediction functions
- Accuracy evaluation
- Clean Python code

Tools:
Python, NLP, scikit-learn


# -----Day 2 -----
# Day 3 MONETIZE & Multi-service scaling


# Freelance AI Services from One Model

## Service 1: Spam Detection
Classify messages as spam or legitimate.

## Service 2: Sentiment Analysis
Classify text as positive or negative.

## Service 3: Feedback Categorization
Categorize customer messages into predefined classes.


In [None]:
# Client A: Spam detection
spam_input = "Win cash prizes now"
predict_text(model, spam_input)

# Client B: Sentiment analysis
sentiment_input = "I am very happy with your service"
predict_text(model, sentiment_input)

# Client C: Feedback categorization
feedback_input = "The delivery was late and support did not reply"
predict_text(model, feedback_input)


array(['ham'], dtype='<U4')

## Pricing Strategy (Entry Level)

- Simple text classifier (client dataset): $30–$50
- Model retraining + evaluation: $50–$100
- Automation / batch processing: $100+

Focus: small businesses, startups, solo founders


# -----Day 3------
## DAY 4 — AUTOMATION & DELIVERY

In [53]:
import pandas as pd
# handles CSV input/output
from sklearn.feature_extraction.text import TfidfVectorizer
# converts text into numeric vectors for ML
from sklearn.naive_bayes import MultinomialNB
# simple, reliable text classifier
from sklearn.pipeline import Pipeline
# chains vectorizer + classifier for cleaner code
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from google.colab import files
# allows downloading CSV directly from Colab


In [54]:
# Sample messages
data = {
    "text": [
        "Win a free prize now!",
        "Please confirm the meeting schedule.",
        "I love your product!",
        "Delivery was late and support didn’t reply.",
        "Congratulations! You won a voucher",
        "Meeting postponed to tomorrow",
        "Claim your free gift today"
    ]
}

# Convert to DataFrame
df = pd.DataFrame(data)

# Save CSV
df.to_csv("client_messages.csv", index=False)
print("Sample CSV created: client_messages.csv")


Sample CSV created: client_messages.csv


In [55]:
client_data = pd.read_csv("client_messages.csv")
client_data.head()


Unnamed: 0,text
0,Win a free prize now!
1,Please confirm the meeting schedule.
2,I love your product!
3,Delivery was late and support didn’t reply.
4,Congratulations! You won a voucher


In [56]:
# Training function
def train_text_classifier(texts, labels):
    model = Pipeline([
        ('tfidf', TfidfVectorizer()),
        ('clf', MultinomialNB())
    ])
    model.fit(texts, labels)
    return model

# Robust prediction function
def predict_text(model, texts):
    if texts is None or len(texts) == 0:
        return []
    if isinstance(texts, pd.Series):
        texts = texts.tolist()
    elif isinstance(texts, str):
        texts = [texts]
    return model.predict(texts)


In [57]:
# Example training dataset
# can replace with actual client-provided CSV
train_data = {
    "text": [
        "Win money now!",
        "Free gift, claim today",
        "Meeting at 10am",
        "Project update needed",
        "Congratulations! You have won",
        "Please review the attached report"
    ],
    "label": ["spam", "spam", "ham", "ham", "spam", "ham"]
}

train_df = pd.DataFrame(train_data)
X_train = train_df['text']
y_train = train_df['label']

# Train the model
model = train_text_classifier(X_train, y_train)


In [58]:
client_data['prediction'] = predict_text(model, client_data['text'])
client_data.head()


Unnamed: 0,text,prediction
0,Win a free prize now!,spam
1,Please confirm the meeting schedule.,ham
2,I love your product!,ham
3,Delivery was late and support didn’t reply.,ham
4,Congratulations! You won a voucher,spam


In [59]:
client_data.to_csv("client_predictions.csv", index=False)
print("Predictions saved: client_predictions.csv")


Predictions saved: client_predictions.csv


          ┌───────────────┐
          │ Day 1: Model  │
          │ Baseline      │
          └───────────────┘
                   │
                   ▼
          ┌─────────────────┐
          │ Day 2: Functions│
          │ train & predict │
          └─────────────────┘
                   │
                   ▼
          ┌─────────────────┐
          │ Day 3: Services │
          │ Spam / Sentiment│
          │ Feedback        │
          └─────────────────┘
                   │
                   ▼
          ┌──────────────────────┐
          │ Day 4: Automation    │
          │ CSV in → Predict →   │
          │ CSV out → Deliver    │
          └──────────────────────┘
                   │
                   ▼
          ┌─────────────────────────────┐
          │ Deliverables / Money        │
          │ - client_predictions.csv    │
          │ - Notebook (.ipynb)        │
          │ - Proposal / Instructions   │
          └─────────────────────────────┘


# -----Day 4-----

