<a href="https://colab.research.google.com/github/debojit11/ml_nlp_dl_transformers/blob/main/ML_week_01.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Week 1: Introduction to Machine Learning & NLP

=============================================================================================

# **SECTION 1: Welcome & Objectives**

In [1]:
print("Welcome to Week 1!")
print("In this notebook, you'll:")
print("- Understand what ML is")
print("- Learn where NLP fits into ML")
print("- See your first ML model on text data")

Welcome to Week 1!
In this notebook, you'll:
- Understand what ML is
- Learn where NLP fits into ML
- See your first ML model on text data


# **SECTION 2: What is Machine Learning?**

### What is Machine Learning?
Machine Learning (ML) is the science of making computers learn from data without being explicitly programmed.

**Real-life analogy:**
If you show many pictures of cats and dogs to a program, and it learns to tell the difference, that's ML!

### Types of ML:
- **Supervised Learning**: Learn from labeled data (e.g., spam or not spam)
- **Unsupervised Learning**: Find hidden patterns in unlabeled data (e.g., grouping news articles by topic)

We'll begin with supervised learning in NLP!

# **SECTION 3: How NLP fits into ML**

### NLP and ML
Natural Language Processing (NLP) is a field of AI that focuses on enabling machines to understand and process human language.

Common NLP tasks that use ML:
- Spam detection (classification)
- Sentiment analysis (positive/negative)
- Text summarization (sequence generation)
- Language translation

We use ML models to learn patterns from language data (text).

# **SECTION 4: The ML Workflow**

### The ML Workflow
1. **Collect data**: e.g., text messages
2. **Preprocess data**: clean and convert text into numbers
3. **Train model**: use data to learn patterns
4. **Evaluate model**: check performance
5. **Predict**: apply model to new text

Let’s see this in action with a mini NLP task.

# **SECTION 5: Your First ML Model - Spam Detection**

In [3]:
# Import libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report

In [4]:
# Load dataset
!wget -q https://raw.githubusercontent.com/justmarkham/pycon-2016-tutorial/master/data/sms.tsv

In [5]:
sms_df = pd.read_csv("sms.tsv", sep='\t', header=None, names=['label', 'message'])
sms_df.head()

Unnamed: 0,label,message
0,ham,"Go until jurong point, crazy.. Available only ..."
1,ham,Ok lar... Joking wif u oni...
2,spam,Free entry in 2 a wkly comp to win FA Cup fina...
3,ham,U dun say so early hor... U c already then say...
4,ham,"Nah I don't think he goes to usf, he lives aro..."


In [6]:
# Preprocessing
X = sms_df['message']
y = sms_df['label'].map({'ham': 0, 'spam': 1})

In [7]:
# TF-IDF Vectorization
vectorizer = TfidfVectorizer()
X_vec = vectorizer.fit_transform(X)

In [8]:
# Train/Test Split
X_train, X_test, y_train, y_test = train_test_split(X_vec, y, test_size=0.2, random_state=42)

In [9]:
# Train Model
model = LogisticRegression()
model.fit(X_train, y_train)

In [10]:
# Evaluate
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.96      1.00      0.98       966
           1       1.00      0.73      0.84       149

    accuracy                           0.96      1115
   macro avg       0.98      0.87      0.91      1115
weighted avg       0.97      0.96      0.96      1115



In [11]:
# Test on custom messages
sample_messages = ["You won a free lottery ticket! Claim now.", "Let's meet tomorrow for coffee."]
sample_vec = vectorizer.transform(sample_messages)
print(model.predict(sample_vec))

[1 0]


# **SECTION 6: What’s Next?**

### What's Next?
This week you saw how ML can classify messages using Logistic Regression.

But… we used it like a black box. Where did it come from? What if we want to predict numbers?

Next week: **Linear Regression** – the foundation of ML models!

# **SECTION 7: Exercises**

### Exercises:
1. Try changing `test_size` in the `train_test_split`. What happens?
2. Add your own messages to `sample_messages`. Does it classify correctly?
3. Replace `LogisticRegression` with another classifier (e.g., `MultinomialNB`) and compare performance.

Try these and play with the model!