# Introduction to Machine Learning

## üìö Learning Objectives

By completing this notebook, you will:
- Understand the key concepts of this topic
- Apply the topic using Python code examples
- Practice with small, realistic datasets or scenarios

## üîó Prerequisites

- ‚úÖ Basic Python
- ‚úÖ Basic NumPy/Pandas (when applicable)

---

## Official Structure Reference

This notebook supports **Course 01, Unit 2** requirements from `DETAILED_UNIT_DESCRIPTIONS.md`.

---


# Introduction to Machine Learning
# ŸÖŸÇÿØŸÖÿ© ŸÅŸä ÿßŸÑÿ™ÿπŸÑŸÖ ÿßŸÑÿ¢ŸÑŸä

**Unit:** Unit 2: AI Concepts, Terminology, and Application Domains  
**Official Structure:** See `../../../DETAILED_UNIT_DESCRIPTIONS.md` for complete requirements

## üìö Learning Objectives | ÿ£ŸáÿØÿßŸÅ ÿßŸÑÿ™ÿπŸÑŸÖ

By completing this notebook, you will:
- Understand what machine learning is and why it's important
- Learn the three main types of ML: Supervised, Unsupervised, Reinforcement
- Understand features, labels, and data preprocessing
- Learn to encode categorical features
- Explore the data generation process in ML

---


In [1]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.cluster import KMeans

print("=== Machine Learning Types ===")
print("\n1. Supervised Learning:")
print(" - Learn from labeled data")
print(" - Examples: Classification, Regression")
print(" - Input: Features (X) and Labels (y)")
print(" - Output: Predictions for new data")

print("\n2. Unsupervised Learning:")
print(" - Learn from unlabeled data")
print(" - Examples: Clustering, Dimensionality Reduction")
print(" - Input: Features (X) only")
print(" - Output: Patterns, groups, structures")

print("\n3. Reinforcement Learning:")
print(" - Learn through interaction and rewards")
print(" - Examples: Game playing, Robotics")
print(" - Input: State, Action")
print(" - Output: Policy (what action to take)")


=== Machine Learning Types ===

1. Supervised Learning:
 - Learn from labeled data
 - Examples: Classification, Regression
 - Input: Features (X) and Labels (y)
 - Output: Predictions for new data

2. Unsupervised Learning:
 - Learn from unlabeled data
 - Examples: Clustering, Dimensionality Reduction
 - Input: Features (X) only
 - Output: Patterns, groups, structures

3. Reinforcement Learning:
 - Learn through interaction and rewards
 - Examples: Game playing, Robotics
 - Input: State, Action
 - Output: Policy (what action to take)


## Supervised Learning: Features and Labels


In [2]:
import pandas as pd
# Example: House Price Prediction
# Features (X): size, bedrooms, location, age
# Label (y): price

# Create sample dataset
data = {
 'size_sqft': [1200, 1500, 1800, 2000, 2200],
 'bedrooms': [2, 3, 3, 4, 4],
 'location': ['A', 'B', 'A', 'C', 'B'],
 'age_years': [5, 10, 2, 15, 8],
 'price': [150000, 200000, 250000, 280000, 320000]
}

df = pd.DataFrame(data)
print("=== Sample Dataset ===")
print(df)

# Separate features (X) and labels (y)
X = df[['size_sqft', 'bedrooms', 'age_years']]
y = df['price']

print(f"\nFeatures (X) shape: {X.shape}")
print(f"Labels (y) shape: {y.shape}")
print(f"\nFeatures:\n{X}")
print(f"\nLabels:\n{y}")


=== Sample Dataset ===
   size_sqft  bedrooms location  age_years   price
0       1200         2        A          5  150000
1       1500         3        B         10  200000
2       1800         3        A          2  250000
3       2000         4        C         15  280000
4       2200         4        B          8  320000

Features (X) shape: (5, 3)
Labels (y) shape: (5,)

Features:
   size_sqft  bedrooms  age_years
0       1200         2          5
1       1500         3         10
2       1800         3          2
3       2000         4         15
4       2200         4          8

Labels:
0    150000
1    200000
2    250000
3    280000
4    320000
Name: price, dtype: int64


## Encoding Categorical Features


In [3]:
import pandas as pd
from sklearn.preprocessing import LabelEncoder
# Method 1: Label Encoding (for ordinal data)
label_encoder = LabelEncoder()
df['location_encoded'] = label_encoder.fit_transform(df['location'])

print("=== Label Encoding ===")
print(df[['location', 'location_encoded']])

# Method 2: One-Hot Encoding (for nominal data)
onehot_encoder = OneHotEncoder(sparse_output=False)
location_onehot = onehot_encoder.fit_transform(df[['location']])
location_df = pd.DataFrame(
 location_onehot, columns=[f'location_{cat}' for cat in label_encoder.classes_]
)

print("\n=== One-Hot Encoding ===")
print(location_df)

# Combine with original features
X_encoded = pd.concat([X, location_df], axis=1)
print("\n=== Combined Features ===")
print(X_encoded)


=== Label Encoding ===
  location  location_encoded
0        A                 0
1        B                 1
2        A                 0
3        C                 2
4        B                 1

=== One-Hot Encoding ===
   location_A  location_B  location_C
0         1.0         0.0         0.0
1         0.0         1.0         0.0
2         1.0         0.0         0.0
3         0.0         0.0         1.0
4         0.0         1.0         0.0

=== Combined Features ===
   size_sqft  bedrooms  age_years  location_A  location_B  location_C
0       1200         2          5         1.0         0.0         0.0
1       1500         3         10         0.0         1.0         0.0
2       1800         3          2         1.0         0.0         0.0
3       2000         4         15         0.0         0.0         1.0
4       2200         4          8         0.0         1.0         0.0


## Simple Supervised Learning Example


In [4]:
import pandas as pd
# Train a simple linear regression model
model = LinearRegression()
model.fit(X_encoded, y)

# Make predictions
predictions = model.predict(X_encoded)

print("=== Model Training ===")
print(f"Actual prices: {y.values}")
print(f"Predicted prices: {predictions}")
print(f"\nModel coefficients: {model.coef_}")
print(f"Model intercept: {model.intercept_:.2f}")

# Example: Predict price for new house
new_house = pd.DataFrame({
 'size_sqft': [1600], 'bedrooms': [3],
 'age_years': [7],
 'location_A': [1],
 'location_B': [0],
 'location_C': [0]
})

predicted_price = model.predict(new_house)
print(f"\n=== Prediction for New House ===")
print(f"Features: {new_house.iloc[0].to_dict()}")
print(f"Predicted price: ${predicted_price[0]:,.2f}")


=== Model Training ===
Actual prices: [150000 200000 250000 280000 320000]
Predicted prices: [150000. 200000. 250000. 280000. 320000.]

Model coefficients: [   189.97634789 -10978.71309715   1002.36521143   3691.10718475
   2665.08985905  -6356.1970438 ]
Model intercept: -64717.12

=== Prediction for New House ===
Features: {'size_sqft': 1600, 'bedrooms': 3, 'age_years': 7, 'location_A': 1, 'location_B': 0, 'location_C': 0}
Predicted price: $217,016.56


## Simple Unsupervised Learning Example: Clustering


In [5]:
import pandas as pd
# Unsupervised learning: Clustering houses by size and price
# No labels needed!

X_cluster = df[['size_sqft', 'price']].values

# Apply K-Means clustering
kmeans = KMeans(n_clusters=2, random_state=42)
clusters = kmeans.fit_predict(X_cluster)

df['cluster'] = clusters

print("=== Unsupervised Learning: Clustering ===")
print("Grouping houses without labels:")
print(df[['size_sqft', 'price', 'cluster']])

print(f"\nCluster centers:")
for i, center in enumerate(kmeans.cluster_centers_):
 print(f" Cluster {i}: Size={center[0]:.0f} sqft, Price=${center[1]:,.0f}")


=== Unsupervised Learning: Clustering ===
Grouping houses without labels:
   size_sqft   price  cluster
0       1200  150000        0
1       1500  200000        0
2       1800  250000        0
3       2000  280000        1
4       2200  320000        1

Cluster centers:
 Cluster 0: Size=1500 sqft, Price=$200,000
 Cluster 1: Size=2100 sqft, Price=$300,000
