# Supervised Learning

Supervised learning is a type of machine learning where the model is trained on labeled data. The goal is to learn a mapping from inputs to outputs, so the model can predict the output for new, unseen inputs.

## Key Concepts
- **Features**: Input variables (X)
- **Labels**: Output variables (y)
- **Training Data**: Data used to train the model
- **Test Data**: Data used to evaluate the model

## Common Algorithms
- Linear Regression
- Logistic Regression
- Decision Trees
- Support Vector Machines
- k-Nearest Neighbors

In this notebook, we'll explore supervised learning with Python examples.

## Types of Supervised Learning

- **Regression**: Predicting continuous values (e.g., house prices)
- **Classification**: Predicting discrete classes (e.g., spam detection)

Let's start with a simple regression example using synthetic data.

In [None]:
# Regression Example: Linear Regression
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

# Generate synthetic data
np.random.seed(0)
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)

# Fit linear regression model
model = LinearRegression()
model.fit(X, y)

y_pred = model.predict(X)

# Plot results
plt.scatter(X, y, color='blue', label='Data')
plt.plot(X, y_pred, color='red', label='Linear Regression')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title('Linear Regression Example')
plt.show()

print(f"Intercept: {model.intercept_[0]:.2f}")
print(f"Slope: {model.coef_[0][0]:.2f}")

## Classification Example

Now let's see a classification example using the famous Iris dataset and logistic regression.

In [None]:
# Classification Example: Logistic Regression on Iris Dataset
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load data
iris = load_iris()
X = iris.data
y = iris.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
clf = LogisticRegression(max_iter=200)
clf.fit(X_train, y_train)

# Predict and evaluate
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

## Model Evaluation

Evaluating supervised learning models is crucial. Common metrics include:
- **Regression**: Mean Squared Error (MSE), R² Score
- **Classification**: Accuracy, Precision, Recall, F1 Score

Let's calculate the MSE for our regression model and the classification report for our classifier.

In [None]:
# Regression Evaluation
from sklearn.metrics import mean_squared_error, r2_score

mse = mean_squared_error(y, y_pred)
r2 = r2_score(y, y_pred)
print(f"Mean Squared Error: {mse:.2f}")
print(f"R² Score: {r2:.2f}")

# Classification Evaluation
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred))

## Decision Trees

Decision Trees are versatile supervised learning models used for both regression and classification. They split data into branches based on feature values, making decisions at each node.

Let's see a classification example using Decision Trees.

In [None]:
# Decision Tree Classification Example
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Use the same Iris dataset
clf_tree = DecisionTreeClassifier(random_state=42)
clf_tree.fit(X_train, y_train)
y_pred_tree = clf_tree.predict(X_test)
accuracy_tree = accuracy_score(y_test, y_pred_tree)
print(f"Decision Tree Accuracy: {accuracy_tree:.2f}")

## k-Nearest Neighbors (k-NN)

k-NN is a simple, instance-based learning algorithm. It classifies new data points based on the majority label among the k closest points in the training set.

Let's see a k-NN classification example.

In [None]:
# k-Nearest Neighbors Classification Example
from sklearn.neighbors import KNeighborsClassifier

knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)
y_pred_knn = knn.predict(X_test)
accuracy_knn = accuracy_score(y_test, y_pred_knn)
print(f"k-NN Accuracy: {accuracy_knn:.2f}")

## Support Vector Machines (SVM)

SVMs are powerful classifiers that find the optimal hyperplane to separate classes in the feature space. They work well for both linear and non-linear classification tasks.

Let's see an SVM classification example.

In [None]:
# Support Vector Machine Classification Example
from sklearn.svm import SVC

svm = SVC(kernel='linear')
svm.fit(X_train, y_train)
y_pred_svm = svm.predict(X_test)
accuracy_svm = accuracy_score(y_test, y_pred_svm)
print(f"SVM Accuracy: {accuracy_svm:.2f}")

## Summary

- Supervised learning uses labeled data to train models for regression and classification tasks.
- We explored linear regression and logistic regression with Python examples.
- Model evaluation is essential to understand performance.

You can now experiment with other algorithms and datasets!