# ML Libraries Overview
In this notebook, we'll explore three popular machine learning libraries: **scikit-learn**, **TensorFlow**, and **PyTorch**.
We'll discuss when to choose each, how to set up the environment, and run minimal working demos.

## 1. When to choose which library
| Feature                  | Scikit-learn | TensorFlow | PyTorch |
|--------------------------|--------------|------------|---------|
| **Best for**             | Classical ML | Deep Learning | Deep Learning |
| **Model complexity**     | Simple to moderate | Complex neural nets | Complex neural nets |
| **Ease of use**          | Very easy | Medium | Easy-medium |
| **Speed on GPU**         | Limited | Excellent | Excellent |
| **Ecosystem**            | Data preprocessing, model selection | End-to-end DL, production ready | Research, experimentation |

**Rule of thumb:**
- If you’re doing **logistic regression, random forests, SVMs** → use **scikit-learn**.
- If you’re training **neural networks for production** → use **TensorFlow**.
- If you’re doing **deep learning research / custom architectures** → use **PyTorch**.

## 2. Environment Setup

In [1]:
!pip install scikit-learn tensorflow torch --quiet

In [2]:
import sklearn, tensorflow as tf, torch
print('scikit-learn:', sklearn.__version__)
print('TensorFlow:', tf.__version__)
print('PyTorch:', torch.__version__)

scikit-learn: 1.7.1
TensorFlow: 2.20.0
PyTorch: 2.8.0+cpu


## 3. Demo: Scikit-learn – Logistic Regression

In [3]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import LinearRegression
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import mean_squared_error, r2_score
import pandas as pd
import seaborn as sns

In [4]:
# Load dataset
df = sns.load_dataset('iris')
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


In [None]:
df.head()
# #model 1 (logistic regression)
X = df.drop('species', axis=1)
y = LabelEncoder().fit_transform(df["species"])

#model 2 (linear regression)
X = df.drop(columns=["petal_length", "species"])
y = df["petal_length"]


print(X.shape)
print(y.shape)

(150, 3)
(150,)


X = df.drop('species', axis=1): this will form a numby array with first four columns

y = LabelEncoder().fit_transform(df["species"]): : this will form a numby array with last column

X → Features (numeric values describing each flower, shape (150, 4) — 150 samples, 4 features).

y → Target labels (0, 1, 2 for different flower species).

In [6]:
# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


X_train → Training features.

X_test → Testing features.

y_train → Training labels.

y_test → Testing labels.

In [7]:
# Model 1 (logistic regression)
model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)



ValueError: Unknown label type: continuous. Maybe you are trying to fit a classifier, which expects discrete classes on a regression target with continuous values.

max_iter=200 → Sets the maximum number of optimization iterations (default can be too small for convergence, so we increase it).

.fit() → Trains the logistic regression model by finding the best weights for classification.

The model learns patterns in X_train that correspond to the correct labels y_train.

In [None]:
# Predict & evaluate (logistic regression)
y_pred = model.predict(X_test)
print('Accuracy:', accuracy_score(y_test, y_pred))

Accuracy: 1.0


In [None]:
model = LinearRegression()
model.fit(X_train, y_train)

.predict(X_test) → Uses the trained model to predict labels for unseen test data.

Produces y_pred — an array of predicted class labels.

accuracy_score(y_test, y_pred) → Calculates the fraction of correct predictions (values between 0 and 1).

Mean Squared Error (MSE) is the most common loss function used in linear regression. It measures how far the predicted values are from the true values by squaring the difference and then averaging over all data points.

Formula:
MSE = (1/n) * Σ (y_true - y_pred)²

Where

n = number of samples

y_true = actual target values

y_pred = predicted target values

Why squared?

Squaring ensures all errors are positive.

It penalizes larger errors more strongly than smaller ones.

In [None]:
y_pred = model.predict(X_test)


print("Mean Squared Error:", mean_squared_error(y_test, y_pred))


Mean Squared Error: 0.13001626031382688


# PyTorch vs TensorFlow: Linear Regression with Gradients

We use a simple **linear regression** model:
\[
$$ y \approx wX + b $$
\]
---

## 🔹 What are Gradients?
- The **loss** is the Mean Squared Error (MSE):

\[
$$ L = \frac{1}{N} \sum (y - (wX+b))^2 $$
\]

- We want **derivatives**:
  - Gradient wrt `w`:  \($\frac{\partial L}{\partial w}$\)  
  - Gradient wrt `b`:  \($\frac{\partial L}{\partial b}$\)

These gradients tell us how much the loss changes when we tweak `w` or `b`.  
They are essential for optimization (gradient descent).

---


In [None]:
import torch

# Random data
X = torch.rand(100, 1)
y = 3*X + 2 + 0.1*torch.randn(100, 1)

# Parameters (with gradients enabled)
w = torch.randn(1, requires_grad=True)
b = torch.randn(1, requires_grad=True)

# Prediction
y_pred = w*X + b

# Mean Squared Error
loss = torch.mean((y - y_pred)**2)

# Backprop (compute gradients)
loss.backward()

print("Loss:", loss.item())
print("Gradient w:", w.grad.item(), "Gradient b:", b.grad.item())

Loss: 5.4809041023254395
Gradient w: -2.6412487030029297 Gradient b: -4.638677597045898


In [None]:
import tensorflow as tf

# Random data
X = tf.random.uniform((100, 1))
y = 3*X + 2 + 0.1*tf.random.normal((100, 1))

# Parameters as Variables
w = tf.Variable(tf.random.normal((1,)))
b = tf.Variable(tf.random.normal((1,)))

# GradientTape to record operations
with tf.GradientTape() as tape:
    y_pred = w*X + b
    loss = tf.reduce_mean((y - y_pred)**2)

# Compute gradients wrt w and b
grads = tape.gradient(loss, [w, b])
print("Loss:", loss.numpy())
print("Gradient w:", grads[0].numpy(), "Gradient b:", grads[1].numpy())

Loss: 8.380713
Gradient w: [-3.2505088] Gradient b: [-5.4442644]


## 4. Demo: TensorFlow – Simple Neural Network

In [None]:
import tensorflow as tf
from tensorflow.keras import layers, models
import numpy as np

# Random data: 100 samples, 10 features
X = np.random.rand(100, 10)
y = np.random.randint(0, 2, size=(100,))

np.random.rand(100, 10) → Creates a NumPy array of shape (100, 10) with random values between 0 and 1.

Simulates 100 samples, each with 10 features.

np.random.randint(0, 2, size=(100,)) → Creates an array of random integers 0 or 1, shape (100,).

This simulates binary classification labels (0 = class A, 1 = class B).

In [None]:
# Build model
model = models.Sequential([
    layers.Dense(16, activation='relu', input_shape=(10,)),
    layers.Dense(1, activation='sigmoid')
])

models.Sequential([...]) → Creates a Sequential model, meaning layers are stacked one after the other.

layers.Dense(16, activation='relu', input_shape=(10,)) →

Fully connected layer with 16 neurons.

activation='relu' → Rectified Linear Unit activation, sets negative outputs to 0.

input_shape=(10,) → The model expects 10 features for each sample.

layers.Dense(1, activation='sigmoid') → Output layer:

1 neuron → outputs a single probability.

Sigmoid activation squashes output between 0 and 1, suitable for binary classification.

In [None]:
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

optimizer='adam' → Optimization algorithm that adjusts weights using gradients (fast & adaptive learning rates).

loss='binary_crossentropy' → Loss function for binary classification using probabilities.

metrics=['accuracy'] → We want to track accuracy during training.

**Forward Pass**

For each batch of data, TensorFlow runs the model’s call()/forward() function to compute predictions (y_pred).

It applies the layers in sequence (Dense, ReLU, etc.).

**Loss Computation**

Compares predictions (y_pred) against ground truth (y) using the loss function defined in model.compile(...).

**Gradient Calculation (Backpropagation)**

Uses autograd (via tf.GradientTape) to automatically compute gradients of the loss w.r.t model parameters.

**Optimizer Step**

Updates weights using the optimizer (e.g., SGD, Adam) chosen in compile.


**Metrics Tracking**

Evaluates accuracy, loss, or other metrics on each batch/epoch if specified.

**Looping**

Repeats this process for each batch in each epoch.

In [None]:
# Train
model.fit(X, y, epochs=5, batch_size=8)

epochs=5 → Go through the entire dataset 5 times.

batch_size=8 → Process data in batches of 8 samples before updating weights.

In [None]:
loss, accuracy = model.evaluate(X, y, verbose=0)
print("Loss:", loss)
print("Accuracy:", accuracy)

In [None]:
# Predict probabilities
y_pred_prob = model.predict(X)

# Convert probabilities to class labels (if classification)
import numpy as np
y_pred = np.argmax(y_pred_prob, axis=1)

print("Predicted classes:", y_pred[:10])

## 5. Demo: PyTorch – Simple Neural Network

torch – the main PyTorch package for tensors and core operations.

torch.nn – contains tools to build neural networks (layers, activations, loss functions).

torch.optim – contains optimization algorithms like Adam, SGD, RMSP

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim

In [None]:
# Random data
X = torch.rand(100, 10)
y = torch.randint(0, 2, (100,), dtype=torch.float32)

torch.rand(100, 10) → Creates a tensor with shape (100, 10) filled with random numbers between 0 and 1.

Simulates 100 samples, each with 10 features.

torch.randint(0, 2, (100,), dtype=torch.float32) → Creates 100 random integers (0 or 1) as the target labels, then converts them to floating-point numbers (needed for our loss function later).

This simulates binary classification labels.

In [None]:
# Model
class SimpleNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(10, 16)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(16, 1)
        self.sigmoid = nn.Sigmoid()
    def forward(self, x):
        return self.sigmoid(self.fc2(self.relu(self.fc1(x))))
# Defines how data flows through the network:
# Input x goes to fc1 layer.
# Apply ReLU activation.
# Output goes to fc2 layer.
# Apply Sigmoid to get probability between 0 and 1.

class SimpleNN(nn.Module) → Defines our neural network architecture by extending PyTorch’s nn.Module.

super().__init__() → Initializes the parent nn.Module class.

self.fc1 = nn.Linear(10, 16) → First fully connected layer: takes 10 input features and outputs 16 neurons.

self.relu = nn.ReLU() → Activation function: Rectified Linear Unit (sets negative values to 0).

self.fc2 = nn.Linear(16, 1) → Second fully connected layer: takes 16 inputs and outputs 1 neuron (for binary classification).

self.sigmoid = nn.Sigmoid() → Activation function that outputs a value between 0 and 1 (probability).

In [None]:
model = SimpleNN()
criterion = nn.BCELoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

model = SimpleNN() → Creates an instance of our network.

criterion = nn.BCELoss() → Binary Cross-Entropy loss — good for binary classification with probabilities.

optimizer = optim.Adam(model.parameters(), lr=0.01) → Adam optimizer updates the network weights during training with a learning rate of 0.01.

In [None]:
# Train loop
for epoch in range(5):
    optimizer.zero_grad()
    outputs = model(X).squeeze()
    loss = criterion(outputs, y)
    loss.backward()
    optimizer.step()
    print(f'Epoch {epoch+1}, Loss: {loss.item():.4f}')

optimizer.zero_grad() → Resets gradient values to zero before backpropagation (otherwise PyTorch accumulates them).

outputs = model(X).squeeze() → Passes X through the network to get predictions, and .squeeze() removes extra dimensions from (100, 1) → (100,).

loss = criterion(outputs, y) → Calculates the Binary Cross-Entropy loss between predictions and true labels.

loss.backward() → Computes the gradients of the loss w.r.t. all learnable parameters.

optimizer.step() → Updates model weights using computed gradients.

## 6. Summary
- **scikit-learn** → Best for small to medium datasets, quick models, classical algorithms.
- **TensorFlow** → Best for scalable deep learning models, production deployment.
- **PyTorch** → Best for flexible, experimental deep learning research.