<a href="https://colab.research.google.com/github/Ishita-01/UCS761_Deep_Learning/blob/main/Lab3_glassDataset.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Loading the dataset & exploring data

In [4]:
import pandas as pd

df = pd.read_csv("glass.csv")
print(df.shape)
print(df.columns)
df.head()


(214, 10)
Index(['RI', 'Na', 'Mg', 'Al', 'Si', 'K', 'Ca', 'Ba', 'Fe', 'Type'], dtype='object')


Unnamed: 0,RI,Na,Mg,Al,Si,K,Ca,Ba,Fe,Type
0,1.52101,13.64,4.49,1.1,71.78,0.06,8.75,0.0,0.0,1
1,1.51761,13.89,3.6,1.36,72.73,0.48,7.83,0.0,0.0,1
2,1.51618,13.53,3.55,1.54,72.99,0.39,7.78,0.0,0.0,1
3,1.51766,13.21,3.69,1.29,72.61,0.57,8.22,0.0,0.0,1
4,1.51742,13.27,3.62,1.24,73.08,0.55,8.07,0.0,0.0,1


## Creating binary labels

In [5]:
df["y"] = (df["Type"] == 1).astype(int)
df = df.drop(columns=["Type"])


## Separating X and y

In [6]:
import numpy as np

X = df.drop(columns=["y"]).values
y = df["y"].values


## Train–test split
Divides data into training and testing sets.
This step is needed to evaluate performance on unseen data and avoid overfitting

In [7]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)


## Feature scaling
Standardizes all features to similar ranges.
This step is needed as it prevents sigmoid saturation and ensures stable, faster learning.

In [8]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)


## Sigmoid Function


In [9]:
def sigmoid(z):
    return 1 / (1 + np.exp(-z))


## Forward computation
Computes weighted sum and converts it into probability.
This is the core prediction mechanism of logistic regression.

In [10]:
def predict_proba(X, w, b):
    z = X @ w + b
    p = sigmoid(z)
    return p


## Loss Function
Measures how wrong the predicted probabilities are.

Guides the learning process by penalizing confident wrong predictions more.

In [11]:
def loss(y, p):
    return -np.mean(y*np.log(p + 1e-9) + (1-y)*np.log(1-p + 1e-9))


## Weight Update
Adjusts weights and bias using gradient descent.

Allows the model to learn from its mistakes and reduce loss.

In [12]:
def update_weights(X, y, w, b, lr):
    p = predict_proba(X, w, b)
    error = p - y

    w = w - lr * (X.T @ error) / len(y)
    b = b - lr * np.mean(error)

    return w, b


## Training Loop

In [13]:
w = np.zeros(X_train.shape[1])
b = 0.0
lr = 0.1
epochs = 100

for i in range(epochs):
    w, b = update_weights(X_train, y_train, w, b, lr)

    if i % 10 == 0:
        p = predict_proba(X_train, w, b)
        print("Epoch", i, "Loss:", loss(y_train, p))


Epoch 0 Loss: 0.6821572814144314
Epoch 10 Loss: 0.6107403564917946
Epoch 20 Loss: 0.574788347949383
Epoch 30 Loss: 0.5528993998426993
Epoch 40 Loss: 0.5379208715556458
Epoch 50 Loss: 0.5269060274508053
Epoch 60 Loss: 0.5184063224566083
Epoch 70 Loss: 0.5116157361449429
Epoch 80 Loss: 0.5060453153729171
Epoch 90 Loss: 0.501379430005224


## Thresholding
Converts probabilities into final class decisions.

 Decision policy depends on application risk, not on the model itself.

In [14]:
def predict_label(p, threshold=0.5):
    return (p >= threshold).astype(int)


In [16]:
p_test = predict_proba(X_test, w, b)

y_pred_05 = predict_label(p_test, 0.5)
y_pred_07 = predict_label(p_test, 0.7)

print(y_pred_05)
print(y_pred_07)

[0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0
 1 0 0 0 0 1]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 1 0 0 0 0 0]


## Why higher threshold is safer in glass quality control

In glass quality control, a false positive (classifying defective or wrong-type glass as acceptable) can lead to serious safety risks and financial loss. By using a higher threshold the model only classifies a sample as “good/target glass” when it is very confident. This reduces the chance of approving wrong or low-quality glass. Although this may increase false negatives (rejecting some good glass), it is safer industries to reject uncertain cases rather than accept them incorrectly.

## One paragraph answering:
* how this differs from perceptron
* why sigmoid matters
* what problem still remains unsolved


Logistic regression differs from the perceptron because it produces probabilities instead of hard class labels. The perceptron uses a step function and only indicates whether a sample belongs to a class or not, while logistic regression uses the sigmoid function, which outputs values between 0 and 1, representing confidence. This allows the use of cross-entropy loss, which measures how wrong a prediction is, rather than just whether it is wrong. Sigmoid is important because it is smooth and differentiable, enabling gradient-based optimization and uncertainty modeling. However, logistic regression still has limitations: it learns only a linear decision boundary, cannot model complex non-linear relationships on its own, and remains sensitive to outliers and feature quality.