In [None]:
import pandas as pd

df = pd.read_csv("glass.csv")


In [None]:
# no of rows and columns
print('no of rows and columns:',df.shape)

# column names
print('column names:',df.columns)

# first few rows
print('first few rows:')
df.head()


no of rows and columns: (214, 10)
column names: Index(['RI', 'Na', 'Mg', 'Al', 'Si', 'K', 'Ca', 'Ba', 'Fe', 'Type'], dtype='object')
first few rows:


Unnamed: 0,RI,Na,Mg,Al,Si,K,Ca,Ba,Fe,Type
0,1.52101,13.64,4.49,1.1,71.78,0.06,8.75,0.0,0.0,1
1,1.51761,13.89,3.6,1.36,72.73,0.48,7.83,0.0,0.0,1
2,1.51618,13.53,3.55,1.54,72.99,0.39,7.78,0.0,0.0,1
3,1.51766,13.21,3.69,1.29,72.61,0.57,8.22,0.0,0.0,1
4,1.51742,13.27,3.62,1.24,73.08,0.55,8.07,0.0,0.0,1


In [None]:
#create binary labels (Type 1 vs rest)
df["y"] = (df["Type"] == 1).astype(int)

#remove original Type column
df = df.drop(columns=["Type"])


Featureâ€“Label Separation

In [None]:
X = df.drop(columns=["y"]).values
y = df["y"].values

In [None]:
from sklearn.model_selection import train_test_split

#split data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)


**Feature Scaling**    
Standardizes features to have zero mean and unit variance.

In [None]:
from sklearn.preprocessing import StandardScaler

#scale features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)


Sigmoid Function

In [None]:
import numpy as np

def sigmoid(z):
    return 1 / (1 + np.exp(-z))


**Forward Computation**     
Computes evidence score (z) and converts it into probability.   
Uses the input features to calculate how confident the model is in its prediction.

In [None]:
def predict_proba(X, w, b):
    z = X @ w + b
    p = sigmoid(z)

    return p


**Loss Function**   
Measures how incorrect the predicted probabilities are using binary cross-entropy.  
Penalizes confident wrong predictions more than uncertain ones, guiding better learning.

In [None]:
def loss(y, p):
    return -np.mean(y * np.log(p) + (1 - y) * np.log(1 - p))


**Weight Update**    
Adjusts weights and bias using gradient descent.   
Allows the model to reduce prediction error iteratively.

In [None]:
def update_weights(X, y, w, b, lr):
    # compute predictions
    p = predict_proba(X, w, b)

    # compute error
    error = p - y

    # update weights and bias
    w = w - lr * (X.T @ error) / len(y)
    b = b - lr * np.mean(error)

    return w, b


**Training Loop  **
Repeatedly updates parameters over multiple epochs.

In [None]:
w = np.zeros(X_train.shape[1])
b = 0.0

lr = 0.1
epochs = 100

for _ in range(epochs):
    w, b = update_weights(X_train, y_train, w, b, lr)


**Decision Thresholding**  
Converts probabilities into class labels using a chosen threshold.

In [None]:
def predict_label(p, threshold=0.5):
    return (p >= threshold).astype(int)


In [None]:
p_test = predict_proba(X_test, w, b)

y_pred_05 = predict_label(p_test, threshold=0.5)
y_pred_07 = predict_label(p_test, threshold=0.7)

y_pred_05
y_pred_07

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0])

**Why higher threshold is safer in glass quality control?**  
A higher threshold requires stronger confidence before labeling a glass as acceptable.In safety-critical applications like glass manufacturing, false positives (incorrectly accepting faulty glass) are more dangerous than false negatives. A higher threshold helps reduce the chances of wrongly approving faulty glass, even if it means rejecting some good ones.


**One paragraph answering:**
* **how this differs from perceptron**
* **why sigmoid matters**
* **what problem still remains unsolved**

Logistic regression differs from a perceptron because instead of giving a hard yes or no output, it provides a probability that shows how confident the model is about its prediction. The sigmoid function plays an important role here, as it smoothly converts the output into values between 0 and 1, allowing the model to learn gradually using gradient descent rather than making abrupt decisions like the step function. However, one major limitation still remains: logistic regression can only learn linear decision boundaries, so it struggles with complex, non-linear patterns unless additional feature engineering or more advanced models are used.