### Plan of attack
* we will build a simple neural network
* train it on realworld dataset
* will mimin the pytorch workflow / training pipeline
* will have a lot of manual elements(loss fn, gradient descent)
* end result if not important

### Code flow


1.   load the dataset
2.    basic preprocessing
3.    training process
      * create the model
      * forward pass
      * loss calculation
      * backprop
      * parameters update (via gradient descent)
4. model evaluation (accuracy)



## 1. Load the Dataset

In [167]:
import numpy as np
import pandas as pd
import torch
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import LabelEncoder

In [168]:
df = pd.read_csv('https://raw.githubusercontent.com/gscdit/Breast-Cancer-Detection/refs/heads/master/data.csv')
df.head()

Unnamed: 0,id,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,...,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst,Unnamed: 32
0,842302,M,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,...,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,
1,842517,M,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,...,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902,
2,84300903,M,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,...,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758,
3,84348301,M,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,...,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173,
4,84358402,M,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,...,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678,


In [169]:
df.shape

(569, 33)

In [170]:
# drop the faltu columns
df.drop(columns = ["id", "Unnamed: 32"], inplace=True)

In [171]:
df.head()

Unnamed: 0,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,symmetry_mean,...,radius_worst,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst
0,M,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,...,25.38,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189
1,M,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,...,24.99,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902
2,M,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,...,23.57,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758
3,M,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,...,14.91,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173
4,M,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,...,22.54,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678


In [172]:
# train test split

X_train, X_test, y_train, y_test = train_test_split(df.iloc[:, 1:], df.iloc[:, 0], test_size=0.2)
"""
1. df.iloc[:, 1:] This means: “take all rows (:) and all columns from index 1 onward (1:)”. So these are your features (X) — all columns except the first one.

2. df.iloc[:, 0] This means: “take all rows and only the first column (0)”. So this is your target (y)."""

"""X_train	Training features (80%)
X_test	Testing features (20%)
y_train	Training labels (80%)
y_test	Testing labels (20%)"""


print("Train shape:", X_train.shape, y_train.shape)
print("Test shape:", X_test.shape, y_test.shape)

Train shape: (455, 30) (455,)
Test shape: (114, 30) (114,)


## 2. Basic PreProcessing

In [173]:
#scaling

from sklearn.preprocessing import StandardScaler

scalar = StandardScaler()

# fit on training data (learn mean and std) and transform it
X_train = scalar.fit_transform(X_train)

# use the same mean and std from training to scale test data (no fitting again)
X_test = scalar.transform(X_test)

# we scale only X (features), not y (target)
# because y is the label — scaling it would change its meaning


In [174]:
X_test

array([[ 0.08330513,  0.1125054 ,  0.10232362, ...,  0.63699692,
        -0.30029344,  0.48742853],
       [-0.40259133,  1.04169601, -0.37541309, ...,  0.86386419,
         0.7670932 ,  0.99623235],
       [-0.31881608,  0.0166184 , -0.30126057, ..., -0.23321187,
         1.10897794,  0.3880071 ],
       ...,
       [-0.7265223 , -0.99247803, -0.7328039 , ..., -0.27077267,
        -0.38696845, -0.3334629 ],
       [ 0.59712667,  0.59422338,  0.62990053, ..., -0.14126301,
        -0.27140177, -0.25690308],
       [ 1.29804627,  0.49148732,  1.25391624, ...,  1.36267171,
         1.24220064,  0.76229956]])

In [175]:
X_train

array([[-0.57572685, -0.08611766, -0.61975171, ..., -0.94912085,
        -0.04026842, -0.18778657],
       [ 0.3067058 ,  2.58958601,  0.46376648, ...,  1.82542086,
         1.90028864,  3.0059277 ],
       [-0.71255976, -0.05415533, -0.71943214, ..., -0.71308874,
         0.20852094, -0.08092182],
       ...,
       [-0.69859722, -0.25049536, -0.63109745, ...,  0.4131345 ,
        -0.39338882,  0.4433003 ],
       [ 1.5102769 ,  3.01194538,  1.4605708 , ...,  0.72714285,
        -0.3083189 , -0.46159427],
       [ 0.28157322,  2.4046611 ,  0.19268433, ..., -0.72646039,
         0.5263293 , -1.20752087]])

In [176]:
# Label Encoding

encoder = LabelEncoder()

# fit on training labels (learn label mapping) and transform them to numbers
y_train = encoder.fit_transform(y_train)

# use the same mapping to transform test labels (no fitting again)
y_test = encoder.transform(y_test)

# we encode only y (target) because models need numeric labels, not text
# we don't encode X here because feature columns may need different encoders

In [177]:
y_train #binary classification problem

array([0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0,
       0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1,
       0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0,
       1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1,
       0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0,
       1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0,
       1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1,
       1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0,
       0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1,
       0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0,
       1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1,
       0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1,
       1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0,

In [178]:
y_test

array([1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0,
       0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1,
       1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1,
       1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0,
       1, 0, 1, 1])

## Numpy to Tensor

In [179]:
# Numpy arrays to PyTorch tensors
X_train_tensor = torch.from_numpy(X_train)
X_test_tensor = torch.from_numpy(X_test)
y_train_tensor = torch.from_numpy(y_train)
y_test_tensor = torch.from_numpy(y_test)


In [180]:
X_train_tensor

tensor([[-0.5757, -0.0861, -0.6198,  ..., -0.9491, -0.0403, -0.1878],
        [ 0.3067,  2.5896,  0.4638,  ...,  1.8254,  1.9003,  3.0059],
        [-0.7126, -0.0542, -0.7194,  ..., -0.7131,  0.2085, -0.0809],
        ...,
        [-0.6986, -0.2505, -0.6311,  ...,  0.4131, -0.3934,  0.4433],
        [ 1.5103,  3.0119,  1.4606,  ...,  0.7271, -0.3083, -0.4616],
        [ 0.2816,  2.4047,  0.1927,  ..., -0.7265,  0.5263, -1.2075]],
       dtype=torch.float64)

In [181]:
y_train_tensor.shape

torch.Size([455])

## 3. Training Process

### **Defining the Model**

In [182]:
# Defining a simple neural network class
class MySimpleNN():
    def __init__(self, X):
        # X is the input data (rows = samples, columns = features)
        # X.shape[1] gives number of features in the dataset

        # random weights for each feature (learned during training)
        self.weights = torch.rand(X.shape[1], 1, dtype=torch.float64, requires_grad=True)

        # bias term (a constant added to every prediction)
        self.bias = torch.zeros(1, dtype=torch.float64, requires_grad=True)


    def forward(self, X):
        # linear combination: X * weights + bias
        z = torch.matmul(X, self.weights) + self.bias

        # apply sigmoid to convert values between 0 and 1 (for binary output)
        y_pred = torch.sigmoid(z)
        return y_pred


    def binary_cross_entropy(self, y_pred, y):
        # small constant to avoid log(0) errors
        epsilon = 1e-7

        # keep predictions within safe range (between epsilon and 1 - epsilon)
        y_pred = torch.clamp(y_pred, epsilon, 1 - epsilon)

        # calculate binary cross-entropy loss (average difference between true and predicted)
        loss = -(y * torch.log(y_pred) + (1 - y) * torch.log(1 - y_pred)).mean()
        return loss


* weights: random numbers that show how important each feature is.
* bias: a small offset value that adjusts all predictions.

* Both have requires_grad=True so PyTorch can update them automatically during training.

* forward method:
    - This is how the model makes predictions.

    - It multiplies input X by weights, adds bias (X * w + b), and passes the result through sigmoid, which squashes the output between 0 and 1 (so it looks like a probability).

* binary_cross_entropy method:
    - Calculates how far predictions are from real labels (y).

    - If predictions are close to actual values, loss is small.

    - If predictions are wrong, loss is large.

    - The goal during training is to minimize this loss by updating weights and bias.

### **Important Parameter**

In [183]:
#Important Parameters

learning_rate = 0.1
epochs = 25 # how many runs to take on data

### **Training Pipeline**

In [184]:
# create model

model = MySimpleNN(X_train_tensor)
model.weights


tensor([[0.8219],
        [0.1699],
        [0.5564],
        [0.2992],
        [0.8904],
        [0.7086],
        [0.9997],
        [0.4078],
        [0.0229],
        [0.6993],
        [0.8310],
        [0.3513],
        [0.9888],
        [0.3735],
        [0.7461],
        [0.7181],
        [0.7233],
        [0.8125],
        [0.0871],
        [0.1938],
        [0.9208],
        [0.6951],
        [0.9867],
        [0.1148],
        [0.1884],
        [0.4126],
        [0.9454],
        [0.2758],
        [0.5697],
        [0.0763]], dtype=torch.float64, requires_grad=True)

In [185]:
model.bias

tensor([0.], dtype=torch.float64, requires_grad=True)

### **Train the Model**

In [186]:
# define training loop
for epoch in range(epochs):

    # forward pass: compute predictions using current weights and bias
    y_pred = model.forward(X_train_tensor)

    # calculate loss between predicted values and actual labels
    loss = model.binary_cross_entropy(y_pred, y_train_tensor)

    # backward pass: compute gradients (derivatives of loss w.r.t weights & bias)
    loss.backward()

    # update weights and bias manually using gradient descent
    with torch.no_grad():  # disable gradient tracking during update
        model.weights -= learning_rate * model.weights.grad  # update weights
        model.bias -= learning_rate * model.bias.grad        # update bias (note: small typo fixed here)

    # reset gradients to zero before next iteration (to avoid accumulation)
    model.weights.grad.zero_()
    model.bias.grad.zero_()

    # print loss for each epoch to monitor training progress
    print(f"epoch: {epoch+1}, loss: {loss.item()}")


epoch: 1, loss: 4.102417219475644
epoch: 2, loss: 3.996963774187573
epoch: 3, loss: 3.8877317441915307
epoch: 4, loss: 3.773800106089673
epoch: 5, loss: 3.655241702106996
epoch: 6, loss: 3.530442683827077
epoch: 7, loss: 3.3943395708990014
epoch: 8, loss: 3.2497127298762916
epoch: 9, loss: 3.1032498518921754
epoch: 10, loss: 2.947994960244552
epoch: 11, loss: 2.7906629857406626
epoch: 12, loss: 2.632475087548874
epoch: 13, loss: 2.467863697254416
epoch: 14, loss: 2.2967710454071435
epoch: 15, loss: 2.128450604493405
epoch: 16, loss: 1.9614394163473579
epoch: 17, loss: 1.8006936006211869
epoch: 18, loss: 1.6450167775537923
epoch: 19, loss: 1.499084432663755
epoch: 20, loss: 1.3566733350596891
epoch: 21, loss: 1.2275414677062533
epoch: 22, loss: 1.116226079226674
epoch: 23, loss: 1.0238653080209135
epoch: 24, loss: 0.9503956505378491
epoch: 25, loss: 0.8943310629037496


* This loop trains your model step by step for a given number of epochs (full passes through the dataset).

* Forward pass
    - The model predicts outputs (y_pred) using current weights and bias.

    - forward() does X * w + b and applies sigmoid.

* Loss calculation

    - Measures how far predictions are from actual labels (y_train_tensor).

    - Lower loss means better predictions.

* Backward pass

    - loss.backward() tells PyTorch to calculate gradients — how much each weight/bias contributed to the error.

    - These gradients are stored in model.weights.grad and model.bias.grad.

* Parameter update

    - Inside torch.no_grad() (so PyTorch doesn’t track this step), we adjust weights and bias in the opposite direction of the gradient to reduce the loss.

    - new_value = old_value - learning_rate * gradient

* Zeroing gradients

    - Gradients accumulate by default in PyTorch, so we must clear them after each update using .zero_().

* Print loss

    - Shows how much the model is improving after each epoch.

    - As training continues, loss should decrease.

In [187]:
model.weights # this will better we started with random weights

tensor([[ 0.2891],
        [-0.1829],
        [-0.0020],
        [-0.2218],
        [ 0.4353],
        [ 0.0079],
        [ 0.2802],
        [-0.3130],
        [-0.3921],
        [ 0.4147],
        [ 0.3113],
        [ 0.1711],
        [ 0.4680],
        [-0.0671],
        [ 0.5355],
        [ 0.1482],
        [ 0.2420],
        [ 0.1863],
        [-0.0780],
        [-0.2289],
        [ 0.3431],
        [ 0.3352],
        [ 0.3898],
        [-0.4369],
        [-0.2408],
        [-0.1955],
        [ 0.2844],
        [-0.4367],
        [ 0.2335],
        [-0.3789]], dtype=torch.float64, requires_grad=True)

In [188]:
model.bias # we started with 1

tensor([-0.0915], dtype=torch.float64, requires_grad=True)

### **Evaluation**

In [189]:
# model evaluation (testing phase)
with torch.no_grad():  # disable gradient tracking (we are not training now)

    # get model predictions for test data
    y_pred = model.forward(X_test_tensor)

    # convert probabilities to binary values (0 or 1)
    y_pred = (y_pred > 0.5).float()

    # compare predictions with actual test labels and find mean accuracy
    accuracy = (y_pred == y_test_tensor).float().mean()

    # print final accuracy value
    print(f'Accuracy: {accuracy.item()}')


Accuracy: 0.5258541107177734


* This code checks how well your trained model performs on the test data.

* with torch.no_grad():

    - Tells PyTorch not to track gradients because we’re only testing, not training.

    - This makes evaluation faster and saves memory.

* model.forward(X_test_tensor)

    - Runs the model on unseen data to get predicted probabilities (values between 0 and 1).

    - (y_pred > 0.5).float() : Converts probabilities into actual class predictions:

    - 0.5 → 1 (positive class)

    - ≤ 0.5 → 0 (negative class)

* (y_pred == y_test_tensor).float().mean()

    - Compares predicted labels to true labels.

    - Converts boolean values (True/False) into 1.0 or 0.0.

    - Takes the average — this gives accuracy (percentage of correct predictions).

* print(f'Accuracy: {accuracy.item()}')

    - Prints the final accuracy score as a number.