# 🧠 Pneumonia Detection from Chest X-Rays (From Scratch)

This notebook is part of a hands-on project to build a binary image classifier to detect pneumonia from chest X-ray images.  
The aim is to understand and implement the entire pipeline **from scratch**, without using high-level deep learning libraries.

**Objective**:  
- Load and preprocess X-ray images  
- Prepare dataset (train/test split)  
- Build a classifier (initially logistic regression) using only NumPy  
- Evaluate the results and identify future work

📌 This is a learning project inspired by the [Deep Learning Specialization](https://www.coursera.org/specializations/deep-learning) by Andrew Ng.


In [33]:
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
import os
import copy
%matplotlib inline

## 🧼 Image Preprocessing

In this section, we:
- Load X-ray images from the dataset folders
- Resize them to a fixed shape (e.g., 128×128)
- Convert images to grayscale (if needed)
- Normalize pixel values to the range [0, 1]
- Flatten the images for input to the model


In [17]:
def preprocess_image(image_path, img_size=(128,128)): # Resizing, normalizing & flatting the images.
    img = Image.open(image_path).convert("L") # Making sure all are greyscaled
    img = img.resize(img_size) # Resizing
    img_array = np.array(img).flatten().astype(np.float32) / 255.0 # Flatting and  Normalizing
    
    return img_array

In [18]:
train_dir = 'train'
test_dir = 'test'
labels = ['PNEUMONIA', 'NORMAL']
img_size = (128,128)

# --- 2. Corrected Data Loading Function ---
def get_data(data_dir):
    """
    Loads, resizes, and NORMALIZES images and their corresponding labels.
    """
    images = []
    image_labels = []
    for label in labels:
        path = os.path.join(data_dir, label)
        for img_file in os.listdir(path):  # Iterate over all files in the directory
            img_path = os.path.join(path, img_file)  # Full path to the image file
            try:
                img_array = preprocess_image(img_path, img_size)
                images.append(img_array)
                
                # Assign label (0 for PNEUMONIA, 1 for NORMAL)
                if label == "PNEUMONIA":
                    image_labels.append(0)
                else:
                    image_labels.append(1)
            except Exception as e:
                print(f"Error processing file {img_path}: {e}")
    
    return np.array(images), np.array(image_labels)
       

## 🧪 Train-Test Split

We shuffle the dataset and split it into:
- **Training set**: 80%
- **Test set**: 20%

This ensures the model can generalize and be evaluated fairly on unseen images.


In [38]:
X_train, y_train = get_data(train_dir)
X_test, y_test = get_data(test_dir)

#fixing there shapes:
X_train= X_train.T
X_test = X_test.T
y_train = y_train.reshape(1,-1)
y_test = y_test.reshape(1,-1)


print("Training set:", X_train.shape, y_train.shape)
print("Test set:", X_test.shape, y_test.shape)

Training set: (16384, 5216) (1, 5216)
Test set: (16384, 624) (1, 624)


In [39]:
def initialize_parameters_deep(layer_dims):
    """
    Arguments:
    layer_dims -- python array (list) containing the dimensions of each layer in our network
    
    Returns:
    parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":
                    Wl -- weight matrix of shape (layer_dims[l], layer_dims[l-1])
                    bl -- bias vector of shape (layer_dims[l], 1)
    """
    
    parameters = {}
    L = len(layer_dims) # number of layers in the network

    for l in range(1, L):
    
        parameters['W' + str(l)] = np.random.randn(layer_dims[l],layer_dims[l-1])*0.01
        parameters['b' + str(l)] = np.zeros((layer_dims[l],1))

    return parameters

In [40]:
#test
parameters = initialize_parameters_deep([4,3,2])

print("W1 = " + str(parameters["W1"]))
print("b1 = " + str(parameters["b1"]))
print("W2 = " + str(parameters["W2"]))
print("b2 = " + str(parameters["b2"]))

W1 = [[ 0.00011528  0.01581172 -0.01592322  0.00042384]
 [ 0.014572    0.00436423 -0.00257021  0.00623894]
 [ 0.01226694  0.00285889 -0.01077597  0.0034427 ]]
b1 = [[0.]
 [0.]
 [0.]]
W2 = [[ 0.0078564   0.0044265   0.00499304]
 [-0.00642493 -0.00990345 -0.00617311]]
b2 = [[0.]
 [0.]]


In [41]:
def linear_forward(A, W, b):
    """
    Implement the linear part of a layer's forward propagation.

    Arguments:
    A -- activations from previous layer (or input data): (size of previous layer, number of examples)
    W -- weights matrix: numpy array of shape (size of current layer, size of previous layer)
    b -- bias vector, numpy array of shape (size of the current layer, 1)

    Returns:
    Z -- the input of the activation function, also called pre-activation parameter 
    cache -- a python tuple containing "A", "W" and "b" ; stored for computing the backward pass efficiently
    """
    # The pre-activation function
    Z = np.dot(W,A) + b
    # As in the describtion, it is for the backword propagation
    cache = (A, W, b)
    
    return Z, cache

In [54]:
# Logistic Regression sigmoid function
def sigmoid(z):
    """
    Compute the sigmoid of z

    Arguments:
    z -- A scalar or numpy array of any size.

    Return:
    s -- sigmoid(z)
    """

    s = 1/ (1+np.exp(-z))
    return s,z

def relu(z):
    """
    Compute the relu of z

    Arguments:
    z -- A scalar or numpy array of any size.

    Return:
    s -- relu(z)
    """

    s = np.maximum(0, z)
    return s,z

In [55]:
def linear_activation_forward(A_prev, W, b, activation):
    """
    Implement the forward propagation for the LINEAR->ACTIVATION layer

    Arguments:
    A_prev -- activations from previous layer (or input data): (size of previous layer, number of examples)
    W -- weights matrix: numpy array of shape (size of current layer, size of previous layer)
    b -- bias vector, numpy array of shape (size of the current layer, 1)
    activation -- the activation to be used in this layer, stored as a text string: "sigmoid" or "relu"

    Returns:
    A -- the output of the activation function, also called the post-activation value 
    cache -- a python tuple containing "linear_cache" and "activation_cache";
             stored for computing the backward pass efficiently
    """
    
    if activation == "sigmoid":

        Z, linear_cache = linear_forward(A_prev, W, b)
        A, activation_cache = sigmoid(Z)
        
    
    elif activation == "relu":
        Z, linear_cache = linear_forward(A_prev, W, b)
        A, activation_cache = relu(Z)
        
    cache = (linear_cache, activation_cache)

    return A, cache

In [56]:
def L_model_forward(X, parameters):
    """
    Implement forward propagation for the [LINEAR->RELU] for the first (L-1) layers->LINEAR->SIGMOID computation
    
    Arguments:
    X -- data, numpy array of shape (input size, number of examples)
    parameters -- output of initialize_parameters_deep()
    
    Returns:
    AL -- activation value from the output (last) layer
    caches -- list of caches containing:
                every cache of linear_activation_forward() (there are L of them, indexed from 0 to L-1)
    """

    caches = []
    A = X
    L = len(parameters) // 2        # number of layers in the neural network
    
    #[LINEAR -> RELU] for the first (L-1) layers. 

    # The for loop starts at 1 because layer 0 is the input.
    for l in range(1, L):
        A_prev = A 
        A, cache = linear_activation_forward(A_prev, parameters['W' + str(l)], parameters['b' + str(l)], activation="relu")
        caches.append(cache)
        
    
    # LINEAR -> SIGMOID.
    AL, cache = A, cache = linear_activation_forward(A, parameters['W' + str(L)], parameters['b' + str(L)], activation="sigmoid")
    caches.append(cache)
    
          
    return AL, caches

In [57]:
def compute_cost(AL, Y):
    """
    Implement the cost function.

    Arguments:
    AL -- probability vector corresponding to your label predictions, shape (1, number of examples)
    Y -- true "label" vector (ex: 0 for PNEUMONIA, 1 for NORMAL), shape (1, number of examples)

    Returns:
    cost -- cross-entropy cost
    """
    
    m = Y.shape[1]

    # Compute loss from aL and y.
    cost = -(np.sum(Y * np.log(AL) + (1 - Y) * np.log(1 - AL))) / m
    
    
    cost = np.squeeze(cost)     # To make sure your cost's shape is what we expect 
                                # (e.g. this turns [[17]] into 17).

    return cost

In [58]:
def linear_backward(dZ, cache):
    """
    Implement the linear portion of backward propagation for a single layer (layer l)

    Arguments:
    dZ -- Gradient of the cost with respect to the linear output (of current layer l)
    cache -- tuple of values (A_prev, W, b) coming from the forward propagation in the current layer

    Returns:
    dA_prev -- Gradient of the cost with respect to the activation (of the previous layer l-1), same shape as A_prev
    dW -- Gradient of the cost with respect to W (current layer l), same shape as W
    db -- Gradient of the cost with respect to b (current layer l), same shape as b
    """
    A_prev, W, b = cache
    m = A_prev.shape[1]

    dW = (1 / m) * np.dot(dZ, A_prev.T)
    db = (1 / m) * np.sum(dZ, axis=1, keepdims=True) # Sum by rows
    dA_prev = np.dot(W.T, dZ)
    
    return dA_prev, dW, db

In [59]:
# Implementing helper functions (relu_backward and sigmoid backward)

def relu_backward(dA, activation_cache):
    """
    Implement the backward propagation for a single ReLU unit.

    Arguments:
    dA -- post-activation gradient, of any shape
    activation_cache -- 'Z' where we store for computing backward propagation efficiently

    Returns:
    dZ -- Gradient of the cost with respect to Z
    """
    Z = activation_cache
    dZ = np.array(dA, copy=True)  # Just converting dz to a correct object.
    
    # When Z <= 0, set dZ to 0 (derivative of ReLU is 0 for Z <= 0)
    dZ[Z <= 0] = 0
    
    return dZ

def sigmoid_backward(dA, activation_cache):
    """
    Implement the backward propagation for a single Sigmoid unit.

    Arguments:
    dA -- post-activation gradient, of any shape
    activation_cache -- 'Z' where we store for computing backward propagation efficiently

    Returns:
    dZ -- Gradient of the cost with respect to Z
    """
    Z = activation_cache
    
    # Compute the Sigmoid value (A) from Z
    A = 1 / (1 + np.exp(-Z))
    
    # Compute the derivative of the Sigmoid function
    dZ = dA * A * (1 - A)
    
    return dZ    

In [66]:
def linear_activation_backward(dA, cache, activation):
    """
    Implement the backward propagation for the LINEAR->ACTIVATION layer.
    
    Arguments:
    dA -- post-activation gradient for current layer l 
    cache -- tuple of values (linear_cache, activation_cache) we store for computing backward propagation efficiently
    activation -- the activation to be used in this layer, stored as a text string: "sigmoid" or "relu"
    
    Returns:
    dA_prev -- Gradient of the cost with respect to the activation (of the previous layer l-1), same shape as A_prev
    dW -- Gradient of the cost with respect to W (current layer l), same shape as W
    db -- Gradient of the cost with respect to b (current layer l), same shape as b
    """
    linear_cache, activation_cache = cache
    
    if activation == "relu":
        
        dZ = relu_backward(dA, activation_cache)
        dA_prev, dW, db= linear_backward(dZ, linear_cache)
        
    elif activation == "sigmoid":
        
        dZ = sigmoid_backward(dA, activation_cache)
        dA_prev, dW, db= linear_backward(dZ, linear_cache)
        
    
    return dA_prev, dW, db

In [67]:
def L_model_backward(AL, Y, caches):
    """
    Implement the backward propagation for the [LINEAR->RELU] * (L-1) -> LINEAR -> SIGMOID group
    
    Arguments:
    AL -- probability vector, output of the forward propagation (L_model_forward())
    Y -- true "label" vector (containing 0 for PNEUMONIA, 1 for NORMAL)
    caches -- list of caches containing:
                every cache of linear_activation_forward() with "relu" (it's caches[l], for l in range(L-1) i.e l = 0...L-2)
                the cache of linear_activation_forward() with "sigmoid" (it's caches[L-1])
    
    Returns:
    grads -- A dictionary with the gradients
             grads["dA" + str(l)] = ... 
             grads["dW" + str(l)] = ...
             grads["db" + str(l)] = ... 
    """
    grads = {}
    L = len(caches) # the number of layers
    Y = Y.reshape(AL.shape) # after this line, Y is the same shape as AL
    
    # Initializing the backpropagation
    dAL = - (np.divide(Y, AL) - np.divide(1 - Y, 1 - AL))
    
    # Last layer (SIGMOID -> LINEAR) gradients.
    current_cache = caches[L-1]
    dA_prev_temp, dW_temp, db_temp = linear_activation_backward(dAL, current_cache, activation="sigmoid")
    grads["dA" + str(L-1)] = dA_prev_temp
    grads["dW" + str(L)] = dW_temp
    grads["db" + str(L)] = db_temp
    
    
    # Loop from l=L-2 to l=0
    for l in reversed(range(L-1)):
        # lth layer: (RELU -> LINEAR) gradients.
        current_cache = caches[l]
        dA_prev_temp, dW_temp, db_temp = linear_activation_backward(grads["dA" + str(l+1)], current_cache, activation="relu")
        grads["dA" + str(l)] = dA_prev_temp
        grads["dW" + str(l+1)] = dW_temp
        grads["db" + str(l+1)] = db_temp     
        
    return grads

In [68]:
def update_parameters(params, grads, learning_rate):
    """
    Update parameters using gradient descent
    
    Arguments:
    params -- python dictionary containing your parameters 
    grads -- python dictionary containing your gradients, output of L_model_backward
    
    Returns:
    parameters -- python dictionary containing your updated parameters 
                  parameters["W" + str(l)] = ... 
                  parameters["b" + str(l)] = ...
    """
    parameters = copy.deepcopy(params)
    L = len(parameters) // 2 # number of layers in the neural network

    # Update rule for each parameter. Use a for loop.
    for l in range(L):
        parameters["W" + str(l+1)] = parameters["W" + str(l+1)] - learning_rate * grads["dW" + str(l+1)]
        parameters["b" + str(l+1)] = parameters["b" + str(l+1)] - learning_rate * grads["db" + str(l+1)]        
        
    return parameters

In [77]:
def L_layer_model(X, Y, layers_dims, learning_rate = 0.02, num_iterations = 3000, print_cost=False):
    """
    Implements a L-layer neural network: [LINEAR->RELU]*(L-1)->LINEAR->SIGMOID.
    
    Arguments:
    X -- input data, of shape (n_x, number of examples)
    Y -- true "label" vector (containing 0 for PNEUMONIA, 1 for NORMAL), of shape (1, number of examples)
    layers_dims -- list containing the input size and each layer size, of length (number of layers + 1).
    learning_rate -- learning rate of the gradient descent update rule
    num_iterations -- number of iterations of the optimization loop
    print_cost -- if True, it prints the cost every 100 steps
    
    Returns:
    parameters -- parameters learnt by the model. They can then be used to predict.
    """

    costs = []       # keep track of cost
    
    # Parameters initialization.
    parameters = initialize_parameters_deep(layers_dims)
    
    # Loop (gradient descent)
    for i in range(0, num_iterations):

        # Forward propagation: [LINEAR -> RELU] for (L-1) layers -> LINEAR -> SIGMOID.
        AL,caches = L_model_forward(X, parameters)
        
        # Compute cost.
        cost = compute_cost(AL, Y)
    
        # Backward propagation.
        grads = L_model_backward(AL, Y, caches)
 
        # Update parameters.   
        parameters = update_parameters(parameters, grads, learning_rate)
                
        # Print the cost every 10 iterations and for the last iteration (because I run localy)
        if print_cost and (i % 10 == 0 or i == num_iterations - 1):
            print("Cost after iteration {}: {}".format(i, np.squeeze(cost)))
        if i % 10 == 0:
            costs.append(cost)
    
    return parameters, costs

In [None]:
# Not the best archtecture but its enough for the start.
layers_dims = [16384, 128, 64, 32, 1] # 5 layer model

In [79]:
# Only 2000 itr since I run localy XD
parameters, costs = L_layer_model(X_train, y_train, layers_dims, num_iterations = 2500, print_cost = True)

Cost after iteration 0: 0.6931772509297659
Cost after iteration 10: 0.6818933399561413
Cost after iteration 20: 0.6716876903398634
Cost after iteration 30: 0.6624568387450022
Cost after iteration 40: 0.6541062116312736
Cost after iteration 50: 0.6465482228350946
Cost after iteration 60: 0.6397043669085696
Cost after iteration 70: 0.633504260579661
Cost after iteration 80: 0.6278845984313655
Cost after iteration 90: 0.6227885528585221
Cost after iteration 100: 0.6181650252008011
Cost after iteration 110: 0.6139679959262327
Cost after iteration 120: 0.6101560476200676
Cost after iteration 130: 0.6066919040435974
Cost after iteration 140: 0.6035420651726247
Cost after iteration 150: 0.6006764110678783
Cost after iteration 160: 0.598067850841683
Cost after iteration 170: 0.5956920160431304
Cost after iteration 180: 0.5935269765967358
Cost after iteration 190: 0.5915529798569173
Cost after iteration 200: 0.5897522303729082
Cost after iteration 210: 0.5881086856458838
Cost after iteration 22

## 📊 Current Status & Next Steps

- ✅ Preprocessing complete and dataset ready
- ⏳ Model training is not yet implemented
- ❌ Evaluation metrics (accuracy, confusion matrix) still to come

### 🚧 Future Plans
- Implement Convolutional neural networks from scratch
- Hyperparameters tuning
- Evaluate performance with accuracy and confusion matrix
- (Optionally) visualize predictions

> Due to time constraints, development is paused here temporarily.  
> The project will be revisited after completing the Deep Learning Specialization with more advanced implementations.

