<a href="https://colab.research.google.com/github/Ekagra444/simpleANN/blob/main/labassiANN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

ANN classification

In [9]:
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

# Load and prepare the dataset
def load_data():
    data = load_breast_cancer()
    X = data.data
    y = data.target.reshape(-1, 1)  # Reshape to (n_samples, 1)

    # Split into train and test sets
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    # Standardize features
    scaler = StandardScaler()
    X_train = scaler.fit_transform(X_train)
    X_test = scaler.transform(X_test)

    return X_train, X_test, y_train, y_test

In [10]:


# Activation functions and their derivatives
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    return sigmoid(x) * (1 - sigmoid(x))

def relu(x):
    return np.maximum(0, x)

def relu_derivative(x):
    return (x > 0).astype(float)



In [13]:
# Neural Network class
class SimpleNeuralNetwork:
    def __init__(self, input_size, hidden_size, output_size):
        # Initialize weights
        self.W1 = np.random.randn(input_size, hidden_size) * np.sqrt(2/input_size)
      #  self.W1 = np.random.randn(input_size, hidden_size) * 0.01 not good with relu activation
        self.b1 = np.zeros((1, hidden_size))
        self.W2 = np.random.randn(hidden_size, output_size) * np.sqrt(2/hidden_size)
        self.b2 = np.zeros((1, output_size))

    def forward(self, X):
        # Hidden layer
        self.z1 = np.dot(X, self.W1) + self.b1
        self.a1 = relu(self.z1)

        # Output layer
        self.z2 = np.dot(self.a1, self.W2) + self.b2
        self.a2 = sigmoid(self.z2)

        return self.a2

    def backward(self, X, y, learning_rate):
        m = X.shape[0]  # Number of samples

        # Output layer error
        dZ2 = self.a2 - y
        dW2 = (1/m) * np.dot(self.a1.T, dZ2)
        db2 = (1/m) * np.sum(dZ2, axis=0, keepdims=True)

        # Hidden layer error
        dZ1 = np.dot(dZ2, self.W2.T) * relu_derivative(self.z1)
        dW1 = (1/m) * np.dot(X.T, dZ1)
        db1 = (1/m) * np.sum(dZ1, axis=0, keepdims=True)

        # Update weights
        self.W2 -= learning_rate * dW2
        self.b2 -= learning_rate * db2
        self.W1 -= learning_rate * dW1
        self.b1 -= learning_rate * db1

    def train(self, X, y, epochs, learning_rate):
        for epoch in range(epochs):
            # Forward pass
            output = self.forward(X)

            # Backward pass and weight update
            self.backward(X, y, learning_rate)

            # Calculate loss (binary cross-entropy)
            loss = -np.mean(y * np.log(output) + (1 - y) * np.log(1 - output))

            # Print progress
            if epoch % 5 == 0:
                predictions = (output > 0.5).astype(int)
                accuracy = accuracy_score(y, predictions)
                print(f"Epoch {epoch}, Loss: {loss:.4f}, Accuracy: {accuracy:.4f}")



In [39]:
# Main execution
if __name__ == "__main__":
    # Load and prepare data
    X_train, X_test, y_train, y_test = load_data()

    # Network parameters
    input_size = X_train.shape[1]
    hidden_size = 8
    output_size = 1

    # Create and train network
    nn = SimpleNeuralNetwork(input_size, hidden_size, output_size)
    nn.train(X_train, y_train, epochs=20, learning_rate=0.7)
    # Test the network
    test_output = nn.forward(X_test)
    test_predictions = (test_output > 0.5).astype(int)
    test_accuracy = accuracy_score(y_test, test_predictions)
    print(f"\nTest Accuracy: {test_accuracy:.4f}")
    print("\n")

Epoch 0, Loss: 0.6620, Accuracy: 0.4747
Epoch 5, Loss: 0.2073, Accuracy: 0.9407
Epoch 10, Loss: 0.1228, Accuracy: 0.9626
Epoch 15, Loss: 0.0947, Accuracy: 0.9736

Test Accuracy: 0.9825




Reason for choosing the dataset -

Clean medical data with 30 tumor features
Perfect 62%-38% class balance (benign/malignant)

Small (569 samples) but enough for meaningful training

Proven ANN performance (>95% accuracy achievable)

Direct clinical relevance for cancer diagnosis

The results empirically validate the importance of proper weight initialization in neural networks. He initialization's mathematical formulation specifically addresses challenges posed by ReLU activations, enabling:


*   Effective gradient flow
*   Rapid convergence
*   Superior generalization performance








ANN regression

In [80]:
import numpy as np
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error, r2_score

class RegressionNN:
    def __init__(self, input_size, hidden_size, output_size):
        # He initialization for ReLU
        self.W1 = np.random.randn(input_size, hidden_size) * np.sqrt(2/input_size)
        self.b1 = np.zeros((1, hidden_size))

        # Xavier initialization for linear output
        self.W2 = np.random.randn(hidden_size, output_size) * np.sqrt(1/hidden_size)
        self.b2 = np.zeros((1, output_size))

    def relu(self, x):
        return np.maximum(0, x)

    def relu_derivative(self, x):
        return (x > 0).astype(float)

    def forward(self, X):
        # Hidden layer
        self.z1 = np.dot(X, self.W1) + self.b1
        self.a1 = self.relu(self.z1)

        # Output layer (linear activation for regression)
        self.z2 = np.dot(self.a1, self.W2) + self.b2
        return self.z2
    def backward(self, X, y, y_pred, learning_rate):
      m = X.shape[0]  # Number of samples

      # MSE derivative (factor of 2 is often omitted by absorbing into learning rate)
      dZ2 = (y_pred - y) / m  # This is ∂L/∂z2 where L = MSE

      # Rest of backpropagation remains the same
      dW2 = np.dot(self.a1.T, dZ2)
      db2 = np.sum(dZ2, axis=0, keepdims=True)

      dZ1 = np.dot(dZ2, self.W2.T) * self.relu_derivative(self.z1)
      dW1 = np.dot(X.T, dZ1)
      db1 = np.sum(dZ1, axis=0, keepdims=True)

      # Update parameters
      self.W2 -= learning_rate * dW2
      self.b2 -= learning_rate * db2
      self.W1 -= learning_rate * dW1
      self.b1 -= learning_rate * db1
    def train(self, X, y, epochs, learning_rate):
        for epoch in range(epochs):
            # Forward pass
            y_pred = self.forward(X)

            # Compute loss (MSE)
            loss = mean_squared_error(y, y_pred)

            # Backward pass
            self.backward(X, y, y_pred, learning_rate)

            # Print training progress
            if epoch % 5 == 0:
                r2 = r2_score(y, y_pred)
                print(f"Epoch {epoch}: MSE = {loss:.4f}, R2 = {r2:.4f}")

# Load and prepare California housing dataset
def load_data():
    data = fetch_california_housing()
    X = data.data
    y = data.target.reshape(-1, 1)

    # Split and scale data
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    scaler = StandardScaler()
    X_train = scaler.fit_transform(X_train)
    X_test = scaler.transform(X_test)

    return X_train, X_test, y_train, y_test

# Main execution
if __name__ == "__main__":
    # Load data
    X_train, X_test, y_train, y_test = load_data()

    # Network parameters
    input_size = X_train.shape[1]
    hidden_size = 10
    output_size = 1

    # Create and train network
    nn = RegressionNN(input_size, hidden_size, output_size)
    nn.train(X_train, y_train, epochs=20, learning_rate=0.5)

    # Evaluate on test set
    test_pred = nn.forward(X_test)
    test_mse = mean_squared_error(y_test, test_pred)
    test_r2 = r2_score(y_test, test_pred)

    print(f"\nFinal Test Results: MSE = {test_mse:.4f}, R2 = {test_r2:.4f}")

Epoch 0: MSE = 3.1099, R2 = -1.3264
Epoch 5: MSE = 1.4028, R2 = -0.0494
Epoch 10: MSE = 0.6885, R2 = 0.4849
Epoch 15: MSE = 0.6056, R2 = 0.5470

Final Test Results: MSE = 0.5846, R2 = 0.5539


Reason for choosing the dataset -

Classic 8-feature housing price prediction problem

20,640 samples - large enough for regression

Mixed-scale features (tests scaling implementation)

Continuous output tests MSE/L2 loss handling

Real-world economic significance

Epoch 0: MSE = 3.1099, R2 = -1.3264
Epoch 5: MSE = 1.4028, R2 = -0.0494
Epoch 10: MSE = 0.6885, R2 = 0.4849
Epoch 15: MSE = 0.6056, R2 = 0.5470


Final Test Results: MSE = 0.5846, R2 = 0.5539

i had to choose higher vvalue of learning rate because we just had one hiden layer and he is used to  ininitiate because of relu activation function

Comparison of output for ANN vs KNN-


For Regression-

KNN's local averaging works well for this dataset's patterns

ANN is too simple (underfitting) or need tuning

Dataset size is small for ANN to learn effectively and moreover we don't have many hidden layers(only 1) which caused KNN to outperform

For Classification-

KNN:

Accuracy: 1.00 (Perfect classification)

No training needed.

ANN:

Test Accuracy: 98.25%

Trained over 15 epochs (Loss ↓ 0.66 → 0.09).

Key Insight
KNN performs better on Iris due to its simplicity and the dataset’s linear separability.

ANN achieves near-perfect results but requires training and slightly underperforms.

Conclusion: For small, simple datasets like Iris, KNN is more efficient. For complex data, ANN generalizes better.