## Aim

## Theory

In [1]:
# pip install tensorflow

## code

In [3]:

import numpy as np
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.utils import to_categorical

# Load and preprocess CIFAR-10 dataset
def load_cifar10():
    (X_train, y_train), (X_test, y_test) = cifar10.load_data()
    # Normalize pixel values to [0, 1]
    X_train = X_train.astype('float32') / 255.0
    X_test = X_test.astype('float32') / 255.0
    # Flatten images (32x32x3 = 3072)
    X_train = X_train.reshape(X_train.shape[0], -1)
    X_test = X_test.reshape(X_test.shape[0], -1)
    # One-hot encode labels
    y_train = to_categorical(y_train, 10)
    y_test = to_categorical(y_test, 10)
    return X_train, y_train, X_test, y_test

# Activation functions
def relu(x):
    return np.maximum(0, x)

def relu_derivative(x):
    return np.where(x > 0, 1, 0)

def softmax(x):
    exp_x = np.exp(x - np.max(x, axis=1, keepdims=True))  # Subtract max for numerical stability
    return exp_x / np.sum(exp_x, axis=1, keepdims=True)

# Neural Network class
class NeuralNetwork:
    def __init__(self, input_size, hidden_size, output_size):
        # Initialize weights and biases
        self.W1 = np.random.randn(input_size, hidden_size) * np.sqrt(2.0 / input_size)  # He initialization
        self.b1 = np.zeros((1, hidden_size))
        self.W2 = np.random.randn(hidden_size, output_size) * np.sqrt(2.0 / hidden_size)
        self.b2 = np.zeros((1, output_size))
        
    def forward(self, X):
        # Forward propagation
        self.z1 = np.dot(X, self.W1) + self.b1
        self.a1 = relu(self.z1)
        self.z2 = np.dot(self.a1, self.W2) + self.b2
        self.a2 = softmax(self.z2)
        return self.a2
    
    def compute_loss(self, y, output):
        # Cross-entropy loss
        m = y.shape[0]
        log_likelihood = -np.log(output[range(m), np.argmax(y, axis=1)] + 1e-10)
        loss = np.sum(log_likelihood) / m
        return loss
    
    def backward(self, X, y, output, learning_rate):
        # Backward propagation
        m = X.shape[0]
        
        # Output layer gradients
        dz2 = output - y  # Gradient of loss w.r.t. z2
        dW2 = np.dot(self.a1.T, dz2) / m
        db2 = np.sum(dz2, axis=0, keepdims=True) / m
        
        # Hidden layer gradients
        da1 = np.dot(dz2, self.W2.T)
        dz1 = da1 * relu_derivative(self.z1)
        dW1 = np.dot(X.T, dz1) / m
        db1 = np.sum(dz1, axis=0, keepdims=True) / m
        
        # Update weights and biases
        self.W1 -= learning_rate * dW1
        self.b1 -= learning_rate * db1
        self.W2 -= learning_rate * dW2
        self.b2 -= learning_rate * db2
    
    def train(self, X, y, X_test, y_test, epochs, batch_size, learning_rate):
        m = X.shape[0]
        for epoch in range(epochs):
            # Shuffle training data
            indices = np.random.permutation(m)
            X_shuffled = X[indices]
            y_shuffled = y[indices]
            
            # Mini-batch gradient descent
            for i in range(0, m, batch_size):
                X_batch = X_shuffled[i:i+batch_size]
                y_batch = y_shuffled[i:i+batch_size]
                
                # Forward pass
                output = self.forward(X_batch)
                
                # Backward pass
                self.backward(X_batch, y_batch, output, learning_rate)
            
            # Compute training loss and accuracy
            train_output = self.forward(X)
            train_loss = self.compute_loss(y, train_output)
            train_predictions = np.argmax(train_output, axis=1)
            train_labels = np.argmax(y, axis=1)
            train_accuracy = np.mean(train_predictions == train_labels)
            
            # Compute test accuracy
            test_output = self.forward(X_test)
            test_predictions = np.argmax(test_output, axis=1)
            test_labels = np.argmax(y_test, axis=1)
            test_accuracy = np.mean(test_predictions == test_labels)
            
            print(f'Epoch {epoch+1}/{epochs}, Train Loss: {train_loss:.4f}, '
                  f'Train Accuracy: {train_accuracy:.4f}, Test Accuracy: {test_accuracy:.4f}')

# Main execution
if __name__ == "__main__":
    # Load data
    X_train, y_train, X_test, y_test = load_cifar10()
    
    # Network parameters
    input_size = 3072  # 32x32x3
    hidden_size = 128  # Number of hidden units
    output_size = 10   # Number of classes
    epochs = 50
    batch_size = 128
    learning_rate = 0.001
    
    # Initialize and train network
    np.random.seed(42)  # For reproducibility
    nn = NeuralNetwork(input_size, hidden_size, output_size)
    nn.train(X_train, y_train, X_test, y_test, epochs, batch_size, learning_rate)



Epoch 1/50, Train Loss: 2.1114, Train Accuracy: 0.2405, Test Accuracy: 0.2315
Epoch 2/50, Train Loss: 2.0272, Train Accuracy: 0.2833, Test Accuracy: 0.2786
Epoch 3/50, Train Loss: 1.9767, Train Accuracy: 0.3045, Test Accuracy: 0.2984
Epoch 4/50, Train Loss: 1.9436, Train Accuracy: 0.3202, Test Accuracy: 0.3161
Epoch 5/50, Train Loss: 1.9190, Train Accuracy: 0.3333, Test Accuracy: 0.3280
Epoch 6/50, Train Loss: 1.8995, Train Accuracy: 0.3426, Test Accuracy: 0.3371
Epoch 7/50, Train Loss: 1.8837, Train Accuracy: 0.3447, Test Accuracy: 0.3401
Epoch 8/50, Train Loss: 1.8704, Train Accuracy: 0.3506, Test Accuracy: 0.3464
Epoch 9/50, Train Loss: 1.8599, Train Accuracy: 0.3544, Test Accuracy: 0.3514
Epoch 10/50, Train Loss: 1.8473, Train Accuracy: 0.3603, Test Accuracy: 0.3520
Epoch 11/50, Train Loss: 1.8391, Train Accuracy: 0.3574, Test Accuracy: 0.3538
Epoch 12/50, Train Loss: 1.8310, Train Accuracy: 0.3686, Test Accuracy: 0.3643
Epoch 13/50, Train Loss: 1.8212, Train Accuracy: 0.3701, Test


### Explanation

1. **Data Preprocessing** (`load_cifar10`):
   - **Loading**: Uses Keras to load CIFAR-10 (50,000 training images, 10,000 test images).
   - **Normalization**: Scales pixel values from [0, 255] to [0, 1].
   - **Flattening**: Reshapes each 32x32x3 image to a 3072-dimensional vector.
   - **One-hot Encoding**: Converts labels to one-hot vectors (e.g., class 3 → [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]).

2. **Activation Functions**:
   - **ReLU**: `max(0, x)` for the hidden layer, with derivative 1 for x > 0, else 0.
   - **Softmax**: Converts output logits to probabilities, ensuring they sum to 1. Subtracts the maximum value for numerical stability.

3. **Neural Network Architecture** (`NeuralNetwork` class):
   - **Initialization**:
     - Input layer: 3072 units (flattened image).
     - Hidden layer: 128 units with ReLU activation.
     - Output layer: 10 units with softmax activation.
     - Weights are initialized using He initialization (`sqrt(2/input_size)`) for better training stability.
     - Biases are initialized to zeros.
   - **Forward Propagation**:
     - Computes `z1 = X * W1 + b1`, then `a1 = ReLU(z1)`.
     - Computes `z2 = a1 * W2 + b2`, then `a2 = softmax(z2)`.
   - **Loss Computation**:
     - Uses cross-entropy loss: `-sum(y * log(output))`, averaged over the batch.
     - Adds a small constant (`1e-10`) to avoid log(0).
   - **Backward Propagation**:
     - Output layer: Computes gradient `dz2 = output - y` (softmax + cross-entropy derivative).
     - Hidden layer: Backpropagates error using `dz1 = (dz2 * W2.T) * ReLU'(z1)`.
     - Computes gradients for weights (`dW1`, `dW2`) and biases (`db1`, `db2`).
     - Updates parameters using gradient descent: `W = W - learning_rate * dW`.
   - **Training**:
     - Implements mini-batch gradient descent with batch size 128.
     - Shuffles training data each epoch to improve generalization.
     - Runs for 20 epochs with a learning rate of 0.001.
     - Reports training loss, training accuracy, and test accuracy per epoch.

4. **Training and Evaluation**:
   - Trains on the full training set, evaluates on both training and test sets.
   - Accuracy is computed by comparing predicted class (argmax of output) to true class.

### Notes
- **Performance**: This simple ANN typically achieves 40-50% test accuracy on CIFAR-10 due to its basic architecture. For better performance, consider:
  - Adding more hidden layers or units.
  - Using convolutional neural networks (CNNs), which are more suited for image data.
  - Implementing regularization (e.g., dropout, L2).
  - Using advanced optimizers (e.g., Adam).
- **Hyperparameters**:
  - Hidden size (128), batch size (128), learning rate (0.001), and epochs (20) are chosen for simplicity. Tune these for better results.
- **Dependencies**: Requires NumPy and TensorFlow (for CIFAR-10 loading only).
- **Runtime**: Training may take a few minutes on a CPU due to the dataset size and matrix operations.

This implementation is a minimal, from-scratch ANN to demonstrate core concepts. For production use, frameworks like TensorFlow or PyTorch are recommended for efficiency and scalability.

## Results

## Conclusion

## Further experiments:
- test on different dataset
- test with different activation functions
- test with different gpus and calculate time
- use diffeent regularization
- use advanced optimizers
- try on different depth of neural network