# Class structure:

src
├── model
│   ├── Loss functions:
│   │   ├── **Mean squared error (MSE)**: Used for regression tasks to minimize the squared difference between predicted and actual values.
│   │   └── **Cross entropy**: Classification tasks to measure the difference between predicted probabilities and actual classes.
│   ├── NN (Neural Network):
│   │   ├── **Layers**: 
│   │   │   ├── Input layer: Takes the shape of the input data.
│   │   │   ├── Hidden layers: Configurable number of layers and neurons per layer (e.g., `self.hidden_layers`, `self.hidden_sizes`).
│   │   │   └── Output layer: Configurable based on the task (regression or classification).
│   │   ├── **Activation functions**: 
│   │   │   ├── ReLU, Sigmoid, Tanh: Common hidden layer activations.
│   │   │   └── Optional activation for output layer (e.g., softmax for classification).
│   │   ├── **Weight and bias storage**: Weights and biases are stored in `self.params` as a dictionary (or other structure).
│   │   └── **Forward propagation**: Implements forward pass through the layers, applying activations.
│   └── Regression:
│       ├── **Design matrix generation**: Creates design matrix `X` based on data shape and polynomial fit degree for tasks like linear or polynomial regression.
│       └── **Parameter storage**: Stores regression coefficients in `self.params` (e.g., after using OLS or Ridge).
├── optimizer
│   ├── **Base optimizer**:
│   │   ├── Common attributes (e.g., learning rate).
│   │   ├── **Step function**: Interface that each optimizer must implement to update weights.
│   └── **Subclasses**:
│       ├── **Gradient Descent (GD)**: Basic gradient descent algorithm.
│       ├── **Stochastic Gradient Descent (SGD)**: Uses mini-batches for updating weights.
│       ├── **AdaGrad (With GD and SGD)**: Can use mini-batches for updating weights.
│       ├── **RMSprop**: Uses mini-batches for updating weights.
│       ├── **Adam**: Momentum-based optimizer.
│       └── Each class stores the momentum and other parameters as its state.
├── train
│   └── **Train function**:
│       ├── **Training loop**: 
│       │   ├── Loops through the dataset over a number of epochs.
│       │   ├── At each epoch, it calls `optimizer.step()` to update the weights.
│       │   └── Evaluates the model on validation data during training.
│       ├── **Loss tracking**: Stores loss over epochs to monitor convergence.
│       └── **Early stopping**: Optional feature to stop training when performance plateaus.
└── utils
    ├── Evaluation functions:
    │   ├── **Mean Squared Error (MSE)**: Measures the average squared difference between predicted and actual values (for regression).
    │   └── **R2 score**: Measures the proportion of variance explained by the model (for regression).
    └── Plotting:
        ├── **Accuracy plots**: Plots accuracy or performance as a function of hyperparameters (learning rate, batch size, etc.).
        ├── **Loss over epochs**: Plots loss during training to visualize convergence.
        └── **Classification coefficients**: For linear models, visualize the learned coefficients.

