# Introduction

The `mingrad` package is a minimal implementation of automatic differentiation and computational graphs in Python, inspired by Andrej Karpathy's `micrograd` library. It serves as an educational resource to understand the fundamental concepts behind modern deep learning frameworks, such as PyTorch and TensorFlow.

Karpathy's `micrograd` is a minimalist autograd engine written in Python, designed to illustrate the core principles of backpropagation and automatic differentiation. The `mingrad` package follows a similar philosophy, providing a lightweight and easy-to-understand codebase that demystifies the inner workings of these essential concepts.

# What is Automatic Differentiation?

**Automatic differentiation** (AD) is a computational technique used to evaluate the derivatives of functions efficiently and accurately. Unlike symbolic differentiation, which can become cumbersome and complex, or numerical differentiation, which can be imprecise, AD leverages the chain rule of calculus to systematically apply differentiation operations to the basic arithmetic operations and elementary functions in a program. This process yields exact derivatives, is highly efficient, and works for functions of arbitrary complexity, making it a powerful tool in various fields such as machine learning, optimization, and scientific computing.

Automatic differentiation (AD) comes in two primary modes: **forward mode** and **reverse mode**, both based on the chain rule. 
1. In **forward mode**, the computation propagates from the input variables to the output variables, computing the derivative of each intermediate variable with respect to the input variables. This mode is efficient for functions with a small number of inputs and a larger number of outputs. For a function $f(x_1, x_2, \ldots, x_n)$, forward mode calculates the derivative of each $x_i$ through each step of the function until the final output. 

2. On the other hand, **reverse mode** propagates from the output variables back to the input variables, computing the derivative of the output variable with respect to each intermediate variable. This mode is efficient for functions with a large number of inputs and a single output. For a function $f(x_1, x_2, \ldots, x_n)$, reverse mode calculates the derivatives by first evaluating the function normally and then propagating gradients from the output back through each intermediate step to the inputs. 

Forward mode is typically used when the number of inputs is small, while reverse mode is more suited for functions with a single output and many inputs, such as in training neural networks.

# Important Terms

- **Computational Graphs**: 
Computational graphs are a way to represent mathematical operations as a series of interconnected nodes. Each node represents a specific operation, and the edges between nodes represent the flow of data. This graph-like structure allows for efficient computation and, more importantly, enables automatic differentiation.

- **Gradients**: In the context of neural networks, gradients refer to the partial derivatives of the loss function with respect to the network's weights and biases. These gradients represent the direction and rate of change of the loss function, and they are crucial for updating the network's parameters during training.

- **Forward and Backward Propagation**: During the training process, inputs are passed through the neural network in the forward propagation phase to obtain predictions or outputs. These outputs are then compared to the true labels, and the errors are propagated back through the network in the backward propagation phase, computing gradients of the loss function with respect to the network's weights using the chain rule of calculus.

- **Gradient Descent**: The gradients computed during backward propagation are used by the gradient descent optimization algorithm to iteratively adjust the network's weights in the direction that minimizes the loss function. This process continues until the loss is minimized to an acceptable level, and the network's predictions become accurate.

# Installation

To install the `mingrad` package, you can follow one of the two methods below:

### Installation via pip

If the `mingrad` package is available on the Python Package Index (PyPI), you can install it using the pip package installer. Open your terminal or command prompt and run the following command:

```bash
pip install mingrad
```

This will download and install the latest stable version of the `mingrad` package and its dependencies.

### Installation from Source

Alternatively, if you want to install the package from source, you can clone the `mingrad` repository from GitHub. Follow these steps:

```bash
# Clone the repository
git clone https://github.com/yourusername/mingrad.git

# Navigate to the project directory
cd mingrad

# Install the package in editable mode
pip install -e .
```

The -e flag installs the package in editable mode, allowing you to make changes to the source code and have them immediately reflected in your Python environment.

# Importing

After following either of these installation methods, you should be able to import and use the mingrad package in your Python scripts or interactive sessions.

```python
import mingrad

# Use the mingrad package
...
```

Make sure to check the repository's documentation for any additional installation requirements or instructions specific to your use case.

# Basic Usage

In [None]:
# Step 1: Import the necessary classes
# Assuming the Variable class is already defined in the workspace as provided earlier
from mingrad.engine import Variable

# Step 2: Create a simple computational graph
x = Variable(2.0)  # create a Variable with value 2.0
y = Variable(3.0)  # create a Variable with value 3.0

# Perform operations to create a computational graph
z = x**2 + y  # z = x^2 + y

In [None]:
# Step 3: Forward propagation
print("Forward Propagation:")
print(f"x: {x}")
print(f"y: {y}")
print(f"z: {z}")

# Step 3: Backward propagation
z.backward()

print("\nBackward Propagation:")
print(f"x.grad: {x.grad}")  # Should print the gradient of z with respect to x
print(f"y.grad: {y.grad}")  # Should print the gradient of z with respect to y

# Neural Networks

In [None]:
from mingrad.nn import MLP

# Create a toy dataset
X = [[2.0, 3.0], [-1.0, -2.0], [1.5, 2.0], [-1.5, -2.5]]
y = [1.0, -1.0, 1.0, -1.0]

# Initialize the network
mlp = MLP(2, [3, 1])

In [None]:
# Training the network on the toy dataset
learning_rate = 0.01
for epoch in range(100):  # number of epochs
    for xi, yi in zip(X, y):
        # Forward pass
        y_pred = mlp(xi)
        
        # Calculate loss (mean squared error)
        loss = (y_pred - yi) ** 2
        
        # Backward pass
        mlp.zero_grad()
        loss.backward()

        # Update parameters
        for p in mlp.parameters():
            p.data -= learning_rate * p.grad

    print(f"Epoch {epoch + 1}, Loss: {loss.data}")

In [None]:
# Printing final predictions
print("\nFinal Predictions:")
for xi in X:
    y_pred = mlp(xi)
    print(f"Input: {xi}, Predicted: {y_pred.data}")

# Summary

This structure covers the essential aspects of introducing the `mingrad` package, demonstrating its core functionality, and guiding users through the process of creating and training simple neural networks. You can expand or modify the content based on your specific requirements and any additional features or examples you want to include.