<a href="https://colab.research.google.com/github/Frutta111/Deep-Learning-In-PyTorch/blob/main/3_PyTorch_AutoGrad_System.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##**Implementing PyTorch's Automatic Differentiation System**


5/4/2024

In this notebook, we will implement PyTorch's unique automatic differentiation system (Auto Grad) on several basic mathematical operations.

Autograd is an automatic differentiation system that tracks operations performed on variables and computes the immediate derivative of the output with respect to the input. It then calculates the derivatives according to the chain rule.

We will use three components in our implementation:

1. A class called `MyScalar`.Each object of this class has the following attributes:
    - **Scalar Value**: A numeric value. .
    - **Immediate Derivative**: The derivative obtained during the computation of this object (also a numeric value).
    - **Parent**: The input to the function represented by this object.

2. **A library of mathematical functions** hat accept an object of the MyScalar class as input and return another object of the same class. The result of the function’s calculation will be stored in the value attribute, and the derivative with respect to the input variable will be stored in the derivative attribute. This library includes specific functions:  `e^a, ln(a), sin(a), cos(a), a^n, n * a, and a + n`. Each function calculates the derivative with respect to the variable and returns an output object of the `MyScalar` class.

3. **A function named `get_gradient`** that returns a dictionary containing the derivative of a `MyScalar` object with respect to each of the variables it depends on.



In [1]:
import torch

First, let us define the class MyScalar,

In [58]:
class MyScalar:
    """
    A class to represent a scalar value with automatic differentiation capabilities.
    Each instance of this class has a unique identifier, a scalar value, an immediate derivative,
    and references to its parent (input) object.
    """

    def __init__(self, value, derivative=1.0, parent = None):
      """
        Initialize a MyScalar object.

        Args:
            value (float): The scalar value.
            derivative (float, optional): The immediate derivative of the scalar. Defaults to 1.0.
            parent (MyScalar, optional): The parent MyScalar object representing the input to the function. Defaults to None.
      """

      if not torch.is_tensor(value):
        value = torch.tensor(value , dtype=torch.float32)
      if not torch.is_tensor(derivative):
        derivative = torch.tensor(derivative,dtype=torch.float32)

      self.value = value # The scalar value
      self.derivative = derivative
      self.parent = parent

      if parent == None:
        id_parent = None
      else:
        id_parent =id(self.parent)

      print(f"variable_id = {id(self)}, value = {self.value.item():.2f}, immediate_derivative = {self.derivative.item():.2f} , parent_id = {id_parent}")

Next, we  will implement a library (methods of Myscalar) which contains several basic functions.

In [48]:
def add(a , n):
    output = MyScalar(value = a.value + n , derivative=torch.tensor([1]),  parent = a)
    return output

def mult(a , n):
    output = MyScalar(value = a.value*n , derivative=torch.tensor([n]), parent = a)
    return output

def power(a , n):
    output = MyScalar(value = a.value**n , derivative=n*(a.value**(n-1)), parent = a)
    return output

def cos(a):
    output = MyScalar(value = torch.cos(a.value) , derivative=-torch.sin(a.value), parent = a)
    return output

def sin(a):
    output = MyScalar(value = torch.sin(a.value) , derivative=torch.cos(a.value), parent = a)
    return output

def ln(a):
    output = MyScalar(value = torch.log(a.value) , derivative=a.value**(-1), parent = a)
    return output

def exp(a):
    output = MyScalar(value = torch.exp(a.value) , derivative=torch.exp(a.value), parent = a)
    return output


Finally, we will preform the Gradient Calculation for MyScalar Objects using the chain rule.

This code defines the:


*  **`get_gradient`** function, which calculates the gradient of a MyScalar object with respect to its input variables.
*  **`_get_gradient`** which is a helper function to recursively calculate the gradients and store them in a dictionary.
*  The calculated gradients are then returned as **a dictionary**, where the keys are the unique identifiers of the MyScalar objects and the values are the corresponding gradients.

In [49]:
def get_gradient(output):
  """
  Calculate the gradient of the given MyScalar object with respect to its input variables.

  Args:
    output (MyScalar): The output MyScalar object for which to calculate the gradient.

  Returns:
      dict: A dictionary containing the gradients of the output with respect to each variable.
  """

  def _get_gradient(output, gradients=None, gr=1):
        """
        Recursive helper function to calculate gradients.

        Args:
            output (MyScalar): The current MyScalar object.
            gradients (dict, optional): A dictionary to store gradients. Defaults to None.
            gr (float, optional): The accumulated gradient. Defaults to 1.

        Returns:
            dict: Updated dictionary containing the gradients.
        """

        if gradients is None:
            gradients = {}

        # If the current object has no parent, it is a variable, so store the gradient
        if output.parent is None:
            gradients[id(output)] =  gr
            return gradients

        # Accumulate the gradient
        gr = gr * output.derivative
        _get_gradient(output.parent, gradients, gr)

        # Store the gradient for the current object
        gradients[id(output)] =  gr / output.derivative

        return gradients

  # Start the gradient calculation from the output object
  return _get_gradient(output)

This function prints the gradients of MyScalar objects

In [64]:
def print_gradient(gradients):
  for id, value   in gradients.items():
    print(f"variable gradient w.r.t parent_id = {id}, gradient value = {value.item():.4f}")


### Usage Example
In the following examples we will create variables from the `MyScalar` class and run the derivation system `get_gradient` on it. For comparison we will define tensors and run PyTorch's automatic derivation system. We will use the [`retain_grad`](https://pytorch.org/docs/stable/generated/torch.Tensor.retain_grad.html#torch.Tensor.retain_grad) function that allows you to keep the partial derivatives during the calculations.

####Exapmle 1

We will creating a scalar with value=2 and preform several mathematical operators to define new variabels

In [75]:
# Print the gradients using MyScalar System:
a = MyScalar(torch.tensor([2]))
b = power(a,2) #a^2
c = exp(b)  #e^a

print("\nThe gradients using my_gradient function are:")
grad = get_gradient(c)
print_gradient(grad)

# Print the gradients using Aoutograd System:
a = torch.tensor([2.], requires_grad=True)
b = a ** 2
c = torch.exp(b)

b.retain_grad()
c.retain_grad()

c.backward()

print("\nthe gradients using aoutograd are:")
print("dc/da =", a.grad.item())  # Gradient of 'c' w.r.t 'a'
print("dc/db =", b.grad.item())  # Gradient of 'c' w.r.t 'b'
print("dc/dc =", c.grad.item())  # Gradient of 'c' w.r.t 'c'

variable_id = 138670482557008, value = 2.00, immediate_derivative = 1.00 , parent_id = None
variable_id = 138670482569344, value = 4.00, immediate_derivative = 4.00 , parent_id = 138670482557008
variable_id = 138670482567424, value = 54.60, immediate_derivative = 54.60 , parent_id = 138670482569344

The gradients using my_gradient function are:
variable gradient w.r.t parent_id = 138670482557008, gradient value = 218.3926
variable gradient w.r.t parent_id = 138670482569344, gradient value = 54.5981
variable gradient w.r.t parent_id = 138670482567424, gradient value = 1.0000

the gradients using aoutograd are:
dc/da = 218.39259338378906
dc/db = 54.598148345947266
dc/dc = 1.0


####Example 2

We will creating a scalar with value=2 and preform several mathematical operators to define new variabels

In [74]:
# Print the gradients using MyScalar System:
a = MyScalar(2)
b = mult(a,3)
c = mult(b,7)
d = add(c,10)
e = mult(d,6)

print("\nThe gradients using my_gradient function are:")
grad = get_gradient(e)
print_gradient(grad)

# Print the gradients using Aoutograd System:
a = torch.tensor([2.], requires_grad=True)
b = a * 3
c = b * 7
d = c + 10
e = d * 6

b.retain_grad()
c.retain_grad()
d.retain_grad()
e.retain_grad()

e.backward()

print("\nThe gradients using aoutograd are:")
print("de/da =", a.grad.item())  # Gradient of 'e' w.r.t 'a'
print("de/db =", b.grad.item())  # Gradient of 'e' w.r.t 'b'
print("de/dc =", c.grad.item())  # Gradient of 'e' w.r.t 'c'
print("de/dd =", d.grad.item())  # Gradient of 'e' w.r.t 'd'
print("de/de =", e.grad.item())  # Gradient of 'e' w.r.t 'e'

variable_id = 138670482562720, value = 2.00, immediate_derivative = 1.00 , parent_id = None
variable_id = 138670482564064, value = 6.00, immediate_derivative = 3.00 , parent_id = 138670482562720
variable_id = 138670482558736, value = 42.00, immediate_derivative = 7.00 , parent_id = 138670482564064
variable_id = 138670482559120, value = 52.00, immediate_derivative = 1.00 , parent_id = 138670482558736
variable_id = 138670482570736, value = 312.00, immediate_derivative = 6.00 , parent_id = 138670482559120

The gradients using my_gradient function are:
variable gradient w.r.t parent_id = 138670482562720, gradient value = 126.0000
variable gradient w.r.t parent_id = 138670482564064, gradient value = 42.0000
variable gradient w.r.t parent_id = 138670482558736, gradient value = 6.0000
variable gradient w.r.t parent_id = 138670482559120, gradient value = 6.0000
variable gradient w.r.t parent_id = 138670482570736, gradient value = 1.0000

The gradients using aoutograd are:
de/da = 126.0
de/db =

####Example 3

We will creating a scalar with value=2 and preform several mathematical operators to define new variabels

In [72]:
# Print the gradients using MyScalar System:
a = MyScalar(torch.tensor([2]))
b = power(a,2) #a^2
c = exp(b)
d = sin(c)
e = cos(d)
f = ln(e)

print("\nThe gradients using my_gradient function are:")
grad = get_gradient(f)
print_gradient(grad)


# Print the gradients using Aoutograd System:
a = torch.tensor([2.], requires_grad=True)
b = a ** 2 #a^2
c = torch.exp(b)
d = torch.sin(c)
e = torch.cos(d)
f = torch.log(e)


b.retain_grad()
c.retain_grad()
d.retain_grad()
e.retain_grad()
f.retain_grad()

f.backward()

print("\nthe gradients using aoutograd are:")
print("df/da =", a.grad.item())  # Gradient of 'f' w.r.t 'a'
print("df/db =", b.grad.item())  # Gradient of 'f' w.r.t 'b'
print("df/dc =", c.grad.item())  # Gradient of 'f' w.r.t 'c'
print("df/dd =", d.grad.item())  # Gradient of 'f' w.r.t 'd'
print("df/de =", e.grad.item())  # Gradient of 'f' w.r.t 'e'
print("df/df =", f.grad.item())  # Gradient of 'f' w.r.t 'f'

variable_id = 138670481915280, value = 2.00, immediate_derivative = 1.00 , parent_id = None
variable_id = 138670482565264, value = 4.00, immediate_derivative = 4.00 , parent_id = 138670481915280
variable_id = 138670482558400, value = 54.60, immediate_derivative = 54.60 , parent_id = 138670482565264
variable_id = 138670482570736, value = -0.93, immediate_derivative = -0.37 , parent_id = 138670482558400
variable_id = 138670482566032, value = 0.60, immediate_derivative = 0.80 , parent_id = 138670482570736
variable_id = 138670482564640, value = -0.51, immediate_derivative = 1.67 , parent_id = 138670482566032

The gradients using my_gradient function are:
variable gradient w.r.t parent_id = 138670481915280, gradient value = -108.2652
variable gradient w.r.t parent_id = 138670482565264, gradient value = -27.0663
variable gradient w.r.t parent_id = 138670482558400, gradient value = -0.4957
variable gradient w.r.t parent_id = 138670482570736, gradient value = 1.3374
variable gradient w.r.t par