# Zvi Badash 214553034
### Question 2 - Demo Autograd Implementation

As instructed, I recorded a video of myself explaining the exercise and the code I wrote for it.
The video unfortunately has a watermark, but I hope it won't be in the way of understanding the code.

[Exercise 12 - My Scalar](https://youtu.be/VHfUC44Y9ek)


## Imports and type aliases

In [1]:
import torch

In [2]:
Size = torch.Size
Tensor = torch.Tensor

## The MyScalar class

In [3]:
class MyScalar:
    """
    A scalar class that supports autograd.
    """

    def __init__(self, value: float, imm_grad=None, parent=None):
        """
        :param value: The value of the scalar
        :param imm_grad: The immediate gradient of the scalar w.r.t its parent
        :param parent: The parent of the scalar (The scalar that this scalar is a function of)
        """
        self.value: float = value
        self.imm_grad: float | None = imm_grad
        self.parent: MyScalar | None = parent

    def _get_gradient(self, grad_dict: dict[int, float]):
        """
        This method is the actual recursive method that computes the gradients of this scalar w.r.t all of its parents.
        The calculation is done using the chain rule for single-variable functions.
        :param grad_dict: The current dictionary of gradients
        :return: A dictionary of the gradients of this scalar w.r.t all of its parents.
        """
        if self.parent is None:
            return grad_dict
        else:
            grad_dict[id(self.parent)] = grad_dict[id(self)] * self.imm_grad
            return self.parent._get_gradient(grad_dict)


    def get_gradient(self):
        """
        This method returns a dictionary of the gradients of this scalar w.r.t all of its parents.
        The keys of the dictionary are the ids of the parents, and the values are the gradients.
        I chose to use ids instead of the actual parents because it's less memory intensive in my opinion.
        This method is merely a wrapper for the recursive method _get_gradient.
        :return: A dictionary of the gradients of this scalar w.r.t all of its parents
        """
        return self._get_gradient({id(self): 1.})

    def log(self):
        """
        :return: A new MyScalar object with the value of the log of this scalar
        """
        return MyScalar(
            torch.log(torch.tensor(self.value)).item(),
            1. / self.value,
            self
        )

    def exp(self):
        """
        :return: A new MyScalar object with the value of the exponent of this scalar
        """
        return MyScalar(
            torch.exp(torch.tensor(self.value)).item(),
            torch.exp(torch.tensor(self.value)).item(),
            self
        )

    def sin(self):
        """
        :return: A new MyScalar object with the value of the sine of this scalar
        """
        return MyScalar(
            torch.sin(torch.tensor(self.value)).item(),
            torch.cos(torch.tensor(self.value)).item(),
            self
        )

    def cos(self):
        """
        :return: A new MyScalar object with the value of the cosine of this scalar
        """
        return MyScalar(
            torch.cos(torch.tensor(self.value)).item(),
            -torch.sin(torch.tensor(self.value)).item(),
            self
        )

    def __pow__(self, power, modulo=None):
        """
        :param power: The power to raise this scalar to
        :param modulo: The modulo to take the power  (Not used)
        :return: A new MyScalar object with the value of this scalar raised to the power of `power`
        """
        assert isinstance(power, (int, float))
        return MyScalar(
                torch.pow(torch.tensor(self.value), power).item(),
                power * torch.pow(torch.tensor(self.value), power - 1).item(),
                self
            )

    def __add__(self, other):
        """
        :param other: An int or float to add to this scalar
        :return: A new MyScalar object with the value of this scalar plus `other`
        """
        assert isinstance(other, (int, float))
        return MyScalar(
            self.value + other,
            1.,
            self
        )

    def __radd__(self, other):
        """
        The reverse of __add__
        """
        return self.__add__(other)

    def __mul__(self, other):
        """
        :param other: An int or float to multiply this scalar by
        :return: A new MyScalar object with the value of this scalar multiplied by `other`
        """
        assert isinstance(other, (int, float))
        return MyScalar(
                self.value * other,
                other,
                self
            )

    def __rmul__(self, other):
        """
        The reverse of __mul__
        """
        return self.__mul__(other)

### Test the new autograd system against PyTorch's autograd

Let's define a function $f(x) = \log(\sin(3x+4))$ and compute its gradient w.r.t $x$ at $x=2$.
Just to verify the results, let's compute it by hand:
$\frac{\mathrm{d}f}{\mathrm{d}x} = \frac{3\cos(3x+4)}{\sin(3x+4)} = 3\cot(3x+4).$ Thus, $\frac{\mathrm{d}f}{\mathrm{d}x}\Bigg\vert_{x=2} = 3\cot(10) \approx 4.6271$

In [4]:
# Calculate the gradient using the new autograd system
a = MyScalar(2.)
b = 3 * a
c = b + 4
d = MyScalar.sin(c)
e = MyScalar.log(d)
e.get_gradient()[id(a)]

4.62705288392008

In [5]:
# Compare with PyTorch's autograd system
A = torch.tensor(2., requires_grad=True)
B = 3 * A
C = B + 4
D = torch.sin(C)
E = torch.log(D)
E.backward()
A.grad.item()

4.6270527839660645

Now let's try with a more complicated function, $f(x) = \exp(\sin^{3/2}(2x^4 + 5))$ and compute it's gradient w.r.t $x$ at $x=1$.

Just to verify the results, let's compute it by hand:
$\frac{\mathrm{d}f}{\mathrm{d}x} = 12 x^3 \exp(\sin^{3/2}(2x^4 + 5)) \sqrt{\sin(2x^4 + 5)} \cos(2x^4 + 5).$ Thus, $\frac{\mathrm{d}f}{\mathrm{d}x}\Bigg\vert_{x=2} = 12 e^{\sin^{3/2}(7)} \sqrt{\sin(7)} \cos(7) \approx 12.4895$

In [6]:
# Calculate the gradient using the new autograd system
a = MyScalar(1.)
b = 2 * a ** 4
c = b + 5
d = MyScalar.sin(c) ** (3/2)
e = MyScalar.exp(d)
e.get_gradient()[id(a)]

12.489481868896883

In [7]:
# Compare with PyTorch's autograd system
A = torch.tensor(1., requires_grad=True)
B = 2 * A ** 4
C = B + 5
D = torch.sin(C) ** (3/2)
E = torch.exp(D)
E.backward()
A.grad.item()

12.489480972290039