---
title: "A Deep Dive into PyTorch's Autograd Engine"
author: "Ayush Raj Dahal"
date: "2023-10-11"
layout: post
hide: false
search_exclude: false
toc: true
categories: [Neural Networks, Machine Learning]
image: "image.jpg"
---

In an attempt of understanding the underlying mechanism of PyTorch's autograd engine, I created its mini-version that does the same job, but on a much lower scale. This blog post explains the core ideas behind that project. This is mostly just to test my own understanding - yet again, and possibly even learn more in the process. Hopefully, by the end of this post, you will also have a good understanding of autograd and how to implement a simple version of it from scratch. This post assumes that you have a basic familiarity with Python and a tiny bit of Calculus. Even if you don't, I will try my best to explain everything as clearly as possible. So, let's get started!

## Contextualizing the Problem

In order to understand the need of an Autograd engine, we will first need to understand what makes its existence relevant in the first place. Although it is commonly associated with Neural Networks in Deep Learning -- which is pretty much the only case we will be concerned with as we move forward in our AI journey -- it is actually a completely independent concept that has nothing to do with Neural Networks, and can be used in applications as broad as weather forecasting and [xyz]. It is the idea of computing derivatives of some variable with reference to all the variables that were used to result in that value.

If terms like "Neural Networks" and "Deep Learning" doesn't make any sense to you yet, don't worry about it. As you'll see in this as well as some of my upcoming blog posts, those concepts are actually a lot, lot simpler than they sound. We will be breaking everything down to the first principles and create everything we need from scratch. So, stick with me. You're in for a fun ride!

### NumPy vs. PyTorch

Things that are similar between the two:

- Both provide a way to create n-dimensional arrays
- Both provide a way to perform mathematical operations on those arrays
- Both are fast and optimized, sometimes even by a factor of thousands, as compared to standard Python operations (since Python is a high-level language that needs a lot of memory and CPU cycles to perform simple operations)

Things that separate the two:

- Parallel runtime & GPU usage
- Data type restriction
- What is a tensor, tensor vs. Array

The main one that we're concerned with:

- Automatic Differentiation Engine

## References

- 