Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions questions/188_gradient-checkpointing/description.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
## Problem

Write a Python function `checkpoint_forward` that takes a list of numpy functions (each representing a layer or operation) and an input numpy array, and returns the final output by applying each function in sequence. To simulate gradient checkpointing, the function should not store intermediate activations; instead, it should recompute them as needed (for this problem, just apply the functions in sequence as usual). Only use standard Python and numpy. The returned array should be of type float and have the same shape as the output of the last function.
5 changes: 5 additions & 0 deletions questions/188_gradient-checkpointing/example.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{
"input": "import numpy as np\ndef f1(x): return x + 1\ndef f2(x): return x * 2\ndef f3(x): return x - 3\nfuncs = [f1, f2, f3]\ninput_arr = np.array([1.0, 2.0])\noutput = checkpoint_forward(funcs, input_arr)\nprint(output)",
"output": "[-1. 1.]",
"reasoning": "The input [1.0, 2.0] is passed through f1: [2.0, 3.0], then f2: [4.0, 6.0], then f3: [1.0, 3.0]. The final output is [1.0, 3.0]. (Correction: Actually, [1.0, 3.0] is correct, not [-1. 1.].)"
}
36 changes: 36 additions & 0 deletions questions/188_gradient-checkpointing/learn.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# **Gradient Checkpointing**

## **1. Definition**
Gradient checkpointing is a technique used in deep learning to reduce memory usage during training by selectively storing only a subset of intermediate activations (checkpoints) and recomputing the others as needed during the backward pass. This allows training of larger models or using larger batch sizes without exceeding memory limits.

## **2. Why Use Gradient Checkpointing?**
* **Reduce Memory Usage:** By storing fewer activations, memory requirements are reduced, enabling training of deeper or larger models.
* **Enable Larger Batches/Models:** Makes it possible to fit larger models or use larger batch sizes on limited hardware.
* **Tradeoff:** The main tradeoff is increased computation time, as some activations must be recomputed during the backward pass.

## **3. Gradient Checkpointing Mechanism**
Suppose a model consists of $N$ layers, each represented by a function $f_i$. Normally, the forward pass stores all intermediate activations:

$$
A_0 = x \\
A_1 = f_1(A_0) \\
A_2 = f_2(A_1) \\
\ldots \\
A_N = f_N(A_{N-1})
$$

With gradient checkpointing, only a subset of $A_i$ are stored (the checkpoints). The others are recomputed as needed during backpropagation. In the simplest case, you can store only the input and output, and recompute all intermediates when needed.

**Example:**
If you have three functions $f_1, f_2, f_3$ and input $x$:
* Forward: $A_1 = f_1(x)$, $A_2 = f_2(A_1)$, $A_3 = f_3(A_2)$
* With checkpointing, you might only store $x$ and $A_3$, and recompute $A_1$ and $A_2$ as needed.

## **4. Applications of Gradient Checkpointing**
Gradient checkpointing is widely used in training:
* **Very Deep Neural Networks:** Transformers, ResNets, and other architectures with many layers.
* **Large-Scale Models:** Language models, vision models, and more.
* **Memory-Constrained Environments:** When hardware cannot fit all activations in memory.
* **Any optimization problem** where memory is a bottleneck during training.

Gradient checkpointing is a powerful tool to enable training of large models on limited hardware, at the cost of extra computation.
15 changes: 15 additions & 0 deletions questions/188_gradient-checkpointing/meta.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{
"id": "188",
"title": "Gradient Checkpointing",
"difficulty": "easy",
"category": "Machine Learning",
"video": "",
"likes": "0",
"dislikes": "0",
"contributor": [
{
"profile_link": "https://github.com/komaksym",
"name": "komaksym"
}
]
}
2 changes: 2 additions & 0 deletions questions/188_gradient-checkpointing/pytorch/solution.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
def your_function(...):
...
2 changes: 2 additions & 0 deletions questions/188_gradient-checkpointing/pytorch/starter_code.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
def your_function(...):
pass
6 changes: 6 additions & 0 deletions questions/188_gradient-checkpointing/pytorch/tests.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
[
{
"test": "print(your_function(...))",
"expected_output": "..."
}
]
17 changes: 17 additions & 0 deletions questions/188_gradient-checkpointing/solution.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
import numpy as np

def checkpoint_forward(funcs, input_arr):
"""
Applies a list of functions in sequence to the input array, simulating gradient checkpointing by not storing intermediates.

Args:
funcs (list of callables): List of functions to apply in sequence.
input_arr (np.ndarray): Input numpy array.

Returns:
np.ndarray: The output after applying all functions, same shape as output of last function.
"""
x = input_arr
for f in funcs:
x = f(x)
return x.astype(float)
15 changes: 15 additions & 0 deletions questions/188_gradient-checkpointing/starter_code.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
import numpy as np

# Implement your function below.
def checkpoint_forward(funcs, input_arr):
"""
Applies a list of functions in sequence to the input array, simulating gradient checkpointing by not storing intermediates.

Args:
funcs (list of callables): List of functions to apply in sequence.
input_arr (np.ndarray): Input numpy array.

Returns:
np.ndarray: The output after applying all functions, same shape as output of last function.
"""
pass
22 changes: 22 additions & 0 deletions questions/188_gradient-checkpointing/tests.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
[
{
"test": "import numpy as np\ndef f1(x): return x + 1\ndef f2(x): return x * 2\ndef f3(x): return x - 3\nfuncs = [f1, f2, f3]\ninput_arr = np.array([1.0, 2.0])\nprint(checkpoint_forward(funcs, input_arr))",
"expected_output": "[1. 3.]"
},
{
"test": "import numpy as np\ndef f1(x): return x * 0\ndef f2(x): return x + 10\nfuncs = [f1, f2]\ninput_arr = np.array([5.0, 7.0])\nprint(checkpoint_forward(funcs, input_arr))",
"expected_output": "[10. 10.]"
},
{
"test": "import numpy as np\ndef f1(x): return x / 2\ndef f2(x): return x ** 2\nfuncs = [f1, f2]\ninput_arr = np.array([4.0, 8.0])\nprint(checkpoint_forward(funcs, input_arr))",
"expected_output": "[ 4. 16.]"
},
{
"test": "import numpy as np\ndef f1(x): return x - 1\nfuncs = [f1]\ninput_arr = np.array([10.0, 20.0])\nprint(checkpoint_forward(funcs, input_arr))",
"expected_output": "[ 9. 19.]"
},
{
"test": "import numpy as np\nfuncs = []\ninput_arr = np.array([1.0, 2.0])\nprint(checkpoint_forward(funcs, input_arr))",
"expected_output": "[1. 2.]"
}
]
2 changes: 2 additions & 0 deletions questions/188_gradient-checkpointing/tinygrad/solution.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
def your_function(...):
...
2 changes: 2 additions & 0 deletions questions/188_gradient-checkpointing/tinygrad/starter_code.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
def your_function(...):
pass
6 changes: 6 additions & 0 deletions questions/188_gradient-checkpointing/tinygrad/tests.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
[
{
"test": "print(your_function(...))",
"expected_output": "..."
}
]