Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions questions/180_gradient_clipping/description.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
## Problem

Write a Python function `clip_gradients` that takes a numpy array of gradients and a float `max_norm`, and returns a new numpy array where the gradients are clipped so that their L2 norm does not exceed `max_norm`. If the L2 norm of the input gradients is less than or equal to `max_norm`, return the gradients unchanged. If it exceeds `max_norm`, scale all gradients so that their L2 norm equals `max_norm`. Only use standard Python and numpy. The returned array should be of type float and have the same shape as the input.
5 changes: 5 additions & 0 deletions questions/180_gradient_clipping/example.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{
"input": "import numpy as np\ngradients = np.array([3.0, 4.0])\nmax_norm = 5.0\nclipped = clip_gradients(gradients, max_norm)\nprint(clipped)",
"output": "[3. 4.]",
"reasoning": "The L2 norm of [3.0, 4.0] is 5.0, which is equal to max_norm, so the gradients are returned unchanged."
}
41 changes: 41 additions & 0 deletions questions/180_gradient_clipping/learn.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# **Gradient Clipping**

## **1. Definition**
Gradient clipping is a technique used in machine learning to prevent the gradients from becoming too large during training, which can destabilize the learning process. It is especially important in training deep neural networks, where gradients can sometimes explode to very large values (the "exploding gradients" problem).

**Gradient clipping** works by scaling the gradients if their norm exceeds a specified threshold (max_norm). The most common form is L2-norm clipping, where the entire gradient vector is rescaled so that its L2 norm is at most `max_norm`.

## **2. Why Use Gradient Clipping?**
* **Stabilizes Training:** Prevents the optimizer from making excessively large updates, which can cause the loss to diverge or become NaN.
* **Enables Deeper Networks:** Makes it feasible to train deeper or recurrent neural networks, where exploding gradients are more likely.
* **Improves Convergence:** Helps the model converge more reliably by keeping updates within a reasonable range.

## **3. Gradient Clipping Mechanism**
Given a gradient vector $g$ and a maximum norm $M$ (max_norm), the clipped gradient $g'$ is computed as:

$$
\text{if } \|g\|_2 \leq M: \\
\quad g' = g \\
\text{else:} \\
\quad g' = g \times \frac{M}{\|g\|_2}
$$

Where:
* $g$: The original gradient vector (numpy array)
* $M$: The maximum allowed L2 norm (max_norm)
* $\|g\|_2$: The L2 norm of $g$
* $g'$: The clipped gradient vector

**Example:**
If $g = [6, 8]$ and $M = 5$:
* $\|g\|_2 = \sqrt{6^2 + 8^2} = 10$
* Since $10 > 5$, we scale $g$ by $5/10 = 0.5$, so $g' = [3, 4]$

## **4. Applications of Gradient Clipping**
Gradient clipping is widely used in training:
* **Recurrent Neural Networks (RNNs):** To prevent exploding gradients in long sequences.
* **Deep Neural Networks:** For stable training of very deep architectures.
* **Reinforcement Learning:** Where gradients can be highly variable.
* **Any optimization problem** where gradient explosion is a risk.

Gradient clipping is a simple yet powerful tool to ensure stable and effective training in modern machine learning workflows.
15 changes: 15 additions & 0 deletions questions/180_gradient_clipping/meta.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{
"id": "180",
"title": "Gradient Clipping (L2 Norm)",
"difficulty": "easy",
"category": "Machine Learning",
"video": "",
"likes": "0",
"dislikes": "0",
"contributor": [
{
"profile_link": "https://github.com/komaksym",
"name": "komaksym"
}
]
}
2 changes: 2 additions & 0 deletions questions/180_gradient_clipping/pytorch/solution.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
def your_function(...):
...
2 changes: 2 additions & 0 deletions questions/180_gradient_clipping/pytorch/starter_code.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
def your_function(...):
pass
6 changes: 6 additions & 0 deletions questions/180_gradient_clipping/pytorch/tests.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
[
{
"test": "print(your_function(...))",
"expected_output": "..."
}
]
21 changes: 21 additions & 0 deletions questions/180_gradient_clipping/solution.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
import numpy as np


def clip_gradients(gradients, max_norm):
"""
Clips the gradients so that their L2 norm does not exceed max_norm.
If the L2 norm is less than or equal to max_norm, returns the gradients unchanged.
Otherwise, scales the gradients so that their L2 norm equals max_norm.

Args:
gradients (np.ndarray): The input gradients (any shape).
max_norm (float): The maximum allowed L2 norm.

Returns:
np.ndarray: The clipped gradients, same shape as input.
"""
norm = np.linalg.norm(gradients)
if norm <= max_norm or norm == 0:
return gradients.astype(float)
else:
return (gradients * (max_norm / norm)).astype(float)
18 changes: 18 additions & 0 deletions questions/180_gradient_clipping/starter_code.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
import numpy as np


# Implement your function below.
def clip_gradients(gradients, max_norm):
"""
Clips the gradients so that their L2 norm does not exceed max_norm.
If the L2 norm is less than or equal to max_norm, returns the gradients unchanged.
Otherwise, scales the gradients so that their L2 norm equals max_norm.

Args:
gradients (np.ndarray): The input gradients (any shape).
max_norm (float): The maximum allowed L2 norm.

Returns:
np.ndarray: The clipped gradients, same shape as input.
"""
pass
30 changes: 30 additions & 0 deletions questions/180_gradient_clipping/tests.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
[
{
"test": "import numpy as np\ngradients = np.array([3.0, 4.0])\nmax_norm = 5.0\nprint(clip_gradients(gradients, max_norm))",
"expected_output": "[3. 4.]"
},
{
"test": "import numpy as np\ngradients = np.array([6.0, 8.0])\nmax_norm = 5.0\nprint(clip_gradients(gradients, max_norm))",
"expected_output": "[3. 4.]"
},
{
"test": "import numpy as np\ngradients = np.array([0.0, 0.0])\nmax_norm = 1.0\nprint(clip_gradients(gradients, max_norm))",
"expected_output": "[0. 0.]"
},
{
"test": "import numpy as np\ngradients = np.array([1.0, 2.0, 2.0])\nmax_norm = 3.0\nprint(clip_gradients(gradients, max_norm))",
"expected_output": "[1. 2. 2.]"
},
{
"test": "import numpy as np\ngradients = np.array([10.0, 0.0])\nmax_norm = 5.0\nprint(clip_gradients(gradients, max_norm))",
"expected_output": "[5. 0.]"
},
{
"test": "import numpy as np\ngradients = np.array([-3.0, -4.0])\nmax_norm = 5.0\nprint(clip_gradients(gradients, max_norm))",
"expected_output": "[-3. -4.]"
},
{
"test": "import numpy as np\ngradients = np.array([-6.0, -8.0])\nmax_norm = 5.0\nprint(clip_gradients(gradients, max_norm))",
"expected_output": "[-3. -4.]"
}
]
2 changes: 2 additions & 0 deletions questions/180_gradient_clipping/tinygrad/solution.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
def your_function(...):
...
2 changes: 2 additions & 0 deletions questions/180_gradient_clipping/tinygrad/starter_code.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
def your_function(...):
pass
6 changes: 6 additions & 0 deletions questions/180_gradient_clipping/tinygrad/tests.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
[
{
"test": "print(your_function(...))",
"expected_output": "..."
}
]