[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/HumbertoDiego/3dgs-reformulated/blob/main/Torch_Backpropagation_Tests.ipynb)

#### Time everything

In [None]:
!pip install -q ipython-autotime
%load_ext autotime

The autotime extension is already loaded. To reload it, use:
  %reload_ext autotime
time: 750 ms (started: 2025-06-10 17:14:30 -03:00)


## Check Cuda and environment

In [299]:
!nvidia-smi
!nvcc --version
!gcc --version

Tue Jun 10 18:22:49 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 551.78                 Driver Version: 551.78         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                     TCC/WDDM  | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA GeForce RTX 3050 ...  WDDM  |   00000000:01:00.0 Off |                  N/A |
| N/A   36C    P8              6W /   35W |       0MiB /   6144MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

# Exemplo 1

$y = x^2$

$\nabla(x) = \frac{dy}{dx}=2x$

Para $x=3 \rightarrow \nabla(x)=6$


Implementação PyTorch:

In [288]:
import torch

x = torch.tensor([3.0], requires_grad=True)

# Forward pass
y = x **2

# Backward pass
y.backward()
print(x.grad)

tensor([6.])
time: 0 ns (started: 2025-06-10 17:14:31 -03:00)


# Exemplo 2

$y = x  w + 1$

$\nabla(x) = \frac{\partial y}{\partial x}=w$

$\nabla(w) = \frac{\partial y}{\partial w}=x$

Para $x= 2$ e $w=3  \rightarrow$

$\nabla(x)=3$

$\nabla(w)=2$


Implementação PyTorch:

In [289]:
x = torch.tensor(2.0, requires_grad=True)
w = torch.tensor(3.0, requires_grad=True)

# Forward pass
y = x * w + 1

# Backward pass
y.backward()
print(x.grad)
print(w.grad)

tensor(3.)
tensor(2.)
time: 0 ns (started: 2025-06-10 17:14:31 -03:00)


# Exemplo 3

$y = e^{(-x^2/2)}$

Onde $x$ é um parâmetro a ser atualizado pelo gradiente descendente.

$\nabla(x) = \frac{\partial y}{\partial x}= -xe^{(-x^2/2)}$

Para $[x]=[2] \rightarrow$

$\nabla(x) = -0.270670566473225$

Implementação SymPy:

In [290]:
import sympy as sp

x= sp.symbols('x')
y = sp.exp(-x**2 / 2)

y_derivative = sp.diff(y, x)
print("Function:", y)
print("Derivative:", y_derivative)
print("Derivative evaluated at x=2:",  y_derivative.evalf(subs={x: 2}))

Function: exp(-x**2/2)
Derivative: -x*exp(-x**2/2)
Derivative evaluated at x=2: -0.270670566473225
time: 0 ns (started: 2025-06-10 17:14:31 -03:00)


Implementação PyTorch:

In [291]:
import torch

x = torch.tensor(2.0, requires_grad=True)

# Forward pass
y = torch.exp(-x**2 / 2)

# Backward pass
y.backward()

# Gradients
print(x.grad)

tensor(-0.2707)
time: 0 ns (started: 2025-06-10 17:14:31 -03:00)


# Exemplo 4

$y = \frac{1}{\sigma\sqrt{2\pi}} e^{-0.5\frac{(x-\mu)^2}{\sigma^2}}$

Onde $x$ é um parâmetro a ser atualizado pelo gradiente descendente.

$\nabla(x) = \frac{\partial y}{\partial x}= - \frac{0.25 \sqrt{2} \left(2 x - 2 μ\right) e^{- \frac{0.5 \left(x - μ\right)^{2}}{σ^{2}}}}{\sqrt{\pi} σ^{3}}$

Para $[x, \mu, \sigma]=[2,0,1] \rightarrow$

$\nabla(x) = -0.107981933026376$

Implementação SymPy:

In [292]:
import sympy as sp

x= sp.symbols('x')
mu = sp.symbols('μ')
sigma = sp.symbols('σ')

y = 1/(sigma * sp.sqrt(2 * sp.pi)) * sp.exp(-1/2 * (x - mu)**2 / sigma**2)

dy_dx = sp.diff(y, x)
print("Function:", y)
print("Derivative w.r.t x:", dy_dx)
print("Derivative w.r.t x evaluated at x=2, μ=0, σ=1:",  dy_dx.evalf(subs={x: 2, mu: 0, sigma: 1}))

Function: sqrt(2)*exp(-0.5*(x - μ)**2/σ**2)/(2*sqrt(pi)*σ)
Derivative w.r.t x: -0.25*sqrt(2)*(2*x - 2*μ)*exp(-0.5*(x - μ)**2/σ**2)/(sqrt(pi)*σ**3)
Derivative w.r.t x evaluated at x=2, μ=0, σ=1: -0.107981933026376
time: 0 ns (started: 2025-06-10 17:14:31 -03:00)


Implementação PyTorch:

In [293]:
import torch

x = torch.tensor(2.0, requires_grad=True)
mu = torch.tensor(0.0, requires_grad=True)
sigma = torch.tensor(1.0, requires_grad=True)
# Forward pass
y = 1/(sigma * torch.sqrt(torch.tensor(2.0 * torch.pi))) * torch.exp(-1/2 * (x - mu)**2 / sigma**2)

# Backward pass
y.backward()

# Gradients
print(x.grad)

tensor(-0.1080)
time: 0 ns (started: 2025-06-10 17:14:31 -03:00)


# Exemplo 5

$y = \frac{1}{\sigma\sqrt{2\pi}} e^{-0.5\frac{(x-\mu)^2}{\sigma^2}}$

Onde $\mu$ e $\sigma$ são parâmetros a seres atualizados pelo gradiente descendente.

$\nabla(\mu) = \frac{\partial y}{\partial \mu}= - \frac{0.25 \sqrt{2} \left(- 2 x + 2 μ\right) e^{- \frac{0.5 \left(x - μ\right)^{2}}{σ^{2}}}}{\sqrt{\pi} σ^{3}}$

$\nabla(\sigma) = \frac{\partial y}{\partial \sigma}= - \frac{\sqrt{2} e^{- \frac{0.5 \left(x - μ\right)^{2}}{σ^{2}}}}{2 \sqrt{\pi} σ^{2}} + \frac{0.5 \sqrt{2} \left(x - μ\right)^{2} e^{- \frac{0.5 \left(x - μ\right)^{2}}{σ^{2}}}}{\sqrt{\pi} σ^{4}} $

Para $[x, \mu, \sigma]=[2,0,1] \rightarrow$

$\nabla(\mu) = 0.107981933026376$

$\nabla(\sigma) = 0.161972899539564$

Implementação SymPy:

In [294]:
import sympy as sp

x= sp.symbols('x')
mu = sp.symbols('μ')
sigma = sp.symbols('σ')

y = 1/(sigma * sp.sqrt(2 * sp.pi)) * sp.exp(-1/2 * (x - mu)**2 / sigma**2)

dy_dmu = sp.diff(y, mu)
dy_dsigma = sp.diff(y, sigma)

print("Function:", y)
print("Derivative w.r.t μ:", dy_dmu)
print("Derivative w.r.t σ:", dy_dsigma)
print("Derivative w.r.t μ evaluated at x=2, μ=0, σ=1:",  dy_dmu.evalf(subs={x: 2, mu: 0, sigma: 1}))
print("Derivative w.r.t σ evaluated at x=2, μ=0, σ=1:",  dy_dsigma.evalf(subs={x: 2, mu: 0, sigma: 1}))

Function: sqrt(2)*exp(-0.5*(x - μ)**2/σ**2)/(2*sqrt(pi)*σ)
Derivative w.r.t μ: -0.25*sqrt(2)*(-2*x + 2*μ)*exp(-0.5*(x - μ)**2/σ**2)/(sqrt(pi)*σ**3)
Derivative w.r.t σ: -sqrt(2)*exp(-0.5*(x - μ)**2/σ**2)/(2*sqrt(pi)*σ**2) + 0.5*sqrt(2)*(x - μ)**2*exp(-0.5*(x - μ)**2/σ**2)/(sqrt(pi)*σ**4)
Derivative w.r.t μ evaluated at x=2, μ=0, σ=1: 0.107981933026376
Derivative w.r.t σ evaluated at x=2, μ=0, σ=1: 0.161972899539564
time: 16 ms (started: 2025-06-10 17:14:31 -03:00)


Implementação PyTorch:

In [295]:
import torch

x = torch.tensor(2.0, requires_grad=True)
mu = torch.tensor(0.0, requires_grad=True)
sigma = torch.tensor(1.0, requires_grad=True)
# Forward pass
y = 1/(sigma * torch.sqrt(torch.tensor(2.0 * torch.pi))) * torch.exp(-1/2 * (x - mu)**2 / sigma**2)

# Backward pass
y.backward()

# Gradients
print(mu.grad)
print(sigma.grad)

tensor(0.1080)
tensor(0.1620)
time: 0 ns (started: 2025-06-10 17:14:31 -03:00)


# Exemplo 6

$G(x) = e^{-0.5 (\mathbf{x}-\mathbf{\mu})^T \Sigma^{-1} (\mathbf{x}-\mathbf{\mu}) }$

Onde:
- $\mathbf{\mu},\mathbf{x} \in \mathbb{R}^3$;
- $\Sigma \in \mathbb{R}^{3\times 3}$;
- $\mathbf{\mu}$ é um parâmetro a ser atualizado pelo gradiente descendente.

$\nabla(\mu) = \frac{\partial G}{\partial \mu}= \left[\frac{\partial G}{\partial \mu_1}, \frac{\partial G}{\partial \mu_2}, \frac{\partial G}{\partial \mu_3}\right] = 
\begin{bmatrix}- \frac{\left(- 0.5 x_{1} + 0.5 μ_{1}\right) \left({\Sigma}_{1,1} {\Sigma}_{2,2} - {\Sigma}_{1,2} {\Sigma}_{2,1}\right)}{{\Sigma}_{0,0} {\Sigma}_{1,1} {\Sigma}_{2,2} - {\Sigma}_{0,0} {\Sigma}_{1,2} {\Sigma}_{2,1} - {\Sigma}_{0,1} {\Sigma}_{1,0} {\Sigma}_{2,2} + {\Sigma}_{0,1} {\Sigma}_{1,2} {\Sigma}_{2,0} + {\Sigma}_{0,2} {\Sigma}_{1,0} {\Sigma}_{2,1} - {\Sigma}_{0,2} {\Sigma}_{1,1} {\Sigma}_{2,0}} + \frac{0.5 \left(x_{1} - μ_{1}\right) \left({\Sigma}_{1,1} {\Sigma}_{2,2} - {\Sigma}_{1,2} {\Sigma}_{2,1}\right)}{{\Sigma}_{0,0} {\Sigma}_{1,1} {\Sigma}_{2,2} - {\Sigma}_{0,0} {\Sigma}_{1,2} {\Sigma}_{2,1} - {\Sigma}_{0,1} {\Sigma}_{1,0} {\Sigma}_{2,2} + {\Sigma}_{0,1} {\Sigma}_{1,2} {\Sigma}_{2,0} + {\Sigma}_{0,2} {\Sigma}_{1,0} {\Sigma}_{2,1} - {\Sigma}_{0,2} {\Sigma}_{1,1} {\Sigma}_{2,0}} - \frac{\left(- 0.5 x_{2} + 0.5 μ_{2}\right) \left(- {\Sigma}_{1,0} {\Sigma}_{2,2} + {\Sigma}_{1,2} {\Sigma}_{2,0}\right)}{{\Sigma}_{0,0} {\Sigma}_{1,1} {\Sigma}_{2,2} - {\Sigma}_{0,0} {\Sigma}_{1,2} {\Sigma}_{2,1} - {\Sigma}_{0,1} {\Sigma}_{1,0} {\Sigma}_{2,2} + {\Sigma}_{0,1} {\Sigma}_{1,2} {\Sigma}_{2,0} + {\Sigma}_{0,2} {\Sigma}_{1,0} {\Sigma}_{2,1} - {\Sigma}_{0,2} {\Sigma}_{1,1} {\Sigma}_{2,0}} + \frac{0.5 \left(x_{2} - μ_{2}\right) \left(- {\Sigma}_{0,1} {\Sigma}_{2,2} + {\Sigma}_{0,2} {\Sigma}_{2,1}\right)}{{\Sigma}_{0,0} {\Sigma}_{1,1} {\Sigma}_{2,2} - {\Sigma}_{0,0} {\Sigma}_{1,2} {\Sigma}_{2,1} - {\Sigma}_{0,1} {\Sigma}_{1,0} {\Sigma}_{2,2} + {\Sigma}_{0,1} {\Sigma}_{1,2} {\Sigma}_{2,0} + {\Sigma}_{0,2} {\Sigma}_{1,0} {\Sigma}_{2,1} - {\Sigma}_{0,2} {\Sigma}_{1,1} {\Sigma}_{2,0}} - \frac{\left(- 0.5 x_{3} + 0.5 μ_{3}\right) \left({\Sigma}_{1,0} {\Sigma}_{2,1} - {\Sigma}_{1,1} {\Sigma}_{2,0}\right)}{{\Sigma}_{0,0} {\Sigma}_{1,1} {\Sigma}_{2,2} - {\Sigma}_{0,0} {\Sigma}_{1,2} {\Sigma}_{2,1} - {\Sigma}_{0,1} {\Sigma}_{1,0} {\Sigma}_{2,2} + {\Sigma}_{0,1} {\Sigma}_{1,2} {\Sigma}_{2,0} + {\Sigma}_{0,2} {\Sigma}_{1,0} {\Sigma}_{2,1} - {\Sigma}_{0,2} {\Sigma}_{1,1} {\Sigma}_{2,0}} + \frac{0.5 \left(x_{3} - μ_{3}\right) \left({\Sigma}_{0,1} {\Sigma}_{1,2} - {\Sigma}_{0,2} {\Sigma}_{1,1}\right)}{{\Sigma}_{0,0} {\Sigma}_{1,1} {\Sigma}_{2,2} - {\Sigma}_{0,0} {\Sigma}_{1,2} {\Sigma}_{2,1} - {\Sigma}_{0,1} {\Sigma}_{1,0} {\Sigma}_{2,2} + {\Sigma}_{0,1} {\Sigma}_{1,2} {\Sigma}_{2,0} + {\Sigma}_{0,2} {\Sigma}_{1,0} {\Sigma}_{2,1} - {\Sigma}_{0,2} {\Sigma}_{1,1} {\Sigma}_{2,0}}\\- \frac{\left(- 0.5 x_{1} + 0.5 μ_{1}\right) \left(- {\Sigma}_{0,1} {\Sigma}_{2,2} + {\Sigma}_{0,2} {\Sigma}_{2,1}\right)}{{\Sigma}_{0,0} {\Sigma}_{1,1} {\Sigma}_{2,2} - {\Sigma}_{0,0} {\Sigma}_{1,2} {\Sigma}_{2,1} - {\Sigma}_{0,1} {\Sigma}_{1,0} {\Sigma}_{2,2} + {\Sigma}_{0,1} {\Sigma}_{1,2} {\Sigma}_{2,0} + {\Sigma}_{0,2} {\Sigma}_{1,0} {\Sigma}_{2,1} - {\Sigma}_{0,2} {\Sigma}_{1,1} {\Sigma}_{2,0}} + \frac{0.5 \left(x_{1} - μ_{1}\right) \left(- {\Sigma}_{1,0} {\Sigma}_{2,2} + {\Sigma}_{1,2} {\Sigma}_{2,0}\right)}{{\Sigma}_{0,0} {\Sigma}_{1,1} {\Sigma}_{2,2} - {\Sigma}_{0,0} {\Sigma}_{1,2} {\Sigma}_{2,1} - {\Sigma}_{0,1} {\Sigma}_{1,0} {\Sigma}_{2,2} + {\Sigma}_{0,1} {\Sigma}_{1,2} {\Sigma}_{2,0} + {\Sigma}_{0,2} {\Sigma}_{1,0} {\Sigma}_{2,1} - {\Sigma}_{0,2} {\Sigma}_{1,1} {\Sigma}_{2,0}} - \frac{\left(- 0.5 x_{2} + 0.5 μ_{2}\right) \left({\Sigma}_{0,0} {\Sigma}_{2,2} - {\Sigma}_{0,2} {\Sigma}_{2,0}\right)}{{\Sigma}_{0,0} {\Sigma}_{1,1} {\Sigma}_{2,2} - {\Sigma}_{0,0} {\Sigma}_{1,2} {\Sigma}_{2,1} - {\Sigma}_{0,1} {\Sigma}_{1,0} {\Sigma}_{2,2} + {\Sigma}_{0,1} {\Sigma}_{1,2} {\Sigma}_{2,0} + {\Sigma}_{0,2} {\Sigma}_{1,0} {\Sigma}_{2,1} - {\Sigma}_{0,2} {\Sigma}_{1,1} {\Sigma}_{2,0}} + \frac{0.5 \left(x_{2} - μ_{2}\right) \left({\Sigma}_{0,0} {\Sigma}_{2,2} - {\Sigma}_{0,2} {\Sigma}_{2,0}\right)}{{\Sigma}_{0,0} {\Sigma}_{1,1} {\Sigma}_{2,2} - {\Sigma}_{0,0} {\Sigma}_{1,2} {\Sigma}_{2,1} - {\Sigma}_{0,1} {\Sigma}_{1,0} {\Sigma}_{2,2} + {\Sigma}_{0,1} {\Sigma}_{1,2} {\Sigma}_{2,0} + {\Sigma}_{0,2} {\Sigma}_{1,0} {\Sigma}_{2,1} - {\Sigma}_{0,2} {\Sigma}_{1,1} {\Sigma}_{2,0}} - \frac{\left(- 0.5 x_{3} + 0.5 μ_{3}\right) \left(- {\Sigma}_{0,0} {\Sigma}_{2,1} + {\Sigma}_{0,1} {\Sigma}_{2,0}\right)}{{\Sigma}_{0,0} {\Sigma}_{1,1} {\Sigma}_{2,2} - {\Sigma}_{0,0} {\Sigma}_{1,2} {\Sigma}_{2,1} - {\Sigma}_{0,1} {\Sigma}_{1,0} {\Sigma}_{2,2} + {\Sigma}_{0,1} {\Sigma}_{1,2} {\Sigma}_{2,0} + {\Sigma}_{0,2} {\Sigma}_{1,0} {\Sigma}_{2,1} - {\Sigma}_{0,2} {\Sigma}_{1,1} {\Sigma}_{2,0}} + \frac{0.5 \left(x_{3} - μ_{3}\right) \left(- {\Sigma}_{0,0} {\Sigma}_{1,2} + {\Sigma}_{0,2} {\Sigma}_{1,0}\right)}{{\Sigma}_{0,0} {\Sigma}_{1,1} {\Sigma}_{2,2} - {\Sigma}_{0,0} {\Sigma}_{1,2} {\Sigma}_{2,1} - {\Sigma}_{0,1} {\Sigma}_{1,0} {\Sigma}_{2,2} + {\Sigma}_{0,1} {\Sigma}_{1,2} {\Sigma}_{2,0} + {\Sigma}_{0,2} {\Sigma}_{1,0} {\Sigma}_{2,1} - {\Sigma}_{0,2} {\Sigma}_{1,1} {\Sigma}_{2,0}}\\- \frac{\left(- 0.5 x_{1} + 0.5 μ_{1}\right) \left({\Sigma}_{0,1} {\Sigma}_{1,2} - {\Sigma}_{0,2} {\Sigma}_{1,1}\right)}{{\Sigma}_{0,0} {\Sigma}_{1,1} {\Sigma}_{2,2} - {\Sigma}_{0,0} {\Sigma}_{1,2} {\Sigma}_{2,1} - {\Sigma}_{0,1} {\Sigma}_{1,0} {\Sigma}_{2,2} + {\Sigma}_{0,1} {\Sigma}_{1,2} {\Sigma}_{2,0} + {\Sigma}_{0,2} {\Sigma}_{1,0} {\Sigma}_{2,1} - {\Sigma}_{0,2} {\Sigma}_{1,1} {\Sigma}_{2,0}} + \frac{0.5 \left(x_{1} - μ_{1}\right) \left({\Sigma}_{1,0} {\Sigma}_{2,1} - {\Sigma}_{1,1} {\Sigma}_{2,0}\right)}{{\Sigma}_{0,0} {\Sigma}_{1,1} {\Sigma}_{2,2} - {\Sigma}_{0,0} {\Sigma}_{1,2} {\Sigma}_{2,1} - {\Sigma}_{0,1} {\Sigma}_{1,0} {\Sigma}_{2,2} + {\Sigma}_{0,1} {\Sigma}_{1,2} {\Sigma}_{2,0} + {\Sigma}_{0,2} {\Sigma}_{1,0} {\Sigma}_{2,1} - {\Sigma}_{0,2} {\Sigma}_{1,1} {\Sigma}_{2,0}} - \frac{\left(- 0.5 x_{2} + 0.5 μ_{2}\right) \left(- {\Sigma}_{0,0} {\Sigma}_{1,2} + {\Sigma}_{0,2} {\Sigma}_{1,0}\right)}{{\Sigma}_{0,0} {\Sigma}_{1,1} {\Sigma}_{2,2} - {\Sigma}_{0,0} {\Sigma}_{1,2} {\Sigma}_{2,1} - {\Sigma}_{0,1} {\Sigma}_{1,0} {\Sigma}_{2,2} + {\Sigma}_{0,1} {\Sigma}_{1,2} {\Sigma}_{2,0} + {\Sigma}_{0,2} {\Sigma}_{1,0} {\Sigma}_{2,1} - {\Sigma}_{0,2} {\Sigma}_{1,1} {\Sigma}_{2,0}} + \frac{0.5 \left(x_{2} - μ_{2}\right) \left(- {\Sigma}_{0,0} {\Sigma}_{2,1} + {\Sigma}_{0,1} {\Sigma}_{2,0}\right)}{{\Sigma}_{0,0} {\Sigma}_{1,1} {\Sigma}_{2,2} - {\Sigma}_{0,0} {\Sigma}_{1,2} {\Sigma}_{2,1} - {\Sigma}_{0,1} {\Sigma}_{1,0} {\Sigma}_{2,2} + {\Sigma}_{0,1} {\Sigma}_{1,2} {\Sigma}_{2,0} + {\Sigma}_{0,2} {\Sigma}_{1,0} {\Sigma}_{2,1} - {\Sigma}_{0,2} {\Sigma}_{1,1} {\Sigma}_{2,0}} - \frac{\left(- 0.5 x_{3} + 0.5 μ_{3}\right) \left({\Sigma}_{0,0} {\Sigma}_{1,1} - {\Sigma}_{0,1} {\Sigma}_{1,0}\right)}{{\Sigma}_{0,0} {\Sigma}_{1,1} {\Sigma}_{2,2} - {\Sigma}_{0,0} {\Sigma}_{1,2} {\Sigma}_{2,1} - {\Sigma}_{0,1} {\Sigma}_{1,0} {\Sigma}_{2,2} + {\Sigma}_{0,1} {\Sigma}_{1,2} {\Sigma}_{2,0} + {\Sigma}_{0,2} {\Sigma}_{1,0} {\Sigma}_{2,1} - {\Sigma}_{0,2} {\Sigma}_{1,1} {\Sigma}_{2,0}} + \frac{0.5 \left(x_{3} - μ_{3}\right) \left({\Sigma}_{0,0} {\Sigma}_{1,1} - {\Sigma}_{0,1} {\Sigma}_{1,0}\right)}{{\Sigma}_{0,0} {\Sigma}_{1,1} {\Sigma}_{2,2} - {\Sigma}_{0,0} {\Sigma}_{1,2} {\Sigma}_{2,1} - {\Sigma}_{0,1} {\Sigma}_{1,0} {\Sigma}_{2,2} + {\Sigma}_{0,1} {\Sigma}_{1,2} {\Sigma}_{2,0} + {\Sigma}_{0,2} {\Sigma}_{1,0} {\Sigma}_{2,1} - {\Sigma}_{0,2} {\Sigma}_{1,1} {\Sigma}_{2,0}}\end{bmatrix}^T$

Para $\mathbf{x}=[2,2,3],\  \mathbf{\mu}=[1,1,1] \ $ e $\Sigma = I \rightarrow$

$\nabla(\mu) = [0.0497870683678639, 0.0497870683678639, 0.0995741367357279] $

Implementação SymPy:

In [296]:
import sympy as sp

x1, x2, x3= sp.symbols('x_1 x_2 x_3')
mu1, mu2, mu3 = sp.symbols('μ_1 μ_2 μ_3')
x = sp.Matrix([x1, x2, x3])
mu = sp.Matrix([mu1, mu2, mu3])
Sigma = sp.Matrix([[1,0,0],[0,1,0],[0,0,1]])

g = sp.exp(-0.5 * (x - mu).T * Sigma.inv() * (x - mu)) 

dg_dmu = sp.diff(g, mu)

print("Function:", g)
print("Derivative w.r.t μ:", dg_dmu)
dg_dmu_num = dg_dmu.subs({x1:2, x2:2, x3:3, mu1:1, mu2:1, mu3:1}).tolist()
print("Derivative w.r.t μ evaluated at x=[2,2,3], μ=[1,1,1]:", dg_dmu_num )

Function: Matrix([[1.0*exp(-0.5*x_1**2 + 1.0*x_1*μ_1 - 0.5*x_2**2 + 1.0*x_2*μ_2 - 0.5*x_3**2 + 1.0*x_3*μ_3 - 0.5*μ_1**2 - 0.5*μ_2**2 - 0.5*μ_3**2)]])
Derivative w.r.t μ: [[[[1.0*(1.0*x_1 - 1.0*μ_1)*exp(-0.5*x_1**2 + 1.0*x_1*μ_1 - 0.5*x_2**2 + 1.0*x_2*μ_2 - 0.5*x_3**2 + 1.0*x_3*μ_3 - 0.5*μ_1**2 - 0.5*μ_2**2 - 0.5*μ_3**2)]]], [[[1.0*(1.0*x_2 - 1.0*μ_2)*exp(-0.5*x_1**2 + 1.0*x_1*μ_1 - 0.5*x_2**2 + 1.0*x_2*μ_2 - 0.5*x_3**2 + 1.0*x_3*μ_3 - 0.5*μ_1**2 - 0.5*μ_2**2 - 0.5*μ_3**2)]]], [[[1.0*(1.0*x_3 - 1.0*μ_3)*exp(-0.5*x_1**2 + 1.0*x_1*μ_1 - 0.5*x_2**2 + 1.0*x_2*μ_2 - 0.5*x_3**2 + 1.0*x_3*μ_3 - 0.5*μ_1**2 - 0.5*μ_2**2 - 0.5*μ_3**2)]]]]
Derivative w.r.t μ evaluated at x=[2,2,3], μ=[1,1,1]: [[[[0.0497870683678639]]], [[[0.0497870683678639]]], [[[0.0995741367357279]]]]
time: 63 ms (started: 2025-06-10 17:14:31 -03:00)


Implementação PyTorch:

In [297]:
import torch

x = torch.tensor([2.0, 2.0, 3.0], requires_grad=True)
mu = torch.tensor([1.0, 1.0, 1.0], requires_grad=True)
Sigma = torch.eye(3)

# Forward pass
dx = x - mu
g = torch.exp(-0.5 * dx.T @ torch.inverse(Sigma) @ dx)

# Backward pass
g.backward()

# Gradients
print(mu.grad)

tensor([0.0498, 0.0498, 0.0996])
time: 0 ns (started: 2025-06-10 17:14:31 -03:00)


Comparação:

In [298]:
dg_dmu = torch.tensor(dg_dmu_num, dtype=torch.float32).reshape(mu.grad.shape)

# Gradients
print(mu.grad,"\n", dg_dmu)

tensor([0.0498, 0.0498, 0.0996]) 
 tensor([0.0498, 0.0498, 0.0996])
time: 0 ns (started: 2025-06-10 17:14:32 -03:00)


# Exemplo 7

$C(\mathbf{p}) = \sum_{i=1}^{N} \left[ c_i \ G_i(\mathbf{p})  \prod_{j=1}^{i-1} (1- G_j(\mathbf{p})) \right]$

Onde:
- $G_i(\mathbf{p}) = exp(-0.5 (\mathbf{p}-\mathbf{\mu_i})^T \Sigma_i^{'-1} (\mathbf{p}-\mathbf{\mu_i}))$;
- $c_i \in \mathbb{R}$;
- $\mathbf{\mu_i},\mathbf{p} \in \mathbb{R}^2$;
- $\Sigma_i' \in \mathbb{R}^{2\times 2}$;
- $\mathbf{\mu_i}$ são parâmetros a ser atualizados pelo gradiente descendente.

$\nabla(\mu) = \frac{\partial C}{\partial \mu} = \sum_{i=1}^{N} \left[ c_i \ \frac{\partial G_i}{\partial \mu} \prod_{j=1}^{i-1} (1- \frac{\partial G_j}{\partial \mu}) \right] $

Para $N=3, \mathbf{p}=[2,3],\ c_i=1/(i+1),\ \mathbf{\mu_i}=[1+i,1+i] \ $ e $\Sigma_i^{'-1} = I \rightarrow$

$\nabla(\mu) = \begin{bmatrix} 0.0507 &  0.1013  \\ 0.0000 &  0.1658 \\ -0.0730 &  0.0000 \end{bmatrix}$

Implementação SymPy:

In [279]:
import sympy as sp

N = 3  # Number of components
Sigma_inv_i = sp.Matrix([[1.0, 0.0], [0.0, 1.0]]) # assumir conhecida por simplicidade
Sigma_inv_j = sp.Matrix([[1.0, 0.0], [0.0, 1.0]])

# variam para n=1,...,N
c = sp.IndexedBase('c')
mux = sp.IndexedBase('μ_x')
muy = sp.IndexedBase('μ_y')

# Ponto de pesquisa
px, py= sp.symbols('p_x p_y')
p = sp.Matrix([px, py])

# Generate values to apply to the derivative
values = {px:2, py:3}
for i in range(N):
    values = values | {mux[i]:1+i, muy[i]:1+i, c[i]:1/(i+1)}

# Expression for C(x)
C = 0 
for i in range(0,N):
    d_i = p - sp.Matrix([mux[i], muy[i]])
    G_i = sp.exp(-0.5 * (d_i.T * Sigma_inv_i * d_i)[0, 0])
    prod = 1
    for j in range(0, i):
        d_j = p - sp.Matrix([mux[j], muy[j]])
        G_j = sp.exp(-0.5 * (d_j.T * Sigma_inv_j * d_j)[0, 0])
        prod *= (1 - G_j)
    C += c[i] * G_i * prod

print("C(p)=")
display(C)
print(f"C(p) evaluated at {values}:", C.subs(values).evalf())

dC_dmu =[]
for i in range(N):
    dC_dmu_i = sp.diff(C, sp.Matrix([mux[i], muy[i]]))
    dC_dmu_num = dC_dmu_i.subs(values).evalf().tolist()
    print(f"Derivative w.r.t μ_{i} evaluated at c_{i}={1/(i+1):.1f}, p=[2,3], μ_{i}=[{1+i},{1+i}]:", dC_dmu_num )
    dC_dmu.append(dC_dmu_num)

C(p)=


(1 - exp(-0.5*(p_x - μ_x[0])*(1.0*p_x - 1.0*μ_x[0]) - 0.5*(p_y - μ_y[0])*(1.0*p_y - 1.0*μ_y[0])))*(1 - exp(-0.5*(p_x - μ_x[1])*(1.0*p_x - 1.0*μ_x[1]) - 0.5*(p_y - μ_y[1])*(1.0*p_y - 1.0*μ_y[1])))*exp(-0.5*(p_x - μ_x[2])*(1.0*p_x - 1.0*μ_x[2]) - 0.5*(p_y - μ_y[2])*(1.0*p_y - 1.0*μ_y[2]))*c[2] + (1 - exp(-0.5*(p_x - μ_x[0])*(1.0*p_x - 1.0*μ_x[0]) - 0.5*(p_y - μ_y[0])*(1.0*p_y - 1.0*μ_y[0])))*exp(-0.5*(p_x - μ_x[1])*(1.0*p_x - 1.0*μ_x[1]) - 0.5*(p_y - μ_y[1])*(1.0*p_y - 1.0*μ_y[1]))*c[1] + exp(-0.5*(p_x - μ_x[0])*(1.0*p_x - 1.0*μ_x[0]) - 0.5*(p_y - μ_y[0])*(1.0*p_y - 1.0*μ_y[0]))*c[0]

C(p) evaluated at {p_x: 2, p_y: 3, μ_x[0]: 1, μ_y[0]: 1, c[0]: 1.0, μ_x[1]: 2, μ_y[1]: 2, c[1]: 0.5, μ_x[2]: 3, μ_y[2]: 3, c[2]: 0.3333333333333333}: 0.433477305494832
Derivative w.r.t μ_0 evaluated at c_0=1.0, p=[2,3], μ_0=[1,1]: [[0.0506615694581183], [0.101323138916237]]
Derivative w.r.t μ_1 evaluated at c_1=0.5, p=[2,3], μ_1=[2,2]: [[0], [0.165811109756010]]
Derivative w.r.t μ_2 evaluated at c_2=0.3, p=[2,3], μ_2=[3,3]: [[-0.0730205111985485], [0]]
time: 63 ms (started: 2025-06-10 17:13:55 -03:00)


Implementação PyTorch:

In [280]:
import torch

N = 3  # Number of components
mu = torch.tensor([[1.0+i, 1.0+i] for i in range(N)], requires_grad=True) 

p = torch.tensor([2.0, 3.0], requires_grad=False)
c = torch.tensor([1.0/(i+1) for i in range(N)], requires_grad=False)
Sigma_inv_i = torch.tensor([[1.0, 0.0],[0.0, 1.0]], requires_grad=False)
Sigma_inv_j = torch.tensor([[1.0, 0.0],[0.0, 1.0]], requires_grad=False)

# Forward pass
C = torch.tensor(0.0) 
for i in range(0,N):
    d_i = p - mu[i]
    G_i = torch.exp(-0.5 * (d_i.T @ Sigma_inv_i @ d_i))
    prod = 1
    for j in range(0, i):
        d_j = p - mu[j]
        G_j = torch.exp(-0.5 * (d_j.T @ Sigma_inv_j @ d_j))
        prod *= (1 - G_j)
    C += c[i] * G_i * prod
print("C(p)=", C)
# Backward pass
C.backward()

# Gradients
print(mu.grad)

C(p)= tensor(0.4335, grad_fn=<AddBackward0>)
tensor([[ 0.0507,  0.1013],
        [ 0.0000,  0.1658],
        [-0.0730,  0.0000]])
time: 0 ns (started: 2025-06-10 17:13:55 -03:00)


Comparação:

In [281]:
dC_dmu = torch.tensor(dC_dmu, dtype=torch.float32).reshape(mu.grad.shape)

# Gradients
print(mu.grad, "\n", dC_dmu)

tensor([[ 0.0507,  0.1013],
        [ 0.0000,  0.1658],
        [-0.0730,  0.0000]]) 
 tensor([[ 0.0507,  0.1013],
        [ 0.0000,  0.1658],
        [-0.0730,  0.0000]])
time: 0 ns (started: 2025-06-10 17:13:55 -03:00)


# Exemplo 8

$C(\mathbf{p}) = \sum_{i=1}^{N} \left[ c_i \ G_i(\mathbf{p})  \prod_{j=1}^{i-1} (1- G_j(\mathbf{p})) \right]$

Onde:
- $G_i(\mathbf{p}) = exp(-0.5 (\mathbf{p}-\mathbf{\mu_i})^T \Sigma_i^{'-1} (\mathbf{p}-\mathbf{\mu_i}))$;
- $\Sigma_i' = JWR_iS_iS_i^TR_i^TW^TJ^T$;
- $S_i = diag(\mathbf{s_i})$;
- $J \in \mathbb{R}^{2 \times 3}$ e $W,R_i,S_i \in \mathbb{R}^{3 \times 3}$;
- $c_i \in \mathbb{R}$, $\mathbf{\mu_i},\mathbf{p} \in \mathbb{R}^2$ e $\mathbf{s_i} \in \mathbb{R}^3$;
- $\mathbf{\mu_i}$ e $\mathbf{s_i}$ serão atualizados pelo gradiente descendente.

$\nabla(\mu) = \frac{\partial C}{\partial \mu} = \sum_{i=1}^{N} \left[ c_i \ \frac{\partial G_i}{\partial \mu} \prod_{j=1}^{i-1} (1- \frac{\partial G_j}{\partial \mu}) \right] $

$\nabla(\mathbf{s}) = \frac{\partial C}{\partial \mathbf{s}} = \sum_{i=1}^{N} \left[ c_i \ exp(-0.5 (\mathbf{p}-\mathbf{\mu_i})^T \frac{\partial \Sigma_i^{'-1}}{\partial \mathbf{s}} (\mathbf{p}-\mathbf{\mu_i})) \prod_{j=1}^{i-1} (1- exp(-0.5 (\mathbf{p}-\mathbf{\mu_j})^T \frac{\partial \Sigma_j^{'-1}}{\partial \mathbf{s}} (\mathbf{p}-\mathbf{\mu_j}))) \right]$

Para $N=3, \mathbf{p}=[2,3],\ c_i=1/(i+1),\ \mathbf{\mu_i}=[1+i,1+i],\ \mathbf{s_i} = [0.5(i+1),0.5(i+1),0.5(i+1)]$, $J = diag(1,1,0), W = I, R_i=I \rightarrow$

$\nabla(\mu) = ?$

$\nabla(\mathbf{s}) = ?$

Implementação SymPy:

In [282]:
import sympy as sp

N = 3  # Number of components
R_i = sp.Matrix([[1.0, 0.0, 0.0], [0.0, 1.0, 0.0],[0.0, 0.0, 1.0]]) # assumir conhecida por simplicidade
W = sp.Matrix([[1.0, 0.0, 0.0], [0.0, 1.0, 0.0],[0.0, 0.0, 1.0]])
J = sp.Matrix([[1.0, 0.0, 0.0], [0.0, 1.0, 0.0]])

# variam para n=1,...,N
c = sp.IndexedBase('c')
mux = sp.IndexedBase('μ_x')
muy = sp.IndexedBase('μ_y')
sx = sp.IndexedBase('s_x')
sy = sp.IndexedBase('s_y')
sz = sp.IndexedBase('s_z')

# Ponto de pesquisa
px, py= sp.symbols('p_x p_y')
p = sp.Matrix([px, py])

# Generate values to apply to the derivative
values = {px:2, py:3}
for i in range(N):
    values = values | {mux[i]:1+i, muy[i]:1+i, c[i]:1/(i+1), sx[i]:0.5*(i+1), sy[i]:0.5*(i+1), sz[i]:0.5*(i+1)}

# Expression for C(x)
C = 0 
for i in range(0,N):
    d_i = p - sp.Matrix([mux[i], muy[i]])
    S_i = sp.Matrix([[sx[i], 0, 0], [0, sy[i], 0], [0, 0, sz[i]]])
    Sigma_2D_i = J * W * R_i * S_i * S_i.T * R_i.T * W.T * J.T
    Sigma_2D_inv_i = Sigma_2D_i.inv()
    G_i = sp.exp(-0.5 * (d_i.T * Sigma_2D_inv_i * d_i)[0, 0])
    prod = 1
    for j in range(0, i):
        d_j = p - sp.Matrix([mux[j], muy[j]])
        S_j = sp.Matrix([[sx[j], 0, 0], [0, sy[j], 0], [0, 0, sz[j]]])
        Sigma_2D_j = J * W * R_i * S_j * S_j.T * R_i.T * W.T * J.T
        Sigma_2D_inv_j = Sigma_2D_j.inv()
        G_j = sp.exp(-0.5 * (d_j.T * Sigma_2D_inv_j * d_j)[0, 0])
        prod *= (1 - G_j)
    C += c[i] * G_i * prod

print("C(p)=")
display(C)

print(f"Evaluation data: {values}:")
print(f"C(p) evaluated:", C.subs(values).evalf())

dC_dmu = []
for i in range(N):
    dC_dmu_i = sp.diff(C, sp.Matrix([mux[i], muy[i]]))
    dC_dmu_num = dC_dmu_i.subs(values).evalf().tolist()
    print(f"Derivative w.r.t μ_{i} evaluated:", dC_dmu_num )
    dC_dmu.append(dC_dmu_num)

dC_ds = []
for i in range(N):
    dC_ds_i = sp.diff(C, sp.Matrix([sx[i], sy[i], sz[i]]))
    dC_ds_num = dC_ds_i.subs(values).evalf().tolist()
    print(f"Derivative w.r.t s_{i} evaluated:", dC_ds_num )
    dC_ds.append(dC_ds_num)

C(p)=


(1 - exp(-0.5*(p_x - μ_x[0])**2/s_x[0]**2 - 0.5*(p_y - μ_y[0])**2/s_y[0]**2))*(1 - exp(-0.5*(p_x - μ_x[1])**2/s_x[1]**2 - 0.5*(p_y - μ_y[1])**2/s_y[1]**2))*exp(-0.5*(p_x - μ_x[2])**2/s_x[2]**2 - 0.5*(p_y - μ_y[2])**2/s_y[2]**2)*c[2] + (1 - exp(-0.5*(p_x - μ_x[0])**2/s_x[0]**2 - 0.5*(p_y - μ_y[0])**2/s_y[0]**2))*exp(-0.5*(p_x - μ_x[1])**2/s_x[1]**2 - 0.5*(p_y - μ_y[1])**2/s_y[1]**2)*c[1] + exp(-0.5*(p_x - μ_x[0])**2/s_x[0]**2 - 0.5*(p_y - μ_y[0])**2/s_y[0]**2)*c[0]

Evaluation data: {p_x: 2, p_y: 3, μ_x[0]: 1, μ_y[0]: 1, c[0]: 1.0, s_x[0]: 0.5, s_y[0]: 0.5, s_z[0]: 0.5, μ_x[1]: 2, μ_y[1]: 2, c[1]: 0.5, s_x[1]: 1.0, s_y[1]: 1.0, s_z[1]: 1.0, μ_x[2]: 3, μ_y[2]: 3, c[2]: 0.3333333333333333, s_x[2]: 1.5, s_y[2]: 1.5, s_z[2]: 1.5}:
C(p) evaluated: 0.408314066132132
Derivative w.r.t μ_0 evaluated: [[0.000107454877800111], [0.000214909755600221]]
Derivative w.r.t μ_1 evaluated: [[0], [0.141368316370717]]
Derivative w.r.t μ_2 evaluated: [[-0.0466742686981012], [0]]
Derivative w.r.t s_0 evaluated: [[0.000214909755600221], [0.000859639022400885], [0]]
Derivative w.r.t s_1 evaluated: [[0], [0.141368316370717], [0]]
Derivative w.r.t s_2 evaluated: [[0.0311161791320675], [0], [0]]
time: 781 ms (started: 2025-06-10 17:13:55 -03:00)


Implementação PyTorch:

In [283]:
import torch

N = 3  # Number of components
mu = torch.tensor([[1.0+i, 1.0+i] for i in range(N)], requires_grad=True)
s = torch.tensor([[0.5*(i+1), 0.5*(i+1), 0.5*(i+1)] for i in range(N)], requires_grad=True) 

p = torch.tensor([2.0, 3.0], requires_grad=False)
c = torch.tensor([1.0/(i+1) for i in range(N)], requires_grad=False)
R_i = torch.tensor([[1.0, 0.0, 0.0], [0.0, 1.0, 0.0],[0.0, 0.0, 1.0]], requires_grad=False)
W = torch.tensor([[1.0, 0.0, 0.0], [0.0, 1.0, 0.0],[0.0, 0.0, 1.0]], requires_grad=False)
J = torch.tensor([[1.0, 0.0, 0.0], [0.0, 1.0, 0.0]], requires_grad=False)

# Forward pass
C = torch.tensor(0.0) 
for i in range(N):
    d_i = p - mu[i]
    S_i = torch.diag(s[i])
    Sigma_2D_i = J @ W @ R_i @ S_i @ S_i.T @ R_i.T @ W.T @ J.T
    Sigma_2D_inv_i = torch.linalg.inv(Sigma_2D_i)
    G_i = torch.exp(-0.5 * (d_i.T @ Sigma_2D_inv_i @ d_i))
    prod = 1
    for j in range(i):
        d_j = p - mu[j]
        S_j = torch.diag(s[j])
        Sigma_2D_j = J @ W @ R_i @ S_j @ S_j.T @ R_i.T @ W.T @ J.T
        Sigma_2D_inv_j = torch.linalg.inv(Sigma_2D_j)
        G_j = torch.exp(-0.5 * (d_j.T @ Sigma_2D_inv_j @ d_j))
        prod *= (1 - G_j)
    C += c[i] * G_i * prod
print("C(p) =", C)

# Backward pass
C.backward()

# Gradients
print(mu.grad)
print(s.grad)

C(p) = tensor(0.4083, grad_fn=<AddBackward0>)
tensor([[ 1.0745e-04,  2.1491e-04],
        [ 0.0000e+00,  1.4137e-01],
        [-4.6674e-02,  0.0000e+00]])
tensor([[0.0002, 0.0009, 0.0000],
        [0.0000, 0.1414, 0.0000],
        [0.0311, 0.0000, 0.0000]])
time: 16 ms (started: 2025-06-10 17:13:56 -03:00)


Comparação:

In [284]:
dC_dmu = torch.tensor(dC_dmu, dtype=torch.float32).reshape(mu.grad.shape)
dC_ds = torch.tensor(dC_ds, dtype=torch.float32).reshape(s.grad.shape)

# Gradients
print(mu.grad, "\n", dC_dmu)
print(s.grad, "\n", dC_ds )

tensor([[ 1.0745e-04,  2.1491e-04],
        [ 0.0000e+00,  1.4137e-01],
        [-4.6674e-02,  0.0000e+00]]) 
 tensor([[ 1.0745e-04,  2.1491e-04],
        [ 0.0000e+00,  1.4137e-01],
        [-4.6674e-02,  0.0000e+00]])
tensor([[0.0002, 0.0009, 0.0000],
        [0.0000, 0.1414, 0.0000],
        [0.0311, 0.0000, 0.0000]]) 
 tensor([[0.0002, 0.0009, 0.0000],
        [0.0000, 0.1414, 0.0000],
        [0.0311, 0.0000, 0.0000]])
time: 0 ns (started: 2025-06-10 17:13:56 -03:00)


# Exemplo 9

$\mathcal{L} = \sum_{k=1}^{256}|| C(\mathbf{p_k}) - \hat{C}(\mathbf{p_k}) ||^2$

Onde:
- $\hat{C}(\mathbf{p_k})$ são observações;
- $C(\mathbf{p_k}) = \sum_{i=1}^{N} \left[ c_i o_i \ G_i(\mathbf{p_k})  \prod_{j=1}^{i-1} (1- o_j G_j(\mathbf{p_k})) \right]$;
- $c_i = l_0  sh^0_i -  l_1 sh^1_i (cp_y - \mu^{3D}_{iy}) + l_1 sh^2_i (cp_z - \mu^{3D}_{iz}) - l_1 sh^3_i (cp_x - \mu^{3D}_{ix})$
- $G_i(\mathbf{p_k}) = exp(-0.5 (\mathbf{p_k}-\mathbf{\mu}^{2D}_i)^T \Sigma_i^{'-1} (\mathbf{p_k}-\mathbf{\mu}^{2D}_i))$;
- $\Sigma_i' = JWR_iS_iS_i^TR_i^TW^TJ^T$;
- $S_i = diag(\mathbf{s_i})$;
- $\mu^{2D}_{i} = [\mu^{3D}_{ix}/\mu^{3D}_{iz} , \mu^{3D}_{iy}/\mu^{3D}_{iz} ]$
- $J \in \mathbb{R}^{2 \times 3}$ e $W,R_i,S_i \in \mathbb{R}^{3 \times 3}$;
- $c_i, o_i, sh_i, l_0, l_1, cp_x, cp_y, cp_z \in \mathbb{R}$, $\mathbf{\mu}^{2D}_i,\mathbf{p_k} \in \mathbb{R}^2$ e $\mathbf{s_i}, \mathbf{\mu}^{3D}_i\in \mathbb{R}^3$;
- $\mathbf{\mu_i}, sh_i, o_i$ e $\mathbf{s_i}$ serão atualizados pelo gradiente descendente.

$\nabla(\mu) = \frac{\partial C}{\partial \mu} = \sum_{i=1}^{N} \left[ c_i \ \frac{\partial G_i}{\partial \mu} \prod_{j=1}^{i-1} (1- \frac{\partial G_j}{\partial \mu}) \right] $

$\nabla(\mathbf{s}) = \frac{\partial C}{\partial \mathbf{s}} = \sum_{i=1}^{N} \left[ c_i \ exp(-0.5 (\mathbf{p}-\mathbf{\mu_i})^T \frac{\partial \Sigma_i^{'-1}}{\partial \mathbf{s}} (\mathbf{p}-\mathbf{\mu_i})) \prod_{j=1}^{i-1} (1- exp(-0.5 (\mathbf{p}-\mathbf{\mu_j})^T \frac{\partial \Sigma_j^{'-1}}{\partial \mathbf{s}} (\mathbf{p}-\mathbf{\mu_j}))) \right]$

Para $N=3, \mathbf{p}=[2,3],\ c_i=1/(i+1),\ \mathbf{\mu_i}=[1+i,1+i],\ \mathbf{s_i} = [0.5(i+1),0.5(i+1),0.5(i+1)]$, $J = diag(1,1,0), W = I, R_i=I \rightarrow$

$\nabla(\mu) = ?$

$\nabla(\mathbf{s}) = ?$

# Exemplo 5

$m = 2w_1 + 3w_2 + b$

Onde $w_i$ e $b_i$ são parâmetros a serem atualizados pelo gradiente descendente.

$\nabla(w_1) = \frac{\partial m}{\partial w_1}=2$

$\nabla(w_2) = \frac{\partial m}{\partial w_2}=3$

$\nabla(b) = \frac{\partial m}{\partial b}=1$

Para quaisquer $[w_1, w_2]$ e $[b] \rightarrow$

$\nabla(w_1) = 2$

$\nabla(w_2) = 3$

$\nabla(b) = 1$


In [4]:
import torch
import torch.nn as nn

# Define a simple model
m = nn.Linear(in_features=2, out_features=1)  # Single layer: m = xA^T + b

# Input and target
x = torch.tensor([[2.0, 3.0]], requires_grad=True)

# Forward pass
pred = m(x)

# Backward pass
pred.backward()

# Gradients
print("Weight gradient:", m.weight.grad)
print("Bias gradient:", m.bias.grad)

Weight gradient: tensor([[2., 3.]])
Bias gradient: tensor([1.])


# Exemplo 5

$m = x_1w_1 + x_2w_2 + x_3w_3 + b$

Onde $w_i$ e $b$ são parâmetros a serem atualizados pelo gradiente descendente.

$\nabla(w_1) = \frac{\partial m}{\partial w_1}=x_1$

$\nabla(w_2) = \frac{\partial m}{\partial w_2}=x_2$

$\nabla(w_3) = \frac{\partial m}{\partial w_2}=x_3$

$\nabla(b) = \frac{\partial m}{\partial b}=1$

Para $[x_1, x_2, x_3]=[2, 3, 4]$ e quaisquer $[w_1, w_2, w_3,b] \rightarrow$

$\nabla(w_1) = 2$

$\nabla(w_2) = 3$

$\nabla(w_3) = 4$

$\nabla(b) = 1$


In [7]:
import torch
import torch.nn as nn

# Define a simple model
m = nn.Linear(in_features=3, out_features=1)  # Single layer: m = xA^T + b

# Input and target
x = torch.tensor([[2.0, 3.0, 4.0]], requires_grad=True)

# Forward pass
pred = m(x)

# Backward pass
pred.backward()

# Gradients
print("Weight gradient:", m.weight.grad)
print("Bias gradient:", m.bias.grad)

Weight gradient: tensor([[2., 3., 4.]])
Bias gradient: tensor([1.])
