## Couple of examples for optimization of custom function with PyTorch

In [0]:
import torch
import torch.optim as optim

### First of all try manually one cycle. We are going to find a minimum for function $y = x ^ 2$

In [0]:
initial_value = 100.
lr = 0.01
x_0 = torch.tensor([initial_value], requires_grad=True)

In [0]:
# Define optimizer: 
optimizer = optim.SGD([x_0], lr=lr, momentum=0.1)

In [0]:
# Don't forget to reset grads:
optimizer.zero_grad()

In [149]:
out = x_0 ** 2
out.backward()
out

tensor([10000.], grad_fn=<PowBackward0>)

Print the derivative:

In [150]:
x_0.grad

tensor([200.])

Make one step of Gradient Descent:

In [0]:
optimizer.step()

Print new value of x_0

In [152]:
x_0

tensor([98.], requires_grad=True)

Just check it manually:

In [153]:
manual_new_x_0 = initial_value - lr * x_0.grad.item() # where 100 is initial value , 0.
manual_new_x_0

98.0

In [0]:
assert manual_new_x_0 == x_0

Now let's make several iterations:

In [156]:
for i in range(20):
  print('Iteration:', i + 1)
  optimizer.zero_grad()
  
  output = x_0 ** 2
  # loss = criterion(output, torch.tensor([0.]))
  
  output.backward()
  print('x_0 = ', round(x_0.item(), 3), '|', 'x_0_grad = ', round(x_0.grad.item(), 3))
  optimizer.step()
  
  print('out = ', round(output.item(), 3), '|', 'x_0 (after step) = ', round(x_0.item(), 3))
  print('------------------------------')
  

Iteration: 1
x_0 =  62.466 | x_0_grad =  124.933
out =  3902.045 | x_0 (after step) =  61.075
------------------------------
Iteration: 2
x_0 =  61.075 | x_0_grad =  122.149
out =  3730.117 | x_0 (after step) =  59.714
------------------------------
Iteration: 3
x_0 =  59.714 | x_0_grad =  119.428
out =  3565.765 | x_0 (after step) =  58.384
------------------------------
Iteration: 4
x_0 =  58.384 | x_0_grad =  116.767
out =  3408.654 | x_0 (after step) =  57.083
------------------------------
Iteration: 5
x_0 =  57.083 | x_0_grad =  114.166
out =  3258.465 | x_0 (after step) =  55.811
------------------------------
Iteration: 6
x_0 =  55.811 | x_0_grad =  111.622
out =  3114.895 | x_0 (after step) =  54.568
------------------------------
Iteration: 7
x_0 =  54.568 | x_0_grad =  109.136
out =  2977.65 | x_0 (after step) =  53.352
------------------------------
Iteration: 8
x_0 =  53.352 | x_0_grad =  106.704
out =  2846.451 | x_0 (after step) =  52.164
------------------------------
I

## Now another example but we will optimize this function:
$f(x) = 2x^5-5x^4+x^3+3x^2-x$

Here is the graph:
https://www.desmos.com/calculator/i3rnu6mrsy

This function has 2 local minum at points approximately:
$x_0=0.167$, $x_0=1.604$ and also $\lim \limits_{x \to -\infty}^{} f(x)= - \infty$


You can try several start points and different learning rates. Eventually you will reach one of those three points, i.e $x_0=0.167$, $x_0=1.604$ or $x_0 = - \infty$

In [0]:
initial_value = 4.
lr = 0.001

x_0 = torch.tensor([initial_value], requires_grad=True)
# Define loss function:
criterion = torch.nn.MSELoss()
# Define optimizer: 
optimizer = optim.SGD([x_0], lr=lr, momentum=0.0) 

In [199]:
for i in range(200):
  print('Iteration:', i + 1)
  optimizer.zero_grad()
  
  output = 2 * x_0 ** 5 - 5 * x_0 ** 4 + x_0 ** 3 + 3 * x_0 ** 2 - x_0
  # loss = criterion(output, torch.tensor([0.]))
  
  output.backward()
  print('x_0 = ', round(x_0.item(), 3), '|', 'x_0_grad = ', round(x_0.grad.item(), 3))
  optimizer.step()
  
  print('out = ', round(output.item(), 3), '|', 'x_0 (after step) = ', round(x_0.item(), 3))
  print('------------------------------')

Iteration: 1
x_0 =  4.0 | x_0_grad =  1351.0
out =  876.0 | x_0 (after step) =  2.649
------------------------------
Iteration: 2
x_0 =  2.649 | x_0_grad =  156.585
out =  51.665 | x_0 (after step) =  2.492
------------------------------
Iteration: 3
x_0 =  2.492 | x_0_grad =  108.832
out =  31.042 | x_0 (after step) =  2.384
------------------------------
Iteration: 4
x_0 =  2.384 | x_0_grad =  82.291
out =  20.687 | x_0 (after step) =  2.301
------------------------------
Iteration: 5
x_0 =  2.301 | x_0_grad =  65.415
out =  14.628 | x_0 (after step) =  2.236
------------------------------
Iteration: 6
x_0 =  2.236 | x_0_grad =  53.778
out =  10.737 | x_0 (after step) =  2.182
------------------------------
Iteration: 7
x_0 =  2.182 | x_0_grad =  45.298
out =  8.078 | x_0 (after step) =  2.137
------------------------------
Iteration: 8
x_0 =  2.137 | x_0_grad =  38.865
out =  6.174 | x_0 (after step) =  2.098
------------------------------
Iteration: 9
x_0 =  2.098 | x_0_grad =  33.

Notice that not always you will reach the closest minimum.
For example, take initial_value = 4., lr = 0.1

The closest minimum is obviously $x_0=1.604$
However, at first iteration you will achive that:
$x_{new} = x_0 - lr \cdot f'(x_0) = 4 - 0.01 \cdot 1351.0 = -9.51$
And after it you will reach $-\infty$