## 损失函数及其梯度

### 连续函数

#### MSE 

![image-2.png](attachment:image-2.png)

![image.png](attachment:image.png)

In [25]:
import torch
from torch.nn import functional as F

In [26]:
x = torch.ones(1)
x

tensor([1.])

In [27]:
w = torch.full([1],2.0,requires_grad = True)
w

tensor([2.], requires_grad=True)

In [28]:
mse = F.mse_loss(x*w,torch.ones(1))
mse

tensor(1., grad_fn=<MseLossBackward0>)

### 自动求导 第一种方法

In [30]:
torch.autograd.grad(mse,[w])

(tensor([2.]),)

In [31]:
w.requires_grad_() #设置需要梯度

tensor([2.], requires_grad=True)

In [32]:
# w = torch.tensor(w,requires_grad = True)

In [33]:
mse = F.mse_loss(x*w,torch.ones(1))
mse

tensor(1., grad_fn=<MseLossBackward0>)

In [34]:
torch.autograd.grad(mse,[w])

(tensor([2.]),)

  ### 开始自动求导 第二种方法

In [43]:
x = torch.ones(1)
x

tensor([1.])

In [44]:
w = torch.full([1],2.0,requires_grad = True)
w

tensor([2.], requires_grad=True)

In [45]:
w.grad #这里显示没有梯度

In [46]:
mse = F.mse_loss(x*w,torch.ones(1))
mse

tensor(1., grad_fn=<MseLossBackward0>)

In [47]:
mse.backward()

In [49]:
w.grad #这里有了梯度

tensor([2.])

## softmax函数

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

In [55]:
a = torch.rand(3,requires_grad = True)
a

tensor([0.1043, 0.4946, 0.3141], requires_grad=True)

In [56]:
p = F.softmax(a,dim=0)
p

tensor([0.2695, 0.3981, 0.3324], grad_fn=<SoftmaxBackward0>)

In [58]:
p.backward()

RuntimeError: grad can be implicitly created only for scalar outputs

In [65]:
p = F.softmax(a,dim=0)
p

tensor([0.2695, 0.3981, 0.3324], grad_fn=<SoftmaxBackward0>)

In [66]:
torch.autograd.grad(p[1],a,retain_graph=True)

(tensor([-0.1073,  0.2396, -0.1323]),)

In [67]:
torch.autograd.grad(p[2],a,retain_graph=True)

(tensor([-0.0896, -0.1323,  0.2219]),)

In [68]:
torch.autograd.grad(p[0],a,retain_graph=True)

(tensor([ 0.1969, -0.1073, -0.0896]),)

In [69]:
# p.backward(retain_graph=True)

RuntimeError: grad can be implicitly created only for scalar outputs