torch.nn中包含两种接口：modules 和对应的function 版本。一般建议使用modules构建参数较多操作，例如layers; 使用function来构建参数较少的操作，例如激活函数，pooling 等。　PyTorch允许我们扩展torch.nn,torch.autograde,以及利用C库来自定义C扩展。若要构建一个**Linear**操作，需要完成两个部分：Module 和　Function。

In [9]:
import torch.nn as nn
import torch

In [4]:
class Linear(nn.Module):
    def __init__(self, input_features, output_features,bias=True):
        super(Linear, self).__init__()
        self.input_features = input_features
        self.output_features = output_features
        ## nn.Parameter is a special kind of Tensor, that will get automatically registered as Module's parameter
        ## once it's assigned as an attribute. Parameters and buffers need to be registered, ot they won't be converted
        ## when e.g. .cuda() is called. You can use .register_buffer() to register buffers.
        ## nn.Parameters require gradients by default
        
        self.weight = nn.Parameter(torch.Tensor(output_features, input_features))
        if bias:
            self.bias = nn.Parameter(torch.Tensor(output_features))
        else:
            # You should always register all possible parameters, but the optional ones can be None if you want.
            self.register_parameter('bias',None)
        
        self.weight.data.uniform_(-0.1,0.1)
        if bias is not None:
            self.bias.data.uniform_(-0.1,0.1)
    
    def forward(self,input):
        # See the autograde section for explanation of what happens here.
        return LinearFunction.apply(input, self.weight,self.bias)
    
    def extra_repr(self):
        # (optional) Set the extra information about this module. You can test it by printing an object of this class
        return 'in_features={}, out_features ={}, bias={}'.format(
        self.in_features, self.out_features, self.bias is not None
        )
        

上述对module的定义对parameter和buffer进行了注册，定义了前向传播函数和额外的自定义函数。如果要将新构建的操作添加到autograd中，是其可以自动求导，需要完成相应的**Function**内容。autograde通过function来实际计算结果和梯度，并记录操作历史。每一个新定义的function需要至少需要完成两个方法：forward()方法用来完成运算，backward()方法用来计算梯度.

In [None]:
class LinearFunction(Function): # Inherit from Function
    # note that both forward and backward are @staticmethods
    @staticmethod
    def forward(ctx, input,weight,bias=None):
        ctx.save_for_backward(input,weight,bias)
        output = input.mm(weight.t())
        if bias is not None:
            output += bias.unsqueeze(0).expand_as(output)
        return output
    
     # This function has only a single output, so it gets only one gradient
    @staticmethod
    def backward(ctx,grad_output):
        # This is a pattern that is very convenient-
        # At the top of backward unpack saved_tensors and initialize all gradients
        # Thank to the fact that additional trailing Nones are ignored, the return statement is simple
        # even when the function has optional inputs.
        input, weight, bias = ctx.saved_tensors
        grad_input = grad_weight = grad_bias = None
        
        # These needs_input_grad checks are optional and there only to improve efficiency
        # if you want to make your code simpler, you can skip them.
        # Returning gradients for inputs that don't requires it is not an error
        if ctx.needs_input_grad[0]:
            grad_input = grad_output.mm(weight)
        if ctx_needs_input_grad[1]:
            grad_weight = grad_ouput.t().mm(input)
        if bias is not None and ctx.needs_input_grad[2]:
            grad_bias = grad_output.sum(0).sequeeze(0)
        
        return grad_input, grad_weight, grad_bias

当需要使用自定义的function时，需要调用apply()方法：　linear = LinearFunction.apply

另外给出一个没有可学习参数的function的例子

In [12]:
class MulConstant(torch.autograd.Function):
    @staticmethod
    def forward(ctx,tensor,constant):
        # ctx is a context object that can be used to stash information for backward computation
        ctx.constant = constant
        return tensor * constant

    @staticmethod
    def backward(ctx, grad_output):
        # We return as many input gradients as there were arguments.
        # Gradients of Non-Tensor arguments to forward must be None
        return grad_output * ctx.constant, None

backward()函数的输入tensor也可以被追中历史，如果backward方法中包含有可微分操作，则会有更高阶的求导。

通过对比backward()求导和数值模拟的结果，可以检查所写的backward()是否正确

In [None]:
from torch.autograd import gradcheck
# gradcheck takes a tuple of tensors as input, check if your gradient evaluated with these tensors are close
# enough to numerical approaximations and returns True if they all verify this condition
input = (torch.randn(20,20,dtype = troch.double,requires_grad =True),torch.randn(30,20,dtype=torch.double,requires_grad = True))
test = gradcheck(linear,input,eps = 1e-6, atol=1e-4)
print(test)