创建一个 cfg 文件夹，然后在文件夹里下载配置文件
```
wget https://raw.githubusercontent.com/pjreddie/darknet/master/cfg/yolov3.cfg
```

写一个 parse_cfg 函数来解析我们下载的 cfg。这个 cfg 是 yolo3作者自己用的配置文件，格式不属于任何一种 python 常用的配置文件格式（作者是用 c 写的），所以我们不得不写这么一个很奇怪的解析函数

创建一个 darknet.py 来写网络构建的代码

In [14]:
from __future__ import division

import torch 
import torch.nn as nn
import torch.nn.functional as F 
import numpy as np

In [11]:
def parse_cfg(cfgfile):
    """
    Takes a configuration file 
    
    Returns a list of blocks. Each blocks describes a block in the neural
    network to be built. Block is represented as a dictionary in the list
    
    """
    file = open(cfgfile, 'r')
    lines = file.read().split('\n')                        # store the lines in a list
    lines = [x for x in lines if len(x) > 0]               # get read of the empty lines 
    lines = [x for x in lines if x[0] != '#']              # get rid of comments
    lines = [x.rstrip().lstrip() for x in lines]           # get rid of fringe whitespaces

    block = {}
    blocks = []

    for line in lines:
        if line[0] == "[":               # This marks the start of a new block
            if len(block) != 0:          # If block is not empty, implies it is storing values of previous block.
                blocks.append(block)     # add it the blocks list
                block = {}               # re-init the block
            block["type"] = line[1:-1].rstrip()     
        else:
            key,value = line.split("=") 
            block[key.rstrip()] = value.lstrip()
    blocks.append(block)

    return blocks

看一眼解析的结果：

In [12]:
parse_cfg('./src/cfg/yolov3.cfg')

[{'type': 'net',
  'batch': '64',
  'subdivisions': '16',
  'width': '608',
  'height': '608',
  'channels': '3',
  'momentum': '0.9',
  'decay': '0.0005',
  'angle': '0',
  'saturation': '1.5',
  'exposure': '1.5',
  'hue': '.1',
  'learning_rate': '0.001',
  'burn_in': '1000',
  'max_batches': '500200',
  'policy': 'steps',
  'steps': '400000,450000',
  'scales': '.1,.1'},
 {'type': 'convolutional',
  'batch_normalize': '1',
  'filters': '32',
  'size': '3',
  'stride': '1',
  'pad': '1',
  'activation': 'leaky'},
 {'type': 'convolutional',
  'batch_normalize': '1',
  'filters': '64',
  'size': '3',
  'stride': '2',
  'pad': '1',
  'activation': 'leaky'},
 {'type': 'convolutional',
  'batch_normalize': '1',
  'filters': '32',
  'size': '1',
  'stride': '1',
  'pad': '1',
  'activation': 'leaky'},
 {'type': 'convolutional',
  'batch_normalize': '1',
  'filters': '64',
  'size': '3',
  'stride': '1',
  'pad': '1',
  'activation': 'leaky'},
 {'type': 'shortcut', 'from': '-3', 'activatio

核对一下论文里的网络结构图，确认结构一致
![](https://pic1.zhimg.com/80/v2-770e443d1ad592a70bdf31868036a3fc_1440w.jpg)

定义两个层，一个空层EmptyLayer用于 route 和 shortcut，一个检测层 DetectionLayer用于预测目标检测的 bbox

In [15]:
class EmptyLayer(nn.Module):
    def __init__(self):
        super(EmptyLayer, self).__init__()

class DetectionLayer(nn.Module):
    def __init__(self, anchors):
        super(DetectionLayer, self).__init__()
        self.anchors = anchors

可以看到这两个层都很简单，因为 route 层中有 concat 操作，而 shortcut 有把两个 featuremap 相加的操作，这两个操作都很简单可以直接在最终的主网络的 forward 中实现，现在先用简单的层来占位置

然后我们要进一步用我们解析得到的 cfg 参数，来创建网络模块，这里我们定义一个 create_modules()

In [16]:
def create_modules(blocks):
    net_info = blocks[0]        # Captures the information about the input and pre-processing    
    module_list = nn.ModuleList()
    prev_filters = 3            # previous feature map is an image, so the number of filters is 3 (R, G, B)
    output_filters = []

    for idx, each_block in enumerate(blocks[1:]):
        module = nn.Sequential()
        # check the type of block
        # create a new module for the block
        # append to module_list
        if each_block['type'] == 'convolutional':
            try:
                bn = int(each_block['batch_normalize'])
                bias = False
            except:
                bn = 0
                bias = True
            filters = int(each_block['filters'])
            size = int(each_block['size'])
            stride = int(each_block['stride'])
            pad = int(each_block['pad'])
            activation = each_block['activation']

            if pad:
                pad = (size - 1) // 2
            else:
                pad = 0

            # add conv layer
            conv = nn.Conv2d(prev_filters, filters, size, stride, pad, bias=bias)
            module.add_module('conv_{}'.format(idx), conv)

            # add bn layer
            if bn:
                bn = nn.BatchNorm2d(filters)
                module.add_module('bn_{}'.format(idx), bn)
            
            # check the activation
            # activation will be either leaky or linear
            if activation == 'leaky':
                leaky = nn.LeakyReLU(0.1, inplace=True)
                module.add_module('leaky_{}'.format(idx), leaky)
            
        elif each_block['type'] == 'upsample':
            stride = int(each_block['stride'])
            upsample = nn.Upsample(scale_factor=2, mode='bilinear')
            module.add_module("upsample_{}".format(idx), upsample)

        elif each_block['type'] == 'route':
            layers = each_block['layers'].split(',')
            start = int(layers[0])
            try:
                end = int(layers[1])
            except:
                end = 0

            if start > 0:
                start = start - idx 
            if end > 0:
                end = end - idx     # a trick to let end negative, to keep end + idx correct
            
            route =  EmptyLayer()
            module.add_module("route_{0}".format(idx), route)

            if end < 0:
                end = output_filters[end + idx]
            filters = output_filters[start + idx] + end

        elif each_block['type'] == 'shortcut':
            shortcut = EmptyLayer()
            module.add_module("shortcut_{}".format(idx), shortcut)

        elif each_block['type'] == 'yolo':
            mask = each_block['mask'].split(',')
            mask = list(map(lambda x: int(x), mask))

            anchors = each_block['anchors'].split(',')
            anchors = list(map(lambda x: int(x), anchors))
            anchors = [(anchors[i], anchors[i+1]) for i in range(0, len(anchors), 2)]
            anchors = [anchors[i] for i in mask]

            detection = DetectionLayer(anchors)
            module.add_module('detection_{}'.format(idx), detection)
        
        module_list.append(module)
        prev_filters = filters
        output_filters.append(filters)

    return (net_info, module_list)

里面一些地方用了一些简单的 trick 来尽量保证代码简洁而可读性高，多看几遍应该不难理解

最后可以写一段代码来验证我们的网络是否正确创建

In [18]:
blocks = parse_cfg("src/cfg/yolov3.cfg")
print(create_modules(blocks))

({'type': 'net', 'batch': '64', 'subdivisions': '16', 'width': '608', 'height': '608', 'channels': '3', 'momentum': '0.9', 'decay': '0.0005', 'angle': '0', 'saturation': '1.5', 'exposure': '1.5', 'hue': '.1', 'learning_rate': '0.001', 'burn_in': '1000', 'max_batches': '500200', 'policy': 'steps', 'steps': '400000,450000', 'scales': '.1,.1'}, ModuleList(
  (0): Sequential(
    (conv_0): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (bn_0): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (leaky_0): LeakyReLU(negative_slope=0.1, inplace=True)
  )
  (1): Sequential(
    (conv_1): Conv2d(32, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
    (bn_1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (leaky_1): LeakyReLU(negative_slope=0.1, inplace=True)
  )
  (2): Sequential(
    (conv_2): Conv2d(64, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (bn_2): BatchN