基本数据结构：</br>
struct Matrix </br>
{</br>
    (int) channelSize ：描述该特征矩阵的通道数</br>
    (int) rowSize ： 描述该特征矩阵的行数</br>
    (int) columnSize ： 描述该特征矩阵的列数</br>
    (float ***) feature : 存储该特征矩阵的值</br>
}</br>
定义：文中所示 num 为批次大小 ； i 为特征矩阵的通道数 ； j 为特征矩阵的行数 ； k 为特征矩阵的例数 ；x 为卷积核数量
    ；y为卷积核行数 ； z为卷积核例数

<b>以下皆为 特征矩阵基本计算时使用的函数算法分析</br>

In [3]:
import random
class Matrix:
    channelSize = 0
    rowSize = 0
    columnSize = 0
    feature =[[[]]]

    def __init__(self,func,channelSize, rowSize , columnSize):
        self.channelSize = channelSize
        self.rowSize = rowSize
        self.columnSize = columnSize
        if(func == "Zero"):
            self.feature = Matrix.Zero(channelSize, rowSize , columnSize)
        if(func == "Random"):
            self.feature = Matrix.Random(channelSize, rowSize , columnSize)

    def Zero(channel:int ,row:int , column:int):
        output = [[[0 for k in range(column)] for j in range(row)] for i in range(channel)]
        return output

    def Random(channel:int ,row:int , column:int):
        output = [[[random.random() for k in range(column)] for j in range(row)] for i in range(channel)]
        return output


struct Matrix *Matrix_Init(struct Matrix *(*func)(int ,int,int),int channel ,int row,int column);

功能 ：该函数为特征矩阵的初始化操作，由*func(Zero/Random)选取初始化模式并通过申请一个Matrix数据结构的空间，再将其指针地址返回。其中Zero初始化的矩阵值全为0，而Random函数初始化的值则全为0至1的浮点数。

时间复杂度O(N) ：2 * i * j * k

调用例子 ： struct Matrix *feature = Matrix_Init(Zero,3,6,4)
如上所示将初始化一个大小为(3X6X4)的特侦矩阵并将其指针地址存储到feature变量中。

void Matrix_Print(struct Matrix *input)<br/>
<br/>
功能 ：该函数为输出代入的特征矩阵<br/>
其格式为：<br/>
channel size = %d , size = %d * %d<br/> 
并依次输出每个特征矩阵通道的二维矩阵<br/>

时间复杂度O(N) ：i * j * k<br/>

调用例子 ： Matrix_Print(feature)<br/>
如上所示将输出该变量的特征矩阵大小并依次输出feature变量中的值。<br/>

算法演示

In [4]:
def Matrix_Print(input:Matrix):
    print("channel size =",input.channelSize,"size =", input.rowSize,"* columnSize" , input.columnSize)
    for i in range(input.channelSize):
        print("channel =",i)
        for j in range(input.rowSize):
            for k in range(input.columnSize):
                print("{:.6f}".format(input.feature[i][j][k]),end = " ")
            print()

In [5]:
feature = Matrix("Random",2,4,4)
Matrix_Print(feature)

channel size = 2 size = 4 * columnSize 4
channel = 0
0.463846 0.055004 0.396108 0.269861 
0.957129 0.925554 0.287401 0.154724 
0.495547 0.501303 0.220264 0.879485 
0.763517 0.488933 0.313242 0.296595 
channel = 1
0.981330 0.191742 0.749812 0.960439 
0.534702 0.090567 0.222822 0.280778 
0.069963 0.583073 0.444843 0.235924 
0.061166 0.516728 0.363048 0.215814 


void Matrix_Convolution(struct Matrix *input,struct Matrix **kernel,struct Matrix *output ,int paddingAmt,int stride)
<br/>
变量：   (Matrix *)input = 待卷积的特征矩阵<br/>
        (Matrix **)kernel = 为卷积核 (通常卷积核不止一个，所以输入为一列的卷积核)<br/>
        (Matrix *)output = 卷积后的特征矩阵<br/>
        (int) paddingAmt = 填充大小 (所谓填充即在input中添加全为0的特征)<br/>
        (int) stride = 步长<br/>
功能：计算input与kernel的卷积答案并将答案存储在output中，paddingAmt为其填充大小，stride为其步长<br/>
output大小为:<br/>
kernel的数量 X 取整(input的行数 -2 * 填充大小 - 卷积核的行数)/步长）+1 X 取整(input的例数 -2 * 填充大小 - 卷积核的例数)/步长）+1<br/>

时间复杂度O(N) ：i * j * k * x * y * z<br/>

调用例子 ： Matrix_Convolution(feature,kernel,ans,1,2)<br/>
如上所示将用kernel对feature进行填充为1，步长为2的卷积并将结果存储在ans中。<br/>

算法演示

In [6]:
def Matrix_Convolution(input:Matrix,kernel: list[Matrix], output:Matrix , paddingAmt:int , stride:int):
    channel = input.channelSize
    outputSize = output.rowSize
    kernelSize = kernel[0].rowSize
    kernelAmt = output.channelSize
    for x in range (kernelAmt):
        for j in range (outputSize):
            for k in range (outputSize):
                temp = 0
                for i in range (channel):
                    for y in range (kernelSize):
                        for z in range (kernelSize):
                            rowLoc = j * stride + y - paddingAmt #用于确定其在input内的行索引
                            columnLoc = k * stride + z - paddingAmt #用于确定其在input内的例索引
                            #用于判断是否为padding区域若为padding区域不需要计算
                            if ((rowLoc < 0 or rowLoc >= input.rowSize ) or (columnLoc < 0 or columnLoc >= input.columnSize )):
                                continue
                            #若位于该位置进行点乘并叠加到暂时变量
                            temp = temp + input.feature[i][rowLoc][columnLoc] * kernel[x].feature[i][y][z]
                output.feature[x][j][k] = temp #将暂时变量的值存到答案中

In [7]:
input = Matrix("Random",3,5,5)
kernel = [Matrix("Random",3,3,3) for kernelAmt in range (12)]
output = Matrix("Random",12,3,3)
paddingAmt = 1
stride = 2
Matrix_Convolution(input ,kernel, output , paddingAmt , stride)
Matrix_Print(output)

channel size = 12 size = 3 * columnSize 3
channel = 0
2.861483 5.484993 3.656394 
4.415974 8.690598 5.129236 
2.826769 5.639196 3.637814 
channel = 1
3.691501 6.227852 4.485535 
4.842864 8.970283 5.591837 
3.404345 6.639315 3.907929 
channel = 2
3.735665 6.442918 3.365600 
5.387335 7.883470 4.222595 
2.708169 5.333133 2.654628 
channel = 3
3.221355 5.016346 2.650469 
4.689790 8.076442 3.074036 
3.268157 5.565744 2.594296 
channel = 4
3.730161 6.556230 3.147093 
4.424768 9.722346 4.706911 
3.257290 6.638313 3.617179 
channel = 5
2.032599 5.177394 3.587141 
3.252410 7.495872 4.062761 
3.174226 5.053252 2.668802 
channel = 6
3.372055 6.106624 4.086923 
4.384296 9.456740 5.234183 
3.328302 6.694340 4.280468 
channel = 7
2.790497 4.003550 3.275084 
4.312906 7.639253 4.060403 
2.813390 4.683752 3.054389 
channel = 8
3.083744 5.498510 2.109720 
4.160560 9.005725 4.122452 
3.075448 5.891340 3.246418 
channel = 9
3.528448 5.040360 3.766196 
3.742920 7.944966 3.699381 
2.816266 5.198250 2.808086

void Matrix_Sum(struct Matrix *input,struct Matrix *addTerm,struct Matrix *output);
<br/>
变量：   (Matrix *)input = 待加的特征矩阵<br/>
        (Matrix *)addTerm = 被加的特征矩阵<br/> 
        (Matrix *)output = 相加后的特征矩阵<br/>
功能：计算input与addTerm的向加答案并将答案存储在output中，由于都是指针所以output如果填入input的话则将答案存储回input中<br/>

时间复杂度O(N) ：i * j * k <br/>

调用例子 ： Matrix_Sum(feature,addTerm,ans)<br/>
如上所示将feature和addTerm进行相加并将结果存储在ans中。<br/>

算法演示


In [8]:
def Matrix_Sum(input : Matrix ,addTerm : Matrix ,output : Matrix ):
    channel = input.channelSize
    row = input.rowSize
    column = input.columnSize
    for i in range (channel):
        for j in range (row):
            for k in range (column):
                #简单的相加，遍历整个特征矩阵
                output.feature[i][j][k] = input.feature[i][j][k] + addTerm.feature[i][j][k]

In [9]:
input = Matrix("Zero",2,3,3)
addTerm = Matrix("Random",2,3,3)
Matrix_Sum(input,addTerm,input)
Matrix_Print(input)

channel size = 2 size = 3 * columnSize 3
channel = 0
0.527471 0.007150 0.684503 
0.342191 0.798709 0.917567 
0.810716 0.386779 0.229733 
channel = 1
0.010114 0.449184 0.753212 
0.296020 0.841618 0.994437 
0.809964 0.189660 0.155757 


void Matrix_Multiply(struct Matrix *input,struct Matrix *mulTerm,struct Matrix *output);
<br/>
变量：   (Matrix *)input = 待乘的特征矩阵<br/>
        (Matrix *)mulTerm = 被乘的特征矩阵<br/> 
        (Matrix *)output = 相乘后的特征矩阵<br/>
功能：计算input与addTerm的叉乘答案并将答案存储在output中<br/>

时间复杂度O(N) ：i * j * k * z<br/>

调用例子 ： Matrix_Multiply(feature,mulTerm,ans)<br/>
如上所示将feature和addTerm进行叉乘并将结果存储在ans中。<br/>

算法演示

In [10]:
def Matrix_Multiply(input : Matrix ,mulTerm : Matrix ,output : Matrix ):
    channel = input.channelSize
    row = input.rowSize
    column = input.columnSize
    mulSize = mulTerm.columnSize
    for i in range (channel):
        for z in range (mulSize):
            for j in range (row):
                temp = 0 #存储行X列相加的答案
                for k in range (column):
                    temp = temp + input.feature[i][j][k] * mulTerm.feature[i][k][z]
                output.feature[i][j][z] = temp

In [11]:
input = Matrix("Random",2,2,3)
mulTerm = Matrix("Random",2,3,2)
ans = Matrix("Zero",2,2,2)
Matrix_Multiply(input,mulTerm,ans)
Matrix_Print(ans)

channel size = 2 size = 2 * columnSize 2
channel = 0
0.958077 1.160086 
0.577696 0.585037 
channel = 1
0.357706 1.168849 
0.446968 1.405853 


void Matrix_ToZero(struct Matrix *input)<br/>
<br/>
功能 ：将特征矩阵所有特征值变为0<br/>

时间复杂度O(N) ：i * j * k<br/>

调用例子 ： Matrix_ToZero(feature)<br/>
如上所示将特征矩阵feature变量中的值全变为0。<br/>

算法演示

In [12]:
def Matrix_ToZero(input:Matrix):
    for i in range(input.channelSize):
        for j in range(input.rowSize):
            for k in range(input.columnSize):
                input.feature[i][j][k] = 0

In [13]:
feature = Matrix("Random",2,2,2)
print("before to zero ...")
Matrix_Print(feature)
Matrix_ToZero(feature)
print("after to zero ...")
Matrix_Print(feature)

before to zero ...
channel size = 2 size = 2 * columnSize 2
channel = 0
0.833939 0.873763 
0.824797 0.466952 
channel = 1
0.864338 0.784714 
0.622560 0.446398 
after to zero ...
channel size = 2 size = 2 * columnSize 2
channel = 0
0.000000 0.000000 
0.000000 0.000000 
channel = 1
0.000000 0.000000 
0.000000 0.000000 


<b>以下皆为 批量归一化函数算法分析</br>

基本数据结构</br>
struct BN</br>
{</br>
    int channelSize = 描述该层的通道数 </br>
</br>
    float *beta = 该层各通道的beta常数</br>
    float *gamma = 该层各通道的gamma常数</br>
    float *mean = 该层各通道的平均数</br>
    float *variance  = 该层各通道的方差</br>
};</br>



In [14]:
class BN:
    channelSize = 0

    beta = []
    gamma = []
    mean =[]
    variance = []
    runMean = []
    runVar = []

    def __init__(self,channelSize):
        self.channelSize = channelSize
        self.beta = [0 for i in range (channelSize)]
        self.gamma = [1 for i in range (channelSize)]
        self.mean = [0 for i in range (channelSize)]
        self.variance = [0 for i in range (channelSize)]
        self.runMean = [0 for i in range (channelSize)]
        self.runVar = [1 for i in range (channelSize)]

struct BN *BN_Init(int channel);

功能 ：该函数为批标准化的初始化操作，通过申请一个BN数据结构的空间，再将其指针地址返回。其中所有beta默认为0.1，gamma为1。

时间复杂度O(N) ： i

调用例子 ： struct BN *coeff = BN_Init(6)
如上所示将初始化一个通道数为6的批标准化数据结构并将其指针地址存储到coeff变量中。

void *BN_GetCoeff(struct Matrix **input , struct BN *coeff)<br/>
</br>
功能 ：获取该层的批标准化的平均数以及方差<br/>

时间复杂度O(N) ：2 * (num * i * j * k)<br/>

调用例子 ： BN_GetCoeff(feature,coeff)<br/>
如上所示将整批特征矩阵feature代入求出该层的平均值以及方差。<br/>

算法演示

In [22]:
def BN_GetCoeff(input:list[Matrix],coeff:BN,momentum:float):
    BATCH = 5 #批次大小建议宏定义
    channel = coeff.channelSize
    row = input[0].rowSize
    column = input[0].columnSize
    amount = BATCH * row * column
    for i in range(channel):
        temp = 0 #用以存储加总值
        for num in range(BATCH):
            for j in range (row):
                for k in range (column):
                    temp = temp + input[num].feature[i][j][k]
        coeff.mean[i] = temp/amount
        coeff.runMean[i] = (1-momentum)*coeff.runMean[i] + momentum * coeff.mean[i]

    for i in range(channel):
        temp = 0 #用以存储加总值
        for num in range(BATCH):
            for j in range (row):
                for k in range (column):
                    temp = temp + (input[i].feature[i][j][k] - coeff.mean[i])**2
        coeff.variance[i] = temp/amount
        coeff.runVar[i] = (1-momentum)*coeff.runVar[i] + momentum*temp/(amount-1)


In [23]:
BATCH = 5
feature = [Matrix("Random",3,3,3) for featureAmt in range (BATCH)]
coeff = BN(3)
BN_GetCoeff(feature,coeff,0.1)
print("after get coeff",coeff.mean)
print("after get coeff",coeff.variance)

after get coeff [0.5565143744309039, 0.4912936417996365, 0.4296305643353902]
after get coeff [0.08590840663990257, 0.06688436501182544, 0.04546311377800545]


void BN_BatchNorm(struct Matrix *input , struct Matrix *output , struct BN *coeff )<br/>
</br>
功能 ：对该层特征矩阵进行批标准化<br/>
具体公式如下：</br>
x_hat = x-mean/(variance+epsilon)^0.5  (epsilon用于防止分母为零)</br>
y = x_hat*gamma + beta</br>

时间复杂度O(N) ：i * j * k<br/>

调用例子 ： BN_BatchNorm(feature,ans,coeff)<br/>
如上所示将特征矩阵feature与coeff进行批标准化并将标准化后的值输入ans。<br/>

算法演示

In [16]:
def BN_BatchNorm(input:Matrix , output:Matrix , coeff:BN):
    EPSILON = 0.1 #建议宏定义
    channel = coeff.channelSize
    row = input.rowSize
    column = input.columnSize
    for i in range(channel):
        for j in range (row):
            for k in range (column):
                output.feature[i][j][k] = coeff.gamma[i]*(input.feature[i][j][k] -coeff.mean[i]) / (coeff.variance[i] + EPSILON)**0.5 + coeff.beta[i] 

In [17]:
BATCH = 5
featureExample = [Matrix("Random",3,3,3) for featureAmt in range (BATCH)]
feature = featureExample[0]
coeff = BN(3)
BN_GetCoeff(featureExample,coeff)
print("before batch normalize...")
Matrix_Print(feature)
BN_BatchNorm(feature,feature,coeff)
print("after batch normalize...")
Matrix_Print(feature)

before batch normalize...
channel size = 3 size = 3 * columnSize 3
channel = 0
0.323364 0.212317 0.778605 
0.596198 0.312863 0.176205 
0.426114 0.429056 0.786290 
channel = 1
0.054830 0.992671 0.922197 
0.125630 0.188861 0.281913 
0.682305 0.057214 0.218566 
channel = 2
0.596378 0.186927 0.498418 
0.689648 0.517896 0.230297 
0.972134 0.488334 0.151764 
after batch normalize...
channel size = 3 size = 3 * columnSize 3
channel = 0
-0.183019 -0.473806 1.009072 
0.531423 -0.210516 -0.568369 
0.086043 0.093745 1.029198 
channel = 1
-0.958501 1.426956 1.247703 
-0.778418 -0.617585 -0.380901 
0.637522 -0.952438 -0.542028 
channel = 2
0.403633 -0.555184 0.174238 
0.622045 0.219849 -0.453625 
1.283545 0.150623 -0.637526 


void **BNBatch_BatchNorm(struct Matrix **input ,struct Matrix **output, struct BN *coeff )<br/>
</br>
功能 ：对该层整批特征矩阵进行批标准化<br/>

时间复杂度O(N) ：num * i * j * k<br/>

调用例子 ： BNBatch_BatchNorm(feature,ans,coeff)<br/>
如上所示将整批特征矩阵feature与coeff进行批标准化并将标准化后的值整批输入ans。<br/>

算法演示

In [18]:
def BNBatch_BatchNorm(input:list[Matrix] , output:list[Matrix] , coeff:BN):
    BATCH = 5 #批次大小建议宏定义
    for num in range (BATCH):
        BN_BatchNorm(input[num],output[num],coeff)
    

<b>以下皆为 前向传导时使用的函数算法分析</br>

void Front_ReLU (struct Matrix *input,struct Matrix *output);</br>

变量：   (Matrix *)input = 待激活的特征矩阵<br/>
        (Matrix *)output = 激活后的特征矩阵<br/>
功能：ReLU为激活函数即将负数为0<br/>

时间复杂度O(N) ：i * j * k <br/>

调用例子 ： Front_ReLU(feature,ans)<br/>
如上所示将feature进行激活并将结果存储在ans中。<br/>

算法演示

In [19]:
def Front_ReLU(input:Matrix,output:Matrix):
    for i in range(input.channelSize):
        for j in range(input.rowSize):
            for k in range(input.columnSize):
                if(input.feature[i][j][k] < 0):
                    output.feature[i][j][k] = 0 
                else:
                    output.feature[i][j][k] = input.feature[i][j][k]

In [20]:
feature = Matrix("Random",2,2,2)
feature.feature = [[[1, -1], [3, -4]], [[-6, 7], [9, -2]]]
ans = Matrix("Zero",2,2,2)
print("before ReLU ...")
Matrix_Print(feature)
Front_ReLU(feature,ans)
print("after ReLU ...")
Matrix_Print(ans)

before ReLU ...
channel size = 2 size = 2 * columnSize 2
channel = 0
1.000000 -1.000000 
3.000000 -4.000000 
channel = 1
-6.000000 7.000000 
9.000000 -2.000000 
after ReLU ...
channel size = 2 size = 2 * columnSize 2
channel = 0
1.000000 0.000000 
3.000000 0.000000 
channel = 1
0.000000 7.000000 
9.000000 0.000000 


void FrontBatch_ReLU (struct Matrix **input,struct Matrix **output);</br>

变量：   (Matrix **)input = 一批待激活的特征矩阵<br/>
        (Matrix **)output = 一批激活后的特征矩阵<br/>
功能：往往前传导时我们会一整批的数据一起处理，此函数为整批一起做激活<br/>

时间复杂度O(N) ：num * i * j * k <br/>

调用例子 ： Front_ReLU(feature,ans)<br/>
如上所示将整批feature进行激活并将结果整批地存储在ans中。<br/>

算法演示

In [21]:
def FrontBatch_ReLU(input:list[Matrix],output:list[Matrix]):
    BATCH = 5 #建议宏定义批次大小
    for num in range(BATCH):
        Front_ReLU(input[num],output[num])  
    

void Front_MaxPooLing(struct Matrix *input,struct Matrix *output,int poolingSize,int paddingAmt,int stride)
<br/>
变量：   (Matrix *)input = 待池化的特征矩阵<br/>
        (Matrix *)output = 池化后的特征矩阵<br/>
        (int) poolingSize = 池化大小 <br/>
        (int) paddingAmt = 填充大小 (所谓填充即在input中添加全为0的特征)<br/>
        (int) stride = 步长<br/>
功能：选取每个input池化区域的最大值并将答案存储在output中，paddingAmt为其填充大小，stride为其步长<br/>
output大小为:<br/>
kernel的数量 X 取整(input的行数 -2 * 填充大小 - 池化大小)/步长）+1 X 取整(input的例数 -2 * 填充大小 - 池化大小)/步长）+1<br/>

时间复杂度O(N) ：i * j * k * y * z<br/>

调用例子 ： Front_MaxPooling(feature,ans,3,1,2)<br/>
如上所示对feature进行填充为1，步长为2的最大池化并将结果存储在ans中。<br/>

算法演示

In [22]:
def Front_MaxPooling(input:Matrix, output:Matrix , poolingSize:int,  paddingAmt:int , stride:int):
    channel = input.channelSize
    outputSize = output.rowSize
    for i in range (channel):
        for j in range (outputSize):
            for k in range (outputSize):
                maximun = -9999 #设初始为一极小数保证池化区域内号码必大于初始数目
                for y in range (poolingSize):
                    for z in range (poolingSize):
                        rowLoc = j * stride + y - paddingAmt #用于确定其在input内的行索引
                        columnLoc = k * stride + z - paddingAmt #用于确定其在input内的例索引
                        #用于判断是否为padding区域若为padding区域不需要计算
                        if ((rowLoc < 0 or rowLoc >= input.rowSize ) or (columnLoc < 0 or columnLoc >= input.columnSize )):
                            if maximun < 0:
                                maximun = 0 #填充区域值
                            continue
                        #若位于该位置进行点乘并叠加到暂时变量
                        if(input.feature[i][j][k] > maximun):
                            maximun = input.feature[i][j][k]
                output.feature[i][j][k] = maximun #将最大值存到答案中

In [23]:
feature = Matrix("Random",1,5,5)
ans = Matrix("Zero",1,3,3)
print("before maxPooling ...")
Matrix_Print(feature)
Front_MaxPooling(feature,ans,3,1,2)
print("after maxPooling ...")
Matrix_Print(ans)


before maxPooling ...
channel size = 1 size = 5 * columnSize 5
channel = 0
0.574301 0.836090 0.571234 0.702342 0.956653 
0.174107 0.110991 0.494246 0.808413 0.186453 
0.882715 0.401943 0.700299 0.239343 0.275724 
0.884415 0.136321 0.090593 0.097804 0.257168 
0.025628 0.913057 0.491138 0.302521 0.388079 
after maxPooling ...
channel size = 1 size = 3 * columnSize 3
channel = 0
0.574301 0.836090 0.571234 
0.174107 0.110991 0.494246 
0.882715 0.401943 0.700299 


void FrontBatch_MaxPooLing(struct Matrix **input,struct Matrix **output,int poolingSize,int paddingAmt,int stride)
<br/>
变量：   (Matrix **)input = 整批待池化的特征矩阵<br/>
        (Matrix **)output = 整批池化后的特征矩阵<br/>
        (int) poolingSize = 池化大小 <br/>
        (int) paddingAmt = 填充大小 (所谓填充即在input中添加全为0的特征)<br/>
        (int) stride = 步长<br/>
功能：往往前传导时我们会一整批的数据一起处理，此函数为整批一起做最大池化<br/>

时间复杂度O(N) ：num * i * j * k * y * z<br/>

调用例子 ： Front_MaxPooling(feature,ans,3,1,2)<br/>
如上所示对整批feature进行填充为1，步长为2的最大池化并将结果存储在整批ans中。<br/>

算法演示

In [24]:
def FrontBatch_MaxPooling(input:list[Matrix],output:list[Matrix],poolingSize:int , paddingAmt:int , stride:int):
    BATCH = 5 #建议宏定义批次大小
    for num in range(BATCH):
        Front_MaxPooling(input[num],output[num],poolingSize,paddingAmt,stride)  

void FrontBatch_Convolution(struct Matrix **input,struct Matrix **kernel,struct Matrix **output ,int paddingAmt,int stride)
<br/>
变量：   (Matrix **)input = 整批待卷积的特征矩阵<br/>
        (Matrix **)kernel = 卷积核<br/>
        (Matrix **)output = 整批卷积后的特征矩阵<br/>
        (int) paddingAmt = 填充大小 (所谓填充即在input中添加全为0的特征)<br/>
        (int) stride = 步长<br/>
功能：往往前传导时我们会一整批的数据一起处理，此函数为整批特征矩阵对同一批卷积核一起做卷积<br/>

时间复杂度O(N) ：num * i * j * k * x * y * z<br/>

调用例子 ： FrontBatch_Convolution(feature,kernel,ans,1,2)<br/>
如上所示将整批feature对卷积核一批kernel进行填充为1，步长为2的卷积并将结果存储在整批ans中。<br/>

算法演示

In [25]:
def FrontBatch_Convolution(input:list[Matrix],kernel:list[Matrix],output:list[Matrix], paddingAmt:int , stride:int):
    BATCH = 5 #建议宏定义批次大小
    for num in range(BATCH):
        Matrix_Convolution(input[num],kernel,output[num],paddingAmt,stride)  

void Front_GlobalAverage(struct Matrix *input,struct Matrix *output)
<br/>
变量：   (Matrix *)input = 待全局平均池化的特征矩阵<br/>
        (Matrix *)output = 全局平均池化后的特征矩阵<br/>
功能：将每个通道内的特征值相加，并分通道求出每个通道的平均值，每个通道的平均值存入output的每一行<br/>

时间复杂度O(N) ：i * j * k <br/>

调用例子 ： Front_GlobalAverage(feature,ans)<br/>
如上所示对feature进行全局平均池化并将结果存储在ans中。<br/>

算法演示

In [26]:
def Front_GlobalAverage(input:Matrix,output:Matrix):
    featureSize = input.rowSize * input.columnSize
    for i in range(input.channelSize):
        temp = 0
        for j in range(input.rowSize):
            for k in range(input.columnSize):
                temp = temp + input.feature[i][j][k]
        output.feature[0][i][0] = temp/featureSize
                    

In [27]:
feature = Matrix("Random",3,2,2)
ans = Matrix("Zero",1,3,1)
print("before GlobalAverage ...")
Matrix_Print(feature)
Front_GlobalAverage(feature,ans)
print("after GlobalAverageg ...")
Matrix_Print(ans)

before GlobalAverage ...
channel size = 3 size = 2 * columnSize 2
channel = 0
0.847114 0.003265 
0.988987 0.152371 
channel = 1
0.279427 0.075017 
0.727833 0.619991 
channel = 2
0.881412 0.432683 
0.274481 0.931530 
after GlobalAverageg ...
channel size = 1 size = 3 * columnSize 1
channel = 0
0.497934 
0.425567 
0.630027 


void FrontBatch_GlobalAverage(struct Matrix **input,struct Matrix **output);</br>

变量：   (Matrix **)input = 一批待全局池化的特征矩阵<br/>
        (Matrix **)output = 一批全局池化的特征矩阵<br/>
功能：往往前传导时我们会一整批的数据一起处理，此函数为整批一起做全局平均池化<br/>

时间复杂度O(N) ：num * i * j * k <br/>

调用例子 ： FrontBatch_GlobalAverage(feature,ans)<br/>
如上所示将整批feature进行全局平均池化并将结果整批地存储在ans中。<br/>

算法演示

In [28]:
def FrontBatch_GlobalAverage(input:list[Matrix],output:list[Matrix]):
    BATCH = 5 #建议宏定义批次大小
    for num in range(BATCH):
        Front_GlobalAverage(input[num],output[num])  

void Front_FullConnect(struct Matrix *input,struct Matrix *weight,struct Matrix *output,struct Matrix *bias);</br>

变量：   (Matrix *)input = 待叉乘的特征矩阵<br/>
        (Matrix *)weight = 权重特征矩阵<br/>
        (Matrix *)output = 全连接后的特征矩阵<br/>
        (Matrix *)bias =  每个通道的偏差量
功能：全连接简单来说就是特征矩阵与权重矩阵的叉乘再加上偏差量<br/>

时间复杂度O(N) ： i * z * j * k + i * j * k <br/>

调用例子 ： Front_FullConnect(feature,weight,ans,bias)<br/>
如上所示将feature与weight进行叉乘再加上偏差量最后将结果存储在ans中。<br/>

算法演示

In [29]:
def Front_FullConnect(input:Matrix , weight:Matrix , output:Matrix , bias : Matrix):
    #input 尺寸 (1 X 上层通道数 X 1)
    #weight 尺寸 (1 X 需预测的分类数 X 上层通道数)
    #output 尺寸 (1 X 需预测的分类数 X 1)
    #bias 尺寸 (1 X 需预测的分类数 X 1)
    Matrix_Multiply(input,weight,output) #先进行叉乘并存储在output中
    Matrix_Sum(output,bias,output) #再加上偏差量

void FrontBatch_FUllConnct(struct Matrix **input,struct Matrix *weight,struct Matrix **output,struct Matrix *bias)
</br>

变量：   (Matrix **)input = 整批待叉乘的特征矩阵<br/>
        (Matrix *)weight = 权重特征矩阵<br/>
        (Matrix **)output = 整批全连接后的特征矩阵<br/>
        (Matrix *)bias =  每个通道的偏差量
功能：往往前传导时我们会一整批的数据一起处理，此函数为整批特征矩阵对同一权重与偏差量做全连接<br/>

时间复杂度O(N) ：num * （i * z * j * k + i * j * k） <br/>

调用例子 ： FrontBatch_FUllConnct(feature,weight,ans,bias)<br/>
如上所示将整批feature与weight进行叉乘再加上偏差量最后将结果存储在整批ans中。<br/>

算法演示

In [30]:
def FrontBatch_FUllConnct(input:list[Matrix] , weight:Matrix , output:list[Matrix] , bias : Matrix):
    BATCH = 5 #建议宏定义批次大小
    for num in range(BATCH):
        Front_FullConnect(input[num],weight,output[num],bias) 

void Front_Softmax(struct Matrix *input,struct Matrix *output)
<br/>
变量：   (Matrix *)input = 待激活的特征矩阵<br/>
        (Matrix *)output = 激活后的特征矩阵<br/>
功能：将每行内的特征值变成e的指数，并分通道求出每行的平均值，每行的百分比存入output的每一行，其目的为方便求出预测值<br/>

时间复杂度O(N) ： j * j  <br/>

调用例子 ： Front_Softmax(feature,ans)<br/>
如上所示对feature进行softmax激活并将结果存储在ans中。<br/>

算法演示

In [31]:
import math

def Front_Softmax(input:Matrix , output:Matrix ):
    size = input.rowSize
    summation = 0 #用于储存所有特征的和，以计算激活后的值
    for j in range (size):
        temp = math.exp(input.feature[0][j][0])
        summation = summation + temp 
        output.feature[0][j][0] = temp #待取百分比
    for j in range (size):
        output.feature[0][j][0] = output.feature[0][j][0]/summation #转换为百分比

In [32]:
feature = Matrix("Random",1,5,1)
feature.feature = [[[1], [2], [3] , [4] , [5]]]
ans = Matrix("Zero",1,5,1)
print("before Softmax ...")
Matrix_Print(feature)
Front_Softmax(feature,ans)
print("after Softmaxg ...")
Matrix_Print(ans) #其和必为1

before Softmax ...
channel size = 1 size = 5 * columnSize 1
channel = 0
1.000000 
2.000000 
3.000000 
4.000000 
5.000000 
after Softmaxg ...
channel size = 1 size = 5 * columnSize 1
channel = 0
0.011656 
0.031685 
0.086129 
0.234122 
0.636409 


void FrontBatch_Softmax(struct Matrix **input,struct Matrix **output)
</br>

变量：   (Matrix **)input = 整批待激活的特征矩阵<br/>
        (Matrix **)output = 整批激活后的特征矩阵<br/>
功能：往往前传导时我们会一整批的数据一起处理，此函数为整批特征矩阵做softmax激活<br/>

时间复杂度O(N) ：num * j * j <br/>

调用例子 ： FrontBatch_Softmax(feature,ans)<br/>
如上所示将整批feature进行激活最后将结果存储在整批ans中。<br/>

算法演示

In [33]:
def FrontBatch_Softmax(input:list[Matrix] , output:list[Matrix] ):
    BATCH = 5 #建议宏定义批次大小
    for num in range(BATCH):
        Front_Softmax(input[num],output[num]) 

int Front_Predict(struct Matrix *input)
<br/>
变量：   (Matrix *)input = softmax后的特征矩阵<br/>
功能：找出特征矩阵最大值，所在的索引行则为预测答案<br/>

时间复杂度O(N) ： j  <br/>

调用例子 ： Front_Predict(feature)<br/>
如上所示对feature取最大值，并返回其索引。<br/>

算法演示

In [34]:
def Front_Predict(input:Matrix):
    maximun = 0 
    index = 0 
    for j in range(input.rowSize):
        if(input.feature[0][j][0] > maximun):
            maximun = input.feature[0][j][0]
            index = j #最大值的索引
    return index

In [35]:
feature = ans
print("before predict ...")
Matrix_Print(feature)
print("predict value")
Front_Predict(feature)

before predict ...
channel size = 1 size = 5 * columnSize 1
channel = 0
0.011656 
0.031685 
0.086129 
0.234122 
0.636409 
predict value


4

void FrontBatch_Predict(struct Matrix **input, int *output)
<br/>
变量：   (Matrix **)input = 整批softmax后的特征矩阵<br/>
        (int *)output = 整批的预测答案<br/>
功能：逐个进行预测并将每个的预测结果存储到output中<br/>

时间复杂度O(N) ： num * j  <br/>

调用例子 ： FrontBatch_Predict(feature,ans)<br/>
如上所示对整批feature分别取最大值，并分别将其索引存储到ans中。<br/>

算法演示

In [36]:
def FrontBatch_Predict(input:list[Matrix],output:list[int]):
    BATCH = 5 #建议宏定义批次大小
    for num in range(BATCH):
        output[num] = Front_Predict(input[num],output[num]) 

float FrontBatch_Accurancy(int *input,int *testCaseLabel,int index)</br>

<br/>
变量： (int *)input = 整批的预测答案<br/>
       (int *)testCaseLabel = 整批的实际答案<br/>
       (int ) index = 实际答案的索引位置<br/>
功能：逐个进行对比，最后反馈正确率<br/>

时间复杂度O(N) ： num  <br/>

调用例子 ： FrontBatch_Accurancy(predict,testCaseAns,index)<br/>
如上所示对整批predict与index索引到的testcaseAns整批做对比，并输出其误差。<br/>

算法演示

In [37]:
def FrontBatch_Accurancy(input:list[int],testCaseLabel:list[int],index:int):
    BATCH = 5 #建议宏定义批次大小
    correct = 0 #use to count the correct time
    start = index*BATCH #索引开始位置
    for num in range(BATCH):
        if(input[num] != testCaseLabel[start+num]):
            correct = correct + 1
    return correct/BATCH

<b>以下皆为 后向传导(Back propagation)时使用的函数算法分析</br>

void Back_ToZero(float *gradient,int size)</br>

<br/>
变量： (float *)gradient = 一列待归零的梯度值<br/>
       (int)size = 列的大小<br/>
功能：逐列进行归零操作<br/>

时间复杂度O(N) ： i  <br/>

调用例子 ： Back_ToZero(gradient,size)<br/>
如上所示对大小为size的gradient进行归零操作。<br/>

算法演示

In [38]:
def Back_ToZero(gradient:list[float],size:int):
    for i in range(size):
        gradient[i] = 0

In [39]:
gradient = [0.1,0.2,0.3,0.4,-0.1]
size = 5
print("gradient before to zero ...")
print(gradient)
Back_ToZero(gradient,size)
print("gradient after to zero ...")
print(gradient)

gradient before to zero ...
[0.1, 0.2, 0.3, 0.4, -0.1]
gradient after to zero ...
[0, 0, 0, 0, 0]


void Back_Descent(struct Matrix *kernel,struct Matrix *gradient)</br>

<br/>
变量： (struct Matrix*)kernel = 待梯度递减的卷积核<br/>
       (struct Matrix*)gradient = 卷积核的梯度<br/>
功能：对该卷积核进行梯度递减即更新卷积核<br/>

时间复杂度O(N) ： i * x * y  <br/>

调用例子 ： Back_Descent(kernel,gradient)<br/>
如上所示根据gradient中每个卷积核中的梯度，逐个进行递减。<br/>

算法演示

In [40]:
def Back_Descent(kernel:Matrix ,gradient:Matrix ):
    learningRate = 0.001 # 学习率建议宏定义
    decayWeight = 0.1 # 衰变权重建议宏定义
    BATCH = 5  #批次大小建议宏定义
    decay = (1-learningRate*decayWeight/BATCH) #为衰变函数直接计算并存储为一个变量减少计算时间
    learning = (learningRate/BATCH)  #为递减函数直接计算并存储为一个变量减少计算时间
    channel = kernel.channelSize
    row = kernel.rowSize
    column = kernel.columnSize
    for i in range(channel):
        for x in range(row):
            for y in range(column):
                kernel.feature[i][x][y] = kernel.feature[i][x][y]*decay - learning*gradient.feature[i][x][y]

In [41]:
kernel = Matrix("Random",2,2,2)
kernel.feature = [[[1, -1], [3, -4]], [[-6, 7], [9, -2]]]
gradient = Matrix("Random",2,2,2)
print("before descent...")
Matrix_Print(kernel)
Back_Descent(kernel,gradient)
print("after descent...")
Matrix_Print(kernel)

before descent...
channel size = 2 size = 2 * columnSize 2
channel = 0
1.000000 -1.000000 
3.000000 -4.000000 
channel = 1
-6.000000 7.000000 
9.000000 -2.000000 
after descent...
channel size = 2 size = 2 * columnSize 2
channel = 0
0.999901 -0.999980 
2.999833 -4.000009 
channel = 1
-5.999983 6.999709 
8.999814 -2.000032 


void Back_BatchNorm_Descent(struct BN *weight,float *gradientBeta,float *gradientGamma)</br>

<br/>
变量： (struct BN*)weight = 待梯度递减的批标准化参数<br/>
       (float **)gradientBeta = 各通道的Beta梯度<br/>
       (float **)gradientGamma = 各通道的Gamma梯度<br/>
功能：对该批标准化参数进行梯度递减即更新批标准化参数<br/>

时间复杂度O(N) ： i <br/>

调用例子 ： Back_BatchNorm_Descent(coeff,gradientBata，gradientGamma)<br/>
如上所示根据gradientBeta核gradientGamma中每个通道的梯度，对coeff逐个进行递减。<br/>

算法演示

In [42]:
def Back_BatchNorm_Descent( weight:BN , gradientBeta:list[float],gradientGamma:list[float]):
    learningRate = 0.001 # 学习率建议宏定义
    decayWeight = 0.1 # 衰变权重建议宏定义
    BATCH = 5  #批次大小建议宏定义
    decay = (1-learningRate*decayWeight/BATCH) #为衰变函数直接计算并存储为一个变量减少计算时间
    learning = (learningRate/BATCH)  #为递减函数直接计算并存储为一个变量减少计算时间
    channel= weight.channelSize
    for i in range(channel):
        weight.beta[i] = weight.beta[i]*decay - learning*gradientBeta[i]
        weight.gamma[i] = weight.gamma[i]*decay - learning*gradientGamma[i]


In [43]:
weight = BN(5)
weight.beta = [1,2,3,4,5]
weight.gamma = [9,7,6.8,7.2,3.6]
gradientBeta = [0.1,0.2,0.3,-0.4,-0.5]
gradientGamma = [0.1,0.2,0.3,-0.4,-0.5]
print("before descent...")
print("beta:",weight.beta)
print("gamma:",weight.gamma)
Back_BatchNorm_Descent(weight,gradientBeta,gradientGamma)
print("after descent...")
print("beta:",weight.beta)
print("gamma:",weight.gamma)

before descent...
beta: [1, 2, 3, 4, 5]
gamma: [9, 7, 6.8, 7.2, 3.6]
after descent...
beta: [0.99996, 1.99992, 2.99988, 4.0, 5.0]
gamma: [8.9998, 6.99982, 6.799803999999999, 7.199935999999999, 3.600028]


void Back_CostFunction(struct Matrix *input,int testCaseLabel,struct Matrix *output)</br>

<br/>
变量： (struct Matrix*)input  = softmax激活后的特征矩阵<br/>
       (int)testCaseLabel = 该图片的实际答案<br/>
       (struct Matrix*)output  = 全连接层的梯度<br/>
功能：该层梯度为input且实际答案所在的行的值需要-1，详情计算见梯度推导算法<br/>

时间复杂度O(N) ： j <br/>

调用例子 ： Back_CostFunction(feature,2，gradient)<br/>
如上所示根据将input的值赋值给gradient并在第二行进行-1的操作<br/>

算法演示

In [44]:
def Back_CostFunction(input:Matrix , testCaseLabel:int , output:Matrix):
    #input实际尺寸为 (1 X predictSize X 1)
    #output实际尺寸为 (1 X predictSize X 1)
    row = input.rowSize ; 
    for j in range(row):
        output.feature[0][j][0] = input.feature[0][j][0]
    output.feature[0][testCaseLabel][0] = output.feature[0][testCaseLabel][0] - 1

In [45]:
feature = Matrix("Random",1,5,1)
ans = Matrix("Zero",1,5,1)
testCaseLabel = 3
print("the feature value...")
Matrix_Print(feature)
Back_CostFunction(feature,testCaseLabel,ans)
print("the gradient value...")
Matrix_Print(ans)

the feature value...
channel size = 1 size = 5 * columnSize 1
channel = 0
0.337729 
0.819477 
0.616194 
0.097791 
0.940982 
the gradient value...
channel size = 1 size = 5 * columnSize 1
channel = 0
0.337729 
0.819477 
0.616194 
-0.902209 
0.940982 


void Gradient_CostFunction(struct Matrix **input,int *testCaseLabel , int index , struct Matrix **output)</br>

<br/>
变量： (struct Matrix**)input  = 整批softmax激活后的特征矩阵<br/>
       (int *)testCaseLabel = 整批的实际答案<br/>
       (int ) index = 实际答案的索引位置<br/>
       (struct Matrix**)output  = 整批全连接层的梯度<br/>
功能：整批的全连接层梯度计算<br/>

时间复杂度O(N) ： num * j <br/>

调用例子 ： Gradient_CostFunction(feature,testCaseLabel,2，gradient)<br/>
如上所示根据将整批input的梯度计算后存储到gradient<br/>

算法演示

In [46]:
def Gradient_CostFunction(input:list[Matrix] , testCaseLabel:list[int] ,index:int, output:list[Matrix]):
    BATCH = 5  #批次大小建议宏定义
    start = index*BATCH #索引位置
    for num in range(BATCH):
        Back_CostFunction(input[num],testCaseLabel[start],output[num])
        start = start + 1

void Back_FullConnect_Bias(struct Matrix *lastTermGradeint,struct Matrix *gradient)</br>
<br/>
变量： (struct Matrix *)lastTermGradeint  = 上一层的梯度<br/>
       (struct Matrix *)gradeint  = 偏差值的梯度<br/>
功能：全连接层中偏差值的梯度为上一层的梯度，详情计算见梯度推导算法<br/>

时间复杂度O(N) ： j <br/>

调用例子 ： Back_FullConnect_Bias(lastTermGradeint,gradient)<br/>
如上所示根据将上一层的梯度存储到gradient<br/>

算法演示

In [47]:
def Back_FullConnect_Bias(lastTermGradient:Matrix,gradient:Matrix):
    Matrix_Sum(gradient,lastTermGradient,gradient)

Gradient_FullConnect_Bias(struct Matrix **lastTermGradeint,struct Matrix *bias,struct Matrix *gradient)</br>
<br/>
变量： (struct Matrix **)lastTermGradeint  = 上一层的梯度<br/>
       (struct Matrix *)bias  = 偏差值的矩阵<br/>
       (struct Matrix *)gradeint  = 偏差值的梯度<br/>
功能：先计算整批的偏差值梯度再对其进行梯度递减<br/>

时间复杂度O(N) ：2 * num * j <br/>

调用例子 ： Gradient_FullConnect_Bias(lastTermGradeint,bias,gradient)<br/>
如上所示根据将上一层整批的梯度存储到gradient中，并对bias进行梯度递减<br/>

算法演示

In [48]:
def Gradient_FullConnect_Bias(lastTermGradient:list[Matrix],bias:Matrix,gradient:Matrix):
    BATCH = 5  #批次大小建议宏定义
    Matrix_ToZero(gradient)
    for num in range(BATCH):
        Back_FullConnect_Bias(lastTermGradient[num],gradient)
    Back_Descent(bias,gradient)
        

void Back_FullConnect_Weight(struct Matrix *lastTermGradient ,struct Matrix *variable, struct Matrix *gradient)
</br>
<br/>
变量： (struct Matrix *)lastTermGradient  = 上一层的梯度<br/>
       (struct Matrix *)variable  = 全连接层的特征矩阵<br/>
       (struct Matrix *)gradient  = 全连接层权重的梯度<br/>
功能：全连接层中权重的梯度即为上层梯度与全连接层特征矩阵的转置的叉乘，详情计算见梯度推导算法<br/>

时间复杂度O(N) ： j * k<br/>

调用例子 ： Back_FullConnect_Weight(lastTermGradient,featureFC,gradient)<br/>
如上所示根据将上一层的梯度与featureFC的转置进行叉乘并将累加到gradient<br/>

算法演示

In [49]:
def Back_FullConnect_Weight(lastTermGradient:Matrix,variable:Matrix,gradient:Matrix):
    #lastTermGradient = (1 X predictSize X 1)
    #variable =  (1 X rowSize X 1)
    #gradient = (1 X predictSize X rowSize)
    row = lastTermGradient.rowSize
    column = variable.rowSize
    for j in range(row):
        for k in range(column):
            gradient.feature[0][j][k] += (lastTermGradient.feature[0][j][0]*variable.feature[0][k][0])


void Gradient_FullConnect_Weight(struct Matrix **lastTermGradient ,struct Matrix **variable,struct Matrix *weight, struct Matrix *gradient)</br>
<br/>
变量： (struct Matrix **)lastTermGradient  = 整批上一层的梯度<br/>
       (struct Matrix **)variable  = 整批全连接层的特征矩阵<br/>
       (struct Matrix *)gradient  = 全连接层权重的梯度<br/>
功能：先计算整批的全连接层权重梯度再对其进行梯度递减<br/>

时间复杂度O(N) ：2 * num * j * k <br/>

调用例子 ： Gradient_FullConnect_Weight(lastTermGradient,variable,weight,gradient)<br/>
如上所示根据将上一层的梯度与featureFC的转置进行叉乘并将其加到gradient中,最后对weight进行梯度递减<br/>

算法演示

In [50]:
def Gradient_FullConnect_Weight(lastTermGradient:list[Matrix],variable:list[Matrix],weight:Matrix,gradient:Matrix):
    BATCH = 5  #批次大小建议宏定义
    Matrix_ToZero(gradient)
    for num in range(BATCH):
        Back_FullConnect_Weight(lastTermGradient[num],variable[num],gradient)
    Back_Descent(weight,gradient)

    

void Back_FullConnect_Variable(struct Matrix *lastTermGradient ,struct Matrix *weight, struct Matrix *gradient)
</br>
<br/>
变量： (struct Matrix *)lastTermGradient  = 上一层的梯度<br/>
       (struct Matrix *)variable  = 全连接层的权重<br/>
       (struct Matrix *)gradient  = 全连接层的梯度<br/>
功能：全连接层的梯度即为全连接层权重的转置上层梯度与上一层梯度的叉乘，详情计算见梯度推导算法<br/>

时间复杂度O(N) ： j * k<br/>

调用例子 ： Back_FullConnect_Variable(lastTermGradient,weightFC,gradient)<br/>
如上所示根据将weightFC的转置与上一层的梯度进行叉乘并将存储到gradient<br/>

算法演示

In [51]:
def Back_FullConnect_Variable(lastTermGradient:Matrix,weight:Matrix,gradient:Matrix): 
    #lastTermGradient = (1 X predictSize X 1)
    #gradient =  (1 X rowSize X 1)
    #weight = (1 X predictSize X rowSize)
    row = lastTermGradient.rowSize   #predictsize
    column = weight.columnSize   # rowsize
    for j in range(row):
        for k in range(column):
            gradient.feature[0][k][0] += (lastTermGradient.feature[0][j][0] * weight.feature[0][j][k])


void Gradient_FullConnect_Variable(struct Matrix **lastTermGradient ,struct Matrix *weight, struct Matrix **gradient)</br>
<br/>
变量： (struct Matrix **)lastTermGradient  = 整批上一层的梯度<br/>
       (struct Matrix *)weight  = 全连接层的权重<br/>
       (struct Matrix **)gradient  = 整批全连接层的梯度<br/>
功能：计算整批的全连接层梯度<br/>

时间复杂度O(N) ：num * j * k <br/>

调用例子 ： Gradient_FullConnect_Variable(lastTermGradient,weight,gradient)<br/>
如上所示根据将weightFC的转置与整批上一层的梯度进行叉乘并将存储到gradient<br/>

算法演示

In [52]:
def Gradient_FullConnect_Variable(lastTermGradient:list[Matrix],weight,gradient:list[Matrix]):
    BATCH = 5  #批次大小建议宏定义
    for num in range(BATCH):
            Matrix_ToZero(gradient[num])
            Back_FullConnect_Variable(lastTermGradient[num],weight,gradient[num])
            

void Gradient_FullConnect(struct Matrix **lastTermGradient ,struct Matrix *bias ,struct Matrix *gradientBias ,struct Matrix *weight , struct Matrix **variable , struct Matrix *gradientWeight
,struct Matrix **gradient) </br>
<br/>
变量： (struct Matrix **)lastTermGradient  = 整批上一层的梯度<br/>
       (struct Matrix *)bias  = 偏差值的矩阵<br/>
       (struct Matrix *)gradeintBias  = 偏差值的梯度<br/>
       (struct Matrix *)weight  = 全连接层的权重<br/>
       (struct Matrix **)variable  = 整批全连接层的特征矩阵<br/>
       (struct Matrix *)gradientWeight  = 全连接层权重的梯度<br/>
       (struct Matrix **)gradient  = 整批全连接层的梯度<br/>
功能：计算整批的全连接层梯度,以及完成偏差矩阵以及权重矩阵的梯度递减<br/>
时间复杂度O(N) ：3 * num * j * k + 2 * num * j <br/>
       

调用例子 ： Gradient_FullConnect(lastTermGradient,bias,gradientBias,weight,variable,gradientWeight,gradient)<br/>
如上所示根据将计算gradient，并使用gradientBias及gradientWeight对bias及weight进行梯度递减<br/>

算法演示


In [53]:
def Gradient_FullConnectGradient_FullConnect(lastTermGradient:list[Matrix],bias:Matrix,gradientBias:Matrix,weight:Matrix,variable:list[Matrix],gradientWeight:Matrix,gradient:list[Matrix]):
    Gradient_FullConnect_Variable(lastTermGradient,weight,gradient)
    #先计算下一层的梯度，因为权重的更新会影响梯度的计算，反之不会
    Gradient_FullConnect_Bias(lastTermGradient,bias,gradientBias)
    Gradient_FullConnect_Weight(lastTermGradient,variable,weight,gradientWeight)


void Back_GlobalAverage(struct Matrix *lastTermGradient ,struct Matrix *variable, struct Matrix *gradient)</br>
<br/>
变量： (struct Matrix *)lastTermGradeint  = 上一层的梯度<br/>
       (struct Matrix *)variable  = 全局平均池化前的特征矩阵<br/>
       (struct Matrix *)gradeint  = 全局平均池化层的梯度<br/>
功能：全局平均池化层的梯度为上一层的梯度分别除于（池化前矩阵的大小），详情计算见梯度推导算法<br/>

时间复杂度O(N) ： i * j * k <br/>

调用例子 ： Back_GlobalAverage(lastTermGradeint,variable,gradient)<br/>
如上所示根据将上一层的梯度进行计算并存储到gradient<br/>

算法演示

In [54]:
def Back_GlobalAverage(lastTermGradient:Matrix,variable:Matrix,gradient:Matrix):
    channel = lastTermGradient.rowSize
    row = variable.rowSize
    column = variable.columnSize
    size = 1.0/row*column
    for i in range(channel):
        for j in range(row):
            for k in range(column):
                gradient.feature[i][j][k] = lastTermGradient.feature[0][i][0] * size

void Gradient_GlobalAverage(struct Matrix **lastTermGradient ,struct Matrix **variable, struct Matrix **gradient)</br>
<br/>
变量： (struct Matrix **)lastTermGradeint  = 整批上一层的梯度<br/>
       (struct Matrix **)variable  = 整批全局平均池化前的特征矩阵<br/>
       (struct Matrix **)gradeint  = 整批全局平均池化层的梯度<br/>
功能：整批的计算全局平均池化层的梯度<br/>

时间复杂度O(N) ： num * i * j * k <br/>

调用例子 ： Gradient_GlobalAverage(lastTermGradeint,variable,gradient)<br/>
如上所示根据将整批上一层的梯度进行计算并存储到gradient<br/>

算法演示

In [55]:
def Gradient_GlobalAverage(lastTermGradient:list[Matrix],variable:list[Matrix],gradient:list[Matrix]):
    BATCH = 5  #批次大小建议宏定义
    for num in range(BATCH):
        Back_GlobalAverage(lastTermGradient[num],variable[num],gradient[num])

void Back_MaxPooling(struct Matrix *lastTermGradient ,struct Matrix *variable ,struct Matrix *output,int poolingSize,int paddingAmt,int stride,struct Matrix *gradient)</br>
<br/>
变量： (struct Matrix *)lastTermGradeint  = 上一层的梯度<br/>
       (struct Matrix *)variable  = 最大池化前的特征矩阵<br/>
       (struct Matrix *)output  = 最大池化后的特征矩阵<br/>
       (int) poolingSize = 池化大小 <br/>
       (int) paddingAmt = 填充大小 (所谓填充即在input中添加全为0的特征)<br/>
       (int) stride = 步长<br/>
       (struct Matrix *)gradeint  = 最大池化层的梯度<br/>
功能：最大池化层的梯度在最大值的格中的值为上一层的梯度其余为0，详情计算见梯度推导算法<br/>

时间复杂度O(N) ： i * j * k * y * z<br/>

调用例子 ： Back_MaxPooling(lastTermGradient ,variable ,output,poolingSize,paddingAmt, stride,gradient)<br/>
如上所示根据将上一层的梯度进行计算并存储到gradient<br/>

算法演示

In [56]:
def Back_MaxPooling(lastTermGradient:Matrix ,variable:Matrix ,output:Matrix,poolingSize:int,paddingAmt:int, stride:int,gradient:Matrix):
    channel = variable.channelSize
    size = lastTermGradient.rowSize
    for i in range(channel):
        for j in range(size):
            for k in range(size):
                maximun =  output.feature[i][j][k]
                for y in range(poolingSize):
                    for z in range(poolingSize):
                        rowLoc = j*stride+y-paddingAmt
                        columnLoc = k*stride+z-paddingAmt
                        if ((rowLoc < 0 or rowLoc >= variable.rowSize ) or (columnLoc < 0 or columnLoc >= variable.columnSize)):
                            continue #填充区域跳过
                        if(variable.feature[i][rowLoc][columnLoc] == maximun):
                            gradient.feature[i][rowLoc][columnLoc] = lastTermGradient.feature[i][j][k] #最大值区域
                        else:
                            gradient.feature[i][rowLoc][columnLoc] = 0 #非最大值区域

void Gradient_MaxPooling(struct Matrix **lastTermGradient ,struct Matrix **variable ,struct Matrix **output,int poolingSize,int paddingAmt,int stride,struct Matrix **gradient);</br>
<br/>
变量： (struct Matrix **)lastTermGradeint  = 整批上一层的梯度<br/>
       (struct Matrix **)variable  = 整批最大池化前的特征矩阵<br/>
       (struct Matrix **)output  = 整批最大池化后的特征矩阵<br/>
       (int) poolingSize = 池化大小 <br/>
       (int) paddingAmt = 填充大小 (所谓填充即在input中添加全为0的特征)<br/>
       (int) stride = 步长<br/>
       (struct Matrix *)gradeint  = 整批最大池化层的梯度<br/>
功能：整批的进行最大池化层的梯度计算<br/>

时间复杂度O(N) ： num * i * j * k * y * z<br/>

调用例子 ： BGradient_MaxPooling(lastTermGradient ,variable ,output,poolingSize,paddingAmt, stride,gradient)<br/>
如上所示根据将整批上一层的梯度进行计算并存储到gradient中<br/>

算法演示

In [57]:
def Gradient_MaxPooling(lastTermGradient:list[Matrix] ,variable:list[Matrix] ,output:list[Matrix],poolingSize:int,paddingAmt:int, stride:int,gradient:list[Matrix]):
    BATCH = 5  #批次大小建议宏定义
    for num in range(BATCH):
        Back_MaxPooling(lastTermGradient[num],variable[num],output[num],poolingSize,paddingAmt,stride,gradient[num])

void Back_ReLU(struct Matrix *lastTermGradient , struct Matrix *variable, struct Matrix *gradient)</br>
<br/>
变量： (struct Matrix *)lastTermGradeint  = 上一层的梯度<br/>
       (struct Matrix *)variable  = 激活前的特征矩阵<br/>
       (struct Matrix *)gradeint  = 激活层的梯度<br/>
功能：激活层的梯度为variable值大于0为1其余为0，详情计算见梯度推导算法<br/>

时间复杂度O(N) ： i * j * k <br/>

调用例子 ： Back_ReLU(lastTermGradient ,variable ,gradient)<br/>
如上所示根据将上一层的梯度进行计算并存储到gradient<br/>

算法演示

In [58]:
def Back_ReLU(lastTermGradient:Matrix ,variable:Matrix ,gradient:Matrix):
    for i in range(variable.channelSize):
        for j in range(variable.rowSize):
            for k in range(variable.columnSize):
                if(variable.feature[i][j][k] > 0):
                    gradient.feature[i][j][k] = lastTermGradient.feature[i][j][k] 
                else:
                    gradient.feature[i][j][k] = 0

void Gradient_ReLU(struct Matrix **lastTermGradient , struct Matrix **variable, struct Matrix **gradient)</br>
<br/>
变量： (struct Matrix **)lastTermGradeint  = 整批上一层的梯度<br/>
       (struct Matrix **)variable  = 整批激活前的特征矩阵<br/>
       (struct Matrix **)gradeint  = 整批激活层的梯度<br/>
功能：整批进行ReLU激活层梯度的计算<br/>

时间复杂度O(N) ： num * i * j * k <br/>

调用例子 ： Gradient_ReLU(lastTermGradient ,variable ,gradient)<br/>
如上所示根据将整批上一层的梯度进行计算并存储到gradient<br/>

算法演示

In [59]:
def Gradient_ReLU(lastTermGradient:list[Matrix] ,variable:list[Matrix] ,gradient:list[Matrix]):
    BATCH = 5  #批次大小建议宏定义
    for num in range(BATCH):
        Back_ReLU(lastTermGradient[num],variable[num],gradient[num])

void Back_BatchNorm_Variable(struct Matrix *lastTermGradient , struct Matrix *variable,struct BN *coeff,struct Matrix *gradient)</br>
<br/>
变量： (struct Matrix *)lastTermGradeint  = 上一层的梯度<br/>
       (struct Matrix *)variable  = 批标准化前的特征矩阵<br/>
       (struct BN *)coeff         = 该层批标准化的参数
       (struct Matrix *)gradeint  = 批标准化层的梯度<br/>
功能：计算批标准化层的梯度，详情计算见梯度推导算法<br/>

时间复杂度O(N) ： 2 * i * j * k <br/>

调用例子 ： Back_BatchNorm_Variable(lastTermGradient ,variable, coeff ,gradient)<br/>
如上所示根据将上一层的梯度进行计算并存储到gradient<br/>

算法演示

In [60]:
def Back_BatchNorm_Variable(lastTermGradient:Matrix ,variable:Matrix,coeff:BN ,gradient:Matrix):
    EPSILON = 0.1 #建议宏定义
    #size of gradient = size of last term gradient
    #variable = the value before batchNorm
    channel = lastTermGradient.channelSize
    row = lastTermGradient.rowSize
    column = lastTermGradient.columnSize
    size = 1/(row*column) # 1/m
    for i in range(channel):
        gradient_Mean  = 0  #dl/dmuse =  -(summation(dl/dxhat))/para1 
        gradient_Variance = 0  #dl/dsigma = summation((dl/dxhat)*(mean - variable)/para2)
        gamma = coeff.gamma[i]
        mean = coeff.mean[i]
        para1 = 1/((coeff.variance[i] + EPSILON)**0.5) #1/(sigma^2+epsilon)^0.5
        para2 = 1/(2*(para1**3)) #1/2(sigma^2+epsilon)^1.5
        for j in range (row):
            for k in range (column):
                temp = lastTermGradient.feature[i][j][k]*gamma
                gradient.feature[i][j][k] = temp 
                gradient_Mean -= temp
                gradient_Variance += (temp*(mean - variable.feature[i][j][k]))
        gradient_Mean = gradient_Mean/para1
        gradient_Variance =gradient_Variance/para2
        for j in range (row):
            for k in range (column):
                gradient.feature[i][j][k] = gradient_Mean * size + gradient_Variance * 2 * size * (variable.feature[i][j][k]) + gradient.feature[i][j][k] * para1 


void Gradient_BatchNorm_Variable(struct Matrix **lastTermGradient , struct Matrix **variable,struct BN *coeff,struct Matrix **gradient)</br>
<br/>
变量： (struct Matrix **)lastTermGradeint  = 整批上一层的梯度<br/>
       (struct Matrix **)variable  = 整批批标准化前的特征矩阵<br/>
       (struct BN *)coeff          = 该层批标准化的参数
       (struct Matrix **)gradeint  = 整批批标准化层的梯度<br/>
功能：整批计算批标准化层的梯度，详情计算见梯度推导算法<br/>

时间复杂度O(N) ： num * 2 * i * j * k <br/>

调用例子 ： Gradient_BatchNorm_Variable(lastTermGradient ,variable, coeff ,gradient)<br/>
如上所示根据将整批上一层的梯度进行计算并存储到gradient<br/>

算法演示

In [61]:
def Gradient_BatchNorm_Variable(lastTermGradient:list[Matrix] ,variable:list[Matrix],coeff:BN ,gradient:list[Matrix]):
    BATCH = 5  #批次大小建议宏定义
    for num in range(BATCH):
        Back_BatchNorm_Variable(lastTermGradient[num],variable[num],coeff,gradient[num])

void Back_BatchNorm_Weight(struct Matrix *lastTermGradient ,struct BN *coeff, struct Matrix *variable,float *gradientBeta,float *gradientGamma)</br>
<br/>
变量： (struct Matrix *)lastTermGradeint  = 上一层的梯度<br/>
       (struct BN *)coeff         = 该层批标准化的参数
       (struct Matrix *)variable  = 批标准化后的特征矩阵<br/>
       (float *)gradeintBeta  = 批标准化层Beta的梯度<br/>
       (float *)gradeintGamma  = 批标准化层Gamma的梯度<br/>
功能：计算批标准化层的权重Beta和Gamma的梯度，详情计算见梯度推导算法<br/>

时间复杂度O(N) ： i * j * k <br/>

调用例子 ： Back_BatchNorm_Weight(lastTermGradient , coeff , variable ,gradientBeta ,gradientGamma)<br/>
如上所示根据将上一层的梯度进行计算权重的梯度并将结果累加gradientBeta和gradientGamma中<br/>

算法演示

In [62]:
def Back_BatchNorm_Weight(lastTermGradient:Matrix , coeff:BN , variable:Matrix ,gradientBeta:list[float] ,gradientGamma:list[float]):
    channel = lastTermGradient.channelSize
    row = lastTermGradient.rowSize
    column = lastTermGradient.columnSize
    for i in range(channel):
        beta = coeff.beta[i]
        gamma = coeff.gamma[i]
        for j in range (row):
            for k in range (column):
                gradientBeta[i] += lastTermGradient.feature[i][j][k]
                #variable[i][j]][k] = Gamma[i] * variableHat[i][j][k] + Beta[i]
                variable_Hat = (variable.feature[i][j][k] - beta)/gamma 
                gradientGamma[i] += (lastTermGradient.feature[i][j][k] * variable_Hat)

void Gradient_BatchNorm_Weight(struct Matrix **lastTermGradient ,struct BN **coeff, struct Matrix **variable,float **gradientBeta,float **gradientGamma)</br>
<br/>
变量： (struct Matrix **)lastTermGradeint  = 整批上一层的梯度<br/>
       (struct BN *)coeff         = 该层批标准化的参数
       (struct Matrix **)variable  = 整批批标准化后的特征矩阵<br/>
       (float *)gradeintBeta  = 批标准化层Beta的梯度<br/>
       (float *)gradeintGamma  = 批标准化层Gamma的梯度<br/>
功能：累加批标准化层的权重Beta和Gamma的梯度，最后对beta和gamma进行梯度递减<br/>

时间复杂度O(N) ： num * i * j * k <br/>

调用例子 ： Gradient_BatchNorm_Weight(lastTermGradient , coeff , variable ,gradientBeta ,gradientGamma)<br/>
如上所示根据将上一层的梯度进行计算权重的梯度并将结果累加gradientBeta和gradientGamma中，并进行梯度递减<br/>

算法演示

In [63]:
def Gradient_BatchNorm_Weight(lastTermGradient:list[Matrix] , coeff:BN , variable:list[Matrix] ,gradientBeta:list[float] ,gradientGamma:list[float]):
    size = coeff.channelSize
    Back_ToZero(gradientBeta,size)
    Back_ToZero(gradientGamma,size)
    BATCH = 5  #批次大小建议宏定义
    for num in range(BATCH):
        Back_BatchNorm_Weight(lastTermGradient[num],coeff,variable[num],gradientBeta,gradientGamma)
    Back_BatchNorm_Descent(coeff,gradientBeta,gradientGamma)

void Gradient_BatchNorm(struct Matrix **lastTermGradient , struct Matrix **inputVariable,struct BN *coeff , struct Matrix **gradient ,struct Matrix **outputVariable,float *gradientBeta,float *gradientGamma )</br>
<br/>
变量： (struct Matrix **)lastTermGradeint  = 整批上一层的梯度<br/>
       (struct Matrix **)inputVariable  = 整批批标准化前的特征矩阵<br/>
       (struct BN *)coeff         = 该层批标准化的参数
       (struct Matrix **) gradient  = 整批批标准化的梯度<br/>
       (struct Matrix **) outputVariable  = 整批批标准化后的特征矩阵<br/>
       (float *)gradeintBeta  = 批标准化层Beta的梯度<br/>
       (float *)gradeintGamma  = 批标准化层Gamma的梯度<br/>
功能：包含了Gradient_BatchNorm_Weight和Gradient_BatchNorm_Variable所有操作<br/>

时间复杂度O(N) ： 3 * num * i * j * k <br/>

调用例子 ： Gradient_BatchNorm(lastTermGradient , inputVariable ,  coeff , gradient, outputVariable ,gradientBeta ,gradientGamma)<br/>
包含了Gradient_BatchNorm_Weight和Gradient_BatchNorm_Variable所有操作
<br/>

算法演示

In [64]:
def Gradient_BatchNorm(lastTermGradient:list[Matrix] , inputVariable:list[Matrix] ,  coeff:BN , gradient:list[Matrix], outputVariable:list[Matrix] ,gradientBeta:list[float] ,gradientGamma:list[float]):
    Gradient_BatchNorm_Variable(lastTermGradient,inputVariable,coeff,gradient)
    Gradient_BatchNorm_Weight(lastTermGradient,coeff,outputVariable,gradientBeta,gradientGamma)

void Back_Convolution_Variable(struct Matrix *lastTermGradient ,struct Matrix **kernel ,struct Matrix *gradient,int stride, int paddingAmt)
</br>
<br/>
变量： (struct Matrix *)lastTermGradeint  = 上一层的梯度<br/>
       (struct Matrix **)kernel  = 该层卷积核<br/>
       (struct Matrix *)gradient  = 该层的梯度<br/>
       (int) paddingAmt = 填充大小 (所谓填充即在input中添加全为0的特征)<br/>
       (int) stride = 步长<br/>
功能：计算该特征向量卷积层的梯度，详细请见梯度算法分析文档<br/>

时间复杂度O(N) ：  j * k * x * i * y * z<br/>

调用例子 ： Back_Convolution_Variable(lastTermGradient ,kernel ,gradient ,paddingAmt, stride)<br/>
利用上层梯度与卷积核计算该层的梯度<br/>

算法演示

In [65]:
def Back_Convolution_Variable(lastTermGradient:Matrix ,kernel:list[Matrix] ,gradient:Matrix ,paddingAmt:int, stride:int):
    channel = gradient.channelSize
    size = lastTermGradient.rowSize
    kernelSize = kernel[0].rowSize
    kernelAmt = lastTermGradient.channelSize
    for x in range (kernelAmt):
        for j in range (size):
            for k in range (size):
                temp = 0
                for i in range (channel):
                    for y in range (kernelSize):
                        for z in range (kernelSize):
                            rowLoc = j * stride + y - paddingAmt #用于确定其在input内的行索引
                            columnLoc = k * stride + z - paddingAmt #用于确定其在input内的例索引
                            #用于判断是否为padding区域若为padding区域不需要计算
                            if ((rowLoc < 0 or rowLoc >= input.rowSize ) or (columnLoc < 0 or columnLoc >= input.columnSize )):
                                continue
                            #若位于该位置进行点乘并叠加到暂时变量
                            #dL/dX = dL/dy * dy/dx (only the kernel which sliding and fix the input will affect the gradient )
                            gradient.feature[i][rowLoc][columnLoc] += (lastTermGradient.feature[x][j][k] * kernel[x].feature[i][y][z])

void Gradient_Convolution_Variable(struct Matrix **lastTermGradient ,struct Matrix **kernel ,struct Matrix **gradient,int stride, int paddingAmt)
</br>
<br/>
变量： (struct Matrix **)lastTermGradeint  = 整批计算上一层的梯度<br/>
       (struct Matrix **)kernel  = 整批计算该层卷积核<br/>
       (struct Matrix **)gradient  = 整批计算该层的梯度<br/>
       (int) paddingAmt = 填充大小 (所谓填充即在input中添加全为0的特征)<br/>
       (int) stride = 步长<br/>
功能：整批计算计算该特征向量卷积层的梯度，详细请见梯度算法分析文档<br/>

时间复杂度O(N) ： num * j * k * x * i * y * z<br/>

调用例子 ： Gradient_Convolution_Variable(lastTermGradient ,kernel ,gradient ,paddingAmt, stride)<br/>
利用整批上层梯度与卷积核计算该层的梯度<br/>

算法演示

In [66]:
def Gradient_Convolution_Variable(lastTermGradient:list[Matrix] ,kernel:list[Matrix] ,gradient:list[Matrix] ,paddingAmt:int, stride:int):
    BATCH = 5  #批次大小建议宏定义
    for num in range(BATCH):
        Matrix_ToZero(gradient[num])
        Back_Convolution_Variable(lastTermGradient[num] ,kernel ,gradient[num] ,paddingAmt, stride)

void Back_Convolution_Kernel(struct Matrix *lastTermGradient ,struct Matrix **gradient,struct Matrix *variable ,int stride, int paddingAmt)</br>
<br/>
变量： (struct Matrix *)lastTermGradeint  = 上一层的梯度<br/>
       (struct Matrix **)gradient  = 该层卷积核的梯度<br/>
       (struct Matrix *)variable  = 卷积前的特征矩阵<br/>
       (int) paddingAmt = 填充大小 (所谓填充即在input中添加全为0的特征)<br/>
       (int) stride = 步长<br/>
功能：计算卷积层卷积核的梯度，详细请见梯度算法分析文档<br/>

时间复杂度O(N) ：   j * k * x * i * y * z<br/>

调用例子 ： Back_Convolution_Variable(lastTermGradient ,gradient ，variable,paddingAmt, stride)<br/>
计算该上层梯度与该特征向量的卷积核梯度并累加到gradient中<br/>

算法演示

In [67]:
def Back_Convolution_Kernel(lastTermGradient:Matrix ,gradient:list[Matrix],variable:Matrix ,paddingAmt:int, stride:int):
    channel = variable.channelSize
    size = lastTermGradient.rowSize
    kernelSize = gradient[0].rowSize
    kernelAmt = lastTermGradient.channelSize
    for x in range (kernelAmt):
        for j in range (size):
            for k in range (size):
                for i in range (channel):
                    for y in range (kernelSize):
                        for z in range (kernelSize):
                            rowLoc = j * stride + y - paddingAmt #用于确定其在input内的行索引
                            columnLoc = k * stride + z - paddingAmt #用于确定其在input内的例索引
                            #用于判断是否为padding区域若为padding区域不需要计算
                            if ((rowLoc < 0 or rowLoc >= input.rowSize ) or (columnLoc < 0 or columnLoc >= input.columnSize )):
                                continue
                            #若位于该位置进行点乘并叠加到暂时变量
                            #dL/dTheta = dL/dy * dy/dTheta (only the kernel which sliding and fix the input will affect the gradient )
                            gradient[x].feature[i][y][z] += (lastTermGradient.feature[x][j][k] * 
                                                                variable.feature[i][rowLoc][columnLoc])

void Gradient_Convolution_Kernel(struct Matrix **lastTermGradient ,struct Matrix **gradient,struct Matrix **variable ,int stride, int paddingAmt)</br>
<br/>
变量： (struct Matrix **)lastTermGradeint  = 整批上一层的梯度<br/>
       (struct Matrix **)gradient  = 该层卷积核的梯度<br/>
       (struct Matrix **)variable  = 整批卷积前的特征矩阵<br/>
       (int) paddingAmt = 填充大小 (所谓填充即在input中添加全为0的特征)<br/>
       (int) stride = 步长<br/>
功能：计算卷积层卷积核的梯度，详细请见梯度算法分析文档<br/>

时间复杂度O(N) ： num * j * k * x * i * y * z<br/>

调用例子 ： Gradient_Convolution_kernel(lastTermGradient ,gradient ，variable,paddingAmt, stride)<br/>
计算该上层梯度与该特征向量的卷积核梯度<br/>

算法演示

In [68]:
def Gradient_Convolution_Kernel(lastTermGradient:list[Matrix] ,gradient:list[Matrix],variable:list[Matrix] ,paddingAmt:int, stride:int):
    BATCH = 5  #批次大小建议宏定义
    for num in range(BATCH):
        Back_Convolution_Kernel(lastTermGradient[num],gradient,variable[num],stride,paddingAmt)

void Gradient_Convolution(struct Matrix **lastTermGradient ,struct Matrix **kernel ,struct Matrix **gradient, struct Matrix **variable ,struct Matrix **gradientKernel, int stride, int paddingAmt)</br>
<br/>
变量： (struct Matrix **)lastTermGradeint  = 整批上一层的梯度<br/>
       (struct Matrix **)kernel  = 整批计算该层卷积核<br/>
       (struct Matrix **)gradient  = 该层卷积核的梯度<br/>
       (struct Matrix **)variable  = 整批卷积前的特征矩阵<br/>
       (struct Matrix **)gradientKernel  = 该层卷积核的梯度<br/>
       (int) stride = 步长<br/>
       (int) paddingAmt = 填充大小 (所谓填充即在input中添加全为0的特征)<br/>
功能：计算卷积层的梯度和卷积核的梯度并对卷积核进行梯度递减<br/>

时间复杂度O(N) ：2 * num * j * k * x * i * y * z<br/>

调用例子 ： Gradient_Convolution(lastTermGradient ,kernel,gradient ,variable, gradientKernel, stride , paddingAmt)<br/>
计算卷积层的梯度和卷积核的梯度并对卷积核进行梯度递减<br/>

算法演示

In [69]:
def Gradient_Convolution(lastTermGradient:list[Matrix] ,kernel:list[Matrix],gradient:list[Matrix] ,variable:list[Matrix], gradientKernel:list[Matrix], stride:int, paddingAmt:int):
    kernelAmt = lastTermGradient[0].channelSize
    Gradient_Convolution_Variable(lastTermGradient,kernel,gradient,stride,paddingAmt)
    for x in range(kernelAmt):
        Matrix_ToZero(gradientKernel[x])
    Gradient_Convolution_Kernel(lastTermGradient,gradientKernel,variable,stride,paddingAmt)
    for x in range(kernelAmt):
        Back_Descent(kernel[x],gradientKernel[x])


1