## Self-Attention  
由google提出，Attention is all your need. 此模型结合了RNN(考虑全序列)和CNN(并行处理)的优点。   
目标：学会使用Transformer.   
## 任务描述   
- 根据给定的特征对说话者进行分类.  
## 模型描述  
# Model
- TransformerEncoderLayer:
  - Base transformer encoder layer in [Attention Is All You Need](https://arxiv.org/abs/1706.03762)
  - Parameters:
    - d_model: input的特征维数（必须设置）.   

    - nhead: Self-Attention模型中multihead的数量(必须设置).  

    - dim_feedforward: 前馈网络模型的维数 (default=2048).

    - dropout: the dropout value (default=0.1).

    - activation: 神经网络的激活函数 relu or gelu (default=relu).

- TransformerEncoder:
  - TransformerEncoder is a stack of N transformer encoder layers
  - Parameters:
    - encoder_layer: 一个 TransformerEncoderLayer() 的实例(必须设置).

    - num_layers: 编码器中的子编码器层数 (必须设置).

    - norm: 层Normal化组件 (必须设置).

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F


class Classifier(nn.Module):
  def __init__(self, d_model=80, n_spks=600, dropout=0.1):
    super().__init__()
    self.prenet = nn.Linear(40, d_model)
    self.encoder_layer = nn.TransformerEncoderLayer(
      d_model=d_model, dim_feedforward=256, nhead=2
    )
    self.pred_layer = nn.Sequential(
      nn.Linear(d_model, d_model),
      nn.ReLU(),
      nn.Linear(d_model, n_spks),
    )

  def forward(self, mels):
    """
    args:
      mels: (batch size, length, 40)
    return:
      out: (batch size, n_spks)
    """
    out = self.prenet(mels)
    out = out.permute(1, 0, 2)
    out = self.encoder_layer(out)
    out = out.transpose(0, 1)
    stats = out.mean(dim=1)
    out = self.pred_layer(stats)
    return out
