In [9]:
from transformers import AutoTokenizer, AutoModel, AutoConfig
model = AutoModel.from_pretrained("bert-base-uncased")
config = AutoConfig.from_pretrained("bert-base-uncased")
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [10]:
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = model(**inputs)
type(outputs)

transformers.modeling_outputs.BaseModelOutputWithPoolingAndCrossAttentions

# 0. Model 介绍

## 0.1 BertModel

> 组成

- self.embeddings = BertEmbeddings
- self.encoder   = BertEncoder
- self.pooler    = BertPooler

> 流程

*Step1*

input_ids ==> ***self.embeddings*** = embedding_output 

*Step2*

embedding_output ==> ***self.encoder*** = encoder_outputs 

*Step3*
encoder_outputs[0] (last_hidden_state) ==> ***self.pooler*** = pooled_output


> 总结

1. last_hidden_state: Model 最后一层的输出 [B, S, H]

2. pooler_output: 最后一层 [CLS] 对应的特征 送入 self.pooler 后得到的向量 [B, H]

3. hidden_states: output_hidden_states=True. Model 每层对应的输出 [B, S, H]
                  第一层是 Embedding layer 对应的值: Embedding layer 由 word_embeddings, position_embeddings等组合而成



## 0.1.1 BertEmbeddings

>组成

- word_embeddings
- position_embeddings
- token_type_embeddings

> 流程: input_ids

**输入**: input_ids 

*Step1*

input_ids ==> ***self.word_embeddings*** = inputs_embeds

*Step2*

embeddings = inputs_embeds + token_type_embeddings

*Step3*

embeddings += position_embeddings 并进行 LN 等操作

最终

**输出** embeddings


> 流程: inputs_embeds

当输入为 inputs_embeds 时，Model 认为 它是 word_embeddings 对应的值.

它需要接着去经历 token_type_embeddings, position_embeddings, LN 和 dropout 操作, 才能得到最终 Embedding Layer 对应的值

## 0.1.2 BertEncoder

> 组成

self.layer = 多层的 BertLayer

> 流程

- **输入**

hidden_states (embedding_output) ==> **self.layer** = all_hidden_states

- **输出**

last_hidden_state

hidden_states
