In [20]:
from transformers import RobertaTokenizer, RobertaModel

# 基础概念解析

## from_pretrained
> [官网](https://huggingface.co/transformers/main_classes/model.html#transformers.PreTrainedModel.from_pretrained)

从 预训练的模型实例化一个 Model
1. 模型默认为 eval 状态. 若要进行训练需要手动设置 model.train()

# Roberta-Base

> [Roberta官网教程](https://huggingface.co/transformers/model_doc/roberta.html)

>[Tokenizer](https://huggingface.co/transformers/main_classes/tokenizer.html?highlight=batch_encode_plus)


>[处理数据](https://huggingface.co/transformers/preprocessing.html)

>[所有预训练模型大小](https://huggingface.co/transformers/pretrained_models.html)

In [21]:
# 导入模型和 tokenizer
tokenizer = RobertaTokenizer.from_pretrained("roberta-base")
model = RobertaModel.from_pretrained('roberta-base')


Some weights of the model checkpoint at roberta-base were not used when initializing RobertaModel: ['lm_head.layer_norm.weight', 'lm_head.dense.bias', 'lm_head.bias', 'lm_head.decoder.weight', 'lm_head.layer_norm.bias', 'lm_head.dense.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [22]:
batch_sentences = ["Hello I'm a single sentence",
                    "And another sentence",
                    "And the very very last one"]

In [4]:
str_sent = 'I love China'
inputs = tokenizer(str_sent, return_tensors='pt')
inputs

{'input_ids': tensor([[  0, 100, 657, 436,   2]]), 'attention_mask': tensor([[1, 1, 1, 1, 1]])}

In [5]:
outputs = model(**inputs, output_attentions=True)

In [12]:
model_large = RobertaModel.from_pretrained('roberta-large')

Downloading:   0%|          | 0.00/482 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.43G [00:00<?, ?B/s]

Some weights of the model checkpoint at roberta-large were not used when initializing RobertaModel: ['lm_head.layer_norm.weight', 'lm_head.dense.bias', 'lm_head.bias', 'lm_head.decoder.weight', 'lm_head.layer_norm.bias', 'lm_head.dense.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [13]:
output_large = model_large(**inputs, output_attentions=True)

In [15]:
attns_large = output_large.attentions
attns_large[0].shape


torch.Size([1, 16, 5, 5])

In [16]:
attns_base = outputs.attentions
attns_base[0].shape


torch.Size([1, 12, 5, 5])

In [28]:
batch_inputs = tokenizer(batch_sentences, padding=True, truncation=True, return_tensors="pt")
batch_inputs

{'input_ids': tensor([[    0, 31414,    38,   437,    10,   881,  3645,     2],
        [    0,  2409,   277,  3645,     2,     1,     1,     1],
        [    0,  2409,     5,   182,   182,    94,    65,     2]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 0, 0, 0],
        [1, 1, 1, 1, 1, 1, 1, 1]])}

In [35]:
output_base_batch = model(**batch_inputs, output_hidden_states=True)

In [39]:
output_large_batch = model_large(**batch_inputs, output_hidden_states=True)

In [40]:
output_large_batch.last_hidden_state.shape

torch.Size([3, 8, 1024])

In [36]:
output_base_batch.last_hidden_state.shape

torch.Size([3, 8, 768])

In [38]:
output_base_batch.hidden_states[0].shape

torch.Size([3, 8, 768])

In [34]:
print(output_base_batch.attentions)

None


In [41]:
model.embeddings.word_embeddings.embedding_dim

768

In [8]:
from transformers import BertModel, BertConfig
configuration = BertConfig()
model = BertModel(configuration)

In [9]:
model.encoder

BertEncoder(
  (layer): ModuleList(
    (0): BertLayer(
      (attention): BertAttention(
        (self): BertSelfAttention(
          (query): Linear(in_features=768, out_features=768, bias=True)
          (key): Linear(in_features=768, out_features=768, bias=True)
          (value): Linear(in_features=768, out_features=768, bias=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
        (output): BertSelfOutput(
          (dense): Linear(in_features=768, out_features=768, bias=True)
          (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (dropout): Dropout(p=0.1, inplace=False)
          (adapters): ModuleDict()
          (adapter_fusion_layer): ModuleDict()
        )
      )
      (intermediate): BertIntermediate(
        (dense): Linear(in_features=768, out_features=3072, bias=True)
      )
      (output): BertOutput(
        (dense): Linear(in_features=3072, out_features=768, bias=True)
        (LayerNorm): LayerNorm((768,), eps=1e-12

In [10]:
type(model.encoder)

transformers.models.bert.modeling_bert.BertEncoder