# BERTで回帰してみよう

## BertForSequenceClassification is どんなの

## documentを眺める

[BertForSequenceClassification](https://huggingface.co/transformers/model_doc/bert.html#bertforsequenceclassification)を眺めると，以下のように書いてある.

```
Bert Model transformer with a sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for GLUE tasks.
This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)
This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

訳：
分類/回帰層(出力層の頂点に1層の線形層がある)がてっぺんに乗ったBERT-Transformerモデルです．
PretrainedModelを継承しているので，ドキュメントはそこを参照しましょう．
Pytorchのnn.Moduleのサブクラスでもあるので，このドキュメントを利用してもよいでしょう．
```

なのでこれを使ってみればよさそうだという結論に至る．

## BertConfig is 何

Bertモデルを作るときの設定だけどデフォルトでやるなら設定する必要がなさそう
```
vocab_size
hidden_size
num_hidden_layers
num_attention_heads
intermediate_size
hidden_act
hidden_dropout_prob
attention_probs_dropout_prob
max_position_embeddings
type_vocab_size
initializer_range
layer_norm_eps
gradient_checkpointing
position_embedding_type
use_cache
```

## サンプルを走らせる

In [23]:
from transformers import BertTokenizer, BertForSequenceClassification,BertConfig, PretrainedConfig
import torch

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
config = BertConfig.from_pretrained('bert-base-uncased',problem_type="regression")

In [33]:
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', problem_type="regression")
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
labels = torch.tensor([1]).unsqueeze(0)  # Batch size 1
outputs = model(**inputs, labels=labels)
loss = outputs.loss
logits = outputs.logits

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

In [37]:
outputs, loss, logits

(SequenceClassifierOutput(loss=tensor(0.5012, grad_fn=<MseLossBackward>), logits=tensor([[0.2533, 0.3331]], grad_fn=<AddmmBackward>), hidden_states=None, attentions=None),
 tensor(0.5012, grad_fn=<MseLossBackward>),
 tensor([[0.2533, 0.3331]], grad_fn=<AddmmBackward>))

これでoutputsが回帰になることがわかった，じゃあConfigとかは(まだ)要らないのか
あとはラベルとセットのデータを用意すればいい

In [38]:
inputs

{'input_ids': tensor([[  101,  7592,  1010,  2026,  3899,  2003, 10140,   102]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1]])}

## 一旦普通の回帰データでやってみるぞい

## m2→parallel

https://github.com/kanekomasahiro/bert-gec/blob/master/scripts/convert_m2_to_parallel.py を眺めるとよさそう,データの形をいい感じにしたい（どんなデータの型にすればいいんだろうか？）