Pre-training input format?

Hi, I was testing the pre-trained model, but I can't seem to make the model to behave correctly on one of the pre-training tasks:

```
from transformers import AutoTokenizer, BartForConditionalGeneration

model_path = "/path/to/pretrained/PreLog"

tokenizer = AutoTokenizer.from_pretrained(model_path)
model = BartForConditionalGeneration.from_pretrained(model_path)

x = tokenizer("authentication <mask>; logname= uid=0 euid=0 tty=NODEVssh ruser= rhost=218.188.2.4", return_tensors="pt")
y = model.generate(**x, max_length=50)

print(tokenizer.decode(y[0]))
```
I get: `</s><s>authentication; logname=0 uid=0 euid=0 tty=NODEVssh ruser=0 rhost=218.188.2.4</s>`. Which is unexpected. I expected it to fill in the `<mask>` token with `failure`. This line of logs is directly taken from Linux from LogHub, which should be in the pre-training dataset. I found the pre-trained weights here https://figshare.com/s/5a08ef8b02b94f6726c2.

Is the input format I used wrong?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pre-training input format? #1

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Pre-training input format? #1

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions