-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Hi, I was testing the pre-trained model, but I can't seem to make the model to behave correctly on one of the pre-training tasks:
from transformers import AutoTokenizer, BartForConditionalGeneration
model_path = "/path/to/pretrained/PreLog"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = BartForConditionalGeneration.from_pretrained(model_path)
x = tokenizer("authentication <mask>; logname= uid=0 euid=0 tty=NODEVssh ruser= rhost=218.188.2.4", return_tensors="pt")
y = model.generate(**x, max_length=50)
print(tokenizer.decode(y[0]))
I get: </s><s>authentication; logname=0 uid=0 euid=0 tty=NODEVssh ruser=0 rhost=218.188.2.4</s>. Which is unexpected. I expected it to fill in the <mask> token with failure. This line of logs is directly taken from Linux from LogHub, which should be in the pre-training dataset. I found the pre-trained weights here https://figshare.com/s/5a08ef8b02b94f6726c2.
Is the input format I used wrong?
Metadata
Metadata
Assignees
Labels
No labels