You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is the content of requests for _loglikelihood_tokens. The context is '' and context_enc is [1]. It is appended in this function from lm_eval/api/model.py:
def loglikelihood(
self, requests, disable_tqdm: bool = False
) -> List[Tuple[float, bool]]:
new_reqs = []
for context, continuation in [req.args for req in requests]:
if context == "":
# BOS or EOS as context
context_enc, continuation_enc = (
[self.prefix_token_id],
self.tok_encode(continuation),
)
else:
context_enc, continuation_enc = self._encode_pair(context, continuation)
new_reqs.append(((context, continuation), context_enc, continuation_enc))
return self._loglikelihood_tokens(new_reqs, disable_tqdm=disable_tqdm)
It merges both premise and hypothesis and has no context. So, the code will add a prefix_token_id to the input for any model.
This input format seems wired for base model as most LLM do not use bos or eos in the start of inputs (maybe except Gemma).
The text was updated successfully, but these errors were encountered:
@SefaZeng this is intentional. It's inspired by how XNLI was evaluated in the XGLM paper.
The doc_to_choice has 3 options which a decoder model is simply required to choose which is the most likely.
I try to test the XNLI results using the latest commit. And I find the inputs have a prefix_token which looks like:
This is the content of
requests
for_loglikelihood_tokens
. Thecontext
is '' andcontext_enc
is [1]. It is appended in this function fromlm_eval/api/model.py
:As XNLI's config (xnli_zh.yaml) is this:
It merges both premise and hypothesis and has no
context
. So, the code will add aprefix_token_id
to the input for any model.This input format seems wired for base model as most LLM do not use
bos
oreos
in the start of inputs (maybe except Gemma).The text was updated successfully, but these errors were encountered: