Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BERT simple true / false comparison scoring seems wrong #177

Open
james-deee opened this issue Feb 1, 2024 · 0 comments
Open

BERT simple true / false comparison scoring seems wrong #177

james-deee opened this issue Feb 1, 2024 · 0 comments

Comments

@james-deee
Copy link

I am using a very basic call around a simple response of "true" and "false" variations, and am getting a very odd result that I don't quite understand.

Here's the code snippet that makes the BERT call

        bert_score: Dict[str, Tensor] = score(
            ['false.'],
            ['false'], 
            model_type='microsoft/deberta-xlarge-mnli',
            lang="en",
        )
        print(f'Score: {bert_score}')

The result for this call is:

Score: (tensor([0.7015]), tensor([0.7298]), tensor([0.7153]))

But here's the odd part, if I make the call with an input of ['True'], this actually scores higher:

        bert_score: Dict[str, Tensor] = score(
            ['True'],
            ['false'], 
            model_type='microsoft/deberta-xlarge-mnli',
            lang="en",
        )
        print(f'Score: {bert_score}')

result:

Score: (tensor([0.7599]), tensor([0.7599]), tensor([0.7599]))

This just seems flat out wrong to me, and am wondering if someone can give me insight into what is happening. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant