Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BertForMultipleChoice works without captum, breaks with it? #1273

Open
rbelew opened this issue Apr 15, 2024 · 0 comments
Open

BertForMultipleChoice works without captum, breaks with it? #1273

rbelew opened this issue Apr 15, 2024 · 0 comments

Comments

@rbelew
Copy link

rbelew commented Apr 15, 2024

Hi, I'm trying to do first experiments with captum, with a pretrained BertForMultipleChoice model over this published Caseholder dataset, and following this BERT SQUAD tutorial. I've posted my notebook on Colab if you're kind enough to look!

My first issue seems to be identifying what "ground truth" means. As this is a multiple choice task, I figured the correct answer would be ground truth? But this fails trying to get the indices of the GT tokens: ground_truth_end_ind = indices.index(ground_truth_tokens[-1])

ValueError: 7607 is not in list

Question#1: What should ground truth be for this dataset?

To get a bit farther I just reused the same text for ground truth. But this fails when it tries to make a prediction:

start_scores, end_scores = predict(input_ids, \
                               token_type_ids=token_type_ids, \
                               position_ids=position_ids, \
                               attention_mask=attention_mask)

I traced this error to modeling_bert.BertForMultipleChoice.forward() (cf. lines 1662-1690).

In previous experiments without captum the shape of input_ids comes in as [16,5,128], input_ids.size(-1) = 128, num_choices = 5, and input_ids shape is changed to [80,128]. Then pooled_output.shape=[80, 128], logits.shape=[80,1], and reshaped_logits is computed correctly, with shape=[16,5].

But using captum, input_ids shape is unchanged num_choices = 260. Then pooled_output.shape=[1,768], logits.shape=[1,1], and the attempt to compute reshaped_logits fails with error:

RuntimeError: shape '[-1, 260]' is invalid for input of size 1

Question#2: Why is BertForMultipleChoice.forward() behaving differently with captum?

Thanks for any help. captum looks hugely useful, can't wait to make use of it!

❓ Questions and Help

(I posted this 5d ago on Discussion Forum )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant