You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
def convert_single_mathqa_example(example, is_training, tokenizer, max_seq_length,
max_program_length, op_list, op_list_size,
const_list, const_list_size,
cls_token, sep_token):
"""Converts a single MathQAExample into an InputFeature."""
features = []
question_tokens = example.question_tokens
if len(question_tokens) > max_seq_length - 2:
print("too long")
question_tokens = question_tokens[:max_seq_length - 2]
tokens = [cls_token] + question_tokens + [sep_token] # 1. This line add [cls_token] at beginning.
segment_ids = [0] * len(tokens)
input_ids = tokenizer.convert_tokens_to_ids(tokens)
input_mask = [1] * len(input_ids)
for ind, offset in enumerate(example.number_indices): # 2. Why don't number_indices offset by 1 ?
if offset < len(input_mask):
input_mask[offset] = 2
else:
if is_training == True:
# invalid example, drop for training
return features
# assert is_training == False
Hello, Thanks for the great work! However, I am confused with the code. In the 1. comment, you add [cls_token] in front of the tokens, which means that the indices of tokens in the tokens will shift to the right by 1. In. 2. comment, you just use the example.number_indices to assign 2 to the indices of numbers, this is confusing, since input_mask is created from the tokens, which contains the [cls] at the beginning. For example: tokens: [[cls], a, b, 1, c, d], the example.number_indices will be [2] (because when you calculate the example.number_indices, there is no [cls] at the beginning, the "2" refers to the number "1"'s index ), the corresponding input_mask will be [1, 1, 1, 1, 1, 1]. When you try to assign the numbers' indices to 2 by the example.number_indices , the input_mask will be [1, 1, 0, 1, 1, 1], however, the 0'index 2 refers to the "b" in the tokens. Could you please explain this? Thanks!
The text was updated successfully, but these errors were encountered:
Hello, Thanks for the great work! However, I am confused with the code. In the 1. comment, you add
[cls_token]
in front of thetokens
, which means that the indices of tokens in thetokens
will shift to the right by 1. In. 2. comment, you just use theexample.number_indices
to assign 2 to the indices of numbers, this is confusing, sinceinput_mask
is created from thetokens
, which contains the [cls] at the beginning. For example:tokens
: [[cls], a, b, 1, c, d], theexample.number_indices
will be [2] (because when you calculate theexample.number_indices
, there is no [cls] at the beginning, the "2" refers to the number "1"'s index ), the correspondinginput_mask
will be [1, 1, 1, 1, 1, 1]. When you try to assign the numbers' indices to 2 by theexample.number_indices
, theinput_mask
will be [1, 1, 0, 1, 1, 1], however, the 0'index 2 refers to the "b" in thetokens
. Could you please explain this? Thanks!The text was updated successfully, but these errors were encountered: