Several possible bugs #12

lijierui · 2021-11-12T01:02:40Z

I've been using this codebase to handle some new datasets, it did help me a lot, but I found a few places where there might be bugs or unclear descriptions.

length of target wasn't cut to max_output_len for pretrain models, if that exceed max_len, in

MWPToolkit/mwptoolkit/model/PreTrain/robertagen.py, line 173 or bertgen.py line 173
decoder_inputs = self.pos_embedder(self.out_embedder(target))
the sequence length would exceed pos_embedder's max length

for GTS, the code is not generalized for datasets with constants other than 1 and 3.14 and thus cause tensor size mismatch

(mwptoolkit/model/Seq2Tree/gts.py) ~line 904
if mask_flag: num_score2[i][:2] = -1e10 # for the first iterations, do not generate 1 and 3.14

there might be bugs in processing " from_prefix_to_infix" and "from_infix_to_prefix" in the preprocessing tools:
If you try to map this equation to prefix and map it back:
1500/(((100+12)-(100-12))/100)
it will yield this, where the relation between 100+12 and 100-12 is not correct.
1500/(100+12-100-12)/100
and for */, it would ignore () as well:
1/(1-(1/(2*2))) would be mapped to 1/(1-1/2*2)
Another small problem, every time when feeding a batch, it will re-preprocess the data. This would include much redundant computation if we run many epochs.

Thanks again for this tool!

The text was updated successfully, but these errors were encountered:

LYH-YF · 2021-11-12T01:38:51Z

we appreciate your suggestions for the toolkit!!!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Several possible bugs #12

Several possible bugs #12

lijierui commented Nov 12, 2021 •

edited

Loading

LYH-YF commented Nov 12, 2021

Several possible bugs #12

Several possible bugs #12

Comments

lijierui commented Nov 12, 2021 • edited Loading

LYH-YF commented Nov 12, 2021

lijierui commented Nov 12, 2021 •

edited

Loading