-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: single embeddings matrix design #140
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good so far. Some thoughts:
- I will file a bug describing why we're doing this for you to link to...give me a little time though. This will also describe our discussion of embedding too, though that's a secondary issue.
- I propose we shouldn't store the separate vocabularies in any class object. They should be computed separately, logged in the pretty-print fashion, then thrown out. This requires moving those logging statements, but so be it.
vocab_map
would be better just calledindex
and we should throw out theSymbolMap
class.
Will work on this in a day or two.
I created issue #142 to document the plan here. |
I just rebased a ton of commits and want to make sure I didn't break anything. EDIT: Hmmm I did not realize that all rebased commits would appear as changes in this PR> I have not resolved complicated git merges in a long time and thought rebasing was the right move, but I don't like how this looks... Should I scrap this and merge master into my branch instead? Edit Edit: I have done so, I think merging is better here. |
I'm not a merge vs. rebase N*zi but the way the PR looks when I review is plenty clear as is. |
Yeah sounds good.
Does this mean we would call it like |
You could make it so |
Ok I was working on replacing the indexing with one method as you suggested but there is an issue here: our output space should be If we set it to be the a subset of the vocab ( |
I think I can make this work with some tricky offsets to map target indices drawn from {0, ..., target_size} to the embedding matrix indices of {0, ..., source+target+features size}. However, this will require providing this info to the model so we can offset predictions when decoding. This seems a bit awkward to me. |
Why not just make it the case that the target vocab are the first or last
N elements of the vocabulary and then keep track of what N is? Then,
initialize the output space to N.
Alternatively you can keep track of the target vocab separately from the
omnibus/shared vocab.
…On Tue, Apr 9, 2024 at 6:22 PM Adam ***@***.***> wrote:
I think I can make this work with some tricky offsets to map target
indices drawn from {0, ..., target_size} to the embedding matrix indices of
{0, ..., source+target+features size}. However, this will require providing
this info to the model so we can offset predictions when decoding. This
seems a bit awkward to me.
—
Reply to this email directly, view it on GitHub
<#140 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABG4OKYBRL5XOPVCJ37B5LY4RS4DAVCNFSM6AAAAAA5P4WQOKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBWGEZTSMRXGI>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
…gs between all models
Still probably some clunkiness here, and I need to fix formatting. But I wanted to put this up in case you want to take a look: I have tested for Attentive LSTM and pg LSTM, and it solves the indexing issue Michael has. When I come back to this tomorrow I also want to set |
Look what we did for |
Nevermind, that's not relevant. I was looking for a flag that defaults to |
I have cleaned this up and tested it. Everything works except Transducer right now, which seems to be because of the target_vocab_size manipulation that happens in the model here https://github.com/CUNY-CL/yoyodyne/blob/master/yoyodyne/models/transducer.py#L43. I think its because |
And, as usual, black is failing the test but passing locally. Sorry I can never remember what the culprit is :( |
|
@Adamits Looking through your changes, it seems you never extend the transducer's vocab size with the additional target vocab size. The target vocab can't be acquired through the vocab map since it's edit actions from the expert. (It's like, every potential target vocab has a copy or insert action, so it's 2x the observed vocabulary.) |
Right, is there a reason why the action vocabulary loops through the dataset instead of just getting a handle on the vocabulary? I think doing that would just fix this. |
Sorry nevermind (though this may also be an issue) I am pretty sure my initial assessment is right: I now initialize the embeddings matrix before the transducer modifications. Will see if I can fix this elegantly. EDIT: Basically what you said is right, we just use the arg |
Most recent commit seems to fix the Transducer issue I created. I will try to test more tomorrow night to be sure all of the changes in this most recent commit are needed. |
Nope, no reason. I just do it because it has to pass an iterable off to the maxwell dependency so can add on the fly. Now that you mention it, it would be more consistent to just pass the dataloader vocabulary to the expert. (Or move expert around so it's calling maxwell while building the dataloader. But it's a cheap operation so not a major need.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just commenting to annoy myself to review the transducer changes.
Does this solve #156? |
Oh yeah I think it does. |
This PR is getting very very stale. From what I recall it was ready to go. Shall I run another round of tests on each arch for both train and predict, and if it all works, we merge? |
@Adamits yes please merge this soon. I also thought it was basically ready to go. |
Ok I ran these, and everything runs and looks good, except the Transducer. No errors, but I trained it for 5 epochs on english sigmorphon 2017 medium data, and the validation accuracy looks messed up. See the metrics.csv Table in markdown below. We also get an error when running predict.py. However, when I run the same code from master (i) I get 0 accuracy after 5 epochs and the train loss is converging absurdly slowly and (ii) predict.py still errors.
|
Update ok the predict.py issue was minor and I fixed it. The transducer seems to be learning to just copy exactly the lemma. Not sure what to make of that yet... |
I have also seen this pattern with validation accuracy, namely it being, implausibly, a constant. I don't know what the origin of it is. Re: the copying of lemma, in the printout of the parameters do you see anything that would be doing feature encoding? |
You might also want to try doing the manual merge against head to see if anything changed there helps... |
After I merge master back in, I get the same thing but with a lower acc:
Prediction is also now broken:
|
Sounds good. Let me just train the Transducer longer as a sanity check.I also feel unsure why the (static) accuracies are so much lower once I merge master. I would plan on committing the version with master merged since I resolved a couple conflicts, sound good? I am also pretty sure you have to merge since this is your org---I dont think I have access to merge. |
Ok I got the same result from the transducer. I also wonder if the particular dataset is somehow causing it to learn this copy heuristic. It seems like we should do more explicit testing of that model, and being able to predict would help to debug---would you be ok merging this and I can open a separate ticket? |
I'm fine with that. I'll deal with the conflicts now. |
Oh Sorry I blanked on committing those. It is minor changes so I guess I will let you do that to get a 2nd pair of eyeson it? |
I just dealt with the conflicts. Flake8 noticed that |
Throwing this WIP up to store all vocabs in a single embeddings matrix shared between source, target, and features. This will fix the current pointer-generator issues when we have disjoint vocabularies. I will get back to implementing it later this week.
vocab_map
target_vocabulary
andsource_vocabulary
for logging.Closes #156.
TODO