WIP: single embeddings matrix design #140

Adamits · 2023-10-02T19:33:54Z

Throwing this WIP up to store all vocabs in a single embeddings matrix shared between source, target, and features. This will fix the current pointer-generator issues when we have disjoint vocabularies. I will get back to implementing it later this week.

Computes a single vocab_map
Retains target_vocabulary and source_vocabulary for logging.
Moves embedding initialization onto the model
Passes a single embedding layer to each module to share.

Closes #156.

TODO

I did not do anything with features. I still need to merge these into the single vocab, being careful not to break anything that requires separate features.
This does not handle the case that someone may want to have unmerged embeddings, such that the same unicode codepoint on the source and target side should be considered different. We will need to update the index to mark those characters with special symbols to implement this.

kylebgorman

Looks good so far. Some thoughts:

I will file a bug describing why we're doing this for you to link to...give me a little time though. This will also describe our discussion of embedding too, though that's a secondary issue.
I propose we shouldn't store the separate vocabularies in any class object. They should be computed separately, logged in the pretty-print fashion, then thrown out. This requires moving those logging statements, but so be it.
vocab_map would be better just called index and we should throw out the SymbolMap class.

Will work on this in a day or two.

kylebgorman · 2023-10-04T17:38:16Z

I created issue #142 to document the plan here.

…/yoyodyne into single-embeddings-matrix

Adamits · 2024-04-09T18:37:54Z

I just rebased a ton of commits and want to make sure I didn't break anything.

EDIT: Hmmm I did not realize that all rebased commits would appear as changes in this PR> I have not resolved complicated git merges in a long time and thought rebasing was the right move, but I don't like how this looks...

Should I scrap this and merge master into my branch instead?

Edit Edit: I have done so, I think merging is better here.

…/yoyodyne into single-embeddings-matrix

kylebgorman · 2024-04-09T19:15:14Z

I'm not a merge vs. rebase N*zi but the way the PR looks when I review is plenty clear as is.

Adamits · 2024-04-09T19:46:11Z

Yeah sounds good.

vocab_map would be better just called index and we should throw out the SymbolMap class.

Does this mean we would call it like index.index (consider that in dataset.py wew currently have e.g. self.index.vocab_map)?

kylebgorman · 2024-04-09T19:51:42Z

Does this mean we would call it like index.index (consider that in dataset.py wew currently have e.g. self.index.vocab_map)?

You could make it so self.index could so be called, i.e., by perlocating that ability upward...if I'm understanding correctly.

Adamits · 2024-04-09T22:13:06Z

Ok I was working on replacing the indexing with one method as you suggested but there is an issue here:

our output space should be target_vocab_size. We could set this to the vocab_size (source + target + features), but this seems theoretically odd, since in many cases most of the output space is invalid, but we will still spill probability mass into it.

If we set it to be the a subset of the vocab (target_vocab_size), then we need to somehow map these output indices back to our vocabulary. Right now I was thinking about how this impacts turning ints back into symbols, but I think this also impacts decoder modules, which need to lookup the target in the shared embedding matrix.

Adamits · 2024-04-09T22:22:36Z

I think I can make this work with some tricky offsets to map target indices drawn from {0, ..., target_size} to the embedding matrix indices of {0, ..., source+target+features size}. However, this will require providing this info to the model so we can offset predictions when decoding. This seems a bit awkward to me.

kylebgorman · 2024-04-09T22:26:02Z

Why not just make it the case that the target vocab are the first or last N elements of the vocabulary and then keep track of what N is? Then, initialize the output space to N. Alternatively you can keep track of the target vocab separately from the omnibus/shared vocab.

…

On Tue, Apr 9, 2024 at 6:22 PM Adam ***@***.***> wrote: I think I can make this work with some tricky offsets to map target indices drawn from {0, ..., target_size} to the embedding matrix indices of {0, ..., source+target+features size}. However, this will require providing this info to the model so we can offset predictions when decoding. This seems a bit awkward to me. — Reply to this email directly, view it on GitHub <#140 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABG4OKYBRL5XOPVCJ37B5LY4RS4DAVCNFSM6AAAAAA5P4WQOKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBWGEZTSMRXGI> . You are receiving this because you commented.Message ID: ***@***.***>

…gs between all models

Adamits · 2024-04-09T23:20:33Z

Still probably some clunkiness here, and I need to fix formatting. But I wanted to put this up in case you want to take a look:

I have tested for Attentive LSTM and pg LSTM, and it solves the indexing issue Michael has. When I come back to this tomorrow I also want to set --tie_embeddings as the default (e.g. defaults to true). Though I guess adding a flag like --no_tie_embeddings is a bit awkward?

kylebgorman · 2024-04-10T00:22:21Z

I have tested for Attentive LSTM and pg LSTM, and it solves the indexing issue Michael has. When I come back to this tomorrow I also want to set --tie_embeddings as the default (e.g. defaults to true). Though I guess adding a flag like --no_tie_embeddings is a bit awkward?

Look what we did for --log_wandb and --no_log_wandb here.

kylebgorman · 2024-04-10T02:13:25Z

Look what we did for --log_wandb and --no_log_wandb here.

Nevermind, that's not relevant. I was looking for a flag that defaults to True. Here's a better example.

Adamits · 2024-04-11T18:23:32Z

I have cleaned this up and tested it. Everything works except Transducer right now, which seems to be because of the target_vocab_size manipulation that happens in the model here https://github.com/CUNY-CL/yoyodyne/blob/master/yoyodyne/models/transducer.py#L43.

I think its because target_vocab_size gets updated AFTER we initialize the single embedding matrix. This can cause idnex out of bounds errors. I can't work on this until tonight or tomorrow. @bonham79 in the meantime if you have any intuition about a good fix for this please take a look!

Adamits · 2024-04-11T18:23:55Z

And, as usual, black is failing the test but passing locally. Sorry I can never remember what the culprit is :(

kylebgorman · 2024-04-11T18:27:18Z

And, as usual, black is failing the test but passing locally. Sorry I can never remember what the culprit is :(

pip install -r requirements.txt ought to set you to the exact version to use. (They have been making changes to the generated format...and I recently upgraded it because there was a security issue in a dependency.)

bonham79 · 2024-06-03T05:00:02Z

@Adamits Looking through your changes, it seems you never extend the transducer's vocab size with the additional target vocab size. The target vocab can't be acquired through the vocab map since it's edit actions from the expert. (It's like, every potential target vocab has a copy or insert action, so it's 2x the observed vocabulary.)

Adamits · 2024-06-04T04:03:04Z

@Adamits Looking through your changes, it seems you never extend the transducer's vocab size with the additional target vocab size. The target vocab can't be acquired through the vocab map since it's edit actions from the expert. (It's like, every potential target vocab has a copy or insert action, so it's 2x the observed vocabulary.)

Right, is there a reason why the action vocabulary loops through the dataset instead of just getting a handle on the vocabulary? I think doing that would just fix this.

Adamits · 2024-06-04T04:24:36Z

@Adamits Looking through your changes, it seems you never extend the transducer's vocab size with the additional target vocab size. The target vocab can't be acquired through the vocab map since it's edit actions from the expert. (It's like, every potential target vocab has a copy or insert action, so it's 2x the observed vocabulary.)

Right, is there a reason why the action vocabulary loops through the dataset instead of just getting a handle on the vocabulary? I think doing that would just fix this.

Sorry nevermind (though this may also be an issue) I am pretty sure my initial assessment is right: I now initialize the embeddings matrix before the transducer modifications. Will see if I can fix this elegantly.

EDIT: Basically what you said is right, we just use the arg vocab now instead of target_vocab.

Adamits · 2024-06-04T04:55:39Z

Most recent commit seems to fix the Transducer issue I created. I will try to test more tomorrow night to be sure all of the changes in this most recent commit are needed.

bonham79 · 2024-06-04T05:11:16Z

@Adamits Looking through your changes, it seems you never extend the transducer's vocab size with the additional target vocab size. The target vocab can't be acquired through the vocab map since it's edit actions from the expert. (It's like, every potential target vocab has a copy or insert action, so it's 2x the observed vocabulary.)

Right, is there a reason why the action vocabulary loops through the dataset instead of just getting a handle on the vocabulary? I think doing that would just fix this.

Nope, no reason. I just do it because it has to pass an iterable off to the maxwell dependency so can add on the fly.

Now that you mention it, it would be more consistent to just pass the dataloader vocabulary to the expert. (Or move expert around so it's calling maxwell while building the dataloader. But it's a cheap operation so not a major need.)

bonham79

Just commenting to annoy myself to review the transducer changes.

kylebgorman · 2024-06-10T12:30:22Z

Does this solve #156?

Adamits · 2024-06-13T03:41:24Z

Does this solve #156?

Oh yeah I think it does.

Adamits · 2024-07-04T21:35:30Z

This PR is getting very very stale. From what I recall it was ready to go. Shall I run another round of tests on each arch for both train and predict, and if it all works, we merge?

kylebgorman · 2024-07-04T23:55:05Z

This PR is getting very very stale. From what I recall it was ready to go. Shall I run another round of tests on each arch for both train and predict, and if it all works, we merge?

@Adamits yes please merge this soon. I also thought it was basically ready to go.

Adamits · 2024-07-06T04:40:29Z

Ok I ran these, and everything runs and looks good, except the Transducer. No errors, but I trained it for 5 epochs on english sigmorphon 2017 medium data, and the validation accuracy looks messed up. See the metrics.csv Table in markdown below. We also get an error when running predict.py.

However, when I run the same code from master (i) I get 0 accuracy after 5 epochs and the train loss is converging absurdly slowly and (ii) predict.py still errors.

lr-Adam	step	val_loss	val_accuracy	epoch	train_loss
0.001	0
	15	0.9778230786323547	0.1899999976158142	0
	15			0	3.108109951019287
0.001	16
	31	0.7275158166885376	0.1899999976158142	1
	31			1	0.9277768731117249
0.001	32
	47	0.581281840801239	0.1899999976158142	2
	47			2	0.7454320788383484
0.001	48
	63	0.49098914861679077	0.1899999976158142	3
	63			3	0.6462973356246948
0.001	64
	79	0.4305487275123596	0.1899999976158142	4
	79			4	0.6198569536209106

Adamits · 2024-07-06T05:24:15Z

Update ok the predict.py issue was minor and I fixed it. The transducer seems to be learning to just copy exactly the lemma. Not sure what to make of that yet...

kylebgorman · 2024-07-06T14:21:07Z

I have also seen this pattern with validation accuracy, namely it being, implausibly, a constant. I don't know what the origin of it is.

Re: the copying of lemma, in the printout of the parameters do you see anything that would be doing feature encoding?

kylebgorman · 2024-07-06T18:32:47Z

You might also want to try doing the manual merge against head to see if anything changed there helps...

Adamits · 2024-07-06T18:53:33Z

After I merge master back in, I get the same thing but with a lower acc:

lr-Adam	step	val_accuracy	val_loss	epoch	train_loss
0.001	0
	15	0.002	0.9778230786323547	0
	15			0	3.108109951019287
0.001	16
	31	0.002	0.7275158166885376	1
	31			1	0.9277768731117249
0.001	32
	47	0.002	0.581281840801239	2
	47			2	0.7454320788383484
0.001	48
	63	0.002	0.49098914861679077	3
	63			3	0.6462973356246948
0.001	64
	79	0.002	0.4305487275123596	4
	79			4	0.6198569536209106

Prediction is also now broken:

Traceback (most recent call last):
  File "/Users/adamwiemerslage/python-envs/yoyodyne/bin/yoyodyne-predict", line 8, in <module>
    sys.exit(main())
  File "/Users/adamwiemerslage/nlp-projects/morphology/yoyodyne/yoyodyne/predict.py", line 158, in main
    model = get_model_from_argparse_args(args)
  File "/Users/adamwiemerslage/nlp-projects/morphology/yoyodyne/yoyodyne/predict.py", line 70, in get_model_from_argparse_args
    return model_cls.load_from_checkpoint(args.checkpoint)
  File "/Users/adamwiemerslage/python-envs/yoyodyne/lib/python3.9/site-packages/pytorch_lightning/core/saving.py", line 139, in load_from_checkpoint
    return _load_from_checkpoint(
  File "/Users/adamwiemerslage/python-envs/yoyodyne/lib/python3.9/site-packages/pytorch_lightning/core/saving.py", line 188, in _load_from_checkpoint
    return _load_state(cls, checkpoint, strict=strict, **kwargs)
  File "/Users/adamwiemerslage/python-envs/yoyodyne/lib/python3.9/site-packages/pytorch_lightning/core/saving.py", line 234, in _load_state
    obj = cls(**_cls_kwargs)
TypeError: __init__() missing 1 required positional argument: 'expert'

kylebgorman · 2024-07-06T18:55:22Z

The prediction bug is #197. Fine to submit over the top of that if everything else is working, IMO, since @bonham79 is on that one.

Adamits · 2024-07-06T19:00:31Z

The prediction bug is #197. Fine to submit over the top of that if everything else is working, IMO, since @bonham79 is on that one.

Sounds good. Let me just train the Transducer longer as a sanity check.I also feel unsure why the (static) accuracies are so much lower once I merge master. I would plan on committing the version with master merged since I resolved a couple conflicts, sound good?

I am also pretty sure you have to merge since this is your org---I dont think I have access to merge.

Adamits · 2024-07-06T19:22:59Z

Ok I got the same result from the transducer. I also wonder if the particular dataset is somehow causing it to learn this copy heuristic. It seems like we should do more explicit testing of that model, and being able to predict would help to debug---would you be ok merging this and I can open a separate ticket?

kylebgorman · 2024-07-06T19:24:59Z

I'm fine with that. I'll deal with the conflicts now.

Adamits · 2024-07-06T19:27:25Z

I'm fine with that. I'll deal with the conflicts now.

Oh Sorry I blanked on committing those. It is minor changes so I guess I will let you do that to get a 2nd pair of eyeson it?

kylebgorman · 2024-07-06T19:52:39Z

I just dealt with the conflicts. Flake8 noticed that source_vocab_size in train.py was unused so I removed it. I also fixed the pretty-printing to avoid nested [ and ]. I tested locally and things are fine.

Adamits added 2 commits October 2, 2023 13:29

WIP: single embeddings matrix design

d676f0a

Changes vocab intersection to union

33ab367

kylebgorman reviewed Oct 3, 2023

View reviewed changes

Adamits added 3 commits April 9, 2024 12:34

WIP: single embeddings matrix design

639b473

Changes vocab intersection to union

ec1bdf5

Merge branch 'single-embeddings-matrix' of https://github.com/Adamits…

b6e883e

…/yoyodyne into single-embeddings-matrix

Adamits added 4 commits April 9, 2024 12:50

Merge branch 'master' into single-embeddings-matrix

2e78719

Merge branch 'single-embeddings-matrix' of https://github.com/Adamits…

6e1eca4

…/yoyodyne into single-embeddings-matrix

Removes accidaental unrelated test

7fade78

Fixes formatting for line length

1e05938

Adds --tie_embeddings. Combines all indices into one. Shares embeddin…

bc55b43

…gs between all models

Adamits added 5 commits April 11, 2024 11:17

Updates tie_embeddings cli arg

ffa3450

Data files formatting and comments.

2ea8df1

Fixes transformer pointer generator with single embedding matrix design

8ca44bd

Raises error when tied_embeddings is required

75d6ccb

Fixes flake8 and black formatting

db2585b

Fixes Transducer vocab for single embeddings matrix design

9bd2739

bonham79 suggested changes Jun 4, 2024

View reviewed changes

bonham79 mentioned this pull request Jun 4, 2024

Make expert inherit vocabulary from dataset #191

Open

Fixes transducer predict (driveby fix)

1d615a5

kylebgorman added 4 commits July 6, 2024 15:28

Merge branch 'master' into single-embeddings-matrix

c3ea4ea

Cleanups to README

2c3b37b

Removes unused source vocab size

bde1664

Fixes pretty-printing

673c529

kylebgorman merged commit a72a60f into CUNY-CL:master Jul 6, 2024
5 checks passed

Adamits mentioned this pull request Jul 6, 2024

Transducer over-copying #207

Open

WIP: single embeddings matrix design #140

WIP: single embeddings matrix design #140

Conversation

Adamits commented Oct 2, 2023 • edited by kylebgorman Loading

kylebgorman left a comment

Choose a reason for hiding this comment

kylebgorman commented Oct 4, 2023

Adamits commented Apr 9, 2024 • edited Loading

kylebgorman commented Apr 9, 2024

Adamits commented Apr 9, 2024

kylebgorman commented Apr 9, 2024

Adamits commented Apr 9, 2024

Adamits commented Apr 9, 2024

kylebgorman commented Apr 9, 2024 via email

Adamits commented Apr 9, 2024

kylebgorman commented Apr 10, 2024

kylebgorman commented Apr 10, 2024

Adamits commented Apr 11, 2024

Adamits commented Apr 11, 2024

kylebgorman commented Apr 11, 2024

bonham79 commented Jun 3, 2024

Adamits commented Jun 4, 2024

Adamits commented Jun 4, 2024 • edited Loading

Adamits commented Jun 4, 2024

bonham79 commented Jun 4, 2024

bonham79 left a comment

Choose a reason for hiding this comment

kylebgorman commented Jun 10, 2024

Adamits commented Jun 13, 2024

Adamits commented Jul 4, 2024

kylebgorman commented Jul 4, 2024

Adamits commented Jul 6, 2024

Adamits commented Jul 6, 2024

kylebgorman commented Jul 6, 2024

kylebgorman commented Jul 6, 2024

Adamits commented Jul 6, 2024

kylebgorman commented Jul 6, 2024

Adamits commented Jul 6, 2024

Adamits commented Jul 6, 2024

kylebgorman commented Jul 6, 2024

Adamits commented Jul 6, 2024

kylebgorman commented Jul 6, 2024

Adamits commented Oct 2, 2023 •

edited by kylebgorman

Loading

Adamits commented Apr 9, 2024 •

edited

Loading

Adamits commented Jun 4, 2024 •

edited

Loading