SaShiMi generation script errors out with own models #32

stefan-baumann · 2022-05-19T08:16:39Z

Hey, first of all, great work with the repository, I don't think I've worked with a repository for a paper that's so extensive and well-structured so far.

I'm currently trying to train the SaShiMi model on my own dataset (following your guide here: #23), and I run into some issues when trying to generate samples with the trained model.
In case this is relevant, I'm trying to do inference on the checkpoint files, and I changed the number of layers (model.n_layers) to 4 to accommodate for the memory limitations of my GPU. Apart from that, I have done no changes to any of the training and model (code) except for switching the dataset to my own.
When I try to call the generation.py script now, I run into a range of errors:

The config overrides cause some errors, namely the hurwitz parameter does not exist anymore, and the setup_step methods don't seem to correctly accept (or rather pass them downstream) the mode argument. I "fixed" this by removing the hurwitz argument override and by adding the mode argument to all module.setup_step() methods and just passing it downstream as required.
Additionally, setting model.layer.postact=null causes the state_dict to not load successfully anymore, giving me the following error:

Missing key(s) in state_dict: "model.c_layers.0.layer.output_linear.weight", "model.c_layers.0.layer.output_linear.bias", "model.c_layers.2.layer.output_linear.weight", "model.c_layers.2.layer.output_linear.bias", "model.c_layers.4.layer.output_linear.weight", "model.c_layers.4.layer.output_linear.bias", "model.c_layers.6.layer.output_linear.weight", "model.c_layers.6.layer.output_linear.bias", "model.u_layers.0.1.layer.output_linear.weight", "model.u_layers.0.1.layer.output_linear.bias", "model.u_layers.0.3.layer.output_linear.weight", "model.u_layers.0.3.layer.output_linear.bias", "model.u_layers.0.5.layer.output_linear.weight", "model.u_layers.0.5.layer.output_linear.bias", "model.u_layers.0.7.layer.output_linear.weight", "model.u_layers.0.7.layer.output_linear.bias", "model.u_layers.1.1.layer.output_linear.weight", "model.u_layers.1.1.layer.output_linear.bias", "model.u_layers.1.3.layer.output_linear.weight", "model.u_layers.1.3.layer.output_linear.bias", "model.u_layers.1.5.layer.output_linear.weight", "model.u_layers.1.5.layer.output_linear.bias", "model.u_layers.1.7.layer.output_linear.weight", "model.u_layers.1.7.layer.output_linear.bias". 
Unexpected key(s) in state_dict: "model.c_layers.0.layer.output_linear.0.weight", "model.c_layers.0.layer.output_linear.0.bias", "model.c_layers.2.layer.output_linear.0.weight", "model.c_layers.2.layer.output_linear.0.bias", "model.c_layers.4.layer.output_linear.0.weight", "model.c_layers.4.layer.output_linear.0.bias", "model.c_layers.6.layer.output_linear.0.weight", "model.c_layers.6.layer.output_linear.0.bias", "model.u_layers.0.1.layer.output_linear.0.weight", "model.u_layers.0.1.layer.output_linear.0.bias", "model.u_layers.0.3.layer.output_linear.0.weight", "model.u_layers.0.3.layer.output_linear.0.bias", "model.u_layers.0.5.layer.output_linear.0.weight", "model.u_layers.0.5.layer.output_linear.0.bias", "model.u_layers.0.7.layer.output_linear.0.weight", "model.u_layers.0.7.layer.output_linear.0.bias", "model.u_layers.1.1.layer.output_linear.0.weight", "model.u_layers.1.1.layer.output_linear.0.bias", "model.u_layers.1.3.layer.output_linear.0.weight", "model.u_layers.1.3.layer.output_linear.0.bias", "model.u_layers.1.5.layer.output_linear.0.weight", "model.u_layers.1.5.layer.output_linear.0.bias", "model.u_layers.1.7.layer.output_linear.0.weight", "model.u_layers.1.7.layer.output_linear.0.bias".

Does this mean that I should rename those keys manually (there's a fairly clear correspondence) to make it work after changing the activation?

Finally, even when I pass through the mode parameter in module.setup_step(), I still get this error:

Traceback (most recent call last):
  File "/home/debaumas/state-spaces/sashimi/generation.py", line 192, in main
    module.setup_step(mode='dense')
  File "/home/debaumas/state-spaces/src/models/sequence/ss/kernel.py", line 1038, in setup_step
    self.kernel.setup_step(mode=mode)
  File "/home/debaumas/state-spaces/src/models/sequence/ss/kernel.py", line 515, in setup_step
    dC = torch.linalg.solve(
torch._C._LinAlgError: linalg.solve: (Batch element 0): The diagonal element 1 is zero, the solve could not be completed because the input matrix is singular.

Do you have any idea what might be causing this and maybe an idea about how to fix/circumvent this?

It'd be awesome if you could help point me in the right direction with this.

Best,
Stefan

The text was updated successfully, but these errors were encountered:

albertfgu · 2022-05-19T08:39:19Z

Hi Stefan,

The model changed recently and we are planning to revisit it next week to make sure the Sashimi code is working.
If you go to changelog and go to the commit when Sashimi was first released (V2), the code should work.

stefan-baumann · 2022-05-19T09:13:06Z

Hi Albert,

Okay perfect, thank you very much for the insanely quick response! I'll try it out with the state from the v2 tag and report back if I encounter any other issues.

Other than that, maybe one more quick question for now: Does it make sense for me to get it working with v2 or would you suggest I rather wait a week or two until you're mostly done with your current iterations?

albertfgu · 2022-05-19T10:30:44Z

It probably depends on what you want to use it for. What I expect is that the training code (generation.py) will change minimally, so if you're writing code to use the model as a black box you shouldn't need to change it much between versions. However, if you're training large-scale models and want to save concrete models, then the models will change between versions and make it harder to load. Realistically, it will probably be about 2 weeks before we can finalize the updated model.

bacor · 2022-05-19T12:00:28Z

Hi there, I quickly wanted to share I run into similar issues when simply trying to generate samples following the instructions, without any changes:

python -m sashimi.generation --model sashimi --dataset youtubemix --n_samples 32 --sample_len 16000

throws

hydra.errors.ConfigCompositionException: Could not override 'model.layer.hurwitz'.
To append to your config use +model.layer.hurwitz=false

I haven't yet tried the steps @stefan-baumann suggested to fix this error though.

davidmrau · 2022-06-18T21:49:40Z

@albertfgu any updates on this? I trained a model on the most recent changes on GitHub and still get the same error when trying to generate.

davidmrau · 2022-06-24T09:48:18Z

I would also be happy to implement the fix myself if you can give me some systematic hints on what has been changed that causes this error.

albertfgu · 2022-06-24T16:52:17Z

The model should work with v2 of the codebase, which is the official Sashimi release. Can you describe your setup and paste the command you ran and the error it gave?

stefan-baumann · 2022-06-24T17:40:55Z

Probably still the same issues I described in my initial post. There have been no changes to relevant parts of the code afaik. I can confirm that I got v2 to work though.

davidmrau · 2022-06-24T18:10:24Z

@stefan-baumann Good to hear that you could make it work! Can you tell me the hash of the commit that works for you?

davidmrau · 2022-06-24T18:14:09Z

I trained a model from scratch on commit 6cbc09a on my own dataset using:

python3.8 -m train wandb=null experiment=sashimi-youtubemix dataset=youtubemix trainer.gpus=4 model.n_layers=4 loader.num_workers=2

For generation later I use:

python3.8 -m sashimi.generation --model sashimi --dataset youtubemix --n_samples 2 --sample_len 16000 --checkpoint_path $MODEL

stefan-baumann · 2022-06-24T18:34:23Z

Can you tell me the hash of the commit that works for you?

I took the v2 tag (74d2706) and backported some of the later commits (especially the kernel stuff) @davidmrau
Iirc only 83a9f13 was actually needed.

albertfgu · 2022-06-24T20:19:06Z

v2 should still work - it sounds like Stefan is confirming that it works modulo PyTorch versions.
v2.1 currently doesn't work because some flags were changed and removed by default.
As part of our upcoming v3 release, we are currently retraining models and will release updated checkpoints.

davidmrau · 2022-06-25T10:27:07Z

I am also able to load the model and to generate using 83a9f13. I still have troubles loading the model trained with the current codebase 6cbc09a, I guess it's easiest to train again using 83a9f13.

albertfgu · 2022-06-26T03:46:53Z

You should be able to load models trained with the current codebase by modifying the generation script with the appropriate flags, for example just removing the hurwitz flag.

davidmrau · 2022-06-27T08:20:32Z

I'll give it a try.

albertfgu · 2022-08-11T18:54:54Z

Sorry for how long it took to get this all out. The current version of the codebase should have

working configs for training all the Sashimi models
updated checkpoints with the latest models
improved generation script that works with the released checkpoints, as well as any new experiments you run

I tested as many things as I could, but please file a new issue for any problems that may arise

albertfgu closed this as completed Aug 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SaShiMi generation script errors out with own models #32

SaShiMi generation script errors out with own models #32

stefan-baumann commented May 19, 2022

albertfgu commented May 19, 2022

stefan-baumann commented May 19, 2022

albertfgu commented May 19, 2022

bacor commented May 19, 2022

davidmrau commented Jun 18, 2022

davidmrau commented Jun 24, 2022

albertfgu commented Jun 24, 2022

stefan-baumann commented Jun 24, 2022

davidmrau commented Jun 24, 2022

davidmrau commented Jun 24, 2022 •

edited

stefan-baumann commented Jun 24, 2022 •

edited

albertfgu commented Jun 24, 2022

davidmrau commented Jun 25, 2022

albertfgu commented Jun 26, 2022

davidmrau commented Jun 27, 2022

albertfgu commented Aug 11, 2022

SaShiMi generation script errors out with own models #32

SaShiMi generation script errors out with own models #32

Comments

stefan-baumann commented May 19, 2022

albertfgu commented May 19, 2022

stefan-baumann commented May 19, 2022

albertfgu commented May 19, 2022

bacor commented May 19, 2022

davidmrau commented Jun 18, 2022

davidmrau commented Jun 24, 2022

albertfgu commented Jun 24, 2022

stefan-baumann commented Jun 24, 2022

davidmrau commented Jun 24, 2022

davidmrau commented Jun 24, 2022 • edited

stefan-baumann commented Jun 24, 2022 • edited

albertfgu commented Jun 24, 2022

davidmrau commented Jun 25, 2022

albertfgu commented Jun 26, 2022

davidmrau commented Jun 27, 2022

albertfgu commented Aug 11, 2022

davidmrau commented Jun 24, 2022 •

edited

stefan-baumann commented Jun 24, 2022 •

edited