Skip to content
This repository has been archived by the owner on Nov 21, 2022. It is now read-only.

Sparseml Integration #196

Closed
mathemusician opened this issue Sep 16, 2021 · 5 comments 路 Fixed by #197
Closed

Sparseml Integration #196

mathemusician opened this issue Sep 16, 2021 · 5 comments 路 Fixed by #197
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@mathemusician
Copy link
Contributor

馃殌 Feature

Add option to use sparseml. Example implementation found here: Google Colab link

Motivation

There is currently no option to use pruning techniques from sparseml out-of-the-box

Pitch

I will make a pull request to add this option to the hydra config. I've already forked a version of the lightning-tranformers library. link

Here's how it will be added on the hydra CLI:

trainer=sparseml

Passing this into the trainer means it will automatically use ddp. How convenient! Sparseml also uses a special conversion to log weights and such, and I've also implemented this:

+trainer/logger=sparsewandb

It is also available as a callback for those who want to train with CPU.

+trainer/callback=sparseml

Sparseml barely supports transformers at the moment, so I've had to make a workaround for their exporter. BERT and other BERT-like models output a ModelOutput which tell the exporter there will be two outputs. But sometimes, there's only one. I've just forced the exporter to treat is all as one output for now. I may open a pull request at sparseml to handle transformer outputs.

The RECIPE_PATH and MODELS_PATH, paths to the recipe yaml and models folder, are passed in as environment variables. I wasn't able to find a way to around this since hydra overwrites added configs after starting the training loop. Maybe there's a better way of doing this.

Alternatives

I haven't thought much about this, but I'll add in a few good alternatives once I find some.

Additional context

This is my first time to make a pull request to a rather large library, so don't be afraid to critique. I need the feedback. I may also need help understanding how to get the "fit" stage works differently from the "train" stage. I'm running training.run_test_after_fit=False becuase the fit stage doesn't work. Training works just fine, however.

@mathemusician mathemusician added enhancement New feature or request help wanted Extra attention is needed labels Sep 16, 2021
@SeanNaren
Copy link
Contributor

This is really cool! Looking forward to the PR :)

Regarding the env variables, it could be possible to make these arguments passed to the callback upon instantiation. I can have a look at making this a possibility once your PR is up!

We've also recently contributed a sparseml callback to the lightning-bolts package, which might also be useful for this: https://lightning-bolts.readthedocs.io/en/latest/callbacks/sparseml.html

@mathemusician
Copy link
Contributor Author

mathemusician commented Sep 16, 2021

@SeanNaren I'm actually using a variant of your sparseml callback! It's what inspired this. I'll submit a PR after I get more feedback from the neuralmagic community. They're usually pretty quick at responding.

@SeanNaren
Copy link
Contributor

epic!! keep me updated :) more than happy to collab on this

@mathemusician
Copy link
Contributor Author

Made the PR. The standard operating procedure is to close the issue after the PR is made, right?

@SeanNaren
Copy link
Contributor

Just linked it to the PR, so when the PR is merged, this will close :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants