You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 21, 2022. It is now read-only.
Add option to use sparseml. Example implementation found here: Google Colab link
Motivation
There is currently no option to use pruning techniques from sparseml out-of-the-box
Pitch
I will make a pull request to add this option to the hydra config. I've already forked a version of the lightning-tranformers library. link
Here's how it will be added on the hydra CLI:
trainer=sparseml
Passing this into the trainer means it will automatically use ddp. How convenient! Sparseml also uses a special conversion to log weights and such, and I've also implemented this:
+trainer/logger=sparsewandb
It is also available as a callback for those who want to train with CPU.
+trainer/callback=sparseml
Sparseml barely supports transformers at the moment, so I've had to make a workaround for their exporter. BERT and other BERT-like models output a ModelOutput which tell the exporter there will be two outputs. But sometimes, there's only one. I've just forced the exporter to treat is all as one output for now. I may open a pull request at sparseml to handle transformer outputs.
The RECIPE_PATH and MODELS_PATH, paths to the recipe yaml and models folder, are passed in as environment variables. I wasn't able to find a way to around this since hydra overwrites added configs after starting the training loop. Maybe there's a better way of doing this.
Alternatives
I haven't thought much about this, but I'll add in a few good alternatives once I find some.
Additional context
This is my first time to make a pull request to a rather large library, so don't be afraid to critique. I need the feedback. I may also need help understanding how to get the "fit" stage works differently from the "train" stage. I'm running training.run_test_after_fit=False becuase the fit stage doesn't work. Training works just fine, however.
The text was updated successfully, but these errors were encountered:
Regarding the env variables, it could be possible to make these arguments passed to the callback upon instantiation. I can have a look at making this a possibility once your PR is up!
@SeanNaren I'm actually using a variant of your sparseml callback! It's what inspired this. I'll submit a PR after I get more feedback from the neuralmagic community. They're usually pretty quick at responding.
馃殌 Feature
Add option to use sparseml. Example implementation found here: Google Colab link
Motivation
There is currently no option to use pruning techniques from sparseml out-of-the-box
Pitch
I will make a pull request to add this option to the hydra config. I've already forked a version of the lightning-tranformers library. link
Here's how it will be added on the hydra CLI:
Passing this into the trainer means it will automatically use ddp. How convenient! Sparseml also uses a special conversion to log weights and such, and I've also implemented this:
It is also available as a callback for those who want to train with CPU.
Sparseml barely supports transformers at the moment, so I've had to make a workaround for their exporter. BERT and other BERT-like models output a ModelOutput which tell the exporter there will be two outputs. But sometimes, there's only one. I've just forced the exporter to treat is all as one output for now. I may open a pull request at sparseml to handle transformer outputs.
The
RECIPE_PATH
andMODELS_PATH
, paths to the recipe yaml and models folder, are passed in as environment variables. I wasn't able to find a way to around this since hydra overwrites added configs after starting the training loop. Maybe there's a better way of doing this.Alternatives
I haven't thought much about this, but I'll add in a few good alternatives once I find some.
Additional context
This is my first time to make a pull request to a rather large library, so don't be afraid to critique. I need the feedback. I may also need help understanding how to get the "fit" stage works differently from the "train" stage. I'm running
training.run_test_after_fit=False
becuase the fit stage doesn't work. Training works just fine, however.The text was updated successfully, but these errors were encountered: