Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Speed up evotuning and improve evotuning ergonomics #57

Merged
merged 30 commits into from
Jul 18, 2020
Merged

Conversation

ericmjl
Copy link
Collaborator

@ericmjl ericmjl commented Jun 28, 2020

PR Description

Before you read on, please ignore the branch name. I thought originally that I could use numba to speed up things, but it turned out that once again, with some careful profiling, I found we didn't have to.

This PR does a few things:

  1. Adds a pre-commit configuration.
  2. Adds an installation script that makes easy the installation of jax on GPU.
  3. Provided backend specification of device (GPU/CPU).
  4. Switched preparation of sequences as input-output pairs exclusively on CPU, for speed.
  5. Added ergonomic UI features - progressbars! - that improve user experience.
  6. Added docs on recommended batch size and its relationship to GPU RAM consumption.
  7. Switched from exact calculation of train/holdout loss to estimated calculation.

In any case, this PR closes #56.

Checklist

General

  1. I have made the PR off a new branch from my fork
    (<your_username>:<feature-branch_name>), not
    <your_username>:master.
  2. I have added my changes to the CHANGELOG.md file at the top.
  3. I have made any necessary changes to the documentation in the README.

Code checks

  1. If there are new features implemented, add suitable tests
    in the tests directory.
  2. If any new dependencies are introduced through the new features,
    add the packages, pinned to a version, to environment.yml.
  3. Run make test in a console in the top level directory
    to make sure all the tests pass.
  4. Run make format in a console in the top level directory
    to make the code comply with the formatting standards.

@ericmjl ericmjl changed the title Numbajit [ENH] Speed up evotuning and improve evotuning ergonomics Jun 28, 2020
@codecov-commenter
Copy link

codecov-commenter commented Jun 28, 2020

Codecov Report

Merging #57 into master will increase coverage by 4.06%.
The diff coverage is 96.29%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #57      +/-   ##
==========================================
+ Coverage   89.27%   93.33%   +4.06%     
==========================================
  Files          11       11              
  Lines         522      540      +18     
==========================================
+ Hits          466      504      +38     
+ Misses         56       36      -20     
Impacted Files Coverage Δ
jax_unirep/params.py 100.00% <ø> (+71.42%) ⬆️
jax_unirep/evotuning.py 96.95% <96.00%> (+3.03%) ⬆️
jax_unirep/utils.py 91.59% <100.00%> (+0.14%) ⬆️
jax_unirep/sampler.py 92.75% <0.00%> (+1.44%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 661a31b...b43b92e. Read the comment docs.

@ericmjl
Copy link
Collaborator Author

ericmjl commented Jun 28, 2020

Hmmm, I'm a little confused as to how the code coverage was impacted. I think I need a second opinion on whether stuff could be refactored a bit better. @ElArkk?

- One unit test
- One lazy man's execution test
@ericmjl ericmjl requested a review from ElArkk June 28, 2020 22:10
@ElArkk
Copy link
Owner

ElArkk commented Jun 29, 2020

Wow, great work @ericmjl ! Very clever to optionally move average loss computation off of GPU, while retaining it in any case (if available) for the work-intensive weight updates!

As for the coverage, one thing I think could be responsible for the decrease on evotuning.py is that we do not supply holdout seqs in the execution test of evotuning or fit?

tests/test_params.py Show resolved Hide resolved
tests/test_params.py Outdated Show resolved Hide resolved
global evotune_loss # this is necessary for JIT to reference evotune_loss
evotune_loss_jit = jit(evotune_loss, backend=backend)

def batch_iter(xs: np.ndarray, ys: np.ndarray, batch_size: int = 25):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to set a default value for batch_size here, when this function is only used inside avg_loss, and you then supply another value (100) further below?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good catch! We should make it all uniform for simplicity, as there's much less unnecessary complexity this way.

jax_unirep/evotuning.py Show resolved Hide resolved
jax_unirep/evotuning.py Show resolved Hide resolved
jax_unirep/evotuning.py Outdated Show resolved Hide resolved
jax_unirep/evotuning.py Outdated Show resolved Hide resolved
ericmjl and others added 8 commits June 29, 2020 21:19
This can be used as part of the test suite. I should have remembered!

Co-authored-by: Arkadij Kummer <43340666+ElArkk@users.noreply.github.com>
h/t @ElArkk

Co-authored-by: Arkadij Kummer <43340666+ElArkk@users.noreply.github.com>
At least I tried.
@ericmjl
Copy link
Collaborator Author

ericmjl commented Jun 30, 2020

@ivanjayapurna I wanted to get a second eye on the code before we merge. Can you and @ElArkk independently test the numbjit branch code in, say, Colab on the GPU runtime? I've been working off my home GPU tower to aid development speed, but I want to make sure I'm not doing something that isn't "generally useful".

UPDATE 30 June 2020 9:20 AM: I just tried it out on your notebook, @ivanjayapurna, and everything runs smooth and fast. By default, I have set it to dump on every epoch. Storage is cheap, human time is not.

@ElArkk
Copy link
Owner

ElArkk commented Jul 1, 2020

I was just thinking, now that we only use one random batch of sequences to calculate the average loss on the dataset, does it still make sense to have an argument for which backend to use? Since the actual training always uses GPU if available, and needs to be able to handle the same or even larger batch_sizes as the average loss calculation does. So if GPU memory would be a problem in average loss calc, it will also be one in training anyways? Maybe I'm missing something here @ericmjl ?

@ericmjl
Copy link
Collaborator Author

ericmjl commented Jul 2, 2020

Sometimes, setting the back-end explicitly can help with debugging. For example, in debugging the memory allocation issue, I found it handy to be able to switch freely between CPU and GPU backend. Hence, I think keeping the backend kwarg in there while also setting a sane default (CPU by default) makes a lot of sense, as it gives both convenience and flexibility.

@ivanjayapurna
Copy link
Contributor

Just wanted to add on the testing side - trained on the TEM-1 sequences for 25 epochs on AWS with no memory issues

@ivanjayapurna
Copy link
Contributor

Results from 25 epoch training:
107602931_974664602992332_726686476718359154_n

The same results but with "epoch 0" plotted:
107045721_284735082582266_4856007470456448087_n

The plots on the left were prior to random batching, and the plots on the right is results from this PR. It is clear that this made a big difference in the models ability to learn. 2 things:

1.) If I'm understanding it correctly epoch 1 is equivalent to no-evotuning UniRep. Epoch 0 here I believe is the final weights saved without a "step" input to dump_params and is a little confusing. Perhaps we should consider changing this? Simple change, just if no step is inputted its just saved without an index in the name or with "final" appended or something. Perhaps epoch's should be 0 indexed as well, to reflect that epoch 0 is before any training has begun.

2.) There seems to be minimal learning after the 1st epoch still, perhaps the learning rate we used was too high.

@ElArkk
Copy link
Owner

ElArkk commented Jul 7, 2020

@ivanjayapurna thank you for the thorough testing! What was the learning rate and batch size you used here?

As for epoch 0, I'm wondering where it stems from, since the fit function does never seem to call on dump_params without the step argument. But maybe I missed it?

It could be a good idea to change epoch indexing back to 0, since right now the loss calculations at epoch 1 actually correspond to 'before any training has happened'. What do you think @ericmjl ?

@ericmjl
Copy link
Collaborator Author

ericmjl commented Jul 7, 2020

It could be a good idea to change epoch indexing back to 0, since right now the loss calculations at epoch 1 actually correspond to 'before any training has happened'. What do you think @ericmjl ?

Yes, let's do that. I think I messed up the epoch calculation when I did this PR up. @ElArkk do you have a spare cycle to handle it? (If not, no worries, I can get to this later in the week.)

@ElArkk
Copy link
Owner

ElArkk commented Jul 7, 2020

@ericmjl @ivanjayapurna I did a quick rework of the epoch calculation, let me know if you think it makes sense this way.

@ericmjl
Copy link
Collaborator Author

ericmjl commented Jul 16, 2020

It works for me. Anything else blocking this PR?

@ElArkk
Copy link
Owner

ElArkk commented Jul 17, 2020

No, I guess this is ready :)

I just have one concern still: Let's say someone wants to evotune on 50-100k sequences. If they use a batch size of ~100, loss after each epoch would be calculated on just 0.2% and 0.1% of the whole dataset respectively. Do you think this is enough to estimate overall loss @ericmjl ?

@ericmjl
Copy link
Collaborator Author

ericmjl commented Jul 17, 2020

Possibly not for only a few epochs, but in the limit of long-run numbers, it should be not too much of an issue. The key show-stopper that I think we should not compromise on is the interactive feel.

@ElArkk
Copy link
Owner

ElArkk commented Jul 17, 2020

By interactive feel you mean, not having to wait for too long between epochs for the loss calculation?

In any case, we shouldn't delay merging the sped-up and more stable evotuning any longer! If we see any problems with avg loss calculation, we can always come back to it. @ericmjl what do you think?

@ericmjl
Copy link
Collaborator Author

ericmjl commented Jul 18, 2020

Agreed, hit that button when it’s done! (And go to bed soon, it’s awfully late there for you to be responding! 😸)

@ElArkk
Copy link
Owner

ElArkk commented Jul 18, 2020

Hitting that button after a good nights sleep 😄

@ElArkk ElArkk merged commit 54ab8e6 into master Jul 18, 2020
@ElArkk ElArkk deleted the numbajit branch July 18, 2020 09:23
@ericmjl
Copy link
Collaborator Author

ericmjl commented Jul 19, 2020

NOICE! (Dude, you have no idea - I was knocked out for 5 hours this afternoon. I’m lacking sleep myself haha.)

ElArkk added a commit that referenced this pull request Oct 14, 2021
* Adding pre-commit

* Fixed up GPU memory allocation, and added docstrings.

* Adding a bash script that makes it easy to install JAX on GPU.

- The script builds a conda environment first.
- Then it clobbers over with the GPU-based installation
based on instructions given by JAX's developers.

* Update fit docstring

* Set backend to "cpu" by default

* Removed parallel kwarg

* Switched back to non-Numba-compatible dictionary definition

* Add pyproject TOML config file

Primarily to add black config

* Applied black

* Add flake8 to pre-commit hooks

* Remove flake8 from pre-commit

* Attempting to increase coverage without doing any actual work ^_^

* Add tests for params

- One unit test
- One lazy man's execution test

* Update changelog

* Fix test

* Add validate_mLSTM1900_params

This can be used as part of the test suite. I should have remembered!

Co-authored-by: Arkadij Kummer <43340666+ElArkk@users.noreply.github.com>

* Used validate_mLSTM1900_params as part of test

h/t @ElArkk

Co-authored-by: Arkadij Kummer <43340666+ElArkk@users.noreply.github.com>

* Fix batch_size in avg_loss function

* Flat is better than nested

At least I tried.

* Fix test

* Make format

* Change holdout_seqs to default back to None

* Set sane defaults for mLSTM1900 layer

* Changed to dumping every epoch by default.

* add backend explanation

* add backend to fit example

* change default batching method of fit function to random

* fix epoch calculations

* fix seq length choice for holdout seqs

Co-authored-by: Arkadij Kummer <43340666+ElArkk@users.noreply.github.com>
Co-authored-by: ElArkk <arkadij.kummer@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Evotuning pairs can be sped up by switching to original NumPy
4 participants