Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes #910. Adds per epoch, per dataset sub-sampling by specifying a
fraction
orsize
argument in the configuration file.As such, I’ve removed the references which truncate the initial data loading after a certain number of samples. This means we process all the data upfront (which could be slow). See below for an issue to potentially resolve this.
Other small fixes along the way
mpi4py
as a dependency as I couldn’t install it and it’s not used anywhere in the whole OA project.wandb
to therequirements.txt
Potential other issues to raise
I didn’t want to let this PR sprawl into a bunch of separate fixes. As such, depending on what other's think, I think it might be worth raising the following new issues (I’m happy to work on them almost immediately, some are small fixes):
prompt_dialogue
dataset has changed url. I think it was coming from this repo which has now changed.pytest
(in fact, I think the directory structure is incorrect for this to work). I didn’t want to start moving files around in this PR to make but would like to open a separate PR to reorganise the tests (and write a bunch more, including for the data sampling).Checks performed
Per above, as
pytest
isn’t currently working I did a bunch of one-off checks:g4dn.xlarge Nvidia T4
✅g4dn.12xlarge Nvidia T4
✅