Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RNG seed [formerly Reproducibility enhancements] #31

Closed
ejhumphrey opened this issue May 3, 2016 · 7 comments · Fixed by #70
Closed

RNG seed [formerly Reproducibility enhancements] #31

ejhumphrey opened this issue May 3, 2016 · 7 comments · Fixed by #70
Milestone

Comments

@ejhumphrey
Copy link

At least two ideas jump out at me re: reproducibility:

  1. RandomDoAThing deformers could optionally take seed params, but always use one internally (and serialize accordingly).
  2. It'd be great if we could reconstruct a deformation pipeline exactly from the "history" object ... which really means either (a) the serialization object should encompass state, which isn't the case for RandomDoAThing deformers, or (b) there's a higher-level object that combines state and pipeline as different objects. The difference here is small (and maybe semantic), but it's a difference between a class and an instance (the pipeline is the class, the state is the instance). This might have interesting repercussions for the design of the Pipeline, which is perhaps more aptly called a PipelineFactory.

please yell if any of this is unclear, I'm kind of stream-of-consciousness working through the idea.

@bmcfee
Copy link
Owner

bmcfee commented May 3, 2016

Yeah, agree 💯. I recently implemented this kind of thing over in entrofy, so it wouldn't be hard to do.

@ejhumphrey
Copy link
Author

so thoughts on the PipelineFactory and Pipeline objects? PipelineFactory is the iterator that yields a Pipeline, which can then be passed a data object to deform. or do you see a simpler approach?

@bmcfee
Copy link
Owner

bmcfee commented May 3, 2016

Well, if you're reconstructing a deformation pipeline from a muda output, it only has to generate a single example. Parameterizing each element of the pipeline according to its seed (and state number) ought to suffice, so we shouldn't need to generate multiple pipeline objects.

@ejhumphrey
Copy link
Author

I'm thinking of the scenario where I generate one pipeline and want to apply it to different audio-jams objects ... to do this currently, I have to keep making new Pipelines with n_samples=1. Intentionally having singleton iterators seems like a design smell, no?

@bmcfee
Copy link
Owner

bmcfee commented May 3, 2016

I'm thinking of the scenario where I generate one pipeline and want to apply it to different audio-jams objects

Different meaning totally different content? If that's the case, why would you care about porting over random parameters?

If you want to reinstantiate a pipeline, random seeds and all, that can be done with the current serialization code (properly extended to include seeds).

@bmcfee
Copy link
Owner

bmcfee commented Aug 30, 2017

Coming off of the discussion in #62, it seems like the more useful version of this idea is to reconstruct a specific deformation sequence from a previous run of muda. This is useful when you have the original audio, deformed jams, and want to rebuild the corresponding deformed audio.

I'm having a hard time thinking of any other reproducibility use cases that can/should be powered by the deformation history of individual outputs.

I specifically don't see the utility in reconstructing a muda pipeline from an output's deformation history. Given the interactions between union, bypass, and pipeline, I'm not sure this is even possible: you'll only get the deformers that actually executed to form this output, not the actual deformation stack. I think encouraging folks to try to abstract up from an instance to the pipeline is an anti-pattern; instead, we should encourage folks to save their pipeline objects alongside the outputs if they want to run further deformations on new data.

So I suggest this issue be consolidated into two enhancements:

  1. Implement the audio re-deformer, as described in How to apply deformations from annotated jams file? #62. This is a minor-revision change.
  2. Add rng seeds to all randomized deformer objects so that serialized pipelines can reproduce exactly. This is a major-revision change.

These two enhancements are independent. Because the deformation history never records randomized objects (only their deterministic parent class), and all state is preserved in the history, you can get reproducibility of randomized deformations for free even without storing the seed. (This, of course, is just for audio re-deformation, not for re-running a deformation sweep on a dataset.)

@ejhumphrey @justinsalamon what do yall think?

@justinsalamon
Copy link

+💯 for re-deformer, indeed it appears #62 surfaced precisely because I shared MUDA jams files (https://github.com/justinsalamon/UrbanSound8K-JAMS) to avoid having to distribute the augmented version of US8K we were using in our paper for reproducibility.

Happy to put together a PR, but no cycles in the near horizon :'(

@bmcfee bmcfee added this to the 0.3.0 milestone Aug 21, 2019
@bmcfee bmcfee changed the title Reproducibility enhancements RNG seed [formerly Reproducibility enhancements] Aug 21, 2019
bmcfee added a commit that referenced this issue Aug 21, 2019
@bmcfee bmcfee mentioned this issue Aug 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants