RNG seed [formerly Reproducibility enhancements] #31

ejhumphrey · 2016-05-03T23:12:37Z

At least two ideas jump out at me re: reproducibility:

RandomDoAThing deformers could optionally take seed params, but always use one internally (and serialize accordingly).
It'd be great if we could reconstruct a deformation pipeline exactly from the "history" object ... which really means either (a) the serialization object should encompass state, which isn't the case for RandomDoAThing deformers, or (b) there's a higher-level object that combines state and pipeline as different objects. The difference here is small (and maybe semantic), but it's a difference between a class and an instance (the pipeline is the class, the state is the instance). This might have interesting repercussions for the design of the Pipeline, which is perhaps more aptly called a PipelineFactory.

please yell if any of this is unclear, I'm kind of stream-of-consciousness working through the idea.

The text was updated successfully, but these errors were encountered:

bmcfee · 2016-05-03T23:18:22Z

Yeah, agree 💯. I recently implemented this kind of thing over in entrofy, so it wouldn't be hard to do.

ejhumphrey · 2016-05-03T23:24:09Z

so thoughts on the PipelineFactory and Pipeline objects? PipelineFactory is the iterator that yields a Pipeline, which can then be passed a data object to deform. or do you see a simpler approach?

bmcfee · 2016-05-03T23:31:00Z

Well, if you're reconstructing a deformation pipeline from a muda output, it only has to generate a single example. Parameterizing each element of the pipeline according to its seed (and state number) ought to suffice, so we shouldn't need to generate multiple pipeline objects.

ejhumphrey · 2016-05-03T23:36:37Z

I'm thinking of the scenario where I generate one pipeline and want to apply it to different audio-jams objects ... to do this currently, I have to keep making new Pipelines with n_samples=1. Intentionally having singleton iterators seems like a design smell, no?

bmcfee · 2016-05-03T23:50:16Z

I'm thinking of the scenario where I generate one pipeline and want to apply it to different audio-jams objects

Different meaning totally different content? If that's the case, why would you care about porting over random parameters?

If you want to reinstantiate a pipeline, random seeds and all, that can be done with the current serialization code (properly extended to include seeds).

bmcfee · 2017-08-30T15:03:06Z

Coming off of the discussion in #62, it seems like the more useful version of this idea is to reconstruct a specific deformation sequence from a previous run of muda. This is useful when you have the original audio, deformed jams, and want to rebuild the corresponding deformed audio.

I'm having a hard time thinking of any other reproducibility use cases that can/should be powered by the deformation history of individual outputs.

I specifically don't see the utility in reconstructing a muda pipeline from an output's deformation history. Given the interactions between union, bypass, and pipeline, I'm not sure this is even possible: you'll only get the deformers that actually executed to form this output, not the actual deformation stack. I think encouraging folks to try to abstract up from an instance to the pipeline is an anti-pattern; instead, we should encourage folks to save their pipeline objects alongside the outputs if they want to run further deformations on new data.

So I suggest this issue be consolidated into two enhancements:

Implement the audio re-deformer, as described in How to apply deformations from annotated jams file? #62. This is a minor-revision change.
Add rng seeds to all randomized deformer objects so that serialized pipelines can reproduce exactly. This is a major-revision change.

These two enhancements are independent. Because the deformation history never records randomized objects (only their deterministic parent class), and all state is preserved in the history, you can get reproducibility of randomized deformations for free even without storing the seed. (This, of course, is just for audio re-deformation, not for re-running a deformation sweep on a dataset.)

@ejhumphrey @justinsalamon what do yall think?

justinsalamon · 2018-05-15T19:38:59Z

+💯 for re-deformer, indeed it appears #62 surfaced precisely because I shared MUDA jams files (https://github.com/justinsalamon/UrbanSound8K-JAMS) to avoid having to distribute the augmented version of US8K we were using in our paper for reproducibility.

Happy to put together a PR, but no cycles in the near horizon :'(

bmcfee added the functionality label May 3, 2016

bmcfee mentioned this issue Aug 28, 2017

How to apply deformations from annotated jams file? #62

Closed

bmcfee added this to the 0.3.0 milestone Aug 21, 2019

bmcfee changed the title ~~Reproducibility enhancements~~ RNG seed [formerly Reproducibility enhancements] Aug 21, 2019

bmcfee added a commit that referenced this issue Aug 21, 2019

fixed #31, seed randomized objects

2370b7b

bmcfee mentioned this issue Aug 21, 2019

Random seeds #70

Merged

bmcfee closed this as completed in #70 Aug 21, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RNG seed [formerly Reproducibility enhancements] #31

RNG seed [formerly Reproducibility enhancements] #31

ejhumphrey commented May 3, 2016

bmcfee commented May 3, 2016

ejhumphrey commented May 3, 2016

bmcfee commented May 3, 2016

ejhumphrey commented May 3, 2016

bmcfee commented May 3, 2016

bmcfee commented Aug 30, 2017

justinsalamon commented May 15, 2018

RNG seed [formerly Reproducibility enhancements] #31

RNG seed [formerly Reproducibility enhancements] #31

Comments

ejhumphrey commented May 3, 2016

bmcfee commented May 3, 2016

ejhumphrey commented May 3, 2016

bmcfee commented May 3, 2016

ejhumphrey commented May 3, 2016

bmcfee commented May 3, 2016

bmcfee commented Aug 30, 2017

justinsalamon commented May 15, 2018