External dependency I/O overhead for out-of-core pipelines #65

Marko-Stamenovic-Bose · 2018-05-11T15:40:29Z

MUDA relies heavily on external command line libraries such as rubberband and sox (lightly wrapped in pyrubberband and pysox) for core deformations such as time-stretch, pitch-shift and drc. These system library wrappers work by writing the transformed signal to disk and then reading it back from disk into memory (presumably to feed an ML algorithm).

The external system call and particularly the additional read-write step introduce a large overhead in highly distributed/multithreaded out-of-core data pipelines. Would it not make sense to either a) allow an option to do an analagous deformation using in-memory python library (for example librosa) or b) replace the external system call altogether with an in-memory transformation?

The text was updated successfully, but these errors were encountered:

bmcfee · 2018-05-11T16:32:25Z

I would be really happy if we could replace the command-line call-outs with proper libraries. Unfortunately, there aren't any replacements that match for quality or functionality, and I'd prefer to not have variable backends for everything.

That said, pyrubberband might get a direct cython implementation soon, which would cut down on most of the issues here. Sox is a different story though.

Marko-Stamenovic-Bose · 2018-05-11T17:39:41Z

OK that's fair. Cython pyrubberband sounds pretty exciting! For drc, what is a good way to objectively evaluate the quality of the transformation?

bmcfee · 2018-05-11T17:55:02Z

For drc, what is a good way to objectively evaluate the quality of the transformation?

I think this would depend on your eventual application. In most muda applications, the measure of "quality" that we care about is the hold-out accuracy of a model trained on the augmentation outputs, and that's pretty heavily abstracted from the drc process.

bmcfee · 2018-05-13T22:37:43Z

It just occurred to me that audiotk might be a good drop-in replacement for Sox. It's got a heavier dependency chain, and I haven't actually used it, but it seems plausible. Anyone feel like taking a crack at reimplementing the DRC class to see if it's worth pursuing?

Marko-Stamenovic-Bose · 2018-05-23T15:21:30Z

Sure I'll take a look. I did have some headaches getting audiotk up and running, which is not a promising development, but I'll try to take another crack when I have a chance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

External dependency I/O overhead for out-of-core pipelines #65

External dependency I/O overhead for out-of-core pipelines #65

Marko-Stamenovic-Bose commented May 11, 2018 •

edited

bmcfee commented May 11, 2018

Marko-Stamenovic-Bose commented May 11, 2018 •

edited

bmcfee commented May 11, 2018

bmcfee commented May 13, 2018

Marko-Stamenovic-Bose commented May 23, 2018

External dependency I/O overhead for out-of-core pipelines #65

External dependency I/O overhead for out-of-core pipelines #65

Comments

Marko-Stamenovic-Bose commented May 11, 2018 • edited

bmcfee commented May 11, 2018

Marko-Stamenovic-Bose commented May 11, 2018 • edited

bmcfee commented May 11, 2018

bmcfee commented May 13, 2018

Marko-Stamenovic-Bose commented May 23, 2018

Marko-Stamenovic-Bose commented May 11, 2018 •

edited

Marko-Stamenovic-Bose commented May 11, 2018 •

edited