Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore promotion mechanism #30

Open
ssfrr opened this issue Apr 4, 2018 · 7 comments
Open

Explore promotion mechanism #30

ssfrr opened this issue Apr 4, 2018 · 7 comments

Comments

@ssfrr
Copy link
Collaborator

ssfrr commented Apr 4, 2018

I'm splitting out the various discussions from #29 so they're easier to keep track of separately. Sorry some of the formatting isn't as pretty, I opted for clarify in who said what. Please don't hesitate to let me know if I missed something important.

In an earlier version of SampledSignals I put the sample rate and channel count as type parameters, but after working with it for a while it seemed like more trouble than it was worth. In general with my Julia code I've found that type parameters are great when you need them for type stability, but trying to move too much logic into the type system gets messy and regular run-time branches on field values are simpler.

It's been a while since I made the switch, but some issues I remember were:

  • Worrying about the type of your type parameter, e.g. Foo{48000} is not the same as Foo{48000.0} so either you normalize to a single type, allow multiple types to be equivalent in your comparisons, or have a type check in your inner constructor that throws an error for the wrong type.
  • I ended up with a ton of type parameters, particularly in SampledSignals where I have SampleBufs, SampleSinks, and SampleSources (which are abstract), and then a ecosystem of concrete source and sink types. Then whenever I wanted to make a change to how something worked it rippled through wherever those types were used, and created a lot of tight coupling.

That said, there could definitely be a way to improve how I'm doing things right now with doing all my conversion in the source/sink domain. In particular precompile times are pretty long with SampledSignals, which I suspect might have to do with my recursive conversion machinery. Having another set of eyeballs and other use-cases would be helpful in iterating the design.

From @haberdashPI:

My current approach is very convenient for my purposes, but I haven't thought much about whether it would scale to your use cases. Maybe it would help if I explained my reasoning; then I'll address your specific points.

For my purposes the problem I'm trying to solve is to make it easy and straightforward to do the following operations.

  • sound mixing (i.e. addition)
  • envelope application (i.e. multiplication)
  • concatenation

I want those to work even if there are a different number of channels, bit rates, sample rates or lengths. For different lengths, I use the additive and multiplicative identity, respectively, to automatically pad the sound (ala broadcast).

The first obvious, simple solution for the rest of these formatting differences (other than length), to me, was to encode most of the information in the type of a Sound object (other than length) and use type promotion. I imagine the same goals could be accomplished using a parameter of a struct, but it didn't seem any easier or harder to implement in my case and the type parameters might lead to faster runtimes: however, I haven't benchmarked anything because I normally generate sounds offline and it seemed fast enough for my purposes.

The functionality of these three operations (and a few others) was the other big barrier for my usage of SampledSignals; I would have had to pirate types or even overwrite some methods, for example, overwriting Base.vcat for SampleBuf. I was worried this would break in not so obvious ways if the methods defined for SampleBuf in SampledSignals changed or if I didn't load my package last.

Thinking about the specific problems with putting the sample rate in the type vs. as a parameter of the struct:

Regarding the type of the sample rate itself — Yes, my inner constructor ensures sample rate is an integer. (I could always use a Float64 instead, though, are there applications where there's a fractional sampling rate???).

Concerning the proliferation of type parameters - With the caveat that I'm not completely sold on using type parameters — I don't love how unreadable errors can get — I wonder if the solution for a larger ecosystem of sampled object is to have one type that carries all the formatting information, possibly with a type hierarchy if formatting information differs in some cases. Then all sampled signals can have a single type parameter indicating its format. Seems like that might possibly avoid the proliferation of parameters, and it has the potential advantage of type parameters that type promotion can be readily leveraged, and it might lead to faster runtimes.

followup from @haberdashPI

Bottom line - I'd like to try implementing a type-parameter verison of SampledBuf and friends and share the results with you and discuss.

How to do this seems like an 'empircal' quesiton. I really like the way using
type parameters worked out for Sounds; I'd be willing to take a shot at taking
this approach for the more general SampledSignals design, based on the idea I
mentioned in my last post about storing a type parameter that is a formatting
object as a type parameters that's shared across different signal types.

Alternatively, it should also be possible to change my design for Sounds to
use a parallel promotion mechanim (promote_sounds) that did basically the same
thing my code does now, but using fields of the structs (or a simple interface
defined over all kinds of sampled signals). This could probably leverage the
"lower-level" methods in SampledSignals for reformatting and resampling via
sinks. That said, I really like how simple the promotion approach was to
implement. But maybe it won't scale???

Either way, I could try to start making changes and see which one seems better/easier to do, and then we can discuss the results.

@ssfrr
Copy link
Collaborator Author

ssfrr commented Apr 4, 2018

I think our goals are the same, in the sense that buffers (and streams in my case) should be easy to mix-and-match. As you mention, they might vary in channel count, element type, sample rate or length. I think the main addition for me is that I want it to be pretty easy for other packages to implement their own source and sink types.

For different lengths, I use the additive and multiplicative identity, respectively, to automatically pad the sound (ala broadcast)

One issue with this behavior is if you're multiplying by an envelope that fades out at the ends - with this scheme you end up with a discontinuous jump to 1 if the envelope is extended, right? It might make more sense in this case to pad out with the first/last value, or make the padding strategy configurable.

Regarding the type of the sample rate itself — Yes, my inner constructor ensures sample rate is an integer. (I could always use a Float64 instead, though, are there applications where there's a fractional sampling rate???).

I think it's useful to support fractional sampling rates. Float64 has enough bits to handle typical audio sample rates (e.g. 44100, 48000) exactly, and this way the system can handle more esoteric formats or super-low sampling rates like 0.1Hz. I could also see non-integer sample rates coming up when manipulating the samplerate of buffers to shift the pitch.

I wonder if the solution for a larger ecosystem of sampled object is to have one type that carries all the formatting information, possibly with a type hierarchy if formatting information differs in some cases. Then all sampled signals can have a single type parameter indicating its format.

I think this is a really nice idea, and could get the benefits of moving format info into the type without creating a burden for extensibility. Maybe something like:

struct SampleFormat{T, SR, CH}
    function SampleFormat{T, SR, CH}() where {T, SR, CH}
        T isa Type{<:Number} || throw(ArgumentError("T must be a numerical type"))
        SR isa Float64 || throw(ArgumentError("SR must be a Float64"))
        CH isa Int || throw(ArgumentError("CH must be an Int"))
        new{T, SR, CH}()
    end
end

function SampleFormat(eltype::Type{<:Number}, samplerate::Real, nchannels::Real)
    SampleFormat{eltype, convert(Float64, samplerate), convert(Int, nchannels)}()
end

Definitely worth investigating. Another issue to worry about here is how to represent 1-channel buffers. Currently I match the AbstractArray interface in that you can have 1D SampleBufs (size is (N,)) or 2D SampleBufs that only have 1 channel (size is (N,1)). Having these two different ways to encode 1-channel buffers is sort of a PITA, but this way the indexing behavior matches AbstractArray: if you have a 3-channel SampleBuf that you index with x[1:1000, 1] you get a 1D SampleBuf just like you would when indexing an array. (x[1:1000, 1:1] would get you the 2D version).

I don't think it's inevitable that all the conversion goes through streams as it does right now in SampledSignals. For channel count and type conversion, there's no state to the conversion so it could be implemented in terms of buffers and then the stream machinery could use those conversions. Where it gets tricky is samplerate conversion, where in the streaming case you need to maintain state from one chunk to the next, so that sort of needs to be implemented with streams in mind. That doesn't necessarily mean that the buffer samplerate conversion needs to invoke the stream machinery though, it could just have a parallel resampling that uses DSP.resample directly.

@haberdashPI
Copy link
Contributor

haberdashPI commented Apr 5, 2018

One issue with this behavior is if you're multiplying by an envelope that fades out at the ends - with this scheme you end up with a discontinuous jump to 1 if the envelope is extended, right? It might make more sense in this case to pad out with the first/last value, or make the padding strategy configurable.

Ah, yes. Good point. I like the idea of having it pad out the last value, with a keyword option to change the padding strategy.

I think it's useful to support fractional sampling rates.

Fair enough! Seems reasonable. Those examples help. Thanks.

I like your proposed SampleFormat object, that seems good. One point I had mentioned was that perhaps for some kind of signals automatic mixing down and up of channels will make less sense. So maybe there should be an additional mixing_strategy arugment which defaults to automatic up and down mixing.

struct SampleFormat{T, SR, CH, MX}
    function SampleFormat{T, SR, CH,MX}() where {T, SR, CH,MX}
        T isa Type{<:Number} || throw(ArgumentError("T must be a numerical type"))
        SR isa Float64 || throw(ArgumentError("SR must be a Float64"))
        CH isa Int || throw(ArgumentError("CH must be an Int"))
        MX isa Symbol || thorw(ArugumentError("MX must be a Symbol")
        new{T, SR, CH}()
    end
end

function SampleFormat(eltype::Type{<:Number}, samplerate::Real, nchannels::Real, mixing_strategy::Symbol=:automatic)
    SampleFormat{eltype, convert(Float64, samplerate), convert(Int, nchannels),mixing_strategy}()
end

Another issue to worry about here is how to represent 1-channel buffers.

Yes. I ran into this same exact issue, and ended up with the same strategy, specifying the number of channels (C) and the number of dimensions (N) as seperate type parameters (where C must be 1 if N == 1). I see how this might be a bit more tricky to handle if C is part of SampleFormat and N is a part of SampleBuf, but it seems worth experimenting with.

One very finicky case is the situation where you want a single sample of multiple channels. The default array interface would return a Cx1 array (where C is the number of channels). As I'm handling things now, I always assume the first dimension is time, so I return a 1xC array.

I don't think it's inevitable that all the conversion goes through streams as it does right now in SampledSignals.

Indeed. It's a bit more work for the streams, but it seems like one should be able to call, e.g. promote(x,y), and in the case of streams it will return some lazy object that converts on the fly, and in the case of sounds it does the conversion right away.

@haberdashPI
Copy link
Contributor

Just thinking about where to start, and it seems like a good first step would be working something out for this, and then implementing mix, amplify (formerly envelope) and vcat to use this promotion mechanism.

@ssfrr
Copy link
Collaborator Author

ssfrr commented Apr 6, 2018

Sounds good. I'm not 100% sold that the mixing strategy needs to be part of the type. It seems like extra complexity and I'm not sure there's a big benefit. I think that a default behavior of:

  • N-to-1 mixes the channels
  • 1-to-N copies the channel
  • general M-to-N mixing throws an error
    is pretty reasonable, and if an application wants something different they can manually munge their channels to match up.

I agree indexing a single frame (buf[4, :]) is tricky, and SampledSignals currently does the same thing as you suggest (returns a 1xC SampleBuf). No one has complained, and it still supports linear indexing, so I think we should stick with that approach.

@ssfrr
Copy link
Collaborator Author

ssfrr commented Apr 6, 2018

hmm, there's also an argument that the samplerate is pretty meaningless with a single frame of audio, so there might be an argument for duplicating Array and having buf[4, :] just return a plain length-C Vector (not a SampleBuf). That has the nice property that code that's written generically for AbstractArray would still get what they expect.

needs a little more thought.

@haberdashPI
Copy link
Contributor

haberdashPI commented Apr 6, 2018

Sure! That seems like a reasonable behavior.

I'm thinking there well be some well defined interface to the format type (e.g. nchannels(::SampleFormat{T,SR,CH}) where {T,SR,CH} = CH) so it would be easy to change later on, or drop in a different formatting object for some unusual sort of signal with different formatting parameters.

I'll start fleshing this out and make a pull request.

@ssfrr
Copy link
Collaborator Author

ssfrr commented Apr 6, 2018

👍 for a function interface that hides away the type parameter specifics.

Looking forward to seeing what you come up with!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants