Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

24 / 32 bit audio - Support for more SampleFormat variants and Sample types #414

Open
julientregoat opened this issue May 23, 2020 · 7 comments

Comments

@julientregoat
Copy link

julientregoat commented May 23, 2020

hey there! I was interested in using cpal for 24 + 32 bit audio. is this currently supported? from the code and SampleFormat, it looks like this isn't directly supported with 32bit numbers.. I could scale them to -1.0 - 1.0 range but I'm not sure if that's the correct way of doing it or if that would be lossy (edit: yeah seems lossy)

let me know, happy to contribute to help make this happen

@mitchmindtree mitchmindtree changed the title 24 / 32 bit audio 24 / 32 bit audio - Support for more SampleFormat variants and Sample types May 25, 2020
@mitchmindtree
Copy link
Member

Thanks for the issue @julientregoat !

Yeah CPAL's current set of supported SampleFormats is quite minimal, I think it's a good idea to start thinking about what other variants we want to add to the SampleFormat type.

I32 is another major one that we should add - the ASIO backend often only supports this format and we currently have a very hacky way of hacking around this.

W.r.t 24-bit streams, what layout specifically does the data you have in mind? I imagine it might be a bit trickier to support formats that don't directly correlate with Rust primitive types, but should be doable if there's a good enough use-case. We have to start thinking about packed, unpacked and the justification of the sample data within unpacked formats, and how we might expose this sort of API to the user. There's a nice description of the various possible layouts for 24-bit audio data here.

@julientregoat
Copy link
Author

julientregoat commented May 25, 2020

Awesome - I actually forked the other day and started adding I32 to SampleFormat, so I guess I'm on the same page 😁 . I actually wanted to ask - did you have any specific requirements or opinions on casting an i32 to i16 or similar conversions? currently I've been working out scaling the number to meet the selected types range.

re: 24 bit, right now I've been working on an AIFF decoder (here's the spec I've been following). They pad the data to fill bytes, justifying it to the left. So 6 bit would look like 1111_1100. I'm still pretty new with understanding what people are expecting as far as layout, and have been trying to figure out packed vs packed myself as far as how I should expose the samples in the codec API, seems like CoreAudio supports both.

A couple of ideas come to mind for handling bit rates that don't fit neatly into rust primitives... one is accept a standardized format that seems used by the largest audience, with 1/2/3/4 byte implementations. This is easier, but maybe makes more work than necessary for us and for consumers of the API if the system accepts the audio data as its read. The alternative could be to support 1/2/4 byte buffers in a struct with flags like left_justified etc. But I'm not sure if this is overkill and we don't need to worry past the container being able to fit the data - how much cpal needs to worry about the layout past that the system supports the format? I haven't seen how this works inside cpal yet. Maybe something that maps to CoreAudio FormatFlags or the ASIO equivalent when querying a Device's SupportedFormats?

Again, pretty new to this stuff (both working with cpal and working with system audio in general) so forgive me if none of this makes sense. Devving on macos these days, can test on Linux and hopefully Windows in the next couple of months.

@Ralith
Copy link
Contributor

Ralith commented May 25, 2020

[u8; 3] has a size of 3 and an align of 1, so it could be used to define an Packed24 type that would get laid out correctly in slices and elsewhere. Other distinct types could be used for the unpadded forms, with conversions defined that do the right thing.

@julientregoat
Copy link
Author

julientregoat commented May 25, 2020

[u8; 3] has a size of 3 and an align of 1, so it could be used to define an Packed24 type that would get laid out correctly in slices and elsewhere. Other distinct types could be used for the unpadded forms, with conversions defined that do the right thing.

whoops, this is what I meant re: standardized format but had 1/2/4 stuck in my mind from packed constraints.

@kawogi
Copy link
Contributor

kawogi commented Sep 13, 2022

I just added proposal that tries to solve this in a generic manner (see draft #690).

My goal is to implement all (byte-aligned) sample types used by ASIO, ALSA, etc. Those are (to my current knowledge)

  • i8, i16, i32
  • i18, i20, i24 (each with 3 and 4 bytes)
  • u8, u16, u32
  • u18, u20, u24 (each with 3 and 4 bytes)
  • f32, f64
  • (not sure about i/u48 - it could be supported but I'm not sure if any back-end would use it)

The approach involves the following changes:

Migration from cpal::Sample to dasp_sample::Sample

This allows to work with more sample types as if they were Rust primitives. This also covers conversions between the types. A possible downside: dasp_sample has no or limited support for primitives with rare widths (18, 20). This would have to be expanded if we want all of the above types to be included. @mitchmindtree is this only a matter of manpower or are there any restrictions on what types could be supported?

Introducing Transcoders
I separated Sample primitives from their in-memory representation. This allows to have multiple representations for the same primitive value. This is why SampleFormat now includes endianness and the number of bytes used. e.g.I24B3(Endianness::Big) means:

  • the sample will be read/written as I24 (public view)
  • it will occupy 3 bytes in a memory buffer ([u8; 3])
  • byte order in a memory buffer will be big endian

While SampleFormats allow reasoning about the characteristics at run-time, Transcoders are compile-time structures holding the information to make most accesses zero-cost where possible.

Introducing SampleBuffer/Mut

With non-power-of-two-types (i24, …) being around it is no longer possible to simply transmute a byte slice into a slice of samples. SampleBuffers wrap a raw byte slice and provide access to the samples therein by making use of an associated Transcoder. Those SampleBuffers are now being handed out to the callback handlers instead of the sample slices.

Also to consider

As this is a breaking change (public facing API changes) we might want to talk about channel management (#367) and a duplex API (#349).

Channel layout could be abstracted away by upgrading the SampleFormat to a FrameFormat and making the SampleBuffer a FrameBuffer. This would IMO greatly improve ergonomics.

I'm not sure if this should be integrated as well (complicating and deferring everything). We can address these at a later stage but that would likely break the API again.

Current Status

So far I implemented i/u(8, 16, 32, 64) and f(32, 64) in both endiannesses. Tests under Linux/ALSA look good and beep is still beeping. I will add i/u24 shortly if there is a consensus that this approach is promising.

Ergonomics for writing to a SampleBufferMut are still not as good as I hoped for but I think that can be solved later.

The new variants will add a lot more match branches for the SampleFormat. I thought about grouping SampleBuffers with diverging Transcoders into enums by their resulting Sample primitive. This would introduce an (private) enum dispatch in the callback function to select the proper Transcoder for each SampleBuffer. (I'll call this "generalization level 1")

Given that all sample type could be converted into all other types, one could also combine all SampleBuffer types into a single big enum. This would maximize ergonomics as you can read/write whatever sample type you need/have. The downside is that you don't necessarily see the runtime costs of converting the values. (I'll call this "generalization level 2")

I think it would be a good idea to allow all three variants:

  • highly specialized callback function over sample type, byte count, endianness
  • specialized over sample type
  • 100 % generalized

Oh, and I made SampleFormat non_exhaustive, so adding more type later won't be a breaking change.

Any feedback is welcome.

@Ralith
Copy link
Contributor

Ralith commented Sep 13, 2022

Thanks for working on this! One quick note: in the interest of avoiding scope creep and surprising behavior, I don't think cpal should be in the business of automatically converting between sample formats during I/O. I'd like to be able to trust that I'm passing data directly to/getting data directly from the underlying API without any hidden magic.

@kawogi
Copy link
Contributor

kawogi commented Sep 13, 2022

I totally agree! This could only happen with "generalization level 2" which should be an opt-in. Even then there will be a way to figure out which sample type can be read/written losslessly.
The remaining code is free of any value conversions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants