Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Audio API #1974

Merged
merged 3 commits into from Aug 30, 2022
Merged

Add Audio API #1974

merged 3 commits into from Aug 30, 2022

Conversation

xyang16
Copy link
Contributor

@xyang16 xyang16 commented Aug 29, 2022

Description

Brief description of what this PR is about

Add Audio API

  • If this change is a backward incompatible change, why must this change be made?
  • Interesting edge cases to note here

* @param bigEndian whether the data is stored in big-endian or little-endian order.
* @return a float array.
*/
public static float[] fromByteArray(byte[] bytes, boolean bigEndian) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public static float[] fromByteArray(byte[] bytes, boolean bigEndian) {
public static float[] fromByteArray(byte[] bytes, ByteOrder endian) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


private static final Logger logger = LoggerFactory.getLogger(AudioFactory.class);

@SuppressWarnings({"PMD.DoubleBraceInitialization", "PMD.UseConcurrentHashMap"})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better avoid create a new anonymous class, you can use a static initializer instead.

This can be just a String[], no need use a map here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

* @return {@link AudioFactory}
*/
public static AudioFactory newInstance(Configuration conf) {
boolean android = "http://www.android.com/".equals(System.getProperty("java.vendor.url"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should just a simple try and fail approach for both Android and PC

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

* @param conf The configuration for {@link AudioFactory} to use.
* @return {@link AudioFactory}
*/
public static AudioFactory newInstance(Configuration conf) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need newInstace() overload

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* {@code Audio} is a container of an audio in DJL. The raw data of the audio is wrapped in a float
* array.
*/
public interface Audio {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Audio is just a data holder now. I'm not sure if we need a interface now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* @param type type of the audio, such as "wav"
* @throws IOException audio cannot be saved through output stream
*/
void save(OutputStream os, String type) throws IOException;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let remove this since we didn't have any implementation now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


/** {@inheritDoc} */
@Override
public void save(OutputStream os, String type) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can move save into Factory in needed in future

* @return the bytes read from the audio input stream.
* @throws IOException if an input or output error occurs
*/
public static byte[] read(AudioInputStream ais) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this implementation in SampleAudioFactory?

int i = 0;
for (Float f : list) {
floatArray[i++] = (f != null ? f : Float.NaN);
for (int i = 0; i < list.size(); i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think old one is better:

  1. doesn't need to call list.size() every thime
  2. it works better on non-randomaccess lists (Like LinkedList)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* @return the float array read from the audio.
* @throws FFmpegFrameGrabber.Exception if error occurs
*/
public static float[] grab(FFmpegFrameGrabber grabber) throws FFmpegFrameGrabber.Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be in FFmpegAudioFactory

@codecov-commenter
Copy link

codecov-commenter commented Aug 29, 2022

Codecov Report

Merging #1974 (4d0dea2) into master (bb5073f) will decrease coverage by 1.93%.
The diff coverage is 68.64%.

@@             Coverage Diff              @@
##             master    #1974      +/-   ##
============================================
- Coverage     72.08%   70.14%   -1.94%     
- Complexity     5126     5859     +733     
============================================
  Files           473      578     +105     
  Lines         21970    25946    +3976     
  Branches       2351     2795     +444     
============================================
+ Hits          15838    18201    +2363     
- Misses         4925     6373    +1448     
- Partials       1207     1372     +165     
Impacted Files Coverage Δ
api/src/main/java/ai/djl/modality/cv/Image.java 69.23% <ø> (-4.11%) ⬇️
...rc/main/java/ai/djl/modality/cv/MultiBoxPrior.java 76.00% <ø> (ø)
...rc/main/java/ai/djl/modality/cv/output/Joints.java 71.42% <ø> (ø)
.../main/java/ai/djl/modality/cv/output/Landmark.java 100.00% <ø> (ø)
...main/java/ai/djl/modality/cv/output/Rectangle.java 72.41% <0.00%> (ø)
...i/djl/modality/cv/translator/BigGANTranslator.java 21.42% <0.00%> (-5.24%) ⬇️
...odality/cv/translator/BigGANTranslatorFactory.java 33.33% <0.00%> (+8.33%) ⬆️
...nslator/InstanceSegmentationTranslatorFactory.java 14.28% <0.00%> (-3.90%) ⬇️
.../cv/translator/SemanticSegmentationTranslator.java 0.00% <0.00%> (ø)
.../cv/translator/StyleTransferTranslatorFactory.java 40.00% <ø> (ø)
... and 480 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

}

/**
* Create new instance of audio factory from the provided factory implementation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Create new instance of audio factory from the provided factory implementation.
* Creates new instance of audio factory from the provided factory implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

/**
* Gets {@link Audio} from file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Gets {@link Audio} from file.
* Returns {@link Audio} from file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

/**
* Construct a new {@code Configuration} instance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Construct a new {@code Configuration} instance.
* Constructs a new {@code Configuration} instance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

public static final class Configuration {

/** Default sample rate. */
public static final int DEFAULT_SAMPLE_RATE = 16000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be public?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to private.

}

/**
* Get the number of channels of an audio file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Get the number of channels of an audio file.
* Returns the number of channels of an audio file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

/**
* Set the number of channels for {@link AudioFactory} to use.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Set the number of channels for {@link AudioFactory} to use.
* Sets the number of channels for {@link AudioFactory} to use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* @param endian whether the data is stored in big-endian or little-endian order.
* @return a float array.
*/
public static float[] fromByteArray(byte[] bytes, boolean endian) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public static float[] fromByteArray(byte[] bytes, boolean endian) {
public static float[] fromByteArray(byte[] bytes, ByteOrder endian) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There actually 3 endian type:
BIG, LITTLE and NATIVE

Copy link
Contributor Author

@xyang16 xyang16 Aug 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://docs.oracle.com/javase/8/docs/api/java/nio/ByteBuffer.html#order-java.nio.ByteOrder-

bo - The new byte order, either BIG_ENDIAN or LITTLE_ENDIAN.

Copy link
Contributor Author

@xyang16 xyang16 Aug 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NATIVE is actually BIG_ENDIAN or LITTLE_ENDIAN

Also I got whether it's big-endian or little-endian from AudioFormat, so there's no native involved.

https://github.com/xyang16/djl/blob/audio/api/src/main/java/ai/djl/modality/audio/SampledAudioFactory.java#L52

float[] floats = Float16Utils.fromByteArray(bytes, format.isBigEndian());

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can choose to only pass in BIG/LITTLE to this API in AudioFactory, but the API should be able to accept any endian. The source of the input to this API may not only come from AudioSystem

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@xyang16 xyang16 force-pushed the audio branch 4 times, most recently from 05d30f5 to cf6e419 Compare August 29, 2022 20:50
@frankfliu
Copy link
Contributor

Can you add some unit test?

@xyang16 xyang16 force-pushed the audio branch 7 times, most recently from 509b309 to 63c681e Compare August 29, 2022 23:28
@xyang16
Copy link
Contributor Author

xyang16 commented Aug 29, 2022

Can you add some unit test?

Done

* @param order the byte order to use.
* @return a float array.
*/
public static float[] fromByteArray(byte[] bytes, ByteOrder order) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually not sure if this really belongs to here,

return the value between -1.0f - 1.0f seems not general enough, it might only work for specific audio data processing use case.

Maybe we should keep this in AudioUtils.java

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can move it to SampledAudioFactory


public class AudioFactoryTest {

private static final String PATH = "src/test/resources/speech.wav";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we re-use test-01.wav?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is test-01.wav?


@BeforeTest
public void setUp() {
factory = AudioFactory.getInstance();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't really need a setup()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


@Test
public void testFromInputStream() throws IOException {
try (InputStream is = new BufferedInputStream(Files.newInputStream(Paths.get(PATH)))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does BufferedInputStream required? If true, we need add this in our API

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes BufferedInputStream is needed. I have added it to API.

@xyang16 xyang16 force-pushed the audio branch 4 times, most recently from 5c60497 to 8f110cf Compare August 30, 2022 04:10
xyang16 and others added 2 commits August 29, 2022 22:33
Change-Id: Ibcd61d71c17881afc46b01abcf4f14c7b5341d15
Change-Id: Id4750f56f937cbee7e80287be8f79396aec710cf
@xyang16 xyang16 merged commit 6e7cafe into deepjavalibrary:master Aug 30, 2022
@xyang16 xyang16 deleted the audio branch May 2, 2023 19:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants