Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot record sound with loopback if silence at start #166

Closed
tez3998 opened this issue Feb 18, 2023 · 10 comments
Closed

Cannot record sound with loopback if silence at start #166

tez3998 opened this issue Feb 18, 2023 · 10 comments

Comments

@tez3998
Copy link

tez3998 commented Feb 18, 2023

First of all, thank you for the amazing library.
It helps my projects a lot.

Behavior I encountered

I wrote the following program which just records speaker output for 5 seconds with loopback and saves it.

import soundcard as sc
import soundfile as sf

OUTPUT_FILE_NAME = "out.wav"    # output file name.
SAMPLE_RATE = 48000              # [Hz]. sampling rate.
RECORD_SEC = 5                  # [sec]. recording duration.

with sc.get_microphone(id=str(sc.default_speaker().name), include_loopback=True).recorder(samplerate=SAMPLE_RATE) as mic:
    # record audio with loopback from default speaker.
    data = mic.record(numframes=SAMPLE_RATE*RECORD_SEC)
    sf.write(file=OUTPUT_FILE_NAME, data=data[:, 0], samplerate=SAMPLE_RATE)

This program works if there is sound from speaker at start.
However, this program doesn`t work if silence at start.

The behavior when silence at start is as follows.

  1. Run the program
  2. The program finish immediately without recording speaker output for 5 seconds

My environment

  • OS: Windows 11 (x64)
  • Python`s version: 3.10.6
  • SoundCard`s version: 0.4.2

Error

There is no error.

@bastibe
Copy link
Owner

bastibe commented Feb 20, 2023

Depending on the sound card, silence is either reported as no-data, or as silence. However, support for this in soundcard has not been published yet, as I didn't have a good test case yet.

Could you try running your code against the current Git master of soundcard? I believe your issue should be fixed on there. And if it is, I will publish it as a new version as soon as you confirm that it's working as intended.

@tez3998
Copy link
Author

tez3998 commented Feb 21, 2023

@bastibe
I appreciate your quick response during your busy time.

Result

I cloned the current master of soundcard and ran the code written above on three output devices.
The results are as shown in the following table.

Output device Was there sound at the start of the code? Result
AMD High Definition Audio Device No Ended immediately and recorded silence.
AMD High Definition Audio Device Yes Successfully recorded.
Realtek(R) Audio No Successfully recorded.
Realtek(R) Audio Yes Successfully recorded.
Pixel Buds A-Series No Ended immediately and recorded silence.
Pixel Buds A-Series Yes Successfully recorded.

And I encountered the following warning at a random timing on all output devices, but the code could works as the above (Timing was random, but warnings tended to occur when output devices were switched before running the code).

C:\Users\user\workspace\clone\bastibe\SoundCard\soundcard\mediafoundation.py:750: SoundcardRuntimeWarning: data discontinuity in recording
  warnings.warn("data discontinuity in recording", SoundcardRuntimeWarning)

@bastibe
Copy link
Owner

bastibe commented Feb 21, 2023

Oh, the endless vagaries of sound drivers on Windows.

Regrettably, I can't debug this issue on my machine, as my sound card behaves like your Realtek. Could you check how this fails in _record_chunk for the affected sound cards?

I could imagine that GetNextPacketSize in _capture_available_frames returns AUDCLNT_E_DEVICE_INVALIDATED.

Alternatively, you could try extending the empty-watcher to more than 10ms. I have seen Windows sound cards taking up to 4s to wake up in extreme cases, if that's the problem. Perhaps we need to wait until AUDCLNT_E_SERVICE_NOT_RUNNING clears?

However, if so, I still don't know how to proceed in soundcard, as the API does not give an indication of how much silence there was. Soundcard operates on the assumption that you can get a fixed number of samples per second. WASAPI just refusing to return anything breaks that assumption. If you have a reasonable idea of how to deal with that, I'm all ears!

@tez3998
Copy link
Author

tez3998 commented Feb 23, 2023

@bastibe
I checked a behavior of SoundCard when output device was Pixel Buds A-Series and there was no sound at the start of the code.

The results of testing your opinions

The value returned from GetNextPacketSize in _capture_available_frames()

Unlike your expectation, GetNextPacketSize always returned 0.

Extending the empty-watcher to more than 4s

I extended empty-watcher to 5s and the code ended in about 5s from its start.

Waiting until AUDCLNT_E_SERVICE_NOT_RUNNING clears

I don`t know what to do due to the lack of my knowledge about audio. Sorry for this.

What I noticed

time.sleep() cannot sleep for 1ms

I noticed time.sleep(0.001) actually sleeps for not 1ms but about 5-15ms. This answer in stackoverflow says the smallest interval you can sleep for is about 10-13ms. If so, we need to use other method.

The reason the code ends immediately and records silence when there is no sound at the start of the code on Pixel Buds A-Series

The behavior of SoundCard in this case is as follows.

  1. If there is no sound at the start of the the code, _record_chunk() returns zero-sized array.
  2. if len(chunk) == 0 in record() is True.
  3. At this time, required_frames is 480000 and recorded_frames is 0. So a variable chunk is required_frames-sized array which value is all 0.
  4. Now, while recorded_frames < required_frames: in record() is False. So the code exits the while loop and record() ends.

@bastibe
Copy link
Owner

bastibe commented Feb 24, 2023

That's very interesting, thank you!

If I understand this correctly, it means that (some variants of) the windows audio API just return no no data when none is available. Which is not in itself a problem, but breaks the assumption of soundcard, which would rather return zeros than no data. We can fudge that by just making up some zeros if no data is available.

However, the question then becomes: How many zeros should we return? Because the length of the output is how soundcard expresses how much time has passed. In this case, it is probably acceptable if the number of zeros is off by some margin of error. Ideally, we'd ask the audio driver for a current "time", but as far as I can tell, no such API is available.

As a workaround, change _record_chunk like this:

    def _record_chunk(self):
        # skip docstring for this example...
        start_time = 0 # in the real implementation, make this self.start_time so we don't skip processing time
        while not self._capture_available_frames():
            if start_time == 0:
                start_time = time.perf_counter_ns()
            now = time.perf_counter_ns()

            # no data for 50 ms: give up and return zeros.
            if now - start_time > 50_000_000:
                ppMixFormat = _ffi.new('WAVEFORMATEXTENSIBLE**')
                hr = self._ptr[0][0].lpVtbl.GetMixFormat(self._ptr[0], ppMixFormat)
                _com.check_error(hr)
                samplerate = ppMixFormat[0][0].nSamplesPerSec # in the real implementation, cache samplerate in self.
                num_samples = samplerate * (now - start_time) / 1_000_000
                return numpy.zeros([len(set(self.channelmap)) * num_samples], dtype='float32')
            time.sleep(0.001)
        # continue with the rest of the function below the while loop...

This should give you a reasonable estimate of the correct number of zeros. If this solves your problem, I'll code up a proper implementation.

@tez3998
Copy link
Author

tez3998 commented Feb 24, 2023

@bastibe
Thanks for great info.
I was able to write code that works correctly on three output devices.

The result of testing your code on my machine

Debugging your code

I changed the following sections because there are errors.

# your original code
samplerate = ppMixFormat[0][0].nSamplesPerSec

# modified code
samplerate = ppMixFormat[0][0].Format.nSamplesPerSec
# your original code
num_samples = samplerate * (now - start_time) / 1_000_000

# modified code
num_samples = int(samplerate * (now - start_time) / 1_000_000)

Result

Your code ended immediately and recorded silence because numpy.zeros() returned array large enough for the code to finish.

The code which worked correctly

Code

_record_chunk()`s while loop in mediafoundation.py

start_time = 0 # in the real implementation, make this self.start_time so we don't skip processing time
while not self._capture_available_frames():
    if start_time == 0:
        start_time = time.perf_counter_ns()
    now = time.perf_counter_ns()

    # no data for 50 ms: give up and return zeros.
    if now - start_time > 50_000_000:
        ppMixFormat = _ffi.new('WAVEFORMATEXTENSIBLE**')
        hr = self._ptr[0][0].lpVtbl.GetMixFormat(self._ptr[0], ppMixFormat)
        _com.check_error(hr)
        samplerate = ppMixFormat[0][0].Format.nSamplesPerSec # in the real implementation, cache samplerate in self.
        num_samples_per_ms = samplerate / 1_000
        num_channels = len(set(self.channelmap))
        giveup_ms = 50
        return numpy.zeros(int(num_samples_per_ms * giveup_ms * num_channels), dtype='float32')

    # rewrote time.sleep(0.001), because time.sleep(0.001) cannot sleep for 1ms.
    remaining_time = 1
    sleep_ms = 1
    _start = time.perf_counter()
    while remaining_time > 0:
        elapsed_time = (time.perf_counter() - _start) * 1_000
        remaining_time = sleep_ms - elapsed_time

Test code

I added some codes which print info.

import soundcard as sc
import soundfile as sf
import time

OUTPUT_FILE_NAME = "out.wav"    # output file name.
SAMPLE_RATE = 48_000              # [Hz]. sampling rate.
RECORD_SEC = 5                  # [sec]. recording duration.

print(f"output device: {str(sc.default_speaker().name)}")

with sc.get_microphone(id=str(sc.default_speaker().name), include_loopback=True).recorder(samplerate=SAMPLE_RATE) as mic:
    _start_time: float = time.perf_counter()
    
    # record audio with loopback from default speaker.
    data = mic.record(numframes=SAMPLE_RATE*RECORD_SEC)

    # output info
    print("\n-- info --")
    print(f"len of data: {len(data)}")
    print(f"elapsed time: {time.perf_counter() - _start_time}s")
    print("-- -- -- --\n")

    sf.write(file=OUTPUT_FILE_NAME, data=data[:, 0], samplerate=SAMPLE_RATE)

Result

Initially, the code recorded silence and then recorded sound from YouTube.
In this demo, the code ended in 5.076047300011851s.

soundcard_bug.mp4

@bastibe
Copy link
Owner

bastibe commented Feb 24, 2023

Please check out #167 for an implementation of this workaround, and thank you again for your analysis and examples!

If #167 works for you, I will try to publish it in a new version of soundcard next week.

@tez3998
Copy link
Author

tez3998 commented Feb 24, 2023

It worked fine on my three output devices!

@bastibe
Copy link
Owner

bastibe commented Feb 27, 2023

Perfect! Thank you for your feedback!

@bastibe bastibe closed this as completed Feb 27, 2023
@tez3998
Copy link
Author

tez3998 commented Feb 27, 2023

@bastibe
Thank you too for your help during your busy time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants