Maximise audio quality - conversion workflow #155

daT4v1s · 2019-06-28T05:46:28Z

not that much into the code,
it seems to do some wav/flac conversion works
by breaking the audio to pieces
and then upload them
i am not sure as to what order , the details, can someone explain it? (fine with flowcharts?)

it seems to down-sample the audio
i am not sure what bit depth/sample rates are allowed,

but it e.g. is output 16kHz, 16bit integer

if mp3, prefer use of
-c:a mp3float otherwise it converts to 16-bit-integer (quality difference) (something to do with the frequency-encoding)

maybe maximise the use of the limited 16-bit dynamic range, and headroom optimize

via (if not already in the "native-upload format")

maybe excessive? oversample maybe to 192khz or 384kHz using max quality settings
apply some dynamic-normalization
downsample+dither

-filter:a aresample=384000:resampler=soxr:precision=33:osf=dbl:cutoff=0.98:osf=dbl,dynaudnorm=g=63:b=1:c=1,aresample=44100:resampler=soxr:precision=33:cutoff=0.91:osf=flt
FFmpeg Resampler Documentation - soxr is better than ffmpeg's default
Dynamic Audio Normalizer

reason for this maybe this might somewhat slight change word output, accuracy? issues

haven't studied its sensitivity

if audio → flac → wav → upload
&if down-sample to 16?bit 16?kHz occurs @ flac to wav stage
when audio is mp3(via float-decode) or aac or opus-ogg,
since it's decoded as a 32-bit float,
then save the flac as 24-bit
to preserve dynamic range

The text was updated successfully, but these errors were encountered:

BingLingGroup · 2019-07-08T03:54:47Z

Using flowchart may be too complicated. Anyway let me explain this.
However it is not that necessary to ask for higher audio quality, due to the api itself may not need that higher quality audio clips. If you don't know well about the speech-to-text api used by this software, you can go to #111 .
Of course what you say is really something that perhaps influence the audio quality. I didn't realize it before. You can refactor the codes to get a much better audio processing workflow. And then open a pull request.

BingLingGroup · 2019-07-13T04:49:35Z

I fix this problem(partially) in my repo. Now conversion is separated. .wav(48kHz/16bit/mono) for regions find and .flac(44.kHz/24bit/mono) for speech api. Details in CHANGELOG.md. @daT4v1s

BingLingGroup · 2019-07-20T14:25:46Z

I just commit a feature about pre-process audio using this workflow but controlled by the autosub itself. issue #40

Default pre-process commands need ffmpeg-normalize. Of course you can write it youself by using the -apc input options. But remember to set pre-processing output format to 44.kHz/24bit/mono flac. Currently I don't write the logic to judge the output format. It will be used directly by speech-to-text method. And when that method cut the clips, it use copy arg so it is very risky when your format isn't proper.

My repo
You can install it from pip. Or wait for me to release. I write pretty some features now. I think I will release it in a few more days.

BingLingGroup · 2019-07-30T12:12:41Z

I've already released the standalone version. Click here and download.

BingLingGroup · 2019-08-06T04:35:24Z

Also, if you are not satisfied with the current conversion command, you can manually replace it by using -acc/--audio-conversion-cmd.

Apart from that, you can also do the conversion outside the autosub. You can manually input -ap n to override the conversion.

More info in my repo's readme.

BingLingGroup mentioned this issue Jul 12, 2019

Fix audio processing and add audio preprocessing BingLingGroup/autosub#7

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Maximise audio quality - conversion workflow #155

Maximise audio quality - conversion workflow #155

daT4v1s commented Jun 28, 2019 •

edited

BingLingGroup commented Jul 8, 2019

BingLingGroup commented Jul 13, 2019

BingLingGroup commented Jul 20, 2019

BingLingGroup commented Jul 30, 2019 •

edited

BingLingGroup commented Aug 6, 2019

Maximise audio quality - conversion workflow #155

Maximise audio quality - conversion workflow #155

Comments

daT4v1s commented Jun 28, 2019 • edited

BingLingGroup commented Jul 8, 2019

BingLingGroup commented Jul 13, 2019

BingLingGroup commented Jul 20, 2019

BingLingGroup commented Jul 30, 2019 • edited

BingLingGroup commented Aug 6, 2019

daT4v1s commented Jun 28, 2019 •

edited

BingLingGroup commented Jul 30, 2019 •

edited