Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maximise audio quality - conversion workflow #155

Open
daT4v1s opened this issue Jun 28, 2019 · 5 comments
Open

Maximise audio quality - conversion workflow #155

daT4v1s opened this issue Jun 28, 2019 · 5 comments

Comments

@daT4v1s
Copy link

daT4v1s commented Jun 28, 2019

not that much into the code,
it seems to do some wav/flac conversion works
by breaking the audio to pieces
and then upload them
i am not sure as to what order , the details, can someone explain it? (fine with flowcharts?)

it seems to down-sample the audio
i am not sure what bit depth/sample rates are allowed,

but it e.g. is output 16kHz, 16bit integer

if mp3, prefer use of
-c:a mp3float otherwise it converts to 16-bit-integer (quality difference) (something to do with the frequency-encoding)

maybe maximise the use of the limited 16-bit dynamic range, and headroom optimize

via (if not already in the "native-upload format")

  1. maybe excessive? oversample maybe to 192khz or 384kHz using max quality settings
  2. apply some dynamic-normalization
  3. downsample+dither

-filter:a aresample=384000:resampler=soxr:precision=33:osf=dbl:cutoff=0.98:osf=dbl,dynaudnorm=g=63:b=1:c=1,aresample=44100:resampler=soxr:precision=33:cutoff=0.91:osf=flt
FFmpeg Resampler Documentation - soxr is better than ffmpeg's default
Dynamic Audio Normalizer

reason for this maybe this might somewhat slight change word output, accuracy? issues

haven't studied its sensitivity

if audio → flac → wav → upload
 &if down-sample to 16?bit 16?kHz occurs @ flac to wav stage
 when audio is mp3(via float-decode) or aac or opus-ogg,
  since it's decoded as a 32-bit float,
 then save the flac as 24-bit
  to preserve dynamic range

@BingLingGroup
Copy link

Using flowchart may be too complicated. Anyway let me explain this.
However it is not that necessary to ask for higher audio quality, due to the api itself may not need that higher quality audio clips. If you don't know well about the speech-to-text api used by this software, you can go to #111 .
Of course what you say is really something that perhaps influence the audio quality. I didn't realize it before. You can refactor the codes to get a much better audio processing workflow. And then open a pull request.

@BingLingGroup
Copy link

I fix this problem(partially) in my repo. Now conversion is separated. .wav(48kHz/16bit/mono) for regions find and .flac(44.kHz/24bit/mono) for speech api. Details in CHANGELOG.md. @daT4v1s

@BingLingGroup
Copy link

I just commit a feature about pre-process audio using this workflow but controlled by the autosub itself. issue #40

Default pre-process commands need ffmpeg-normalize. Of course you can write it youself by using the -apc input options. But remember to set pre-processing output format to 44.kHz/24bit/mono flac. Currently I don't write the logic to judge the output format. It will be used directly by speech-to-text method. And when that method cut the clips, it use copy arg so it is very risky when your format isn't proper.

My repo
You can install it from pip. Or wait for me to release. I write pretty some features now. I think I will release it in a few more days.

@BingLingGroup
Copy link

BingLingGroup commented Jul 30, 2019

I've already released the standalone version. Click here and download.

@BingLingGroup
Copy link

Also, if you are not satisfied with the current conversion command, you can manually replace it by using -acc/--audio-conversion-cmd.

Apart from that, you can also do the conversion outside the autosub. You can manually input -ap n to override the conversion.

More info in my repo's readme.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants