-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Maximise audio quality - conversion workflow #155
Comments
Using flowchart may be too complicated. Anyway let me explain this. |
I fix this problem(partially) in my repo. Now conversion is separated. .wav(48kHz/16bit/mono) for regions find and .flac(44.kHz/24bit/mono) for speech api. Details in CHANGELOG.md. @daT4v1s |
I just commit a feature about pre-process audio using this workflow but controlled by the autosub itself. issue #40 Default pre-process commands need ffmpeg-normalize. Of course you can write it youself by using the My repo |
I've already released the standalone version. Click here and download. |
Also, if you are not satisfied with the current conversion command, you can manually replace it by using Apart from that, you can also do the conversion outside the autosub. You can manually input More info in my repo's readme. |
not that much into the code,
it seems to do some wav/flac conversion works
by breaking the audio to pieces
and then upload them
i am not sure as to what order , the details, can someone explain it? (fine with flowcharts?)
it seems to down-sample the audio
i am not sure what bit depth/sample rates are allowed,
but it e.g. is output 16kHz, 16bit integer
if mp3, prefer use of
-c:a mp3float
otherwise it converts to 16-bit-integer (quality difference) (something to do with the frequency-encoding)maybe maximise the use of the limited 16-bit dynamic range, and headroom optimize
via (if not already in the "native-upload format")
-filter:a aresample=384000:resampler=soxr:precision=33:osf=dbl:cutoff=0.98:osf=dbl,dynaudnorm=g=63:b=1:c=1,aresample=44100:resampler=soxr:precision=33:cutoff=0.91:osf=flt
FFmpeg Resampler Documentation - soxr is better than ffmpeg's default
Dynamic Audio Normalizer
reason for this maybe this might somewhat slight change word output, accuracy? issues
haven't studied its sensitivity
if audio → flac → wav → upload
&if down-sample to 16?
bit
16?kHz
occurs @ flac to wav stagewhen audio is mp3(via float-decode) or aac or opus-ogg,
since it's decoded as a 32-bit float,
then save the flac as 24-bit
to preserve dynamic range
The text was updated successfully, but these errors were encountered: