Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auditok相关参数无效 #70

Closed
libukai opened this issue Dec 21, 2019 · 1 comment
Closed

Auditok相关参数无效 #70

libukai opened this issue Dec 21, 2019 · 1 comment

Comments

@libukai
Copy link

libukai commented Dec 21, 2019

输入命令:autosub -i 1.mp4 -S en-US -mxrs 3.0 -hsa

切割出来的时间轴片段依然全是6秒的,同样在其他文件上设置最小区域时间也无效

翻译目的语言未提供。只进行语音识别。
语音语言和目的语言一致。只进行语音识别。

将源音频转换为"/var/folders/wx/11bfg6v90fj6ddtpmgz8_q1m0000gn/T/tmpyxg_sm_f.wav"来获取源音频的总长度,用于语音区域检测。
ffmpeg -hide_banner -y -i "1.mp4" -ac 1 -ar 48000 "/var/folders/wx/11bfg6v90fj6ddtpmgz8_q1m0000gn/T/tmpyxg_sm_f.wav"
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '1.mp4':
  Metadata:
    major_brand     : mp42
    minor_version   : 1
    compatible_brands: isommp41mp42
    creation_time   : 2019-12-21T14:52:57.000000Z
  Duration: 00:01:00.05, start: 0.000000, bitrate: 302 kb/s
    Stream #0:0(und): Video: h264 (Main) (avc1 / 0x31637661), yuv420p(tv, smpte170m/bt470bg/smpte170m), 640x640, 230 kb/s, 29.94 fps, 30 tbr, 15360 tbn, 60 tbc (default)
    Metadata:
      creation_time   : 2019-12-21T14:52:57.000000Z
      handler_name    : Core Media Video
    Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 64 kb/s (default)
    Metadata:
      creation_time   : 2019-12-21T14:52:57.000000Z
      handler_name    : Core Media Audio
Stream mapping:
  Stream #0:1 -> #0:0 (aac (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to '/var/folders/wx/11bfg6v90fj6ddtpmgz8_q1m0000gn/T/tmpyxg_sm_f.wav':
  Metadata:
    major_brand     : mp42
    minor_version   : 1
    compatible_brands: isommp41mp42
    ISFT            : Lavf57.71.100
    Stream #0:0(und): Audio: pcm_s16le ([1][0][0][0] / 0x0001), 48000 Hz, mono, s16, 768 kb/s (default)
    Metadata:
      creation_time   : 2019-12-21T14:52:57.000000Z
      handler_name    : Core Media Audio
      encoder         : Lavc57.89.100 pcm_s16le
size=    5630kB time=00:01:00.05 bitrate= 768.0kbits/s speed= 563x
video:0kB audio:5630kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.001353%
ffprobe /var/folders/wx/11bfg6v90fj6ddtpmgz8_q1m0000gn/T/tmpyxg_sm_f.wav -show_format -pretty -loglevel quiet

"/var/folders/wx/11bfg6v90fj6ddtpmgz8_q1m0000gn/T/tmpyxg_sm_f.wav"已被删除。

为API转换为"/var/folders/wx/11bfg6v90fj6ddtpmgz8_q1m0000gn/T/tmpfcqs9x6n.flac"。
ffmpeg -hide_banner -y -i "1.mp4" -ac 1 -ar 44100 "/var/folders/wx/11bfg6v90fj6ddtpmgz8_q1m0000gn/T/tmpfcqs9x6n.flac"
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '1.mp4':
  Metadata:
    major_brand     : mp42
    minor_version   : 1
    compatible_brands: isommp41mp42
    creation_time   : 2019-12-21T14:52:57.000000Z
  Duration: 00:01:00.05, start: 0.000000, bitrate: 302 kb/s
    Stream #0:0(und): Video: h264 (Main) (avc1 / 0x31637661), yuv420p(tv, smpte170m/bt470bg/smpte170m), 640x640, 230 kb/s, 29.94 fps, 30 tbr, 15360 tbn, 60 tbc (default)
    Metadata:
      creation_time   : 2019-12-21T14:52:57.000000Z
      handler_name    : Core Media Video
    Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 64 kb/s (default)
    Metadata:
      creation_time   : 2019-12-21T14:52:57.000000Z
      handler_name    : Core Media Audio
Stream mapping:
  Stream #0:1 -> #0:0 (aac (native) -> flac (native))
Press [q] to stop, [?] for help
[flac @ 0x7f9754003e00] encoding as 24 bits-per-sample
Output #0, flac, to '/var/folders/wx/11bfg6v90fj6ddtpmgz8_q1m0000gn/T/tmpfcqs9x6n.flac':
  Metadata:
    major_brand     : mp42
    minor_version   : 1
    compatible_brands: isommp41mp42
    encoder         : Lavf57.71.100
    Stream #0:0(und): Audio: flac, 44100 Hz, mono, s32 (24 bit), 128 kb/s (default)
    Metadata:
      creation_time   : 2019-12-21T14:52:57.000000Z
      handler_name    : Core Media Audio
      encoder         : Lavc57.89.100 flac
size=    4473kB time=00:01:00.05 bitrate= 610.2kbits/s speed= 341x
video:0kB audio:4465kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.182859%
ffprobe /var/folders/wx/11bfg6v90fj6ddtpmgz8_q1m0000gn/T/tmpfcqs9x6n.flac -show_format -pretty -loglevel quiet

按照语音区域将音频转换为多个短语音片段。
转换中: N/A% |                                                    | ETA:  --:--转换中:  10% |#####                                               | ETA:   0:00转换中: 100% |####################################################| Time:  0:00:00

"/var/folders/wx/11bfg6v90fj6ddtpmgz8_q1m0000gn/T/tmpfcqs9x6n.flac"已被删除。

将短片段语音发送给API并得到识别结果。
语音转文字中: N/A% |                                                 | ETA:  --语音转文字中: N/A% |                                                 | ETA:  --语音转文字中:  10% |####                                             | ETA:   0语音转文字中: 100% |#################################################| Time:  0:00:03
语音字幕文件创建在了"1.en-us.srt"。

做完了。

转换出来的字幕如下

1
00:00:00,140 --> 00:00:06,130
In a vinyasa yoga class we step forward quite often and if you're not sure how to do it you don't have the strength unit

2
00:00:06,140 --> 00:00:12,130
Flexibility can look like this your in comfortably there's nothing wrong

3
00:00:12,140 --> 00:00:18,130
Wrong with this but if you're looking to work on it in your practice this is how you can begin to work on it so

4
00:00:18,140 --> 00:00:24,130
Firstly bringing your knee forward your shoulders of your resting your knee for do you really want to work on the hamstrings

5
00:00:24,140 --> 00:00:30,130
Working right support the heel that bent like in towards your group then you're going to work on your packs and your

6
00:00:30,140 --> 00:00:36,130
Can your front body work even your hip flexors by pulling your thigh up towards your chest then a company

7
00:00:36,140 --> 00:00:42,130
Opening all of that with the strength of protraction so your protractor in the shoulder blade getting the train

8
00:00:42,140 --> 00:00:48,130
Is highway from the ground as possible finally coming as tired as you can on the tips of the standing light

9
00:00:48,140 --> 00:00:54,130
The hit list and from there once you're not tiny package flex your foot and then stop

10
00:00:54,140 --> 00:00:59,920
The foot forward full body strength

操作环境(请提供以下完整数据):

  • 操作系统: Mac 10.15.2
  • Python版本: Python 3.6.1
  • Autosub版本: 0.5.1

确保你已经看过 readme,也搜索并阅读过和你遇到的情况相关的问题。否则会被认为是重复的并被立刻关闭。

描述问题
清晰并准确地描述问题。

复现问题
复现问题的步骤:

  1. 你使用的命令行参数。
  2. 一份完整的autosub命令行输出。你可以使用Ctrl-ACtrl-C去复制所有输出。推荐使用以下的代码块markdown语法。
  1. 等等

期待的行为
清晰并准确地描述你本想做的事情。

截图
合适的话可以提供用以描述问题的截图。但是不推荐用截图来展示命令行输出,除非你真的认为这很有必要。

操作环境(请提供以下完整数据):

  • 操作系统: [譬如 windows]
  • Python版本: [譬如 Python 2.7]
  • Autosub版本: [譬如 0.4.0]

额外信息(可选)
任何其他的能描述问题的信息。

@BingLingGroup
Copy link
Owner

BingLingGroup commented Dec 22, 2019

https://github.com/BingLingGroup/autosub/blob/dev/autosub/cmdline_utils.py#L834-L835 ,这里忘了改成args.max_region_sizeargs.min_region_size
顺便改了下别的auditok参数,提交 bb93549 应该可以用了,估计是之前加这个功能的时候忘了改了。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants