Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

音频转文字过程中显存不断增加,最终 out of memory #2881

Closed
sunzhaoyang opened this issue Feb 6, 2023 · 2 comments
Closed
Assignees
Labels

Comments

@sunzhaoyang
Copy link

sunzhaoyang commented Feb 6, 2023

General Question

因为音频文件比较大,所以我是给切分成 20s 一段再进行识别

import auditok
from paddlespeech.cli.text.infer import TextExecutor
from paddlespeech.cli.asr.infer import ASRExecutor
import sys
from tempfile import NamedTemporaryFile
import os
from pydub import AudioSegment

def dot(txt):
    text_punc = TextExecutor()
    result = text_punc(txt)
    return result


# split returns a generator of AudioRegion objects

for root, dirs, files in os.walk(".", topdown=False):
    for name in files:
        if name.endswith('mp3'):
            full_path = os.path.join(root, name)
            print(full_path)
            wav_file = full_path.replace('.mp3', '.wav')
            txt_file = full_path.replace('.mp3', '.txt')
            # convert to wav
            sound = AudioSegment.from_mp3(full_path)
            sound.export(wav_file, format="wav")

            audio_regions = auditok.split(
                wav_file,
                min_dur=0.2,     # minimum duration of a valid audio event in seconds
                max_dur=20,       # maximum duration of an event
                max_silence=10,  # maximum duration of tolerated continuous silence within an event
                energy_threshold=55  # threshold of detection
            )

            with open(txt_file, 'w') as t:

                for i, r in enumerate(audio_regions):
                    with NamedTemporaryFile(suffix='.wav') as f:
                        r.save(f.name)
                        asr = ASRExecutor()
                        raw_result = asr(audio_file=f.name, force_yes=True)
                        t.write(dot(raw_result))

识别过程中随着一个个音频分片的解析,眼瞅着 GPU 不断增长,从几百兆增加到 8G 最终 out of memory

试了 FLAGS_use_cuda_managed_memory 改为 true 和 false 都不行

CUDA 版本 11.2

W0206 08:26:41.720482 13625 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 11.2, Runtime API Version: 11.2
W0206 08:26:41.722322 13625 gpu_resources.cc:91] device: 0, cuDNN Version: 8.2.

paddlepaddle-gpu         2.4.1.post112 
@yt605155624
Copy link
Collaborator

yt605155624 commented Feb 6, 2023

asr = ASRExecutor() 只需要初始化一次,放到 for 循环外面试试,当前这种写法不仅会每次都初始化一个对象,而且多次推理达不到加速效果,参考 #1256 (comment)

TextExecutor() 也只需要初始化一次,ASRExecutor() TextExecutor() 可以尝试在整个程序的最外部初始化一个全局的

@coderLinJ5945
Copy link

asr = ASRExecutor() 初始化一次,如果是接口调用的话,微服务启动的时候初始化一次ASRExecutor,并发调用接口的时候,会不会出现ASRExecutor的并发问题?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants