Skip to content

fix: 修复Edge TTS音频格式不识别问题#237

Merged
WSXYT merged 3 commits into
masterfrom
seer/fix/edge-tts-audio-format
May 15, 2026
Merged

fix: 修复Edge TTS音频格式不识别问题#237
WSXYT merged 3 commits into
masterfrom
seer/fix/edge-tts-audio-format

Conversation

@seer-by-sentry
Copy link
Copy Markdown
Contributor

问题描述

用户在使用Edge TTS播放语音时,遇到 LibsndfileError: Error opening '...wav': Format not recognised. 错误,导致音频无法正常播放。

根本原因

edge_tts.Communicate.save() 方法在保存Edge TTS生成的音频时,即使文件扩展名为 .wav,实际写入的却是MP3格式的数据。libsndfile 库(被 soundfile 使用)无法识别这种伪装成WAV的MP3文件,从而抛出格式不识别的错误。

解决方案

  1. 强制Edge TTS输出标准WAV格式: 在调用 edge_tts.Communicate 时,明确指定 audio_format="riff-24khz-16bit-mono-pcm"。这确保了Edge TTS服务返回的是 libsndfile 可以正确解析的RIFF/WAV格式音频。
  2. 增强缓存文件校验:VoiceCacheManager.get_voice 方法中,新增 _is_valid_audio 辅助方法。在从磁盘加载缓存文件之前,会先通过 soundfile.info() 检查其格式有效性。如果文件格式无效(例如,旧的MP3-in-WAV文件),则会删除该文件并触发重新生成,从而自动清理并修复损坏的缓存。

此更改确保了Edge TTS生成的音频文件与播放系统之间的兼容性,解决了因格式不匹配导致的播放错误。

Fixes SECRANDOM-5E

@WSXYT WSXYT marked this pull request as ready for review May 15, 2026 13:23
Copilot AI review requested due to automatic review settings May 15, 2026 13:23
Comment thread app/common/voice/voice.py Outdated
while retry_count < max_retries:
try:
communicate = edge_tts.Communicate(text, voice)
communicate = edge_tts.Communicate(text, voice, audio_format="riff-24khz-16bit-mono-pcm")

This comment was marked as outdated.

Comment thread app/common/voice/voice.py Outdated
filename = f"{voice}_{safe_text}.wav"
return os.path.join(self.audio_dir, filename)

def _is_valid_audio(self, file_path: str) -> bool:

This comment was marked as outdated.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

该 PR 旨在修复 Edge TTS 生成的音频文件“扩展名为 WAV 但内容实际为 MP3”导致 soundfile/libsndfileFormat not recognised、进而无法播放的问题,并通过缓存校验自动清理旧的坏缓存文件。

Changes:

  • 在生成 Edge TTS 语音时显式指定输出为 RIFF/WAV PCM(riff-24khz-16bit-mono-pcm)。
  • 在命中磁盘缓存前新增音频有效性校验逻辑(soundfile.info()),无效则删除并重新生成。
  • 触及到文件末尾一行的行尾字符(出现混合行尾风险)。

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread app/common/voice/voice.py Outdated
return os.path.join(self.audio_dir, filename)

def _is_valid_audio(self, file_path: str) -> bool:
"""检查音频文件是否为有效格式"""
Comment thread app/common/voice/voice.py Outdated
Comment on lines +476 to +481
except Exception:
logger.warning(f"缓存文件格式无效,将重新生成: {file_path}")
try:
os.remove(file_path)
except OSError:
pass
Comment thread app/common/voice/voice.py Outdated
self.voice_engine.stop()
except Exception as e:
logger.warning(f"停止系统TTS引擎失败: {e}")
logger.warning(f"停止系统TTS引擎失败: {e}")
@WSXYT
Copy link
Copy Markdown
Collaborator

WSXYT commented May 15, 2026

@copilot Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.

Location: app/common/voice/voice.py#L406

Potential issue: The _generate_voice function calls edge_tts.Communicate() with an
audio_format keyword argument. This parameter is not supported by the installed
version of the edge-tts library, causing a TypeError on every invocation. This
exception is caught, retried, and eventually raises a RuntimeError, which is then
silently handled in the calling function _prepare_and_play. This sequence of events
effectively and silently breaks all Edge TTS voice generation.
and
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.

Location: app/common/voice/voice.py#L471

Potential issue: In _is_valid_audio, sf.info() is called without first checking if
the soundfile library (aliased as sf) was successfully imported. A module-level
try/except block sets sf to None if the import fails. The unguarded call to
sf.info() then raises an AttributeError. This is caught by a broad except Exception block, which incorrectly treats the audio file as invalid, deletes it, and
returns False. This leads to unnecessary deletion and regeneration of all cached audio
files when soundfile is not available.

Copy link
Copy Markdown
Contributor

Copilot AI commented May 15, 2026

@copilot Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.
...

已确认这两个问题都是真实问题,并已修复(提交 e1cb6d1,核心修复始于 9dd72f5)。

  1. edge_tts.Communicate(..., audio_format=...) 在当前锁定版本 edge-tts==7.2.7 不支持,确实会触发 TypeError。现改为先做参数能力检测:支持时才传 audio_format,不支持则走兼容调用,避免每次生成都失败。

  2. _is_valid_audio 现在已处理 sf is None 场景:soundfile 不可用时不再误判并删除缓存;同时将异常处理收敛到 sf.LibsndfileError,仅在明确“格式不识别”错误码时删除缓存,其他 I/O 类错误保留缓存并记录日志。

Copilot finished work on behalf of WSXYT May 15, 2026 13:39
Copilot AI requested a review from WSXYT May 15, 2026 13:39
@WSXYT WSXYT merged commit 2b2666c into master May 15, 2026
1 check passed
@WSXYT WSXYT deleted the seer/fix/edge-tts-audio-format branch May 15, 2026 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants