Skip to content

Conversation

@bghira
Copy link
Owner

@bghira bghira commented Feb 1, 2026

This pull request introduces a safeguard to the text embeddings processing logic for audio datasets that source their data from video. Specifically, it ensures that text embedding processing is skipped for these datasets, as they inherit captions from their parent video dataset during training.

Text embedding processing logic:

  • In simpletuner/helpers/data_backend/factory.py, the _process_text_embeddings function now checks if the audio dataset is configured with source_from_video=True and skips text embedding processing in that case, logging an informational message. This prevents redundant processing since captions are inherited from the parent dataset.

@bghira bghira merged commit 3722deb into main Feb 1, 2026
2 checks passed
@bghira bghira deleted the bugfix/double-encode-captions branch February 1, 2026 03:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants