Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue generating subtitles #19

Closed
pordeciralgo opened this issue Jun 4, 2024 · 5 comments
Closed

Issue generating subtitles #19

pordeciralgo opened this issue Jun 4, 2024 · 5 comments

Comments

@pordeciralgo
Copy link

Steps to reproduce

  1. Tried to generate subtitles for this classic movie: https://1drv.ms/v/s!AvxL3H5dkUh1h4k-5Z8uG14x1YrLfQ?e=nAnSTF
  2. Settings:
    -Language: English (also tested with Spanish, same result)
    -Transcribe from File
    -Transcribe using WhisperX
    -Translate to English: NOT CHECKED
    -Generate subtitles: CHECKED
    -Highlight words: NOT CHECKED
    -Max. line count: 2
    -Max. line width: 42
    -Model size: large-v2
    -Compute type: float32
    -Batch size: 8
    -Use CPU: NOT CHECKED
    Running 2.2.1 on Windows 10 x64
  3. Generate transcription works OK.

Expected behaviour

Save transcription generates .txt, .srt and .vtt files.

It does behave as expected with other audio files.

Actual behaviour

Save transcription gets stuck on an empty 0-byte .txt. Neither .srt nor .vtt are generated.

System information

  • Windows 10 x64
  • Spanish system language (es-ES)
  • Audiotext 2.2.1
@pordeciralgo
Copy link
Author

Please let me know when you're downloaded the file, so I can unshare it. Thank you in advance.

@HenestrosaDev
Copy link
Owner

I've already downloaded the file. I'll take a look to see what's going on.

@HenestrosaDev
Copy link
Owner

It seems that the problem is caused by the ♪ symbols, which throw a UnicodeEncodeError in the save_transcription method of the main_controller when writing the transcription text to a .txt file.

This is the stacktrace for reference:

UnicodeEncodeError: 'charmap' codec can't encode character '\u266a' in position 0: character maps to <undefined>

I've already solved this by encoding the file with utf-8. I'll close the issue as soon as I release version 2.2.2, which should be sometime today or tomorrow.

HenestrosaDev added a commit that referenced this issue Jun 4, 2024
@pordeciralgo
Copy link
Author

That was fast! Thank you very much for your help :)

@HenestrosaDev
Copy link
Owner

I've just released the new version, so I'm closing this issue. Please create a new one if you encounter any problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants