Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wrong encoding crashes the app when trying to print the __header__ #106

Closed
FernandoCamporredondo opened this issue Jan 21, 2024 · 3 comments

Comments

@FernandoCamporredondo
Copy link
Contributor

FernandoCamporredondo commented Jan 21, 2024

I found out that if you are executing subsai from a nodejs app, while your windows machine encoding is set to japanese (cp932), header will give a error message "cp932' codec can't encode character '\u2588'", when the library tries to print header in the line 118 of the file cli.py

I don't know python, but from what I have been searching on stackoverflow, adding ".encode('utf-8')` to the header seems to fix it on my machine.

__header__ = f"""
███████╗██╗   ██╗██████╗ ███████╗     █████╗ ██╗
██╔════╝██║   ██║██╔══██╗██╔════╝    ██╔══██╗██║
███████╗██║   ██║██████╔╝███████╗    ███████║██║
╚════██║██║   ██║██╔══██╗╚════██║    ██╔══██║██║
███████║╚██████╔╝██████╔╝███████║    ██║  ██║██║
╚══════╝ ╚═════╝ ╚═════╝ ╚══════╝    ╚═╝  ╚═╝╚═╝
                                            
Subs AI: Subtitles generation tool powered by OpenAI's Whisper and its variants.
Version: {__version__}               
===================================
""".encode('utf-8')
abdeladim-s added a commit that referenced this issue Jan 21, 2024
@abdeladim-s
Copy link
Owner

Yes, I forget to take encoding into consideration, I applied the changes accordingly.
Thanks a lot @FernandoCamporredondo for pointing that out.

@FernandoCamporredondo
Copy link
Contributor Author

FernandoCamporredondo commented Feb 10, 2024

@abdeladim-s

I recently found out that if a file name countains some characters, like "–", it will also crash

I made some changes that worked for me. Here is the pull request with them #112

Without those changes, the error message that I was getting was

Traceback (most recent call last): File "C:\Users\Usuario\.pyenv\pyenv-win\versions\3.10.0\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\Usuario\.pyenv\pyenv-win\versions\3.10.0\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\Usuario\.pyenv\pyenv-win\versions\3.10.0\Scripts\subsai.exe\__main__.py", line 7, in <module> File "C:\Users\Usuario\.pyenv\pyenv-win\versions\3.10.0\lib\site-packages\subsai\cli.py", line 149, in main run(media_file_arg=args.media_file, File "C:\Users\Usuario\.pyenv\pyenv-win\versions\3.10.0\lib\site-packages\subsai\cli.py", line 84, in run print(f"[+] Processing file: {file}") UnicodeEncodeError: 'cp932' codec can't encode character '\u2013' in position 172: illegal multibyte sequence

@abdeladim-s
Copy link
Owner

Yeah another encoding issue.
The PR seems perfect and has been merged.
Thanks @FernandoCamporredondo for the contribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants