Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix UnicodeDecodeError in some VBA file handling #2380

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

LeonKohli
Copy link

Summary

This pull request addresses the issue where xlwings encounters a UnicodeDecodeError when handling VBA files with encodings other than UTF-8. The proposed changes add a multi-encoding read strategy to the export_vba_modules, vba_edit, and vba_import functions in cli.py, allowing xlwings to handle files with different encodings such as 'ISO-8859-1' and 'cp1252', alongside the default 'utf-8'.

Changes

  • Modified export_vba_modules, vba_edit, and vba_import functions to try reading files with 'utf-8', 'ISO-8859-1', and 'cp1252' encodings.
  • Added error handling to raise UnicodeDecodeError if all encodings fail, providing clear feedback.

Justification

These changes enhance the robustness of xlwings in handling VBA files with various encodings, which is a common scenario in diverse environments. It ensures smoother functionality and reduces the likelihood of runtime errors due to encoding issues.

Testing

The modifications have been tested in various scenarios to ensure compatibility and functionality across different file encodings.

@LeonKohli LeonKohli changed the title "Fix UnicodeDecodeError in some VBA file handling Fix UnicodeDecodeError in some VBA file handling Jan 15, 2024
@fzumstein
Copy link
Member

Thanks! Can you also provide instructions on how to reproduce this issue? See also #2335

@LeonKohli
Copy link
Author

Hey @fzumstein

Thank you for following up on the pull request. Regarding the UnicodeDecodeError issue with xlwings, I encountered it while working on a large VBA project in my professional environment. Initially, I focused on resolving the issue directly as it was impacting our workflow, rather than setting up a detailed reproduction scenario.

However, I believe the problem arises when handling VBA files with specific Unicode characters that are not encoded in UTF-8, especially in large projects where various encoding standards might have been used over time

@fzumstein
Copy link
Member

were you using files that were exported outside of xlwings?

@LeonKohli
Copy link
Author

To clarify, the files I encountered the issue with were indeed part of a larger VBA project, and some of these files may have been exported or edited outside of xlwings before being reintegrated into the project. This mixed handling could be a contributing factor to the encoding discrepancies leading to the UnicodeDecodeError.

@fzumstein
Copy link
Member

Right, in this case I think it's fair to expect that the user has to use xlwings to do the initial export though. The way you have it now is just covering your specific use case, it wouldn't work for this case: #2335
I'd rather allow users to specify a non-utf-8 encoding via command line switch, something like:
xlwings vba edit --encoding=cp932
I think it could be ok to loop through utf-8 and locale.getpreferredencoding() by default, as this is still generic.

@LeonKohli
Copy link
Author

A command line switch for specifying encoding, offers greater flexibility for individual files.
However, I'd like to raise a concern regarding projects where multiple contributors have worked on different modules, potentially using varying encodings. In such scenarios, a single encoding specified via the command line might not be sufficient to handle all modules correctly.

Based on my project structure there are some modules using utf8 some using cp932 or LATIN-1

I understand this introduces additional complexity but it might be necessary to ensure robust handling of diverse and collaborative VBA projects.

@zwackelfuss
Copy link

Hello,
I am new to xlwings and just installed the latest version (0.31.0). Opening a file the way described here #2335 (comment) (Terminal vba edit --file) , the xlsm opens, but this error occurs:

  File "C:\Users\Heiko\AppData\Local\Programs\Python\Python39\lib\site-packages\xlwings\cli.py", line 757, in export_vba_modules
    exported_code = f.readlines()
  File "C:\Users\Heiko\AppData\Local\Programs\Python\Python39\lib\codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 354: invalid start byte
PS C:\Users\Heiko\Documents\testxl> 

Is there anything I can do to resolve this problem?

@fzumstein
Copy link
Member

@zwackelfuss if you can attach a sample workbook to replicate the issue, that would help. Also, if you can report your machine's locale:

image

@zwackelfuss
Copy link

zwackelfuss commented Mar 28, 2024

Thank you Felix. See my language settings (German).
image
Odd: I opened the xlsm and deleted some simple cell values (name of persons), saved the file and opened it in xlwings.. without error. So it seems some of the names are the reason for the error. As the files contains names, I prefer sending you the file via mail instead of posting it here - hope this is ok for you.

EDIT: it even odder: the content is not important. Opening the file for the first time causes the error. Closing it and opening it again.. error is gone, all fine.

@PedroWitzel
Copy link

Hello, first time here 👋
I had a similar issue with a Brazilian Portuguese workbook, and grabbing this modification resolved the issue.

I'm running xlwings vba edit to do so.
I can do further testing if you go further with this, or if you require a 'broken' workbook.

@ZwilleSmutje
Copy link

I was able to get the export working when setting the reading part of cli.py>vba_edit to

Line 867 with open(path, "r", encoding="ISO 8859-1") as f:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants