Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patching export for unicode issue #3537

Closed

Conversation

CasperWA
Copy link
Contributor

Fixes #3492

This PR attemps to accommodate the issue described in #3492 by adding encoding info when opening the json files for writing during export and further adding some encoding info in our own ZipFolder.

@AntimoMarrazzo, could you possibly try and export using this branch and see if it fixes your usecase?

Copy link
Contributor

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @CasperWA : few questions. Not sure about the solution, mostly also because I don't think we (me at least) fully understand the cause of the failure. Need to reproduce it first and add a test

aiida/tools/importexport/dbexport/zip.py Outdated Show resolved Hide resolved
aiida/tools/importexport/dbexport/zip.py Outdated Show resolved Hide resolved
aiida/tools/importexport/dbexport/zip.py Outdated Show resolved Hide resolved
@@ -412,7 +412,7 @@ def export_tree(
}

# N.B. We're really calling zipfolder.open (if exporting a zipfile)
with folder.open('data.json', mode='w') as fhandle:
with folder.open('data.json', mode='w', encoding='utf-8') as fhandle:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main question is under what conditions did @AntimoMarrazzo case fail and why was this not caught by the tests. I would first want to fully understand the failure and then add a test.

@CasperWA CasperWA moved this from To do to In progress in Coding Week 2019 Dec 3, 2019
@CasperWA CasperWA force-pushed the fix_3492_export_unicode_issue branch 2 times, most recently from dd65904 to 38b3ac9 Compare December 4, 2019 19:56
@sphuber sphuber removed this from In progress in Coding Week 2019 Jan 27, 2020
@ramirezfranciscof
Copy link
Member

@CasperWA what is the status of this? are you waiting for @AntimoMarrazzo to test it? did you discuss @sphuber comment in person? (in which case shouldn't we mark them as resolved?)

@sphuber
Copy link
Contributor

sphuber commented May 7, 2020

Since this has been open for very long and needs more work, is it ok to close this @CasperWA ?

@CasperWA
Copy link
Contributor Author

CasperWA commented May 7, 2020

Since this has been open for very long and needs more work, is it ok to close this @CasperWA ?

Please wait a bit longer. I will get around to it soonish - at least to fully review what needs to be done and at the very least update the original issue / update the PR.

@CasperWA CasperWA force-pushed the fix_3492_export_unicode_issue branch from 38b3ac9 to 74a211d Compare May 14, 2020 10:07
@CasperWA CasperWA force-pushed the fix_3492_export_unicode_issue branch from 74a211d to dca2107 Compare May 14, 2020 10:08
@CasperWA
Copy link
Contributor Author

CasperWA commented May 14, 2020

The original issue pertains to discrepancies between Py2 and Py3. A good explanation can be found in this StackOverflow accepted answer.
In short, in Py2 str was stored as a bytes object, while unicode was stored as a "text" object. In Py3 unicode was changed to str, i.e., str is now a "text" object and a new class bytes replaced the old style str class.
So the issue is the str class.

In the current version of AiiDA we do not support Py2, so this is not an issue.
However, to be more clear, I have aligned the custom ZipFolder class with the custom Folder class, so that open and write works in the same way, i.e., with a BytesIO stream buffer when writing.
Also, I have added a check for the type of data when writing, which will raise TypeError if it's incorrect, e.g., if you try to write a str object when having opened it in simple 'w' mode, and vice versa when opening in 'wb' mode.

The original issue is thus not truly fixed, since it pertains to an older AiiDA version (one that supports Py2).

@CasperWA CasperWA requested a review from sphuber May 14, 2020 10:15
@CasperWA
Copy link
Contributor Author

Thinking further about this, the issue is actually that we're using the custom MyWritingZipFile class, since zipfile.ZipFile.writestr works fine with either bytes or str - it will simply convert a str to a UTF-8 encoded bytes object behind the scenes ... So if we don't go through the buffer in MyWritingZipFile...

@codecov
Copy link

codecov bot commented May 14, 2020

Codecov Report

Merging #3537 into develop will increase coverage by 0.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #3537      +/-   ##
===========================================
+ Coverage    78.77%   78.77%   +0.01%     
===========================================
  Files          463      463              
  Lines        34399    34408       +9     
===========================================
+ Hits         27094    27103       +9     
  Misses        7305     7305              
Flag Coverage Δ
#django 70.68% <100.00%> (+0.01%) ⬆️
#sqlalchemy 71.56% <100.00%> (+0.01%) ⬆️
Impacted Files Coverage Δ
aiida/tools/importexport/dbexport/__init__.py 97.72% <100.00%> (ø)
aiida/tools/importexport/dbexport/zip.py 91.58% <100.00%> (+0.89%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9d918e0...9ddd6a9. Read the comment docs.

@CasperWA
Copy link
Contributor Author

I think perhaps this PR should be closed, and issue #3492 should be directed towards the mixed Py2/Py3 versions.

I will create another PR for the changes I have introduced in this PR, but they have nothing to do with the original issue, but are simply a streamlining of the aiida.tools.importexport.dbexport.zip module instead (I have further local changes than what has been shown here).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error exporting some files (another Unicode issue)
3 participants