Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing doesn't work for umlauts (likely UTF-8 formatting problem) #403

Open
2 tasks
claell opened this issue Sep 20, 2023 · 4 comments · May be fixed by #405
Open
2 tasks

Writing doesn't work for umlauts (likely UTF-8 formatting problem) #403

claell opened this issue Sep 20, 2023 · 4 comments · May be fixed by #405

Comments

@claell
Copy link
Contributor

claell commented Sep 20, 2023

Describe the bug
When I create a library with unicode characters (tested with umlauts), the export will not show them properly.

Reproducing

Version: Most current from PyPI

Code:

import bibtexparser
from bibtexparser import *

bib_library = bibtexparser.Library()

fields = []

fields.append(bibtexparser.model.Field("author", "ö"))

entry = bibtexparser.model.Entry("ARTICLE", "test", fields)

bib_library.add(entry)

print(bib_library.entries_dict)

bibtexparser.write_file("my_new_file.bib", bib_library)

Bibtex:

@ARTICLE{test,
	author = {�}
}

Workaround
No; possibly generating a string first and then writing that manually will work (similarly to the workaround for #394).

Remaining Questions (Optional)
Please tick all that apply:

  • I would be willing to to contribute a PR to fix this issue.
  • This issue is a blocker, I'd be greatful for an early fix.
@claell
Copy link
Contributor Author

claell commented Sep 20, 2023

The proposed workaround seems to work for now. But it's a bit concerning to see that the library doesn't handle such simple cases of encoding well.

@MiWeiss
Copy link
Collaborator

MiWeiss commented Sep 20, 2023

Have not manually reproduced, but a change similar to what has been done in #395 (to fix #394, parsing encoding) would be reasonable to be implemented also for the writer. In short, the user should be able to pass in an encoding.

Would you be willing to contribute a fix?

Note: Bibtex does not actually support non-ascii characters, and using the LatexEncodingMiddleware may fix large parts of this problem. However, newer replacements of bibtex support utf-8, thus the issue clearly still remains valid and to be implemented.

claell added a commit to claell/python-bibtexparser that referenced this issue Sep 21, 2023
@claell claell linked a pull request Sep 21, 2023 that will close this issue
@claell
Copy link
Contributor Author

claell commented Sep 21, 2023

I started a PR in #405.

@MiWeiss
Copy link
Collaborator

MiWeiss commented Nov 2, 2023

PR #405 is looking for someone to take over. Volunteers, come forward 🚀 ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants