Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output filename with exotic characters is corrupted #3724

Closed
isaomatsunami opened this issue Mar 15, 2021 · 2 comments · Fixed by #3736
Closed

Output filename with exotic characters is corrupted #3724

isaomatsunami opened this issue Mar 15, 2021 · 2 comments · Fixed by #3736
Assignees
Labels
encoding Selection of encoding at import time, or encoding issues in data cleaning export Exporting a project to some format. Use the format-specific sub-label if available Type: Bug Issues related to software defects or unexpected behavior, which require resolution.
Milestone

Comments

@isaomatsunami
Copy link
Contributor

isaomatsunami commented Mar 15, 2021

East Asian Characters in Project name are corrupted in exported files.

To Reproduce

  1. First, import the attached xls file, which has a filename of German-Japanese-French-Chinese characters.(and contains such characters inside cells too). Project names are not corrupted.
    äöüßいろはにàçéùè§繁體字.xls.zip
  2. Then, Export data as tsv/csv/excel
  3. Finally, the output filename is "äöüß????àçéùè§???-xls.csv". German/French are preserved but Japanese /Chinese are not.

Versions

  • Operating System: Mac OS
  • Browser Version: Firefox 61, Safari
  • JRE or JDK Version: java version "11.0.1" 2018-10-16 LTS/Java(TM) SE Runtime Environment 18.9 (build 11.0.1+13-LTS)/Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.1+13-LTS, mixed mode
  • OpenRefine: 3.5-SNAPSHOT

Additional context

The problem of the same kind on importing files with exotic characters is perfectly resolved. Huge thanks to dev team!

@isaomatsunami isaomatsunami added Type: Bug Issues related to software defects or unexpected behavior, which require resolution. Status: Pending Review Indicates that the issue or pull request is awaiting review by project maintainers or collaborators labels Mar 15, 2021
@gitonthescene
Copy link
Contributor

Possibly related to #3277?

@wetneb wetneb added encoding Selection of encoding at import time, or encoding issues in data cleaning export Exporting a project to some format. Use the format-specific sub-label if available labels Mar 16, 2021
@gitonthescene
Copy link
Contributor

gitonthescene commented Mar 16, 2021

Actually, it looks like the filename is set here and passed to the browser in the Content-Disposition header.

This doc seems to suggest the unicode needs to be url encoded with a byte-encoding specified. I think OpenRefine already packages PercentEscaper from guava.

In short, this looks fixable.

@wetneb wetneb removed the Status: Pending Review Indicates that the issue or pull request is awaiting review by project maintainers or collaborators label Mar 16, 2021
@gitonthescene gitonthescene self-assigned this Mar 16, 2021
gitonthescene pushed a commit to gitonthescene/OpenRefine that referenced this issue Mar 16, 2021
wetneb pushed a commit that referenced this issue Mar 17, 2021
Co-authored-by: Douglas Mennella <douglas.mennella@gmail.com>
@wetneb wetneb added this to the 3.5 milestone Mar 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
encoding Selection of encoding at import time, or encoding issues in data cleaning export Exporting a project to some format. Use the format-specific sub-label if available Type: Bug Issues related to software defects or unexpected behavior, which require resolution.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants