Make Unicode path for projects work again #3229

SuslikV · 2020-02-15T17:32:48Z

export and updates also uses json.dumps(), but it is slightly different thing. It needs more time to investigate if changes are required. In my tests it doesn't affect this issue itself.

Let's see if this fix is woks for you too.

SuslikV · 2020-02-15T17:40:22Z

non-English, what a mess... I should update commit messages. If this fix works, of course.

ferdnyc

The only thing I really don't like here is the change to the thumbnail requests.Response handling, I left a code comment explaining.

The rest, well, all those ensure_ascii=False arguments to json.dumps() make me concerned about the possible consequences, but maybe it is the right way to go. I'd have to see the effects of the change — and get an idea of the issues that the current code is causing — first. (Which I realize I can get at #3227, I just haven't gotten around to reading it yet.)

ferdnyc · 2020-02-15T18:38:51Z

src/windows/models/files_model.py

-                    thumb_path = r.text
+                    thumb_path = r.content.decode("utf-8", "ignore")


I can't say I like this change, I think it'll bite us.

requests.Response.text is documented as "Content of the response, in unicode". Which is exactly what we want.

If there's an issue with the encoding of the Unicode, we have two choices:

We can have the server end explicitly specify UTF-8 in a content-type header (which the client will receive and respect)

Or, on the client end, we can (as the Response documentation suggests) "set r.encoding appropriately before accessing this property."

Either option is much safer than accessing the raw bytes of the request and blindly force-decoding it.

You want to modify API again...

I think that the p1. is here, (something like charset=utf-8 can be specified):

openshot-qt/src/classes/thumbnail.py

Line 156 in e247a9b

self.send_header('Content-type', 'text/html')

As for the p2. - I see no much difference in this two:

response.encoding = 'utf-8' response.text

and the decode() code - it surely receives utf-8:

openshot-qt/src/classes/thumbnail.py

Line 184 in e247a9b

self.wfile.write(bytes(thumb_path, "utf-8"))

no need to see the header's charset.

And it can't be safer. It cannot use other technique. I just take example from the https://docs.python.org/3/howto/unicode.html and know the result, but I don't know how the response.text works when ...the input string can’t be converted according to the encoding’s rules... . Edit: Of course, I cannot imaging how it can happen if it already was send in utf-8... Broken message?

Fixes an issue when application fails to load files from the saved project with non-English characters in the path to the clips. Also removing json.loads strict=False key as obsolete in 3.x Python.

Fixes an issue that list view miniatures of the imported clips not rendered if path contains non-English letters. By defult the ".text" field in response uses ISO-8859-1 charset (RFC 2616 specs) while server sends utf-8 encoded path string just few strings below in the code.

SuslikV · 2020-02-16T11:19:42Z

Updated. API fix was implemented (instead of programming solution).

SuslikV · 2020-02-25T12:23:00Z

It seems that fix from the main developer is available now: #3247

jonoomph · 2020-02-26T20:35:11Z

@SuslikV I just retested this PR, and it works for my unicode examples now. I suggest we merge this PR, and close my related PR (#3247).

ferdnyc · 2020-02-26T23:10:44Z

Any fix that works! 👍

Now the only question is, what about all of the mis-encoded project files our 2.5.0 users are sitting with? Will they be readable with OpenShot after this change? If (assuming) they aren't, is it worth putting together a sort of "project file fix-up" tool to correct their formatting? (...Hopefully it's feasible to do that programmatically?)

I'd be willing to slap together a standalone Python command-line tool to correct the contents of mis-encoded files, something we could distribute to users (as well as include in the 2.5.1 release that's presumably imminent) — it's not as convenient as having OpenShot directly support reading the corrupted files, but it's simpler and probably safer to make it a separate, explicit thing.

This week has me under uncharacteristically high real-world time pressures, unfortunately, so my availability is limited, but assuming the logic is relatively straightforward and well-understood, the few hours I have free tonight should be enough time to get that together... or at least get it into a good enough state where I can submit a branch and hand it off for completion. (Because if it's not done in the next 5-6 hours, then it'll be another 12-18 before I can return to it.)

jonoomph · 2020-02-26T23:37:32Z

@ferdnyc Based on what I know, this only affects project files with certain unicode characters, and only version 2.5.0. I'm not sure how many people are really affected at this point, with broken projects. It could be dozens, or hundreds, but I don't imagine it's a huge amount.

Now, with that being said, I would be happy to create a page on openshot.org, which could take an *.osp upload file, and pops out a "fixed" encoding version. If anyone wants to create that "Python", I can adopt it into the official website, and make a tool for people. 👍

jonoomph · 2020-02-26T23:39:06Z

Or... if want to integrate some "charset checking" code into openshot-qt, which could detetct a JSON parse failure, and then try and fix the JSON encoding at runtime. That would also be a good solution... especially if a large # of people are affected.

ferdnyc · 2020-02-27T00:18:40Z

Oh, no, it's gonna affect anyone with any non-ASCII characters in their pathnames — I just created this file hierarchy:

/var/tmp/Jöerç
├── Charsetᛀᛊᛇᛅ_assets
│   ├── blender
│   ├── thumbnail
│   │   ├── 0U3BZFTQMS.png
│   │   └── K6QJISME3F.png
│   └── title
├── Charsetᛀᛊᛇᛅ.osp
├── ForagerNåkəd.mp4
└── ForagerNøtNækd.mp4

Both of the MP4 files were imported into the project. Here's what came out:

$ egrep '\/u[[:xdigit:]]{4}' /var/tmp/Jöerç/Charsetᛀᛊᛇᛅ.osp 
    "path": "../J/u00f6er/u00e7/ForagerN/u00e5k/u0259d.mp4",
   "image": "../J/u00f6er/u00e7/Charset/u16c0/u16ca/u16c7/u16c5_assets/thumbnail/0U3BZFTQMS.png"
    "path": "../J/u00f6er/u00e7/ForagerN/u00f8tN/u00e6kd.mp4",
   "image": "../J/u00f6er/u00e7/Charset/u16c0/u16ca/u16c7/u16c5_assets/thumbnail/K6QJISME3F.png"
   "path": "../J/u00f6er/u00e7/ForagerN/u00e5k/u0259d.mp4",
   "image": "../J/u00f6er/u00e7/Charset/u16c0/u16ca/u16c7/u16c5_assets/thumbnail/0U3BZFTQMS.png"
   "path": "../J/u00f6er/u00e7/ForagerN/u00f8tN/u00e6kd.mp4",
   "image": "../J/u00f6er/u00e7/Charset/u16c0/u16ca/u16c7/u16c5_assets/thumbnail/K6QJISME3F.png"
     "path": "../J/u00f6er/u00e7/ForagerN/u00e5k/u0259d.mp4",
     "path": "../J/u00f6er/u00e7/ForagerN/u00f8tN/u00e6kd.mp4",
      "path": "../J/u00f6er/u00e7/ForagerN/u00e5k/u0259d.mp4",
      "path": "../J/u00f6er/u00e7/ForagerN/u00e5k/u0259d.mp4",
      "path": "../J/u00f6er/u00e7/ForagerN/u00e5k/u0259d.mp4",
      "path": "../J/u00f6er/u00e7/ForagerN/u00f8tN/u00e6kd.mp4",
       "path": "../J/u00f6er/u00e7/ForagerN/u00e5k/u0259d.mp4",
      "path": "../J/u00f6er/u00e7/ForagerN/u00f8tN/u00e6kd.mp4",

...However, as my egrep demonstrates, the damage is very easy to detect, and correct. Basically, any forward-slash followed by a u and four hex digits is a corrupted Unicode escape that should be encoded as a backslash followed by a u and four hex digits. Any other forward slashes are path separators and should be left alone. Simple enough. And then the result needs to be read in with json.loads(), to unsubstitute the Unicode escapes.

ferdnyc · 2020-02-27T05:35:53Z

(Hrm. And I just noticed, from the above, that we appear to be storing paths relative to the parent directory of the project file location, for some reason.)

The "../J\u00f6er\u00e7/filename" paths should really just be "filename".
The "../J\u00f6er\u00e7/subdir/filename" paths should just be "subdir/filename".

Edit: Correction: Relative to the project file location, but rooted in its parent dir.

SuslikV · 2020-02-27T08:29:19Z

I can only add that some users has usernames that can cause problems too. Because the settings file of OpenShot uses same routines to write/read the openshot.settings where the recent files list is stored (some of these paths can lead to the user's folder, so even if user don't uses other characters but English in the current project, the OpenShot will find them...)

The other way the issue may appear is the default names templates for localization (mentioned here: #3252 )

ferdnyc reviewed Feb 15, 2020

View reviewed changes

SuslikV force-pushed the patch-11 branch from 8fbe76d to 7deb3f7 Compare February 16, 2020 10:03

SuslikV added 2 commits February 16, 2020 12:43

Get json in raw format

bac52ca

Fixes an issue when application fails to load files from the saved project with non-English characters in the path to the clips. Also removing json.loads strict=False key as obsolete in 3.x Python.

jonoomph mentioned this pull request Feb 26, 2020

Cannot open properly just saved project #3227

Closed

jonoomph merged commit ee43136 into OpenShot:develop Feb 26, 2020

jonoomph mentioned this pull request Feb 26, 2020

Improve UTF-8 file path support for OpenShot 2.5 #3247

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make Unicode path for projects work again #3229

Make Unicode path for projects work again #3229

SuslikV commented Feb 15, 2020 •

edited

Loading

SuslikV commented Feb 15, 2020

ferdnyc left a comment

ferdnyc Feb 15, 2020

SuslikV Feb 15, 2020 •

edited

Loading

SuslikV commented Feb 16, 2020

SuslikV commented Feb 25, 2020

jonoomph commented Feb 26, 2020

ferdnyc commented Feb 26, 2020 •

edited

Loading

jonoomph commented Feb 26, 2020

jonoomph commented Feb 26, 2020

ferdnyc commented Feb 27, 2020 •

edited

Loading

ferdnyc commented Feb 27, 2020 •

edited

Loading

SuslikV commented Feb 27, 2020

		thumb_path = r.text
		thumb_path = r.content.decode("utf-8", "ignore")

Make Unicode path for projects work again #3229

Make Unicode path for projects work again #3229

Conversation

SuslikV commented Feb 15, 2020 • edited Loading

SuslikV commented Feb 15, 2020

ferdnyc left a comment

Choose a reason for hiding this comment

ferdnyc Feb 15, 2020

Choose a reason for hiding this comment

SuslikV Feb 15, 2020 • edited Loading

Choose a reason for hiding this comment

SuslikV commented Feb 16, 2020

SuslikV commented Feb 25, 2020

jonoomph commented Feb 26, 2020

ferdnyc commented Feb 26, 2020 • edited Loading

jonoomph commented Feb 26, 2020

jonoomph commented Feb 26, 2020

ferdnyc commented Feb 27, 2020 • edited Loading

ferdnyc commented Feb 27, 2020 • edited Loading

SuslikV commented Feb 27, 2020

SuslikV commented Feb 15, 2020 •

edited

Loading

SuslikV Feb 15, 2020 •

edited

Loading

ferdnyc commented Feb 26, 2020 •

edited

Loading

ferdnyc commented Feb 27, 2020 •

edited

Loading

ferdnyc commented Feb 27, 2020 •

edited

Loading