Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pybind: fix unicode handling in CephFSVolumeClient::purge #8452

Merged
merged 1 commit into from May 16, 2016

Conversation

jcsp
Copy link
Contributor

@jcsp jcsp commented Apr 5, 2016

os.path.join is sensitive to string encoding, but
just doing a straight substitution should not be.

Signed-off-by: John Spray john.spray@redhat.com

@jcsp
Copy link
Contributor Author

jcsp commented Apr 5, 2016

Tested by ceph/ceph-qa-suite#934

@jcsp
Copy link
Contributor Author

jcsp commented May 2, 2016

Passing test run here (http://pulpito.ceph.com/jspray-2016-04-30_17:56:02-fs-wip-jcsp-testing---basic-mira/) minus the valgrind failure from #8747

@jdurgin
Copy link
Member

jdurgin commented May 4, 2016

Looks like that run didn't include test_purge (not using the wip-15266 suite branch).

Maybe I'm missing something, but I'm not seeing how using format() handles unicode better than os.path.join - what are the types of args that fail with the latter and succeed with the former?

@gregsfortytwo gregsfortytwo assigned jcsp and unassigned gregsfortytwo May 5, 2016
@onyb
Copy link
Contributor

onyb commented May 5, 2016

Python 3 documentation suggests the following for os.path module:

Applications are encouraged to represent file names as (Unicode) character strings.

In this particular case, one possible fix in PY2 could be: d_full = os.path.join(root_path, unicode(d.d_name)). Note that d_full in the above expression would also be a Unicode literal.

I hope things will be simpler after we have a compatibility layer (for PY2 and PY3 dual support), during Google Summer of Code.

os.path.join is sensitive to string encoding, but
just doing a straight substitution should not be.

Signed-off-by: John Spray <john.spray@redhat.com>
@jcsp
Copy link
Contributor Author

jcsp commented May 11, 2016

@jdurgin yeah, this was a weird one. I think Manila is giving us share IDs (which are very much ascii) as unicode python objects.

>>> os.path.join(u"/foo", chr(0xc3))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.7/posixpath.py", line 73, in join
    path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)
>>> "{0}/{1}".format(u"/", chr(0xc3))
'//\xc3'

My reproducer PR wasn't actually using unicode properly so I've updated that too (can see it failing if you back out the join vs. format change)

@jcsp jcsp assigned gregsfortytwo and unassigned jcsp May 11, 2016
@jdurgin
Copy link
Member

jdurgin commented May 13, 2016

Ah, ran into similar passing of unicode from glance, cinder, and nova - since we didn't have a high-level library like this we ended up forcing everything to utf-8-encoded string types before passing to librbd. Of course, nothing actually used non-ascii chars for e.g. uuids, so it was just a nuisance, but allowed it to work with existing versions of the python bindings.

@jcsp
Copy link
Contributor Author

jcsp commented May 13, 2016

@jdurgin
Copy link
Member

jdurgin commented May 13, 2016

lgtm fwiw

@jcsp jcsp merged commit 7d5ca75 into ceph:master May 16, 2016
@jcsp jcsp deleted the wip-15266 branch May 16, 2016 12:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants