Recursive upload with RSA with UTF8 files completes but download fails #29

Closed
flexray opened this Issue Apr 4, 2017 · 6 comments

Comments

Projects
None yet
2 participants

flexray commented Apr 4, 2017

Upload completes flawlessly
blobxfer stopablob container1 /etc --upload --rsapublickey ../public.pem --storageaccountkey blabla

Download fails:
blobxfer stopablob container1 etc1 --remoteresource . --download --rsaprivatekey ../private.pem --storageaccountkey blabla --rsakeypassphrase blabla

remote blob: apparmor.d/tunables/home.d/ubuntu length: 352 bytes, md5: KoiBH3t2PaqWwgsgJpKUpA==
Traceback (most recent call last):
  File "/usr/local/bin/blobxfer", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python2.7/dist-packages/blobxfer.py", line 2566, in main
    localfile, blob, False, blobdict[blob])
  File "/usr/local/lib/python2.7/dist-packages/blobxfer.py", line 1827, in generate_xferspec_download
    remoteresource, contentlength, contentmd5))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xdc' in position 11: ordinal not in range(128)
=====================================
 azure blobxfer parameters [v0.12.1]
=====================================
             platform: Linux-4.4.0-71-generic-x86_64-with-Ubuntu-16.04-xenial
   python interpreter: CPython 2.7.12
     package versions: az.common=1.1.4 az.sml=0.20.5 az.stor=0.33.0 crypt=1.8.1 req=2.12.3
      subscription id: None
      management cert: None
   transfer direction: Azure->local
       local resource: etc1
      include pattern: None
      remote resource: .
   max num of workers: 3
              timeout: None
      storage account: stopablob
              use SAS: False
  upload as page blob: False
  auto vhd->page blob: False
 upload to file share: False
 container/share name: container1
  container/share URI: https://stopablob.blob.core.windows.net/container1
    compute block MD5: False
     compute file MD5: True
    skip on MD5 match: True
   chunk size (bytes): 4194304
     create container: True
  keep mismatched MD5: False
     recursive if dir: True
component strip on up: 1
        remote delete: False
           collate to: disabled
      local overwrite: True
      encryption mode: file dependent
         RSA key file: ../private.pem
         RSA key type: private
=======================================

flexray commented Apr 4, 2017

I've added below lines and that did the trick. Don't know if it breaks other things

import sys
reload(sys)
sys.setdefaultencoding('utf8')

Collaborator

alfpark commented Apr 4, 2017

@flexray What is your locale set to? Most likely your locale is not unicode-aware and thus is causing the error in the print function when it encounters a non-ascii character. Instead of manually adding those lines (as that is not a recommended solution), you should either:

  1. properly set your locale
  2. use PYTHONIOENCODING=utf-8 environment variable in your shell while invoking blobxfer

flexray commented Apr 5, 2017

It is 100% replicable. Regenerated Blob and I used default Azure VM with Ubuntu 16.04 in North Europe Region.
Setting declare -x PYTHONIOENCODING="utf-8" does nothing. Error is the same.

Locale:
flexray@stopeczka:~$ locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

Traceback (most recent call last):
File "/usr/local/bin/blobxfer", line 11, in
sys.exit(main())
File "/usr/local/lib/python2.7/dist-packages/blobxfer.py", line 2566, in main
localfile, blob, False, blobdict[blob])
File "/usr/local/lib/python2.7/dist-packages/blobxfer.py", line 1827, in generate_xferspec_download
remoteresource, contentlength, contentmd5))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xdc' in position 11: ordinal not in range(128)

Collaborator

alfpark commented Apr 5, 2017

I'm not sure why print isn't honoring your locale or PYTHONIOENCODING in your case. The following is on Ubuntu 16.04:

$ python --version
Python 2.7.12
$ export LC_ALL=POSIX LC_CTYPE=POSIX LANG=POSIX
$ python -c 'print u"\xdc"'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xdc' in position 0: ordinal not in range(128)

Setting PYTHONIOENCODING fixes the issue with no locale:

$ PYTHONIOENCODING=utf-8 python -c 'print u"\xdc"'
Ü

Setting the locale correctly and not specifying PYTHONIOENCODING prints correctly:

$ export LC_ALL=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 LANG=en_US.UTF-8
$ python -c 'print u"\xdc"'
Ü
$ PYTHONIOENCODING=utf-8 python -c 'print u"\xdc"'
Ü

Since you are on 16.04, you could try installing blobxfer for python3 instead which shouldn't have those print issues with unicode.

flexray commented Apr 6, 2017

Tried python3 and now i cannot pass the privatekey passphrase in commandline - that worked before.

Traceback (most recent call last):
File "/usr/local/bin/blobxfer", line 11, in
sys.exit(main())
File "/usr/local/lib/python3.5/dist-packages/blobxfer.py", line 2247, in main
backend=cryptography.hazmat.backends.default_backend())
File "/usr/local/lib/python3.5/dist-packages/cryptography/hazmat/primitives/serialization.py", line 20, in load_pem_private_key
return backend.load_pem_private_key(data, password)
File "/usr/local/lib/python3.5/dist-packages/cryptography/hazmat/backends/multibackend.py", line 305, in load_pem_private_key
return b.load_pem_private_key(data, password)
File "/usr/local/lib/python3.5/dist-packages/cryptography/hazmat/backends/openssl/backend.py", line 974, in load_pem_private_key
password,
File "/usr/local/lib/python3.5/dist-packages/cryptography/hazmat/backends/openssl/backend.py", line 1128, in _load_key
raise TypeError("Password must be bytes")
TypeError: Password must be bytes

Collaborator

alfpark commented Apr 14, 2017

Looks like a defect in the current code (the password needs to be encoded). Please revert back to your Python2 workaround for now. This will be properly addressed in the rewrite.

@alfpark alfpark closed this Apr 14, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment