New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeEncodeError: 'ascii' codec can't encode characters #17

Open
Zakusov opened this Issue Jul 24, 2012 · 12 comments

Comments

Projects
None yet
3 participants
@Zakusov

Zakusov commented Jul 24, 2012

OS: Windows 7

git version 1.7.11.msysgit.0
cleartool version 7.1.2.0
Python version 2.7.1

config:

[core]
    repositoryformatversion = 0
    filemode = false
    bare = false
    logallrefupdates = true
    symlinks = false
    ignorecase = true
    hideDotFiles = dotGitOnly
    autocrlf = false
    debug = true
    cache = false

When I commit to git and use cyrillic in the commit message, then gitcc checkin fails:

sergey@ZAKUSOV /C/git-cc-test-repo (master)
$ gitcc checkin
> cleartool update .
> git log -z --reverse --pretty=format:%H%x01%s%n%b --first-parent master_ci..
> git diff --name-status -M -z --ignore-submodules 27492ee1c17c6be57c8b4112885573c71ffe9a8b^..27492ee1c17c6be57c8b4112885573c71ffe9a8b
> git ls-tree -z 27492ee1c17c6be57c8b4112885573c71ffe9a8b -- module1/test1.txt
> git merge-base master_ci HEAD
> cleartool co -reserved -nc module1/test1.txt
> git hash-object Z:project1\module1/test1.txt
> git ls-tree -z 779bb19196a306a58b5859c3ff8f3f7658649ab6 module1/test1.txt
> git ls-tree -z 27492ee1c17c6be57c8b4112885573c71ffe9a8b module1/test1.txt
> git cat-file blob 06f0ae4f6ea9ddfdfd960b54c34f0c85d2422ebd
> cleartool ci -identical -c "¦¦¦-¦-¦-¦¬TВ ¦- ¦-¦¦TВ¦¦TГ feature1" module1/test1.txt
Traceback (most recent call last):
  File "c:/git-cc/gitcc", line 48, in <module>
    main()
  File "c:/git-cc/gitcc", line 14, in main
    return invoke(cmd, args)
  File "c:/git-cc/gitcc", line 38, in invoke
    cmd.main(*args)
  File "c:\git-cc\checkin.py", line 39, in main
    checkout(statuses, comment.strip(), initial)
  File "c:\git-cc\checkin.py", line 90, in checkout
    transaction.commit(comment);
  File "c:\git-cc\checkin.py", line 113, in commit
    cc_exec(['ci', '-identical', '-c', comment, file])
  File "c:\git-cc\common.py", line 50, in cc_exec
    return popen('cleartool', cmd, CC_DIR, **args)
  File "c:\git-cc\common.py", line 57, in popen
    pipe = Popen(cmd, cwd=cwd, stdout=PIPE, stderr=PIPE, env=env)
  File "c:\Python27\lib\subprocess.py", line 679, in __init__
    errread, errwrite)
  File "c:\Python27\lib\subprocess.py", line 896, in _execute_child
    startupinfo)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 28-39: ordinal not in range(128)

Variable ENCODING = cp866.

@charleso

This comment has been minimized.

Show comment
Hide comment
@charleso

charleso Jul 24, 2012

Owner

Hi Zakusov,

Can you try changing the following in checkin.py:

cc_exec(['ci', '-identical', '-c', comment, file])

to

cc_exec(['ci', '-identical', '-c', comment.encode(ENCODING), file])

Charles

Owner

charleso commented Jul 24, 2012

Hi Zakusov,

Can you try changing the following in checkin.py:

cc_exec(['ci', '-identical', '-c', comment, file])

to

cc_exec(['ci', '-identical', '-c', comment.encode(ENCODING), file])

Charles

@Zakusov

This comment has been minimized.

Show comment
Hide comment
@Zakusov

Zakusov Jul 24, 2012

$ gitcc checkin
> git branch
> cleartool update .
> git log -z --reverse --pretty=format:%H%x01%s%n%b --first-parent master_ci..
> git diff --name-status -M -z --ignore-submodules a25e62357ada37171bb53781ee94fd10bd67c9e9^..a25e62357ada37171bb53781ee94fd10bd67c9e9
> git ls-tree -z a25e62357ada37171bb53781ee94fd10bd67c9e9 -- module1/test2.txt
> git merge-base master_ci HEAD
> cleartool co -reserved -nc module1
> git ls-tree -z a25e62357ada37171bb53781ee94fd10bd67c9e9 module1/test2.txt
> git cat-file blob fe9879ae5b80feb5b8a87ef92cc298b04bb958e1
> cleartool mkelem -nc module1/test2.txt
Traceback (most recent call last):
  File "c:/git-cc/gitcc", line 48, in <module>
    main()
  File "c:/git-cc/gitcc", line 14, in main
    return invoke(cmd, args)
  File "c:/git-cc/gitcc", line 38, in invoke
    cmd.main(*args)
  File "c:\git-cc\checkin.py", line 39, in main
    checkout(statuses, comment.strip(), initial)
  File "c:\git-cc\checkin.py", line 90, in checkout
    transaction.commit(comment);
  File "c:\git-cc\checkin.py", line 115, in commit
    cc_exec(['ci', '-identical', '-c', comment.decode(ENCODING), file])
  File "C:\Python27\lib\encodings\cp866.py", line 15, in decode
    return codecs.charmap_decode(input,errors,decoding_table)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-15: ordinal not in range(128)

Zakusov commented Jul 24, 2012

$ gitcc checkin
> git branch
> cleartool update .
> git log -z --reverse --pretty=format:%H%x01%s%n%b --first-parent master_ci..
> git diff --name-status -M -z --ignore-submodules a25e62357ada37171bb53781ee94fd10bd67c9e9^..a25e62357ada37171bb53781ee94fd10bd67c9e9
> git ls-tree -z a25e62357ada37171bb53781ee94fd10bd67c9e9 -- module1/test2.txt
> git merge-base master_ci HEAD
> cleartool co -reserved -nc module1
> git ls-tree -z a25e62357ada37171bb53781ee94fd10bd67c9e9 module1/test2.txt
> git cat-file blob fe9879ae5b80feb5b8a87ef92cc298b04bb958e1
> cleartool mkelem -nc module1/test2.txt
Traceback (most recent call last):
  File "c:/git-cc/gitcc", line 48, in <module>
    main()
  File "c:/git-cc/gitcc", line 14, in main
    return invoke(cmd, args)
  File "c:/git-cc/gitcc", line 38, in invoke
    cmd.main(*args)
  File "c:\git-cc\checkin.py", line 39, in main
    checkout(statuses, comment.strip(), initial)
  File "c:\git-cc\checkin.py", line 90, in checkout
    transaction.commit(comment);
  File "c:\git-cc\checkin.py", line 115, in commit
    cc_exec(['ci', '-identical', '-c', comment.decode(ENCODING), file])
  File "C:\Python27\lib\encodings\cp866.py", line 15, in decode
    return codecs.charmap_decode(input,errors,decoding_table)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-15: ordinal not in range(128)
@charleso

This comment has been minimized.

Show comment
Hide comment
@charleso

charleso Jul 24, 2012

Owner

Encode or decode? Looks like you've used decode there.

Owner

charleso commented Jul 24, 2012

Encode or decode? Looks like you've used decode there.

@charleso

This comment has been minimized.

Show comment
Hide comment
@charleso

charleso Jul 24, 2012

Owner

Hi Zakusov,

cc_exec(['ci', '-identical', '-c', comment.decode(ENCODING), file])

Still says decode in your error. Did you change it? I'm not sure if encode will actually work, but I'm afraid I can't suggest anything else. Sorry I can't help any further.

Charles

Owner

charleso commented Jul 24, 2012

Hi Zakusov,

cc_exec(['ci', '-identical', '-c', comment.decode(ENCODING), file])

Still says decode in your error. Did you change it? I'm not sure if encode will actually work, but I'm afraid I can't suggest anything else. Sorry I can't help any further.

Charles

@Zakusov

This comment has been minimized.

Show comment
Hide comment
@Zakusov

Zakusov Jul 24, 2012

$ gitcc checkin
> git branch
> cleartool update .
> git log -z --reverse --pretty=format:%H%x01%s%n%b --first-parent master_ci..
> git diff --name-status -M -z --ignore-submodules a25e62357ada37171bb53781ee94fd10bd67c9e9^..a25e62357ada37171bb53781ee94fd10bd67c9e9
> git ls-tree -z a25e62357ada37171bb53781ee94fd10bd67c9e9 -- module1/test2.txt
> git merge-base master_ci HEAD
> cleartool co -reserved -nc module1
> git ls-tree -z a25e62357ada37171bb53781ee94fd10bd67c9e9 module1/test2.txt
> git cat-file blob fe9879ae5b80feb5b8a87ef92cc298b04bb958e1
> cleartool mkelem -nc module1/test2.txt
Traceback (most recent call last):
  File "c:/git-cc/gitcc", line 48, in <module>
    main()
  File "c:/git-cc/gitcc", line 14, in main
    return invoke(cmd, args)
  File "c:/git-cc/gitcc", line 38, in invoke
    cmd.main(*args)
  File "c:\git-cc\checkin.py", line 39, in main
    checkout(statuses, comment.strip(), initial)
  File "c:\git-cc\checkin.py", line 90, in checkout
    transaction.commit(comment);
  File "c:\git-cc\checkin.py", line 113, in commit
    cc_exec(['ci', '-identical', '-c', comment.encode(ENCODING), file])
  File "c:\git-cc\common.py", line 50, in cc_exec
    return popen('cleartool', cmd, CC_DIR, **args)
  File "c:\git-cc\common.py", line 56, in popen
    debug('> ' + ' '.join(map(f, cmd)))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 1: ordinal not in range(128)

Zakusov commented Jul 24, 2012

$ gitcc checkin
> git branch
> cleartool update .
> git log -z --reverse --pretty=format:%H%x01%s%n%b --first-parent master_ci..
> git diff --name-status -M -z --ignore-submodules a25e62357ada37171bb53781ee94fd10bd67c9e9^..a25e62357ada37171bb53781ee94fd10bd67c9e9
> git ls-tree -z a25e62357ada37171bb53781ee94fd10bd67c9e9 -- module1/test2.txt
> git merge-base master_ci HEAD
> cleartool co -reserved -nc module1
> git ls-tree -z a25e62357ada37171bb53781ee94fd10bd67c9e9 module1/test2.txt
> git cat-file blob fe9879ae5b80feb5b8a87ef92cc298b04bb958e1
> cleartool mkelem -nc module1/test2.txt
Traceback (most recent call last):
  File "c:/git-cc/gitcc", line 48, in <module>
    main()
  File "c:/git-cc/gitcc", line 14, in main
    return invoke(cmd, args)
  File "c:/git-cc/gitcc", line 38, in invoke
    cmd.main(*args)
  File "c:\git-cc\checkin.py", line 39, in main
    checkout(statuses, comment.strip(), initial)
  File "c:\git-cc\checkin.py", line 90, in checkout
    transaction.commit(comment);
  File "c:\git-cc\checkin.py", line 113, in commit
    cc_exec(['ci', '-identical', '-c', comment.encode(ENCODING), file])
  File "c:\git-cc\common.py", line 50, in cc_exec
    return popen('cleartool', cmd, CC_DIR, **args)
  File "c:\git-cc\common.py", line 56, in popen
    debug('> ' + ' '.join(map(f, cmd)))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 1: ordinal not in range(128)
@charleso

This comment has been minimized.

Show comment
Hide comment
@charleso

charleso Jul 24, 2012

Owner

Looks like it's failing on the debug method. You can try commenting that out for now. Again, not sure if that's actually going to help.

Owner

charleso commented Jul 24, 2012

Looks like it's failing on the debug method. You can try commenting that out for now. Again, not sure if that's actually going to help.

@charleso

This comment has been minimized.

Show comment
Hide comment
@charleso

charleso Jul 24, 2012

Owner

Also, how are you setting the ENCODING variable? It should use the system setting, but you can also try to manually setting it in common.py just to be sure.

Owner

charleso commented Jul 24, 2012

Also, how are you setting the ENCODING variable? It should use the system setting, but you can also try to manually setting it in common.py just to be sure.

@Zakusov

This comment has been minimized.

Show comment
Hide comment
@Zakusov

Zakusov Jul 25, 2012

With commented out debug method:

$ gitcc checkin
Traceback (most recent call last):
  File "c:/git-cc/gitcc", line 48, in <module>
    main()
  File "c:/git-cc/gitcc", line 14, in main
    return invoke(cmd, args)
  File "c:/git-cc/gitcc", line 38, in invoke
    cmd.main(*args)
  File "c:\git-cc\checkin.py", line 39, in main
    checkout(statuses, comment.strip(), initial)
  File "c:\git-cc\checkin.py", line 90, in checkout
    transaction.commit(comment);
  File "c:\git-cc\checkin.py", line 113, in commit
    cc_exec(['ci', '-identical', '-c', comment.encode(ENCODING), file])
  File "c:\git-cc\common.py", line 50, in cc_exec
    return popen('cleartool', cmd, CC_DIR, **args)
  File "c:\git-cc\common.py", line 57, in popen
    pipe = Popen(cmd, cwd=cwd, stdout=PIPE, stderr=PIPE, env=env)
  File "C:\Python27\lib\subprocess.py", line 679, in __init__
    errread, errwrite)
  File "C:\Python27\lib\subprocess.py", line 855, in _execute_child
    args = list2cmdline(args)
  File "C:\Python27\lib\subprocess.py", line 615, in list2cmdline
    return ''.join(result)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 0: ordinal not in range(128)

ENCODING variable is initialized by

ENCODING = sys.stdin.encoding

How to identify the commit message encoding by Python?

Want to note that when I commit to ClearCase and use Cyrillic in the commit message, synchronization (ClearCase to Git) is successful.

Zakusov commented Jul 25, 2012

With commented out debug method:

$ gitcc checkin
Traceback (most recent call last):
  File "c:/git-cc/gitcc", line 48, in <module>
    main()
  File "c:/git-cc/gitcc", line 14, in main
    return invoke(cmd, args)
  File "c:/git-cc/gitcc", line 38, in invoke
    cmd.main(*args)
  File "c:\git-cc\checkin.py", line 39, in main
    checkout(statuses, comment.strip(), initial)
  File "c:\git-cc\checkin.py", line 90, in checkout
    transaction.commit(comment);
  File "c:\git-cc\checkin.py", line 113, in commit
    cc_exec(['ci', '-identical', '-c', comment.encode(ENCODING), file])
  File "c:\git-cc\common.py", line 50, in cc_exec
    return popen('cleartool', cmd, CC_DIR, **args)
  File "c:\git-cc\common.py", line 57, in popen
    pipe = Popen(cmd, cwd=cwd, stdout=PIPE, stderr=PIPE, env=env)
  File "C:\Python27\lib\subprocess.py", line 679, in __init__
    errread, errwrite)
  File "C:\Python27\lib\subprocess.py", line 855, in _execute_child
    args = list2cmdline(args)
  File "C:\Python27\lib\subprocess.py", line 615, in list2cmdline
    return ''.join(result)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 0: ordinal not in range(128)

ENCODING variable is initialized by

ENCODING = sys.stdin.encoding

How to identify the commit message encoding by Python?

Want to note that when I commit to ClearCase and use Cyrillic in the commit message, synchronization (ClearCase to Git) is successful.

@charleso

This comment has been minimized.

Show comment
Hide comment
@charleso

charleso Jul 25, 2012

Owner

Hi Zakusov,

Try reproducing it in a simple test Python program and do something like:

' '.join([commitMessage])

Where commitMessage is a string that contains Cyrillic. This is definitely git-cc and not Clearcase (and therefore my fault). I'm afraid I don't really know Python all that well. :(

Charles

Owner

charleso commented Jul 25, 2012

Hi Zakusov,

Try reproducing it in a simple test Python program and do something like:

' '.join([commitMessage])

Where commitMessage is a string that contains Cyrillic. This is definitely git-cc and not Clearcase (and therefore my fault). I'm afraid I don't really know Python all that well. :(

Charles

@dolanor

This comment has been minimized.

Show comment
Hide comment
@dolanor

dolanor Oct 25, 2012

OS: Red Hat Linux 5.3
git : git version 1.7.11-rc3
clearcase : 7.0.1.4
Python : 2.4.3
gitcc : e81f9e9
$LANG : en_US.utf-8
$LC_ALL : C

gitcc rebase
> git ls-files --modified
> git log -n 1 --pretty=format:%ai master_cc
> cleartool ls -recurse -short .
> cleartool lsh -fmt %o%m|%Nd|%u|%En|%Vn|%Nc\n -recurse .
Traceback (most recent call last):
 File "/infinity_tmpfs/ut1p40/charleso-git-cc-e81f9e9/gitcc", line 51, in ?
   main()
 File "/infinity_tmpfs/ut1p40/charleso-git-cc-e81f9e9/gitcc", line 14, in main
   return invoke(cmd, args)
 File "/infinity_tmpfs/ut1p40/charleso-git-cc-e81f9e9/gitcc", line 41, in invoke
   cmd.main(*args)
 File "/infinity_tmpfs/ut1p40/charleso-git-cc-e81f9e9/rebase.py", line 40, in main
   history = getHistory(since)
 File "/infinity_tmpfs/ut1p40/charleso-git-cc-e81f9e9/rebase.py", line 87, in getHistory
   return cc_exec(lsh)
 File "/infinity_tmpfs/ut1p40/charleso-git-cc-e81f9e9/common.py", line 50, in cc_exec
  return popen('cleartool', cmd, CC_DIR, **args)
 File "/infinity_tmpfs/ut1p40/charleso-git-cc-e81f9e9/common.py", line 70, in popen
   return stdout.decode(ENCODING)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 10374337: ordinal not in range(128)

This is what I got. Clearcase commit message could have been written with some accent from french.

$ file test.py && cat test.py
test.py: UTF-8 Unicode text

mavar = ''.join('€uro cest fun français')
print mavar

output :

$ python test.py
sys:1: DeprecationWarning: Non-ASCII character '\xe2' in file test.py on line 2, but no encoding declared; see  http://www.python.org/peps/pep-0263.html for details
€uro cest fun français

dolanor commented Oct 25, 2012

OS: Red Hat Linux 5.3
git : git version 1.7.11-rc3
clearcase : 7.0.1.4
Python : 2.4.3
gitcc : e81f9e9
$LANG : en_US.utf-8
$LC_ALL : C

gitcc rebase
> git ls-files --modified
> git log -n 1 --pretty=format:%ai master_cc
> cleartool ls -recurse -short .
> cleartool lsh -fmt %o%m|%Nd|%u|%En|%Vn|%Nc\n -recurse .
Traceback (most recent call last):
 File "/infinity_tmpfs/ut1p40/charleso-git-cc-e81f9e9/gitcc", line 51, in ?
   main()
 File "/infinity_tmpfs/ut1p40/charleso-git-cc-e81f9e9/gitcc", line 14, in main
   return invoke(cmd, args)
 File "/infinity_tmpfs/ut1p40/charleso-git-cc-e81f9e9/gitcc", line 41, in invoke
   cmd.main(*args)
 File "/infinity_tmpfs/ut1p40/charleso-git-cc-e81f9e9/rebase.py", line 40, in main
   history = getHistory(since)
 File "/infinity_tmpfs/ut1p40/charleso-git-cc-e81f9e9/rebase.py", line 87, in getHistory
   return cc_exec(lsh)
 File "/infinity_tmpfs/ut1p40/charleso-git-cc-e81f9e9/common.py", line 50, in cc_exec
  return popen('cleartool', cmd, CC_DIR, **args)
 File "/infinity_tmpfs/ut1p40/charleso-git-cc-e81f9e9/common.py", line 70, in popen
   return stdout.decode(ENCODING)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 10374337: ordinal not in range(128)

This is what I got. Clearcase commit message could have been written with some accent from french.

$ file test.py && cat test.py
test.py: UTF-8 Unicode text

mavar = ''.join('€uro cest fun français')
print mavar

output :

$ python test.py
sys:1: DeprecationWarning: Non-ASCII character '\xe2' in file test.py on line 2, but no encoding declared; see  http://www.python.org/peps/pep-0263.html for details
€uro cest fun français
@charleso

This comment has been minimized.

Show comment
Hide comment
@charleso

charleso Oct 25, 2012

Owner

Hi Dolanor,

I'm afraid I can't probably help with this one (not that I'm help much anyway). I'm not sure that test is the same issue. Did you read the link from the error message? You need to declare the encoding of the python file.

I wrote a mini test program locally just cat'ing a file with your test string and it worked. I'm on Python 2.7.1 though.

from subprocess import Popen, PIPE
import os, sys

ENCODING = None
if hasattr(sys.stdin, 'encoding'):
    ENCODING = sys.stdin.encoding
if ENCODING is None:
    import locale
    locale_name, ENCODING = locale.getdefaultlocale()
if ENCODING is None:
    ENCODING = "ISO8859-1"

def popen(exe, cmd, cwd, env=None, decode=True, errors=True):
    cmd.insert(0, exe)
    pipe = Popen(cmd, cwd=cwd, stdout=PIPE, stderr=PIPE, env=env)
    (stdout, stderr) = pipe.communicate()
    if errors and pipe.returncode > 0:
        raise Exception((stderr + stdout).decode(ENCODING))
    return stdout if not decode else stdout.decode(ENCODING)

print(popen('cat', ['french.txt'], '/home/user/blah'))

Charles

Owner

charleso commented Oct 25, 2012

Hi Dolanor,

I'm afraid I can't probably help with this one (not that I'm help much anyway). I'm not sure that test is the same issue. Did you read the link from the error message? You need to declare the encoding of the python file.

I wrote a mini test program locally just cat'ing a file with your test string and it worked. I'm on Python 2.7.1 though.

from subprocess import Popen, PIPE
import os, sys

ENCODING = None
if hasattr(sys.stdin, 'encoding'):
    ENCODING = sys.stdin.encoding
if ENCODING is None:
    import locale
    locale_name, ENCODING = locale.getdefaultlocale()
if ENCODING is None:
    ENCODING = "ISO8859-1"

def popen(exe, cmd, cwd, env=None, decode=True, errors=True):
    cmd.insert(0, exe)
    pipe = Popen(cmd, cwd=cwd, stdout=PIPE, stderr=PIPE, env=env)
    (stdout, stderr) = pipe.communicate()
    if errors and pipe.returncode > 0:
        raise Exception((stderr + stdout).decode(ENCODING))
    return stdout if not decode else stdout.decode(ENCODING)

print(popen('cat', ['french.txt'], '/home/user/blah'))

Charles

@dolanor

This comment has been minimized.

Show comment
Hide comment
@dolanor

dolanor Jan 23, 2013

I went up typing the cleartool lshistory with the format my self and dumping it in a file.
Thanks to the bad configuration of our Red Hat, we are not able to type in UTF-8 char neither iso-8859-15, though we are a french company with keyboard allowing you to type some é stuff.

In the log it gives the weirdo losange/squared UTF-8->ASCII misinterpretation.
Finally I fixed it myself in the log with some other ^[A ^[[O stuff that I thought could block the decoding. But now I'm having another issue that I will submit in another github issue. Thanks

dolanor commented Jan 23, 2013

I went up typing the cleartool lshistory with the format my self and dumping it in a file.
Thanks to the bad configuration of our Red Hat, we are not able to type in UTF-8 char neither iso-8859-15, though we are a french company with keyboard allowing you to type some é stuff.

In the log it gives the weirdo losange/squared UTF-8->ASCII misinterpretation.
Finally I fixed it myself in the log with some other ^[A ^[[O stuff that I thought could block the decoding. But now I'm having another issue that I will submit in another github issue. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment