Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows: non-latin symbols in USERPROFILE or PATH break conan #2163

Open
anatoly-spb opened this issue Dec 16, 2017 · 14 comments

Comments

Projects
None yet
5 participants
@anatoly-spb
Copy link

commented Dec 16, 2017

The following script in 866 code page does nothing, but must install cmake files to generate the project:

call git clone https://github.com/memsharded/example-poco-timer.git 
cd example-poco-timer
set CONAN_USER_HOME=
set USERNAME=Профиль
set USERPROFILE=F:\Профиль
mkdir build 
cd build
call conan install ..

Double check the output, as you see conan says and does nothing:

F:\Work\conan>call git clone https://github.com/memsharded/example-poco-timer.git
Cloning into 'example-poco-timer'...
remote: Counting objects: 40, done.
remote: Total 40 (delta 0), reused 0 (delta 0), pack-reused 40
Unpacking objects: 100% (40/40), done.

F:\Work\conan>cd example-poco-timer

F:\Work\conan\example-poco-timer>set CONAN_USER_HOME=

F:\Work\conan\example-poco-timer>set USERNAME=Профиль

F:\Work\conan\example-poco-timer>set USERPROFILE=F:\Профиль

F:\Work\conan\example-poco-timer>mkdir build

F:\Work\conan\example-poco-timer>cd build

F:\Work\conan\example-poco-timer\build>call conan install ..

F:\Work\conan\example-poco-timer\build>dir
 Том в устройстве F имеет метку WORK
 Серийный номер тома: 4E18-2D78

 Содержимое папки F:\Work\conan\example-poco-timer\build

16.12.2017  18:02    <DIR>          .
16.12.2017  18:02    <DIR>          ..
               0 файлов              0 байт
               2 папок  29 097 922 560 байт свободно

If I assign to the USERPROFILE environment variable a latin string or set CONAN_USER_HOME to latin path everything is fine.

Also I observe a strange error when PATH contains non-latine symbols:

git clone https://github.com/anatoly-spb/conan-libqrencode
cd conan-libqrencode
set PATH=F:\Профиль;%PATH%
conan create bincrafters/stable

failed with

libqrencode/4.0.0@bincrafters/stable: Package '9b1a62505a16cd151a66fc2504680042cb2cf4f2' built
libqrencode/4.0.0@bincrafters/stable: Build folder F:\Conan\build\.conan\data\libqrencode\4.0.0\bincrafters\stable\build\9b1a62505a16cd151a66fc2504680042cb2
cf4f2
libqrencode/4.0.0@bincrafters/stable: Generated conaninfo.txt
libqrencode/4.0.0@bincrafters/stable: Generated conanbuildinfo.txt
libqrencode/4.0.0@bincrafters/stable: Generating the package
libqrencode/4.0.0@bincrafters/stable: Package folder F:\Conan\build\.conan\data\libqrencode\4.0.0\bincrafters\stable\package\9b1a62505a16cd151a66fc250468004
2cb2cf4f2
libqrencode/4.0.0@bincrafters/stable: Calling package()
libqrencode/4.0.0@bincrafters/stable package(): Copied 1 '.h' files: qrencode.h
libqrencode/4.0.0@bincrafters/stable package(): Copied 1 '.lib' files: qrencode.lib
libqrencode/4.0.0@bincrafters/stable: Package '9b1a62505a16cd151a66fc2504680042cb2cf4f2' created
libqrencode/4.0.0@bincrafters/stable test package: Generator cmake created conanbuildinfo.cmake
libqrencode/4.0.0@bincrafters/stable test package: Generator txt created conanbuildinfo.txt
libqrencode/4.0.0@bincrafters/stable test package: Generated conaninfo.txt
ERROR: Unable to build it successfully
  File "conan\conans\client\tools\env.py", line 34, in environment_append
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd2 in position 1792: ordinal not in range(128)

@anatoly-spb anatoly-spb changed the title Windows: non-latin symbols in USERPROFILE breaks conan Windows: non-latin symbols in USERPROFILE or PATH breaks conan Dec 16, 2017

@anatoly-spb anatoly-spb changed the title Windows: non-latin symbols in USERPROFILE or PATH breaks conan Windows: non-latin symbols in USERPROFILE or PATH break conan Dec 16, 2017

@memsharded

This comment has been minimized.

Copy link
Contributor

commented Dec 16, 2017

Yes, that is totally possible, we have tried to workaround that in the past, but seems really challenging. Specially for OS related things, like paths. Some of the limitations comes because of keeping compatibility with python 2.7, because python 3 support for Unicode and encodings in general is much better. Are you running conan with python 2 or 3? Because I would recommend you the later. If you haven't yet, please try and tell me. Thanks!

@memsharded

This comment has been minimized.

Copy link
Contributor

commented Dec 16, 2017

I add the label "fix wanted", because it is also a bit difficult to test and reproduce, it requires installing non-latin OS languages, etc., so it is difficult to test also in CI. Any help to address these issues is very welcome.

@anatoly-spb

This comment has been minimized.

Copy link
Author

commented Dec 16, 2017

@memsharded I have tried python 3.6 wihtout success:

set PATH=c:\Python36;%PATH%
call python.exe -V
call git clone https://github.com/memsharded/example-poco-timer.git 
cd example-poco-timer
set CONAN_USER_HOME=
set USERNAME=Профиль
set USERPROFILE=F:\Профиль
mkdir build 
cd build
call conan install ..
dir 

outputs

F:\Work\conan\example-poco-timer>call python.exe -V
Python 3.6.3

F:\Work\conan\example-poco-timer>set CONAN_USER_HOME=
F:\Work\conan\example-poco-timer>set USERNAME=Профиль
F:\Work\conan\example-poco-timer>set USERPROFILE=F:\Профиль
F:\Work\conan\example-poco-timer>mkdir build
F:\Work\conan\example-poco-timer>cd build
F:\Work\conan\example-poco-timer\build>call conan install ..
F:\Work\conan\example-poco-timer\build>dir
 Том в устройстве F имеет метку WORK
 Серийный номер тома: 4E18-2D78
 Содержимое папки F:\Work\conan\example-poco-timer\build
17.12.2017  00:24    <DIR>          .
17.12.2017  00:24    <DIR>          ..
               0 файлов              0 байт
               2 папок  29 097 041 920 байт свободно

the same with PATH:

libqrencode/4.0.0@bincrafters/stable: Package '9b1a62505a16cd151a66fc2504680042cb2cf4f2' built
libqrencode/4.0.0@bincrafters/stable: Build folder F:\Conan\build\.conan\data\libqrencode\4.0.0\bincrafters\stable\build\9b1a62505a16cd151a66fc2504680042cb2cf4f2
libqrencode/4.0.0@bincrafters/stable: Generated conaninfo.txt
libqrencode/4.0.0@bincrafters/stable: Generated conanbuildinfo.txt
libqrencode/4.0.0@bincrafters/stable: Generating the package
libqrencode/4.0.0@bincrafters/stable: Package folder F:\Conan\build\.conan\data\libqrencode\4.0.0\bincrafters\stable\package\9b1a62505a16cd151a66fc2504680042cb2cf4f2
libqrencode/4.0.0@bincrafters/stable: Calling package()
libqrencode/4.0.0@bincrafters/stable package(): Copied 1 '.h' files: qrencode.h
libqrencode/4.0.0@bincrafters/stable package(): Copied 1 '.lib' files: qrencode.lib
libqrencode/4.0.0@bincrafters/stable: Package '9b1a62505a16cd151a66fc2504680042cb2cf4f2' created
libqrencode/4.0.0@bincrafters/stable test package: Generator cmake created conanbuildinfo.cmake
libqrencode/4.0.0@bincrafters/stable test package: Generator txt created conanbuildinfo.txt
libqrencode/4.0.0@bincrafters/stable test package: Generated conaninfo.txt
ERROR: Unable to build it successfully
  File "conan\conans\client\tools\env.py", line 34, in environment_append
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa6 in position 4: ordinal not in range(128)

Anyway there is a workaround with setting CONAN_USER_HOME to a latin path and remove from PATH non-latin symbols, so as for me this is not big problem.

@mpdelbuono

This comment has been minimized.

Copy link
Contributor

commented Dec 17, 2017

@memsharded I have a bunch of paths using Japanese characters so I can probably take a look at this and see if I can write up a fix, but I have a question based on the earlier comments. The goal is to make this work in both Python 2 and 3, right? There isn't something you're aware of that would prevent it from working in one or the other? Or are you only looking to support it in Python 3?

@memsharded

This comment has been minimized.

Copy link
Contributor

commented Dec 17, 2017

Yes, you are right. I would like to go Py3 only, but the user base still using 2.7 is very large, so I think we still need to support 2.7. Which is probably the main inconvenience for robust handling of this issues, most of the times I tried to address these things there were stoppers in py2.7 (I can't recall exactly what :(, but one of the things I recall is failing under OS messages, that can't be easily replicated in testing, and are also depending on the CP of the output terminal)

@mpdelbuono

This comment has been minimized.

Copy link
Contributor

commented Dec 19, 2017

So as a status update: I've started to break some ground on Python 2.7's handling of unicode paths. I'm approaching design-wise as "all paths are always unicode" and if that's true, then everything else should fall into place (Python 2.6+'s IO classes support unicode paths, as long as you're using the unicode type, and Python 3's string types are always unicode).

The problem behind this approach, however, is that Python 2 and 3 differ wildly in how they handle strings. Python 3 strings are always unicode, but Python 2 requires some special calls. The problem here is that those calls either don't exist in Python 3 or are deprecated, resulting in the need to special case. On top of this, there are some APIs that simply don't accept unicode strings in Python 2, so a different approach must be taken.

For now I'm trying to abstract this away by putting the switching logic in the tools module, making the rest of the code common, but it's a little bit unfortunate that this is the way it goes. For example, the following code is in that compatibility module:

import imp
from io import open

# Alternate implementation of imp.load_source which
# uses open() and imp.new_module() instead. This is to work around the fact that
# Python 2's imp.load_source() does not support Unicode. See http://bugs.python.org/issue9425
def load_source(name, pathname):
    try:
        return imp.load_source(name, pathname)
    except UnicodeEncodeError:
        # fallback to open()/new_module
        with open(pathname, encoding='utf8') as f:
            module = imp.new_module(name)
            module.__file__ = pathname
            exec(f.read(), module.__dict__)
            return module

Anyway... so far I've gotten conan download to work properly when the data directory is configured in conan.conf with a unicode path name. This is only the first step, but I believe if I follow the same paradigm everywhere, it should continue to work. My question is: do you think this approach is sane?

mpdelbuono added a commit to mpdelbuono/conan that referenced this issue Dec 19, 2017

Add support for a unicode conan data directory. Tested so far with `c…
…onan download` on Py2.7. A unicode compatibility module has been started to contain elements that need to differ between Py2 and Py3. (conan-io#2163)
@memsharded

This comment has been minimized.

Copy link
Contributor

commented Dec 19, 2017

I'm approaching design-wise as "all paths are always unicode" and if that's true

One of the limitations that I have found in the past with Py2.7 is that sometimes it is not under user control. You can manage a lot, but at some point you have to rely on some python library feature, that it will always fail, it won't accept unicode, or attempt to decode it to something that fails, etc.

I don't want to be pessimistic, but wanted to share my previous pains trying this. I was able to do some progress, but always got stuck somewhere :(

What you are proposing is just a workaround for one of those issues. It reads a bit like cluttering the codebase, so I am not sure about extending it everywhere.

In any case, many thanks, you are doing a fantastic job.

I am thinking, maybe the sanest thing would be to aim for better unicode support just in Python 3. Users wanting it should work in Python 3 (which is also a good reason to move some userbase to Py3, hopefully we are able to deprecate Py2 sooner), and while it will require some work, will be much easier than achieving it in Py2&3. What do you think?

@mpdelbuono

This comment has been minimized.

Copy link
Contributor

commented Dec 31, 2017

I've been poking various places and I've come to the same conclusion that you have, that it's probably better to just say "if you want Unicode support, use Py3". The sheer number of places where I would have to add an abstraction layer is probably more effort than it is worth.

That being said, Py3 doesn't work on its own, yet, but I'm working on fixing that. At least this way all strings can support unicode natively and thus most of the OS elements handle it automatically. There's just a few "gotcha" places, for which I should have a patch soon.

(I'm debating whether or not c112a21 should stay in, as it was made to support Py2. It seems unfortunate to pull out already-done work, but it's also potentially unclean.)

@memsharded

This comment has been minimized.

Copy link
Contributor

commented Jan 1, 2018

Ok, I am happy to know that it is not just I am too incompentent :) :)

Regarding that code, I am not sure either. Does it improve some case for py2? If not, and the codebase would be cleaner, then yes, it might make sense to remove it (if it won't break any users).

Thanks, great job!

mpdelbuono added a commit to mpdelbuono/conan that referenced this issue Jan 10, 2018

Add support for a unicode conan data directory. Tested so far with `c…
…onan download` on Py2.7. A unicode compatibility module has been started to contain elements that need to differ between Py2 and Py3. (conan-io#2163)

mpdelbuono added a commit to mpdelbuono/conan that referenced this issue Jan 11, 2018

Add support for a unicode conan data directory. Supports including un…
…icode characters within the conan.conf file. A unicode compatibility module has been started to contain elements that need to differ between Py2 and Py3, however,that compatibility module only gets Py2 partially supported. Py3 works well, however, CMake has trouble with generation. This does not appear to be a conan issue, but rather a cmake issue, as the command lines generated by the conan CMake helper are correct. (conan-io#2163)
@mpdelbuono

This comment has been minimized.

Copy link
Contributor

commented Jan 11, 2018

@anatoly-spb Could you please try putting chcp 65001 at the top of your batch script first? It appears that the SET commands weren't running correctly, resulting in a very broken environment variable to begin with.

Once that's done, can you please try running the conan on my fork's branch feature/unicode-path-support and see if you have trouble with it? There were some problems with configuration files that needed to be addressed, but I think they have been all resolved in that fork and I'm no longer able to reproduce your problem (assuming the chcp 65001 is added at the top of the batch script).

Note: this only works with Python 3 (tested using 3.6.4)

@ForNeVeR

This comment has been minimized.

Copy link

commented Jan 11, 2018

@mpdelbuono just a side note: @anatoly-spb mentioned that the script was saved in CP866 (it's the OEM code page for Windows in Russian locale), so I think that SET should've worked correctly in that case.

@mpdelbuono

This comment has been minimized.

Copy link
Contributor

commented Jan 12, 2018

Thanks @ForNeVeR , I totally missed that line and didn't realize that was the OEM code page. Unfortunately while I have a lot of languages installed, I still run with the en-US as my primary locale so I'm not entirely sure of what all the default settings are.

Switching into CP 866, it seems like everything is still working minus one small bug that I've found (regardless of which CP is used), so I think the fork above should (mostly) work. I'll have that one other item patched up soon and then submit a pull request. At a minimum, it's better than it was.

@ForNeVeR

This comment has been minimized.

Copy link

commented Jan 12, 2018

@mpdelbuono I think that setting chcp 866 and loading the batch file written in CP866 before running Conan should be enough to run a representative experiment. So, if it worked in your environment, then it should work for everybody. Congratulations.

@jgsogo

This comment has been minimized.

Copy link
Member

commented Sep 12, 2018

The path to deprecate python2 has been defined (https://blog.conan.io/2018/08/13/Its-Time-To-Deprecate-Python-2.html) and many efforts towards handling unicode in a compatible way for py2 and py3 have been done without a clear success (thanks to all!! 🙌), so I agree with the "if you want unicode support, use Python 3", don't you?

Said that, this issue should be retargeted to an up-to-date version of Conan running in python 3, is it still an issue, @anatoly-spb?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.