Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Flake] error: 'file:///tmp/测试' is not a valid URL #5759

Open
Vonfry opened this issue Dec 10, 2021 · 17 comments
Open

[Flake] error: 'file:///tmp/测试' is not a valid URL #5759

Vonfry opened this issue Dec 10, 2021 · 17 comments
Labels

Comments

@Vonfry
Copy link
Member

Vonfry commented Dec 10, 2021

Describe the bug
If a flake is under a dir with non english characters, the error is out when running nix flake lock.

Steps To Reproduce

cd /tmp
mkdir 测试
cd 测试
git init
echo '{ inputs.nixpkgs.url = "nixpkgs";  outputs = { ... }: {  }; }' > flake.nix
nix flake lock

or

cd /tmp
mkdir 测试
echo '{ inputs.nixpkgs.url = "nixpkgs";  outputs = { ... }: {  }; }' > flake.nix
nix flake lock ./测试 # P.S. nix flake lock ./test can work

Expected behavior
It can work.

nix-env --version output
nix-env (Nix) 2.4 (nixpkgs: c6fc79a2ac611b58ff93574cf787260ed2101d06)

Additional context
The problem is caused by flakeUrl and parseFlakeRefWithFragment probably

cc: @NickCao

@Vonfry Vonfry added the bug label Dec 10, 2021
@thufschmitt
Copy link
Member

The error (/tmp/测试 is not a valid URL) is actually thrown in parseURL. I tried changing the corresponding regex to replace the a-zA-Z0-9 character ranges by [:alnum:] thinking that it would probably be utf-8 aware (as long as the locale is), but it didn’t seem to change anything.

@kidonng
Copy link

kidonng commented Dec 10, 2021

I tried changing the corresponding regex to replace the a-zA-Z0-9 character ranges by [:alnum:] thinking that it would probably be utf-8 aware (as long as the locale is), but it didn’t seem to change anything.

Did some rudimentary test and it seems to work:

Run following code online: https://coliru.stacked-crooked.com/a/7966647daeafabab

Input:

#include <iostream>
#include <locale>

int main() {
    const wchar_t c = L'\u86e4'; // Ideograph clam CJK
 
    std::locale loc1("C");
    std::cout << "isalnum('蛤', C locale) returned "
               << std::boolalpha << std::isalnum(c, loc1) << '\n';
 
    std::locale loc2("en_US.UTF-8");
    std::cout << "isalnum('蛤', Unicode locale) returned "
              << std::boolalpha << std::isalnum(c, loc2) << '\n';
}

Outputs:

g++ -std=c++20 -O2 -Wall -pedantic -pthread main.cpp && ./a.out
isalnum('蛤', C locale) returned false
isalnum('蛤', Unicode locale) returned true

@thufschmitt
Copy link
Member

@kidonng ah indeed, need to use a wstring (and wregexes).
Do you know whether there’s a nice (and locale-aware) way to convert a string to a wstring? Looks like wstring_convert used to be the way to go, but since it’s deprecated I assume that there’s a better way of doing that

@edolstra
Copy link
Member

Please don't use wstring, we're not on Windows. The tacit assumption in Nix is that we use UTF-8-encoded strings everywhere.

@thufschmitt
Copy link
Member

The tacit assumption in Nix is that we use UTF-8-encoded strings everywhere.

Yup’, but the regex engine is apparently not aware of that. Maybe I’m just not doing things properly (I must confess that I’ve more or less no idea what I’m doing), but I can’t get a regex that matches any UTF-8 character on plain strings. Any idea how to do that?

@stale
Copy link

stale bot commented Jun 20, 2022

I marked this as stale due to inactivity. → More info

RasmusRendal added a commit to RasmusRendal/nix that referenced this issue Jul 12, 2022
Also change the flakeref parser to just check if the directory exists.
While using `c & 0xFF` doesn't look pretty, it's the only thing that
works. I suspect it has something to do with the intersection between
the previous conversion to unsigned, and the templating used in boost.

This closes NixOS#5759 and NixOS#4563.

TODO: Just deleting the regex like I did does not work. You still need
to be able to parse paths that aren't there, if they aren't used lazily.
@balsoft
Copy link
Member

balsoft commented Dec 7, 2022

Still a problem

@balsoft
Copy link
Member

balsoft commented Dec 7, 2022

What's the reason to have this regex in the first place?

@tdeebswihart
Copy link

tdeebswihart commented Feb 28, 2023

I just had this happen with a space in my filesystem path:

$ nix --version
nix (Nix) 2.11.1
$ mkdir '/tmp/A Directory'
$ cd '/tmp/A Directory' 
$ echo '{ inputs.nixpkgs.url = "nixpkgs";  outputs = { ... }: {  }; }' > flake.nix
$ git init
Initialized empty Git repository in /private/tmp/A Directory/.git/
$ nix flake lock
error: 'file:///private/tmp/A Directory' is not a valid URL

@thufschmitt
Copy link
Member

What's the reason to have this regex in the first place?

It's used to parse the input since it's not just a path, but an arbitrary flake input.

@chronoslynx I think #6614 would fix that, but it's abandoned. Do you feel like you could take it over (essentially rebase it since it's mostly good apart from the merge conflicts)?

@thufschmitt
Copy link
Member

Regarding the original issue,

  • cd 测试 && nix flake lock now seems to work,
  • The ugly nix flake lock file:$PWD/%E6%B5%8B%E8%AF%95 should probably be easy to get working, but fail at the moment, probably because the percent decoding isn't properly done everywhere it should
  • The also-but-marginally-less ugly nix flake lock ./%E6%B5%8B%E8%AF%95 fails, probably both because of the above and because the path regex doesn't seem to take percent-encoding properly into account
  • The logical nix flake lock ./测试 also fails, and I don't see how we could solve that without having a unicode-aware parsing method

@tdeebswihart
Copy link

I'll take a swing at it. I'm ill this weekend, so won't be quick unfortunately

@tdeebswihart
Copy link

tdeebswihart commented Mar 4, 2023

$ ./bootstrap.sh
$ ./configure $configureFlags --prefix=$(pwd)/outputs/out
$ make -j $NIX_BUILD_CORES
  GEN    Makefile.config
make: *** No rule to make target '/Users/timods/Documents/Projects/nix/nix/outputs/out/bin/nix', needed by 'doc/manual/nix.json'.  Stop.

That's a promising start after I rebase. I'll take a deeper look tomorrow

@tdeebswihart
Copy link

tdeebswihart commented Mar 4, 2023

So the following actually works on a release build of nix. We don't seem to bother with parsing the file path as a URL unless the directory is a git repo:

$ mkdir 测试 && cd 测试
$ nix flake lock           # OK
$ rm flake.lock
$ git init 
$ nix flake lock
error: getting status of '/nix/store/0ccnxa25whszw7mgbgyzdm4nqc0zwnm8-source/flake.nix': No such file or directory

That PR also does not fix the problem when spaces are present in a URL. I added some prints to the top of parseURL and it looks like the path with a space isn't being encoded prior to the call to parseURL:

$ cd 测试
$ ~/dev/nix/result/bin/nix flake lock
parseURL(daemon)
parseURL(file:///private/tmp/%E6%B5%8B%E8%AF%95)
parseURL(file:///private/tmp/%E6%B5%8B%E8%AF%95)
error: getting status of '/nix/store/0ccnxa25whszw7mgbgyzdm4nqc0zwnm8-source/flake.nix': No such file or directory

$ cd ../A\ Directory
$ ~/dev/nix/result/bin/nix flake lock
parseURL(file:///private/tmp/A Directory)
error: 'file:///private/tmp/A Directory' is not a valid URL

On your notes above: the path regex actually looks fine. I dumped the full url regex and put it into https://regex101.com to test it out and it handles percent-encoding without issue. It looks more like we aren't always percent-encoding the string in the first place

thufschmitt pushed a commit to RasmusRendal/nix that referenced this issue Apr 20, 2023
To support using nix flakes in paths with spaces, this introduces the convention
that the path part of the URL should be percent-encoded when dealing with paths.

It follows the convention of firefox, which uses percent encoding when encoding
a local path to an URL.

Eventually, this might also allow paths with arbitrary unicode characters, if
the percent encoding and decoding methods are improved such that they can handle
them.

This closes NixOS#6394.

Make url encoding work for arbritrary unicode characters

Also change the flakeref parser to just check if the directory exists.
While using `c & 0xFF` doesn't look pretty, it's the only thing that
works. I suspect it has something to do with the intersection between
the previous conversion to unsigned, and the templating used in boost.

This closes NixOS#5759 and NixOS#4563.

TODO: Just deleting the regex like I did does not work. You still need
to be able to parse paths that aren't there, if they aren't used lazily.
@stereomato
Copy link

I just hit this with also a space in the path, ha

@thufschmitt
Copy link
Member

@pearsche I think #6614 (merged, but not yet released) should fix this.
Can you try with a recent-enough Nix version (nix shell nix/9a78d87bc0576b87f33d6ee591d7480f0206f300 should give you that for instance)?

@mi-skam
Copy link

mi-skam commented Jul 31, 2024

I have the same with a path including an @-sign /home/user/@/config#hostname

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants