Add an option to bytecode compile during installation #2086

konstin · 2024-02-29T16:33:07Z

Add a --compile option to pip install and pip sync.

I chose to implement this as a separate pass over the entire venv. If we wanted to compile during installation, we'd have to make sure that writing is exclusive, to avoid concurrent processes writing broken .pyc files. Additionally, this ensures that the entire site-packages are bytecode compiled, even if there are packages that aren't from this uv invocation. The disadvantage is that we do not update RECORD and rely on this comment from PEP 491:

Uninstallers should be smart enough to remove .pyc even if it is not mentioned in RECORD.

If this is a problem we can change it to run during installation and write RECORD entries.

Internally, this is implemented as an async work-stealing subprocess worker pool. The producer is a directory traversal over site-packages, sending each .py file to a bounded async FIFO queue/channel. Each worker has a long-running python process. It pops the queue to get a single path (or exists if the channel is closed), then sends it to stdin, waits until it's informed that the compilation is done through a line on stdout, and repeat. This is fast, e.g. installing jupyter plotly on Python 3.12 it processes 15876 files in 319ms with 32 threads (vs. 3.8s with a single core). The python processes internally calls compileall.compile_file, the same as pip.

Like pip, we ignore and silence all compilation errors (#1559). There is a 10s timeout to handle the case when the workers got stuck. For the reviewers, please check if i missed any spots where we could deadlock, this is the hardest part of this PR.

I've added uv-dev compile <dir> and uv-dev clear-compile <dir> commands, mainly for my own benchmarking. I don't want to expose them in uv, they almost certainly not the correct workflow and we don't want to support them.

Fixes #1788
Closes #1559
Closes #1928

notatallshaw · 2024-02-29T19:30:37Z

The disadvantage is that we do not update RECORD and rely on this comment from PEP 491:

Uninstallers should be smart enough to remove .pyc even if it is not mentioned in RECORD.

If this is a problem we can change it to run during installation and write RECORD entries.

I would validate that pip successfully uninstalls the pyc files, because unless I am misunderstanding this issue it probably doesn't: pypa/pip#11835

konstin · 2024-03-01T10:19:11Z

I ran

cargo run venv --seed; cargo run pip install --compile tqdm
.\.venv\Scripts\pip uninstall -y tqdm

and the tqdm folder was fully removed.

BurntSushi

Nice, this is awesome! I gave some suggestions on the concurrency structure as I think it should be possible to simplify things.

crates/uv/src/commands/pip_sync.rs

crates/uv/tests/pip_sync.rs

crates/uv/src/main.rs

crates/uv-installer/src/compile.rs

BurntSushi · 2024-03-01T15:07:12Z

crates/uv-installer/src/compile.rs

+    let workers =
+        std::thread::available_parallelism().map_err(CompileError::AvailableParallelism)?;
+    // 10 is an arbitrary number.
+    let (sender, receiver) = async_channel::bounded::<PathBuf>(workers.get() * 10);


I wouldn't even bother with * 10.

You're probably find with a rendezvous channel (a capacity of 0). Rendezvous channels are especially nice because they manifest logic errors in the structure of your concurrency very quickly. (Although I do think there are occasions where using a capacity of 1 can lead to simplifications that you couldn't otherwise do with a capacity of 0.)

It's like how most times, we just use vec![] because you usually don't benefit from starting with some non-zero initial capacity.

A larger queue is significantly faster. I tried size 1, the number of workers, the number of workers by 10 and size 1000.

hyperfine --warmup 1 --prepare "target/profiling/uv-dev clear-compile .venv" "target/profiling/uv-dev1 compile .venv" "target/profiling/uv-devworkers compile .venv" "target/profiling/uv-devworkers10 compile .venv" "target/profiling/uv-dev1000 compile .venv" Benchmark 1: target/profiling/uv-dev1 compile .venv Time (mean ± σ): 359.8 ms ± 11.5 ms [User: 6757.8 ms, System: 1849.4 ms] Range (min … max): 346.8 ms … 380.1 ms 10 runs Benchmark 2: target/profiling/uv-devworkers compile .venv Time (mean ± σ): 352.6 ms ± 8.1 ms [User: 6856.0 ms, System: 1709.1 ms] Range (min … max): 337.5 ms … 368.1 ms 10 runs Benchmark 3: target/profiling/uv-devworkers10 compile .venv Time (mean ± σ): 331.4 ms ± 5.9 ms [User: 6723.6 ms, System: 1761.8 ms] Range (min … max): 323.9 ms … 340.1 ms 10 runs Benchmark 4: target/profiling/uv-dev1000 compile .venv Time (mean ± σ): 331.0 ms ± 6.0 ms [User: 6735.2 ms, System: 1738.5 ms] Range (min … max): 323.7 ms … 339.6 ms 10 runs Summary target/profiling/uv-dev1000 compile .venv ran 1.00 ± 0.03 times faster than target/profiling/uv-devworkers10 compile .venv 1.07 ± 0.03 times faster than target/profiling/uv-devworkers compile .venv 1.09 ± 0.04 times faster than target/profiling/uv-dev1 compile .venv

That is quite interesting. I stand corrected. It might be good to add a comment about it here (no need to include the full benchmark results though).

To me I think this implies that the directory traversal isn't feeding file paths fast enough into the workers, right? And I think that would be exacerbated with a high core count. It means there are likely idle workers in the low capacity case where as those workers get work more quickly in the high capacity case. Very interesting.

BurntSushi · 2024-03-01T15:16:19Z

crates/uv-installer/src/compile.rs

+        .arg(&pip_compileall_py)
+        .stdin(Stdio::piped())
+        .stdout(Stdio::piped())
+        .stderr(Stdio::inherit())


I think this means that anything written to stderr by the interpreter will get fed through to uv's stderr right? If so, was that intentional? I don't think it's what I would expect. The messages could be quite confusing to users.

What I'd suggest is reading stderr in a separate Tokio task. That's I think the pretty standard solution for getting around the "so many things went wrong that the stderr pipe filled up and now the subprocess is blocked on writing to stderr and can't make progress" failure mode.

Anything written to stderr is a failure mode, is there a good way to say: When there's anything written to stderr at this checkout point, exit the worker with stderr as error message?

I would do that in a task dedicated to reading stderr. If that task reads anything, then you should be able to devise a mechanism to communicate that back to the main worker thread, have it kill the sub-process and return whatever was on stderr.

I'd only do this though if you're sure that anything written to stderr should cause the entire enterprise to fail. e.g., What if stderr just contains a warning or something? (But maybe that can't happen here. IDK.)

I've added handling to stderr that will

do nothing if there was no stderr,

debug print if there was stderr but exit status passed,

show the stderr in the error chain if there was stderr print and an error.

Nice, LGTM!

crates/uv-installer/src/compile.rs

BurntSushi · 2024-03-01T15:28:05Z

crates/uv-installer/src/compile.rs

+        }
+    }
+    Ok(())
+}


So looking at this code holistically, I feel like we might be able to simplify the timeout/cancellation logic here.

The way I would go about this is to think about the workers themselves controlling how long they're willing to wait for the subprocess to complete its task. You pick a timeout at that granularity. Then right before to start communicating with the subprocess, spawk a task that will kill the subprocess if it hasn't completed after N seconds. You should be able to implement this with tokio::select!.

If you do that, then I think you should be able to get rid of the timeouts almost everywhere else. It absolves of you needing to worry about writing to stdin taking too long, because the worker will kill the subprocess if it's truly blocked. When the subprocess gets killed, your code that is blocked on stdin should become unblocked immediately and everything should gracefully terminate.

I've simplified the timeouts in a different way: Only the caller has a timeout for workers (or the channel) getting stuck, one when sending (due to backpressure) and one when waiting for the workers to join. This way it doesn't matter whether the python process is broken, there's a bug in the worker implementation or the tokio threadpool or the channel broke due to a panic, each case trips that timeout.

Placing it at the send might be an unusual positioning, but i think it's the best place to catch stuck workers.

Note that either case is almost certainly a bug, either a real bug in uv or a quirk python interpreter we need to special case.

Add a `--compile` option to `pip install` and `pip sync`. I chose to implement this as a separate over the entire venv. If we wanted to compile during installation, we'd have to make sure that writing is exclusive, to avoid concurrent processes writing broken `.pyc` files. Additionally, this ensures that the entire site-packages are bytecode compiled, even if there are packages that aren't from this `uv` invocation. The disadvantage is that we do not update RECORD and rely on this comment from [PEP 491](https://peps.python.org/pep-0491/): > Uninstallers should be smart enough to remove .pyc even if it is not mentioned in RECORD. If this is a problem we can change it to run during installation and write RECORD entries. Internally, this is implemented as an async work-stealing subprocess worker pool. The producer is a directory traversal over site-packages, sending each `.py` file to a bounded async FIFO queue/channel. Each worker has a long-running python process. It pops the queue to get a single path (or exists if the channel is closed), then sends it to stdin, waits until it's informed that the compilation is done through a line on stdout, and repeat. This is fast, e.g. installing `jupyter plotly` on Python 3.12 it processes 15876 files in 319ms with 32 threads (vs. 3.8s with a single core). The python processes internally calls `compileall.compile_file`, the same as pip. Like pip, we ignore and silence all compilation errors (#1559). There are 10s and 1s timeouts to ensure we don't get stuck when the python subprocess doesn't work properly or there was a panic breaking tokio. For the reviewers, please check if i missed any spots where we could deadlock, this is the hardest part of this PR. I still have to check that windows works. I don't think we want an option to compile the seed packages in ihe `venv` subcommand, do we? Fixes #1788 Closes #1559 Closes #1928

crates/uv/src/commands/pip_install.rs

crates/uv/src/commands/mod.rs

crates/uv-installer/src/compile.rs

charliermarsh

I'm cool with the overall approach, but I defer to Andrew to review and approve the pipeline in compile.rs.

charliermarsh · 2024-03-03T23:09:51Z

On second thought, I’m wondering if this should just be a stand-alone command. Since it runs over the whole env, there’s no reason it needs to be coupled to install / sync.

konstin · 2024-03-04T10:21:54Z

We can expose it as uv compileall (we can copy the dev command over), my only counterargument is not extending our external interface more. I'd like to keep the --compile flag either way to keep the install operation a single command.

Fwiw there's also python -m compileall, it does the same thing just slower:

$ hyperfine --prepare "uv venv && uv pip install jupyter plotly" "python -m compileall .venv/lib/python3.12/sit
e-packages/"
Benchmark 1: python -m compileall .venv/lib/python3.12/site-packages/
  Time (mean ± σ):      5.407 s ±  1.026 s    [User: 4.354 s, System: 0.427 s]
  Range (min … max):    4.668 s …  7.699 s    10 runs

crates/uv-installer/src/compile.rs

zanieb · 2024-03-04T15:43:36Z

crates/uv-installer/src/compile.rs

+        // Otherwise stdout is buffered and we'll wait forever for a response
+        .env("PYTHONUNBUFFERED", "1")


You could also just flush on the Python end when you call print?

Yes, that would also work

BurntSushi

I'm fine with this landing as-is, but I'm still overall skeptical of how the channel timeouts are being used here.

My understanding is that if compilation takes longer than the channel timeout overall (~10s), then the channel timeout could trip and thus cause the entire enterprise to fail even if it would have otherwise succeeded. Whether or not that's a real problem in practice is hard to say, but it feels non-ideal to be placing bounds on things that we don't really control. That is, the channel timeout is conflating the possibility of bugs (in both our code and perhaps in how the Python interpreter behaves) with the amount of time it takes for the user's resources to bytecode-compile a set of Python files.

I do think the better approach here is to deal with problems with byte-code compiling at the level of the subprocess. And specifically, only use timeouts when something you know could legitimately hang (like, say, killing a subprocess).

With that said, without user reports, it's hard to know whether the timeouts will be an issue in practice.

konstin · 2024-03-04T17:44:34Z

I do think the better approach here is to deal with problems with byte-code compiling at the level of the subprocess. And specifically, only use timeouts when something you know could legitimately hang (like, say, killing a subprocess).

The problem i'm mainly thinking about is when the python process is in some regard broken, e.g. image python is replaced by a bash script that does > /dev/null (this isn't the case, but i've seen pythons that were not pythons or behaved strangely for some other reason). I'm less worried about OS operation such as killing a subprocess.

The second problem i'm thinking about is e.g. a panic in one of the threads that makes tokio behave oddly or a concurrency bug on our end, even i don't have good insight about the failure modes here.

I feel this is a trade-off we have to make: We can put no timeout and expect the user to kill the process (or the CI to timeout) vs. we specify and unreasonably high limit after which we abort (similar to the timeout we put on http requests). If we want to put this in control of the user, i would remove the timeouts.

BurntSushi · 2024-03-04T18:19:44Z

The problem i'm mainly thinking about is when the python process is in some regard broken, e.g. image python is replaced by a bash script that does > /dev/null (this isn't the case, but i've seen pythons that were not pythons or behaved strangely for some other reason). I'm less worried about OS operation such as killing a subprocess.

I think for cases like this, I'd be happier with a large timeout at a more granular level. e.g., a ~60 second timeout within the worker. So after 60 seconds of no output, the worker would kill the subprocess and report an error to the user. The 60 second wait isn't great, but the user will at least eventually get some kind of error message instead of a truly indefinite hang.

If this were a common failure mode then this wouldn't be great. But if a Python process is as badly broken as > /dev/null, I think I'm more-or-less okay with having to wait longer before saying something to the user.

And since the timeout is applied more granularly (at the level of a single subprocess), you could feasibly decrease it.

The second problem i'm thinking about is e.g. a panic in one of the threads that makes tokio behave oddly or a concurrency bug on our end, even i don't have good insight about the failure modes here.

I don't have as much experience with panics in Tokio, but I think it will catch them and the task will quit and you'll get a JoinError. But I defer to you on this one. You probably have more practical hands-on experience than I do here.

I feel this is a trade-off we have to make: We can put no timeout and expect the user to kill the process (or the CI to timeout) vs. we specify and unreasonably high limit after which we abort (similar to the timeout we put on http requests). If we want to put this in control of the user, i would remove the timeouts.

Yeah to be clear, I think it's fine to have timeouts somewhere. I'm mostly more uneasy with having them at a higher level. But it could all be fine. I just don't know.

charliermarsh · 2024-03-05T03:06:09Z

This LGTM, though I'm also curious if we can move the timeout down a level?

charliermarsh · 2024-03-05T03:32:46Z

I moved the temporary directory into the cache just to be safe. I'll merge this to keep it moving.

konstin · 2024-03-05T09:48:48Z

I moved the temporary directory into the cache just to be safe.

What's the problem we're avoiding this way?

Follow-up to #2086: Don't use timeouts for the entire workers, but only for the section that's about communicating with the (potentially broken) `python` subprocess. I've also raised the timeout to 60s.

konstin added enhancement New feature or request compatibility Compatibility with a specification or another tool labels Feb 29, 2024

BurntSushi reviewed Mar 1, 2024

View reviewed changes

konstin added 5 commits March 1, 2024 20:23

Windows, better error handling and comments

12532d8

Review Part 1

57ebc98

Add uv-dev commands to compile and clear compiled files

82d349a

Review, part 2

c84b996

konstin force-pushed the konsti/compile-option branch from 1e8c61a to c84b996 Compare March 1, 2024 19:23

charliermarsh reviewed Mar 3, 2024

View reviewed changes

crates/uv/src/commands/pip_install.rs Outdated Show resolved Hide resolved

charliermarsh reviewed Mar 3, 2024

View reviewed changes

crates/uv/src/commands/mod.rs Outdated Show resolved Hide resolved

charliermarsh reviewed Mar 3, 2024

View reviewed changes

crates/uv-installer/src/compile.rs Outdated Show resolved Hide resolved

charliermarsh reviewed Mar 3, 2024

View reviewed changes

crates/uv-installer/src/compile.rs Show resolved Hide resolved

charliermarsh approved these changes Mar 3, 2024

View reviewed changes

Review Charlie

337ebde

zanieb reviewed Mar 4, 2024

View reviewed changes

crates/uv-installer/src/compile.rs Outdated Show resolved Hide resolved

zanieb reviewed Mar 4, 2024

View reviewed changes

BurntSushi approved these changes Mar 4, 2024

View reviewed changes

konstin added 2 commits March 4, 2024 18:46

Update snapshots

c58bdf3

Forward stderr properly

77a765a

Merge branch 'main' into konsti/compile-option

655d627

Modify some error messages; use a tempdir in cache

787d518

charliermarsh enabled auto-merge (squash) March 5, 2024 03:32

charliermarsh merged commit 2a53e78 into main Mar 5, 2024
7 checks passed

charliermarsh deleted the konsti/compile-option branch March 5, 2024 03:35

konstin mentioned this pull request Mar 5, 2024

Only use compile timeout for critical section #2199

Merged

BrewTestBot mentioned this pull request Mar 5, 2024

uv 0.1.15 Homebrew/homebrew-core#165167

Merged

hauntsaninja mentioned this pull request Mar 24, 2024

Speed up pyc compilation #2637

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an option to bytecode compile during installation #2086

Add an option to bytecode compile during installation #2086

konstin commented Feb 29, 2024 •

edited

Loading

notatallshaw commented Feb 29, 2024 •

edited

Loading

konstin commented Mar 1, 2024

BurntSushi left a comment

BurntSushi Mar 1, 2024

konstin Mar 1, 2024

BurntSushi Mar 4, 2024

BurntSushi Mar 1, 2024

konstin Mar 1, 2024

BurntSushi Mar 4, 2024

konstin Mar 4, 2024

BurntSushi Mar 4, 2024

BurntSushi Mar 1, 2024

konstin Mar 1, 2024

konstin Mar 1, 2024

charliermarsh left a comment

charliermarsh commented Mar 3, 2024

konstin commented Mar 4, 2024

zanieb Mar 4, 2024

konstin Mar 4, 2024

BurntSushi left a comment

konstin commented Mar 4, 2024

BurntSushi commented Mar 4, 2024

charliermarsh commented Mar 5, 2024

charliermarsh commented Mar 5, 2024

konstin commented Mar 5, 2024

		// Otherwise stdout is buffered and we'll wait forever for a response
		.env("PYTHONUNBUFFERED", "1")

Add an option to bytecode compile during installation #2086

Add an option to bytecode compile during installation #2086

Conversation

konstin commented Feb 29, 2024 • edited Loading

notatallshaw commented Feb 29, 2024 • edited Loading

konstin commented Mar 1, 2024

BurntSushi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

charliermarsh left a comment

Choose a reason for hiding this comment

charliermarsh commented Mar 3, 2024

konstin commented Mar 4, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BurntSushi left a comment

Choose a reason for hiding this comment

konstin commented Mar 4, 2024

BurntSushi commented Mar 4, 2024

charliermarsh commented Mar 5, 2024

charliermarsh commented Mar 5, 2024

konstin commented Mar 5, 2024

konstin commented Feb 29, 2024 •

edited

Loading

notatallshaw commented Feb 29, 2024 •

edited

Loading