Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for YML files in conda create --file (WIP) #14113

Draft
wants to merge 40 commits into
base: main
Choose a base branch
from

Conversation

jaimergp
Copy link
Contributor

@jaimergp jaimergp commented Aug 1, 2024

Description

Looking into parts of #11633 (conda env create -> conda create).

This is the minimum effort required to have YML support in conda create. It has a few problems UX-wise, though:

  • There's no yes/no prompt
  • The reported information is different (terser)
  • We can't get the --name from the environment.yml file
  • It doesn't support the pseudo-plugin system

This is just here to see what breaks and inform of the challenges in conda.cli.install to make a case for a bigger refactor.

What I would like to have instead

I envision a new Environment class that CLIs need to fulfill with details such as packages, channels or solver settings, and then this class delegates to the adequate install backends (solver, explicit...). This starts to build up a homogeneized Environment schema that could look like a potential conda.toml or environment.yml v2. That's a big change so I'll draft a roadmap to get there if interested. My idea would be to have a single file that represents the input state of the environment, and then operating on the environment would mean editing that file (on disk or virtually), and then applying the changes to disk. Again, borrowing a few concepts from the Pixi model.

Edit: I went ahead and implemented parts of the details dropdown. See this comment: #14113 (comment)

Checklist - did you ...

  • Add a file to the news directory (using the template) for the next release's release notes?
  • Add / update necessary tests?
  • Add / update outdated documentation?

@jaimergp jaimergp requested a review from a team as a code owner August 1, 2024 15:38
@conda-bot conda-bot added the cla-signed [bot] added once the contributor has signed the CLA label Aug 1, 2024
@jaimergp jaimergp marked this pull request as draft August 1, 2024 15:39
Copy link

codspeed-hq bot commented Aug 1, 2024

CodSpeed Performance Report

Merging #14113 will degrade performances by 14.78%

Comparing jaimergp:conda-create-yml (eee2ea7) with main (7c4941c)

Summary

❌ 3 regressions
✅ 18 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main jaimergp:conda-create-yml Change
test_install[classic] 220.5 ms 258.7 ms -14.78%
test_update[classic-update] 275.1 ms 309.5 ms -11.12%
test_update[classic-upgrade] 270.7 ms 309 ms -12.41%

@jaimergp
Copy link
Contributor Author

jaimergp commented Aug 6, 2024

I expanded a little bit on that details tag above.


conda environment modification UX

There are four main commands:

  • conda create: Creates a new environment (identified by name or full path). Accepts a list of MatchSpec strings and/or a list of TXT files containing MatchSpecs. Paths and URLs are also accepted.
  • conda install: Same as create, but it expects an existing environment.
  • conda remove: Removes package(s) from the environment, and their dependents. Can also remove the full environment if --all is passed.
  • conda update: Installs the latest version of a package already installed, as long as it's solvable.

All of these can be tackled as the same type of input if expressed as a single input file. Let's assume that file is environment.yml for practical purposes.

  • conda create -n env -c conda-forge python=3.10:
name: env
channels:
- conda-forge
dependencies:
- python=3.10
  • conda install -n env numpy just means adding numpy to the implicit file:
name: env
channels:
- conda-forge
dependencies:
- python=3.10
- numpy=*  #
  • conda update -n python means unpinning python, solving, and then repinning to the resolved version. Note this command only accepts package names.
name: env
channels:
- conda-forge
dependencies:
- python=*  # would become python=3.12 
- numpy=*
  • conda remove numpy is as obvious as it sounds; remove its entry and resolve:
name: env
channels:
- conda-forge
dependencies:
- python=*  # would become python=3.12

However conda remove can also remove dependencies not explicitly in the input file, but maybe transitory dependencies that are part of the one of the variants. For example, mkl BLAS for numpy. We can conda remove mkl and we would obtain openblas instead, while still keeping numpy. We would need an extra input field to prevent a package from being installed, similar to run_constrained in meta.yaml. This can operate as a series of pins too!

name: env
channels:
- conda-forge
dependencies:
- python=*  # would become python=3.12
constraints:
- mkl <0 # removes or prevents installation of mkl

--file flags

The --file flag (which can be specified multiple times) accepts TXT files. These TXT files have newline-separated MatchSpecs or URLs. #-leading lines are ignored. If @EXPLICIT is in the file, it's considered an explicit install with no solver invocation: the URLs are just fetched and linked in order of appearance.

Since they only specify specs, they can be concatenated easily. They are meant to complement the CLI, not replace it!

environment.yml files are not supported by this option (yet). This format has more details whose concatenation is not as obvious: what to do with name or channels? Who wins? First? Last? These would need to be solved before added to the new input file format. As a result we would start only accepting one of these at a time.

Solving-affecting flags

There are also a number of solver flags that can affect the result of the solve:

  • Postprocess dependencies of passed specs after the solve. These can only be implemented under the "everything is a file" model by post-processing the hypothetical lockfile that gets created. Note this leaves the environment in an inconsistent state.

    • --no-deps: only the packages corresponding to the spec are installed. This can be replicated by adding the URL to the artifact directly as part of the spec.
    • --only-deps: everything but the passed packages is installed. User would need to craft the list of dependencies required by the package. I feel this is not too used in practice, and it's mostly to support development environments that are better served by specialized tooling.
  • Adjust how dependencies are dealt with. These flags are useful to solve complications of dealing with the "installed packages should not change unless necessary" behaviour that conda prefers. In my opinion, these could be implemented as a series of constraints for freeze-installed, or as a series of spec=* in dependencies:

    • --specs-satisfied-skip-solve: if the installed stuff satisfies the constraints, do not update even if there are newer versions compatible.
    • --freeze-installed: constrain everything else while solving the new specs.
    • --update-deps: force the update of the dependencies of the specs we passed (consider it a partial update-all).
    • --update-specs: this is the default behaviour when not freezing.
    • --update-all: this is in principle the default behaviour of a fresh input file. So it's a matter of resolving it again with the new repodata.

There are also some extra CLI flags that concern how the channels are fetched and can have an effect on the solution:

  • --repodata-fn: which repodata file(s) are fetched from the remote channels(s). This can affect the solution.
  • --no/strict-channel-priority: how to deal with several channels at once.
  • from condarc, only_tar_bz2. This could be deprecated.

And of course, the --solver flag.

All other flags

Everything else in the CLI should be considered a runtime option that does not affect the solution of the environment, and hence could be just kept around when necessary. For example, --copy can be used without issues in the CLI and its presence won't affect which packages are installed.

The proposed schema

This is the proposed schema for a more explicit input file that can potentially replace the state stored in conda-meta/history, conda-meta/state and conda-meta/pinned.

name: str
description: str
last_modified: datetime
channels: list of str  # these should be ideally URLs for fully resolved channels
channel_options:
  repodata_fn: list of str
  # maybe authentication stuff
platforms: list of str
solver_options: dict
  solver: str
  channel_priority: flexible or strict
  use_only_tar_bz2: bool
  aggressive_update_packages: list of str
dependencies: list of str or dict of (str, list(str))
constraints: list of str  # conda-meta/pinned
variables: dict of (str, str)  # conda-meta/state

@travishathaway
Copy link
Contributor

@jaimergp,

I'm curious why you see the need to eventually support multiple --file options. Wouldn't this just needlessly overcomplicate our implementation and the CLI interface? Under what circumstances would this be useful?

Another thing that we could do to make the CLI even more simplified would be to either accept a --file option or list of MatchSpecs on the command line. This would also help further simplify our implementation.

@jezdez
Copy link
Member

jezdez commented Aug 6, 2024

To support @travishathaway, I've mentioned this to @jaimergp in person already, I also think multiple --file invocations with .yml file are a nice-to-have since it would imply resolving a merging strategy first. We can take a look at how micromamba handles it of course, but that's not just a refactor then anymore, but a larger feature addition.

Could we achieve the end goals of this ticket basically in multiple steps, to reduce the code churn? Deprecation of existing flags and code paths need to be accounted for as well.

@jaimergp
Copy link
Contributor Author

jaimergp commented Aug 6, 2024

I'm curious why you see the need to eventually support multiple --file options. Wouldn't this just needlessly overcomplicate our implementation and the CLI interface? Under what circumstances would this be useful?

Because it's already supported with .txt files. We can choose to only allow multiples ones with different formats, but it could be useful on setups like conda create -n dev --file base-deps.yml --file os-specific-deps.yml.

Eventually it doesn't matter as long as we are able to construct "The Source Of Truth File" from all those input files, and dump it in the conda-meta.

Could we achieve the end goals of this ticket basically in multiple steps, to reduce the code churn? Deprecation of existing flags and code paths need to be accounted for as well.

Absolutely. I don't intend this PR to be merged. It's mostly a conversation driver so we can discuss code challenges with good technical context (e.g. how the diff looks like). When we have a decision, we can create an epic/meta with the smaller items and work on them one by one.

Maybe the first step is a quick prototype of the dreamt CLI plus the draft implementation of the new explicit-state environment file, which maybe I drop here in this PR.

@jaimergp
Copy link
Contributor Author

Hello @conda/conda-core! This PR is still in draft but we have reached a milestone here. conda create --file passes the conda env create -f tests :)

Let me recap what I've done here:

  • Created a conda.cli.install2 module that reimplements parts of conda.cli.install.
  • Added a new Environment class that is able to accumulate much of the input data necessary to operate on an environment. It has a weird scope overlap with PrefixData but it's more CLI-ish, if that makes sense. Still unsure if this would be an implementation detail or a first-class API citizen (maybe encapsulating the logic of a potential environment.yml v2 file format).
    • The merge classmethod allows us to combine Environment objects, regardless the source (CLI, txt, yml...). This is so we can support multiple TXT files, but it also handles multiple YMLs if you want. The CLI data is massaged into an Environment object too, so this makes it super easy to combine all the possible sources.
    • Fun fact: This is mostly so we can deal with the annoying feature of environment.yml being able to provide an env name / path, and then have it overridden in the CLI (sometimes). I don't like this but we need it for a smooth transition I guess.
  • I've also split the main install function in smaller ones, and rewrote some of them so we can get Transaction objects out of them. This allows us to have the same UX across input sources.
    • Note that conda env create never asked for confirmation or reported the summary of the transaction. Same with explicit files. So, technically, the correct translation for conda env create -f some.yml is conda create --file some.yml --yes.

There's still some more work to do (like a smarter file format detector that doesn't have to deal with the now deprecated Anaconda.org environments), but you get the idea.

@jaimergp
Copy link
Contributor Author

pre-commit.ci autofix

@kcpevey
Copy link

kcpevey commented Sep 13, 2024

Excited to see this! I need to specify my optional test/docs dependencies in a separate file from my core dependencies. It would be great to be able to use the yml format instead of the txt format for specifying multiple files 💜

@beeankha beeankha mentioned this pull request Sep 23, 2024
17 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed [bot] added once the contributor has signed the CLA
Projects
Status: 🆕 New
Development

Successfully merging this pull request may close these issues.

5 participants