Lockfile implementation of categories instead of category #278

maresb · 2022-11-11T13:09:29Z

In order to make it possible to, for instance, reliably install packages in both dev and main, I proposed in mamba-org/mamba#1209 changing the category attribute from a string to a list of strings called categories.

There hasn't been any movement yet (which I'm aware of) towards an implementation. I have an idea which might offer an easy way forward, purely in conda-lock at first...

What would happen if we duplicate the entries in the lockfile, so that the lockfile looks roughly like:?

- name: common-package
  category: dev
- name: common-package
  category: main

If we are lucky, then perhaps the conda-lock and micromamba implementations will follow the expected behavior. (To be tested!) This way we wouldn't need a new lockfile format version.

If this works, this is still rather redundant, but not catastrophically so. We should aim to do better in lockfile format v2. In the meantime, since the micromamba folks believe that extra attributes will be ignored, we could start generating interoperable lockfiles which look like

- name: common-package
  category: dev
  categories:
  - main
  - dev
- name: common-package
  category: main
  categories:
  - main
  - dev

so that this can be read as v1 by ignoring categories, or alternatively read as v2 by ignoring category (effectively containing duplicate entries). This way, we could first develop everything in conda-lock, and then only once we're ready, work on implementing v2 in micromamba.

The text was updated successfully, but these errors were encountered:

maresb · 2022-12-04T20:30:07Z

Preliminary experiment with micromamba indicates success. 🚀

srilman · 2022-12-12T01:48:41Z

Is there an existing branch or implementation to test?

maresb · 2022-12-12T07:12:40Z

Sorry, not yet. I just did by hand the experiment with micromamba described above. Would you like to try an implementation?

srilman · 2022-12-13T17:59:58Z

Sure! One question: how should duplicate entires in separate input files be handled. Say, for example, we are running conda-lock -f main.yml -f test.yml on the following instances:

# main.yml
category: main
dependencies:
  python 3.10
  pandas 1.4

# test.yml
category: test
dependencies:
  pandas 1.5

Currently, Conda Lock with solve for ['python 3.10.*', 'pandas 1.5'] and produce a lock-file where Pandas's category is test. Should it be in both main and test or just test? What if pandas 1.5 was specified in main.yml; should it be in both categories then?

I would argue that it should be only in the test category in the first case, but both categories in the second case.

maresb · 2022-12-13T18:23:45Z

As written, that would be a dependency conflict.

The idea is to merge the dependencies into a single simultaneous solve. So what would happen logically is:

dependencies:
- python =3.10
- pandas =1.4
- pandas =1.5

and this goes to the solver, which in this case would fail because there's no Pandas package with version 1.4 and 1.5 simultaneously.

But let's suppose the versions were compatible, for example if main had pandas =1.5.1. Then the simultaneous solve would receive

dependencies:
- python =3.10
- pandas =1.5.1
- pandas =1.5

Then the solver would find the solution python → 3.10.8, pandas → 1.5.1, and all their corresponding dependencies.

To compute which things should go in the main category, look at the list of the packages in main.yml: python and pandas. These two packages, plus all their dependencies, should be in main.

Similarly, to compute which things should go in test, it is pandas and all its dependencies. Since python happens to be a dependency of pandas, they both go in test.

Therefore, in this case, all packages go into both main and test.

Does this logic make sense?

srilman · 2022-12-13T18:52:49Z

Yes, that does. Concatenating them together seems like the least error-prone solution.

Just to clarify though, do we expect conda-lock to currently fail, or is this the wanted future behavior? Because when I ran the previous example on the current main branch, it did not fail.

maresb · 2022-12-13T19:36:55Z

It is a bug that solving succeeds. I'm guessing that there is some failed merging behavior. I haven't looked at the code, but I expect in pseudocode (class and parameter names are probably incorrect!!!) that the Pandas dependencies should look something like:

From main.yml: Dependency(name="pandas", specification="=1.4", categories=["main"])
From test.yml: Dependency(name="pandas", specification="=1.5", categories=["test"])

They should merge into: Dependency(name="pandas", specification="=1.4,=1.5", categories=["main", "test"]). I strongly suspect that rather being merged, the dependency from test is simply overwriting the other.

srilman · 2022-12-18T19:19:39Z

So after some research and planning, I decided to split the implementation into 2 pieces.

First, as you said, the current behavior when seeing repeated dependencies in multiple input files is to overwrite the version constraint rather than merge it. Since this is unwanted behavior, I wrote a mini PR to merge the version strings instead.

I have a second branch with an implementation for supporting multiple categories in lockfiles (built on top of the merging versions branch) located here: https://github.com/srilman/conda-lock/tree/multiple-categories

maresb · 2022-12-18T19:26:23Z

This sounds excellent!!! Thanks so much for digging into this! I will try to review this soon.

maresb · 2023-01-04T19:15:08Z

@g-rutter astutely points out in #306 that when we implement categories in the lockfiles, we need to ensure that categories propagate to all subdependencies, and provides a very nice test case. Thanks!

I haven't yet found a chance to verify whether a4d58b6 achieves this.

maresb · 2023-01-04T20:54:11Z

I made a first pass at thinking through the details. (I might have some misconceptions about the implementation, so please correct any nonsense I write!) I get the impression that we may want to refactor a few things in order to implement this correctly...

First we need to understand the context of aggregate_lock_specs. In make_lock_spec we call parse_source_files to read the list of files into a corresponding list of LockSpecifications. Then we call aggregate_lock_specs to merge these into a single LockSpecification. Moreover, for each file processed within parse_source_files, we parse each platform individually and then call aggregate_lock_specs to merge the platform-specific results into a single LockSpecification. Later in the pipeline, create_lockfile_from_spec splits this fully-merged spec into individual platforms for solving.

It seems to me like we don't need to, and shouldn't, merge platforms in the LockSpecification. Indeed, my mental model (not the current implementation) of what a LockSpecification should be is a dict with platforms as keys, and platform-specific specifications as values.

One minor complication is that we don't know the list of platforms before running parse_source_files. If no platform_overrides is given via --platform, then we need to search the individual files for a platform specification, i.e. the top-level platforms: key. (In case none is found, then we fallback to the default platforms.) For this reason, perhaps we want to compute the list of platforms in a first-pass.

Perhaps one way to proceed is to refactor source-file-specific functionality into a SourceFile class. Then we can do something like

platforms = platform_overrides or union(sf.platforms for sf in source_files) or DEFAULT_PLATFORMS
spec = {platform: aggregate_lock_specs([sf.spec(platform) for sf in source_files]) for platform in platforms}

This way aggregate_lock_specs only needs to deal with specs for a single platform.

Does it make sense what I'm writing?

srilman · 2023-01-14T21:59:23Z

Sorry for the delay @maresb! Overall, I agree with the approach you suggested, especiall

Treating LockSpecifications as a dict of platform to dependencies.
Having aggregate_lock_specs deal with specs for a single platform.

But to clarify, how will we parse source files in this approach? Right now, I believe we parse a source file given a platform, in order to handle things like os preprocessing selectors. In this new approach, would we parse the source file in a platform-independent fashion, and then apply a platform to it? I would much prefer that method, since that would make it easier to add more selector-related features in the future, such as

and, or, or not operations
Selectors for other parameters, such as the python version.

Either way, I would be happy to take a first pass implementing this. Any ideas on how to split this into smaller tasks in order to reduce the size of the overall PR?

maresb · 2023-01-15T01:48:31Z

No worries, great to hear back from you @srilman! I am not sure I understand your question:

In this new approach, would we parse the source file in a platform-independent fashion, and then apply a platform to it?

I think that multiple parsing passes are required because we don't know at the beginning what is the final list of platforms. So I think we need to parse unfiltered to extract the platforms key, and then reparse for each platform by applying the corresponding filter. Is this consistent with what you have in mind?

maresb · 2023-01-15T01:53:29Z

Feel free to take a stab at it. I think you are thinking about this more deeply than I am.

maresb mentioned this issue Nov 11, 2022

Python dependency gets wrong category name when pip is only used in dev dependencies. #275

Open

srilman mentioned this issue Dec 18, 2022

Merge Version Constraints from Multiple Input Files #300

Open

maresb mentioned this issue Dec 19, 2022

Implement conda-lock install --category #301

Open

maresb mentioned this issue Dec 31, 2022

Typing improvements #303

Merged

g-rutter mentioned this issue Jan 4, 2023

Ensure consistent precedence of categories #306

Open

srilman mentioned this issue Jan 22, 2023

New Source File and Lock Specification Approach #316

Open

srilman mentioned this issue Feb 9, 2023

Default platforms are unexpectedly added from multiple sources #337

Closed

2 tasks

srilman mentioned this issue Mar 5, 2023

Refactor LockSpecification as a Dictionary from Platforms to List of Deps #383

Merged

srilman mentioned this issue Mar 13, 2023

Support Multiple Categories for Sub-Dependencies in Lockfile #390

Closed

maresb mentioned this issue Jun 13, 2023

Non-optional dependency shows up as optional #434

Open

2 tasks

hoxbro mentioned this issue Feb 3, 2024

Support not, and, and or for preprocess selector #595

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lockfile implementation of categories instead of category #278

Lockfile implementation of categories instead of category #278

maresb commented Nov 11, 2022

maresb commented Dec 4, 2022

srilman commented Dec 12, 2022

maresb commented Dec 12, 2022

srilman commented Dec 13, 2022

maresb commented Dec 13, 2022 •

edited

Loading

srilman commented Dec 13, 2022 •

edited

Loading

maresb commented Dec 13, 2022

srilman commented Dec 18, 2022

maresb commented Dec 18, 2022

maresb commented Jan 4, 2023

maresb commented Jan 4, 2023

srilman commented Jan 14, 2023

maresb commented Jan 15, 2023

maresb commented Jan 15, 2023

Lockfile implementation of categories instead of category #278

Lockfile implementation of categories instead of category #278

Comments

maresb commented Nov 11, 2022

maresb commented Dec 4, 2022

srilman commented Dec 12, 2022

maresb commented Dec 12, 2022

srilman commented Dec 13, 2022

maresb commented Dec 13, 2022 • edited Loading

srilman commented Dec 13, 2022 • edited Loading

maresb commented Dec 13, 2022

srilman commented Dec 18, 2022

maresb commented Dec 18, 2022

maresb commented Jan 4, 2023

maresb commented Jan 4, 2023

srilman commented Jan 14, 2023

maresb commented Jan 15, 2023

maresb commented Jan 15, 2023

maresb commented Dec 13, 2022 •

edited

Loading

srilman commented Dec 13, 2022 •

edited

Loading