Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

which_package to yield unique collection of package records #5108

Merged
merged 10 commits into from
Dec 14, 2023

Conversation

kenodegard
Copy link
Contributor

@kenodegard kenodegard commented Dec 12, 2023

Description

The refactor of which_package accidentally resulted in the same package record being returned multiple times (once for each file collision). This results in misleading build warnings.

Instead we only want to return a unique set of package records.

Resolves #5106

Checklist - did you ...

  • Add a file to the news directory (using the template) for the next release's release notes?
  • Add / update necessary tests?
  • Add / update outdated documentation?

@conda-bot conda-bot added the cla-signed [bot] added once the contributor has signed the CLA label Dec 12, 2023
@kenodegard kenodegard self-assigned this Dec 12, 2023
@kenodegard kenodegard changed the base branch from main to 3.28.x December 12, 2023 21:50
jezdez
jezdez previously approved these changes Dec 13, 2023
tests/test_inspect_pkg.py Outdated Show resolved Hide resolved
Copy link
Member

@mbargull mbargull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gh-5041 actually had two related regressions:

  1. The one fixed here.
  2. conda_build.post._lookup_in_prefix_packages is now printing a List[PrefixRecord] which means it prints repr(PrefixRecord) that have all PathData etc. expanded (leading to very verbose logs with sometimes >1M characters per line).

_lookup_in_prefix_packages needs to be changed to print(..., [str(record) for record in precs_in_reqs], ...) or the like.

tests/test_inspect_pkg.py Outdated Show resolved Hide resolved
tests/test_inspect_pkg.py Outdated Show resolved Hide resolved
jezdez
jezdez previously approved these changes Dec 14, 2023
for file in prec["files"]:
if samefile(prefix / file, path):
yield prec
if any(samefile(prefix / file, path) for file in prec["files"]):
Copy link
Contributor

@dholth dholth Dec 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(n * m) ? I see this version skips work if true.

Too bad to iterate over all installed files to do the search

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worst case yes, same as before

any returns on the first True: https://docs.python.org/3/library/functions.html#any

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what avoids the duplicated records to begin with, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a subsequent PR, we could see if implementing a cached path -> artifact map would speed this up. It can be a veeery slow part of the post processing stage in conda-build.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what avoids the duplicated records to begin with, right?

correct, this is essentially what was happening pre #5041:

for dist in fn(prefix):
# dfiles = set(dist.get('files', []))
dfiles = dist_files(prefix, dist)
# TODO :: This is completely wrong when the env is on a case-sensitive FS!
if any(norm_ipp == normcase(w) for w in dfiles):
yield dist

cached path -> artifact map

definitely something to consider

"{}: {} found in {}{}".format(
msg_prelude, n_dso_p, [prec.name for prec in precs], and_also
),
f"{msg_prelude}: {n_dso_p} found in {[str(prec) for prec in precs]}{and_also}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This upgrades from name to the full str representation. i.e. from python to defaults::python-3.12.0-h123abc_0, I think? Is that intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this is now more detailed (but still less verbose than str(precs)) than before

the change was inspired by the fix below on line 1214-1215, where precs_in_reqs: List[PackageRecord] was cast to string resulting in a very verbose output, see #5108 (review)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good that this is now more consistent with the full spec which was already output at https://github.com/conda/conda-build/pull/5108/files#diff-5073339a2e88f80714f82a001af334869aa5df9627b2f00ffd18d7a2035f537eR1190 .

@jaimergp
Copy link
Contributor

Do we need to de-dup in line 1180 in post.py or are we confident there will be no dups?

    precs = list(which_package(in_prefix_dso, run_prefix))

@kenodegard
Copy link
Contributor Author

kenodegard commented Dec 14, 2023

@jaimergp AFAIK no deduping is needed once we merge this PR, with the modifications in which_package it is only possible to return a specific PackageRecord once

tests/test_inspect_pkg.py Outdated Show resolved Hide resolved
tests/conftest.py Outdated Show resolved Hide resolved
" (likely) or a missing dependency (less likely)".format(
msg_prelude, [prec.name for prec in precs]
),
f"{msg_prelude}: .. but {[str(prec) for prec in precs]} not in reqs/run, "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I find the change to use str(prec) in the len(precs_in_reqs) == 0 and len(precs) > 0 case above a welcome change, one could argue that we might only want to output the name here so users get a more succinct "you might've missed package in run requirements" message.

But I don't have a strong opinion on this one.

mbargull
mbargull previously approved these changes Dec 14, 2023
Copy link
Member

@mbargull mbargull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some minor nit comments, but overall this LGTM, thanks!

Co-authored-by: Marcel Bargull <mbargull@users.noreply.github.com>
@kenodegard kenodegard mentioned this pull request Dec 14, 2023
67 tasks
@kenodegard kenodegard merged commit a22d6ef into conda:3.28.x Dec 14, 2023
24 checks passed
@kenodegard kenodegard deleted the reduce-which_package branch December 14, 2023 23:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed [bot] added once the contributor has signed the CLA
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

How to interpet linking WARNINGs from conda-build 3.28.x?
7 participants