Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added RIOS #444

Merged
merged 6 commits into from
Apr 27, 2016
Merged

added RIOS #444

merged 6 commits into from
Apr 27, 2016

Conversation

gillins
Copy link
Contributor

@gillins gillins commented Apr 21, 2016

@conda-forge-linter
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipes/rios) and found it was in an excellent condition.

@@ -0,0 +1,5 @@
#!/bin/bash

python setup.py install --prefix=$PREFIX
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the prefix needed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not, I think we can remove both the build.sh and the bld.bat and replace with a build/script section.

@pelson pelson changed the title added RIOS WIP: added RIOS Apr 21, 2016
@gillins
Copy link
Contributor Author

gillins commented Apr 22, 2016

This build won't succeed until the builds from conda-forge/gdal-feedstock#42 appear. Is this something that needs to be done manually?

I had to add this same hack for testing: conda-forge/gdal-feedstock#35. Do we agree this needs to be fixed in Conda?

@gillins
Copy link
Contributor Author

gillins commented Apr 26, 2016

These tests should pass now that conda-forge/gdal-feedstock#45 has merged. What is the best way to restart the checks?

@ocefpaf
Copy link
Member

ocefpaf commented Apr 26, 2016

@gillins I re-started the CIs.

@gillins
Copy link
Contributor Author

gillins commented Apr 26, 2016

Thanks! Any ideas why Travis is picking up the older GDAL (build 3) on OSX?

@gillins
Copy link
Contributor Author

gillins commented Apr 26, 2016

It is actually picking up GDAL 2.0.0 rather than 2.0.2. This sometimes happens to me also. Very strange...

@ocefpaf
Copy link
Member

ocefpaf commented Apr 26, 2016

It is actually picking up GDAL 2.0.0 rather than 2.0.2. This sometimes happens to me also. Very strange...

I see gdal 2.0.0 from the default channel at test time and build time seems to be picking the conda-forge version but not the latest build number 4... Argh... That is very frustrating, but I don't think it is a problem on our side. Maybe you should pin to gdal 2.0.2 to ensure that the latest version is picked up.

Here is what is installed at test time:

    proj4-4.9.1                |                0         292 KB  defaults
    hdf5-1.8.16                |                2         2.9 MB  conda-forge
    kealib-1.4.5               |                2         169 KB  defaults
    libnetcdf-4.4.0            |                1         980 KB  defaults
    libgdal-2.0.0              |                4        16.7 MB  defaults
    gdal-2.0.0                 |      np111py27_3         446 KB  defaults
    rios-1.4.2                 |           py27_0         131 KB  file:///Users/travis/minico

Since you depend only on numpy and gdal one would think that gdal 2.0.2 would be installed, but instead it is picking up gdal 2.0.0 and the latest hdf5!

@ocefpaf
Copy link
Member

ocefpaf commented Apr 27, 2016

@gillins don't squash this PR because I want to use it as an example of the problem. But You can rebase against master to fix the AppVeyor failure.

@pelson and @msarahan RIOS build and run dependencies are:

- python
- numpy
- gdal

Even though they are the same the conda-build solver is correctly picking gdal 2.0.2 at build time, but wrongly installing gdal 2.0.0 from the default channel at test time.

It is important to note that this problem did not happen on Linux. (Maybe because we are still using conda-buil 1.18.2 there.)

Any ideas how we can fix that without pinning gdal to the exact 2.0.2 version?

@jakirkham
Copy link
Member

Just FYI, rebasing shouldn't be necessary as long as we restart the CI. It will always use the latest for master.

@ocefpaf ocefpaf merged commit 1736b1a into conda-forge:master Apr 27, 2016
@ocefpaf
Copy link
Member

ocefpaf commented Apr 27, 2016

@gillins pinning to the exact version did the trick, but that is not desirable. We must find what is wrong in conda-build and fix it.

@gillins
Copy link
Contributor Author

gillins commented Apr 27, 2016

Yes I can see there is going to be a problem when we upgrade to GDAL 2.1.0 (to be released soon) - we are going to have to re-pin to this instead.

@jakirkham
Copy link
Member

jakirkham commented Apr 27, 2016

We could just automate pinning throughout our stack. I was briefly chatting with @ocefpaf earlier about this. Not sure how seriously we want to follow this and how it would be automated, but after handling some dependency issues recently it might just make our lives easier.

@ocefpaf
Copy link
Member

ocefpaf commented Apr 27, 2016

We could just automate pinning throughout our stack. I was briefly chatting with @ocefpaf earlier about this.

Indeed we must do that, but in this particular case here just getting the latest version should suffice, but that failed with latest conda-build on OS X and worked fine with conda-build 1.18.2 on Linux. There are other variables to investigate but I suspect something is broken in the solver.

@jakirkham
Copy link
Member

There are other variables to investigate but I suspect something is broken in the solver.

Hmm...hard to say. Based on the test dependencies installed, it seems like it tries to get the latest version of everything, but must pick between hdf5 and gdal being the latest as both can't be. I'm not sure how the solver handles tiebreakers like this, but it has to pick one somehow.

@ocefpaf
Copy link
Member

ocefpaf commented Apr 27, 2016

Hmm...hard to say. Based on the test dependencies installed, it seems like it tries to get the latest version of everything, but must pick between hdf5 and gdal being the latest as both can't be. I'm not sure how the solver handles tiebreakers like this, but it has to pick one somehow.

I though about that, but since gdal is specified in the recipe than gdal should be the tie breaker.

@jakirkham
Copy link
Member

@mcg1969, could you please help understand how tiebreakers are resolved in the solver? We are a little unclear.

In particular, we are thinking about the case where you have 2 dependencies, but you can have only 1 with the latest version and the other 1 must have an older version. How does the solver pick which dependency has the latest version.

@mcg1969
Copy link
Contributor

mcg1969 commented Apr 27, 2016

A true tie in conda is broken in an undefined/undetermined manner. It's not necessarily deterministic from run-to-run, either, because of natural sources of randomness in the ordering of the SAT constraints, such as Python's string hashing.

But in my experience true ties are quite rare. Conda's approach is to solve an entire sequence of optimization problems, ratcheting down on the ambiguity. Ignoring features and removals for the moment, the optimization passes minimize the following metrics, in order. Once a metric has been minimized, it is locked in, and conda moves on to the next.

  1. worst-case version downgrade across requested packages (i.e., on the command line). Any version bump has a score of 1, whether it's a major/minor/sub-minor version change.
  2. sum of the version downgrades across requested packages
  3. worst-case build downgrade across requested packages
  4. sum of the build downgrades across requested packages
  5. worst-case version downgrade across all other packages
  6. sum of the build downgrades across all other packages
  7. the total number of packages installed.

Strictly speaking I'd prefer to do just 2, 4, 6, and 7, but #1 in particular proved necessary in practice, because scenarios with very large version downgrade requirements would slow the solver intolerably. (I suppose I could get away with 1, 2, 4, 6, 7, but honestly nobody would notice the performance difference, I don't think, and keeping them simplifies the code, because each peak/sum pair of problems is solved by the same routine.) Minimizing version first, then build, makes a certain amount of practical sense, but it's also a performance win.

Note that the optimization takes place on a pruned set of packages. If we eliminated the pruning pass, the solution might change. For instance, say pandas has a new version but it is eliminated by the pruning pass by dependency conflicts. Then its version downgrade metric would be zero. I suppose we could make the case it shouldn't be that way, but sending the entire package set through the SAT solver would slow conda down intolerably.

Is it possible for there to be a tie when all is said and done? Possibly---and certainly in a conda-forge context where you're trying to game the version and build numbers. But I just don't see it that often.

More importantly, I think that trying to understand tiebreakers is not what anyone concerned with package building should be doing. What you need to do is figure out how to unambiguously specify the exact packages you want. It's quite possible that will require additional capabilities in conda, such as the channel conflict improvements in the pipeline.

@mcg1969
Copy link
Contributor

mcg1969 commented Apr 27, 2016

One more general comment. I see someone above talking about the possibility that the solver is broken. Is that possible? Oh, my yes. It might be broken in the sense that we haven't implemented properly what I intended to implement; and it might be broken in the sense that the design is problematic. Anyone who lived through the 3.19-4.05 upgrades knows very well that the solver can be broken :-)

So please don't misunderstand what I'm about to say: at this point, most thoughts that the solver is broken are likely to be wrong. The reasons are threefold.

First of all, the dependency relationships between packages are opaque and complex, and are buried inside metadata that nobody ever sees unless they make a very deliberate effort to do so. It's quite difficult to reason about dependency relationships by hand. It's a lousy job for a human, an ideal one for a computer.

Second, we haven't done the best job of helping people understand why conda makes the choices it does sometimes. That's something we're struggling to fix. I've got a PR that is producing much better hints now for unsatisfiable specifications, but that's actually the easier case. What's hard is producing "hints" that explain why conda chose a set of packages that looks "suboptimal" in some sense. That is a longer-term fix.

And finally, optimization algorithms in general----particularly combinatorial ones like this---very often produce non-intuitive solutions. I've been doing mathematical optimization for a long time, and I can't tell you how many times I've looked at the solution to a problem and thought, "that can't be right". And then I take the time to mechanically verify it, and yes, it is... Sometimes I finally figure out why my intuition was wrong, sometimes I don't.

Again, that is not meant to explain away conda's shortcomings or bugs. But it does help reinforce why I think that what we need to do is make it easier for package builders to be as precise as possible in specifying how they want their packages to be paired with dependencies. Version pinning, explicit channels, features (ugh), and dependency metapackages (a new idea I've been promoting) are all things we can do to help. Because relying solely on the underlying mathematics to get it right is going to produce some truly baffling results sometimes.

@jakirkham jakirkham changed the title WIP: added RIOS added RIOS Apr 27, 2016
@ocefpaf
Copy link
Member

ocefpaf commented Apr 27, 2016

A true tie in conda is broken in an undefined/undetermined manner. It's not necessarily deterministic from run-to-run, either, because of natural sources of randomness in the ordering of the SAT constraints, such as Python's string hashing.

As a scientist (who almost flunked Quantum Phisics 😜) I am worried about that randomness statement.

But it does help reinforce why I think that what we need to do is make it easier for package builders to be as precise as possible in specifying how they want their packages to be paired with dependencies.

I disagree. In that particular case it would be nice if conda could use the stated dependencies python, numpy, and gdal. And get the latest version of those.

The first pass got it right. The second did not. Conda decided for:

  • gdal-2.0.0 np111py27_3 defaults instead of gdal-2.0.2 np111py27_4 conda-forge
  • proj4-4.9.1 0 defaults instead of proj.4 4.9.1 2 (granted that the name proj4 was pulled from the outdated gdal instead of the name proj.4. What is sad is that we renamed our proj4 to proj.4 to match the default channel a long time ago.)
  • hdf5-1.8.16 conda-forge instead of hdf5 1.8.15.1 (OK, that is higher, but shouldn't gdal weight in since gdal is specified in the recipe and not hdf5? Note that, if gdal higher version was respected, we would get hdf5 1.8.15.1 because gdal 2.0.2 is pinned against it to ensure that the right compatible version gets chosen. That pinning went down the drain here.)
  • kealib-1.4.5 2 defaults instead of kealib-1.4.6 conda-forge

I see 3 downgrades (gdal, proj.4, kealib), or 2 if you do not account for proj4, in favor of 1 upgrade hdf5, which should not happen!

I am not sure that pinning exactly is the way to way as the expected behavior* does not seems to be unreasonable.

* The expected behavior being: Try to get the most up-to-date version of the specified dependencies.

@ocefpaf
Copy link
Member

ocefpaf commented Apr 27, 2016

Another puzzling fact. I wanted to check if that

libnetcdf-4.4.0 | 1 980 KB defaults

being picked up above was compiled against hdf5 1.8.16, but I cannot seem to find it:

> conda search libnetcdf                                                            ⏎
Using Anaconda Cloud api site https://api.anaconda.org
Fetching package metadata: ......
libnetcdf                    4.2.1.1                       0  defaults        
                             4.2.1.1                       1  defaults        
                             4.3.2                         0  defaults        
                             4.3.2                         1  defaults        
                             4.3.3.1                       0  defaults        
                             4.3.3.1                       1  defaults        
                          .  4.3.3.1                       3  defaults

I also think that the default channel does not package hdf5 1.8.16 yet. Which makes the choices above even more dangerous as gdal would be broken without a libnetcdf that was build against the same hdf5 installed. That is another pinnig in our recipe that went down the drain due to the version solution above.

@mcg1969
Copy link
Contributor

mcg1969 commented Apr 27, 2016

@ocefpaf, I think you may misunderstand me. If you present a set of specifications to conda that it reasonably can interpret multiple different ways---a true "tie"---then that's a problem. Conda can't be expected to magically determine which choice you prefer among those ties.

But if conda is not finding a solution that it should prefer, according to its own formulas, that's a bug.

The problem is that it's not always clear that's the case. I've seen this many times: people think conda should be selecting this package or that one, but it can't because some other package's dependency chains forbid it. conda can't be in the business of overruling package specifications to get the result you want, but it should indeed do a better job of pointing out those problems.

@mcg1969
Copy link
Contributor

mcg1969 commented Apr 27, 2016

@ocefpaf: if you can give me a conda create -n recipe (with a full channel list if it includes channels other than defaults) that exhibits the weird behavior you're seeing, I'll try and figure out what's on.

@ocefpaf
Copy link
Member

ocefpaf commented Apr 27, 2016

Conda can't be expected to magically determine which choice you prefer among those ties.

@mcg1969 I am not asking for magic, just for a way to ensure the specified dependencies are preferred to be the most up-to-date than the secondary dependencies that will be downloaded with them.

@ocefpaf: if you can give me a conda create -n recipe (with a full channel list if it includes channels other than defaults) that exhibits the weird behavior you're seeing, I'll try and figure out what's on.

I guess that the case in this PR is it, no?

@mcg1969
Copy link
Contributor

mcg1969 commented Apr 27, 2016

I'm not seeing a conda create -n command anywhere, no.

@mcg1969
Copy link
Contributor

mcg1969 commented Apr 27, 2016

(Please understand my tunnel vision here---I'm not a package builder, and I don't do a lot of work on conda-build. If I'm going to reproduce a bug in dependency resolution I pretty much need a sequence of one or two conda create/conda install commands to isolate the problem.)

@ocefpaf
Copy link
Member

ocefpaf commented Apr 27, 2016

(Please understand my tunnel vision here---I'm not a package builder, and I don't do a lot of work on conda-build. If I'm going to reproduce a bug in dependency resolution I pretty much need a sequence of one or two conda create/conda install commands to isolate the problem.)

I understand that. Note that the behavior we are seeing here is hard to reproduce. However, I am not familiar with the command conda create -n in the current context. The problem is the fact that we have two different solutions when using conda-build at build and test (run) time while both build and run have the same dependencies specified.

@mcg1969
Copy link
Contributor

mcg1969 commented Apr 27, 2016

Perhaps someone listening can help construct a good test case for me, isolated from conda-build?

@mcg1969
Copy link
Contributor

mcg1969 commented Apr 27, 2016

For the build install I assume the command is equivalent to something like this:

conda create -n test -c conda-forge python numpy gdal --dry-run

This assumes that defaults is the only channel in my .condarc; otherwise I could do

conda create -n test --override-channels -c defaults -c conda-forge python numpy gdal --dry-run

This picks up the right gdal package. The question, I suppose, is what is the equivalent command for the test? My assumption is that it's adding rios?

@mcg1969
Copy link
Contributor

mcg1969 commented Apr 27, 2016

Ah, wait, I see something in your text above:

hdf5-1.8.16 conda-forge instead of hdf5 1.8.15.1 (OK, that is higher, but shouldn't gdal weight in since gdal is specified in the recipe and not hdf5? Note that, if gdal higher version was respected, we would get hdf5 1.8.15.1 because gdal 2.0.2 is pinned against it to ensure that the right compatible version gets chosen. That pinning went down the drain here.)

This may indeed be a clue, but it's hard to be sure since other things are going on. No, gdal is not weighted more than hdf5 just because it is in the recipe. The only differentiation conda makes is between packages that are explicitly requested on the command line, and all other packages. It doesn't matter how many levels deep in the dependency chain the packages are.

I tried an experiment. I hacked my local metadata cache so that rios depended on gdal instead of gdal 2.0.2. Then I ran this:

conda --debug create -n test -c conda-forge python numpy gdal rios --dry-run --use-index-cache

I still get gdal 2.0.2 out of that. I'm observing that the total number of version downgrades conda has to do to get a compatible set of packages is 2, with a "peak" of 1---that means it's downgrading 2 packages 1 version each.

@mcg1969
Copy link
Contributor

mcg1969 commented Apr 27, 2016

ACTUALLY, I guess it depends on exactly how conda-build is constructing the test environment. Is it doing this?

conda create -n test -c conda-forge python numpy gdal rios --dry-run --use-index-cache

or is it doing this?

conda create -n test -c conda-forge rios --dry-run --use-index-cache

The difference here of course is that python, numpy, and gdal are no longer on the command line, so they get weighted differently. In the longer version, the gdal version is given higher priority than, say, hdf5.

EDIT: To be clear, I got the same result both ways, I just want to know exactly what conda create command conda-build is using, and exactly what packages it is seeing.

@mcg1969
Copy link
Contributor

mcg1969 commented Apr 27, 2016

Can someone give me a link to the Travis CI output?

@mcg1969
Copy link
Contributor

mcg1969 commented Apr 27, 2016

Could it be something as simple as the test environment is not picking up the conda-forge channel?

@ocefpaf
Copy link
Member

ocefpaf commented Apr 27, 2016

Could it be something as simple as the test environment is not picking up the conda-forge channel?

No. the hdf5, wrongly chosen there, is from conda-forge.

Can someone give me a link to the Travis CI output?

https://travis-ci.org/conda-forge/staged-recipes/jobs/124909151

@mcg1969
Copy link
Contributor

mcg1969 commented Apr 28, 2016

Grr. Well, I'm convinced the solver is getting bad input. I'm trying to reproduce this every which way I can think and I'm getting gdal 2.0.2. Any chance we can re-run this CI job with the --debug flag passed to conda-build? The output will be super-long but I can wade through it.

@mcg1969
Copy link
Contributor

mcg1969 commented Apr 28, 2016

EDIT: Retracting that claim. It does look like a scoring issue.

@mcg1969
Copy link
Contributor

mcg1969 commented Apr 28, 2016

It looks like it is indeed due to subtleties in the scoring algorithm. The mathematics is doing the right thing.

What conda-build is using for the test spec is akin to the shorter spec above; that is,

conda create -n test -c conda-forge rios=1.4.2=py27_0 python=2.7*--dry-run

So this puts gdal at the same level has hdf5 and every other dependency. The downgrade scores are 2 versions (kealib & gdal) and 0 builds.

But I think we have a very simple answer here. We need to change the spec that conda build is using in the test environment to be identical to the build spec, except for the addition of the newly built package. I'll work on this. [EDIT: CLARIFICATION BELOW. Of course we can't do this exactly.]

@mcg1969
Copy link
Contributor

mcg1969 commented Apr 28, 2016

Actually, that's obviously wrong, because the test and build dependencies are separate. But if they are the same, the specs passed to the solver should behave as described above. So this just means adding the run requirements list explicitly to the spec.

@mcg1969
Copy link
Contributor

mcg1969 commented Apr 28, 2016

Well. that was simpler than I thought. One line.

@mcg1969
Copy link
Contributor

mcg1969 commented Apr 28, 2016

As I say over in the PR:

This sometimes caused conda to select different version of the packages between build and test environments. Technically, if this causes functional differences, those differences should be investigated, and tighter requirements for both build and run should be considered.

I'm not convinced you should relax your gdal 2.0.2 spec (or perhaps it should become a >= spec). The TravisCI output suggests that rios isn't working with gdal 2.0.0. You need to prevent people from installing this package alongside 2.0.0, then.

@ocefpaf
Copy link
Member

ocefpaf commented Apr 28, 2016

Thanks for looking into this @mcg1969!

@pelson
Copy link
Member

pelson commented Apr 28, 2016

Great job @mcg1969.

@ocefpaf ocefpaf mentioned this pull request Apr 28, 2016
@gillins
Copy link
Contributor Author

gillins commented Apr 28, 2016

Agreed - well done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants