Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datascience doesn't declare its dependencies #335

Closed
matthew-brett opened this issue Oct 1, 2018 · 11 comments · Fixed by #403
Closed

Datascience doesn't declare its dependencies #335

matthew-brett opened this issue Oct 1, 2018 · 11 comments · Fixed by #403
Labels

Comments

@matthew-brett
Copy link

I think that Datascience needs the following packages to import correctly:

  • matplotlib
  • pandas
  • scipy
  • ipython

These aren't declared in setup.py or similar, I think. At least, they do not get installed with pip install datascience. Worth adding?

davidwagner added a commit that referenced this issue Oct 3, 2018
@davidwagner
Copy link
Member

Thanks for the suggestion. A separate comment suggests we should not list all dependencies in requirements.txt: see #301 (comment). So, I think this is a won't-fix.

@matthew-brett
Copy link
Author

I think the comment you linked to is out of date, as of pip release 10 (2018-04-14); pip no longer does recursive upgrades. I supposed that the requirements.txt file would likely only be used by users of pip (like me), and not Anaconda users. Pip users will be lucky not to get a broken installation from using requirements.txt, because the other libraries will not be installed by default. But perhaps I'm missing something.

@davidwagner
Copy link
Member

OK -- I'm not up to date on pip, so I'm pretty clueless how pip works and not the right person to make this decision. @papajohn, can you comment on this? Can/should we add these dependencies to requirements.txt now, given the changes to pip mentioned above?

@davidwagner davidwagner reopened this Jun 10, 2019
@papajohn
Copy link
Contributor

papajohn commented Jun 12, 2019 via email

@adnanhemani
Copy link
Member

I don't have extensive experience with dependencies but can look into this if no one else has experience. Perhaps @SamLau95 might?

@matthew-brett
Copy link
Author

I do have a lot of experience, I'm happy to advise.

You have the problem you've pointed out, that for pip versions older than 10, and therefore, some installations older than April 2018, pip will upgrade all packages with pip install -U datascience.

You used to have the problem that Matplotlib, Pandas etc were difficult to install; I think wheels have taken care of this problem for well over 99% of installations. This has been so for Numpy, Scipy, Matplotlib, Pandas for several years now.

Worst case, your user has an old pip, runs pip install -U datascience, and pip upgrades Numpy, Scipy, Pandas, Matplotlib, and these don't have wheels available. I think you're now talking about a tiny fraction of your users (much less than 1%), and a somewhat unusual case, even for those users (upgrading datascience).

Second worst case, your user has an old pip, is using an old Anaconda (with an old pip), and does pip install -U datascience. Then the Anaconda versions of Pandas etc get upgraded to pip versions, and I suppose this could be fragile. I don't use Anaconda myself, so I don't have any experience of that problem. You could solve that problem by packaging Datascience with conda, so they can also do conda install datascience - I believe that's pretty easy, but again, I've never done it.

On balance, if it were me, I'd declare all dependencies in setup.py, including the big ones, and assume that, even if you release now, there will soon be so few users with pip < 10, that this won't be a problem.

I would declare everything in requirements.txt, because that is a pip-only file, that needs to be used explicitly on the pip command line.

@stefanv - any thoughts?

@adnanhemani
Copy link
Member

I agree with everything @matthew-brett has said above - thank you for the in-depth thoughts!

Adding the WIP label to this issue. Would definitely like to hear @papajohn's thoughts before I make any movement on this.

@papajohn
Copy link
Contributor

papajohn commented Jun 23, 2019 via email

@stefanv
Copy link
Contributor

stefanv commented Jun 23, 2019

@matthew-brett Nothing to add. We always specify requirements.txt - I cannot think of any reason not to. To avoid duplication, we also often read those files in setup.py for the installation dependencies, so we don't have to specify versions twice.

Conda seems to play better with pip now, but I haven't checked the very latest status. I wonder if @hmaarrfk knows?

@hmaarrfk
Copy link

For conda, Minimum version requirements don't seem to change much version to version. Conda package maintainers try to manually verify that pinnings are correct.

Conda-forge is quite adament about people updating. Anaconda will try to support old stuff, to a point. Again, they will likely really want users to update conda and pip before supporting them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants