Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling pandas dependency #34

Closed
BFedder opened this issue Jun 20, 2022 · 9 comments
Closed

Handling pandas dependency #34

BFedder opened this issue Jun 20, 2022 · 9 comments

Comments

@BFedder
Copy link
Collaborator

BFedder commented Jun 20, 2022

Opening this here as a place to focus the discussion.

Essentially, MDAnalysis needs a version of panedr that does not require pandas. However, a version of panedr that works as previously, out of the box and with pandas support, is needed as well. Because of this, my current solution of making pandas an optional requirement (as seen in the current state of #33) will not work.

In my current solution, pip install -e . installs panedr, but not pandas. pip install -e .[pandas] installs both and thus provides the full set of features of panedr.

Quoting @jbarnoud here:

Instead, I would create 2 packages: the default one that depends on pandas and a "lite" one for downstream integrators who want to minimise dependencies.

I am not sure how best to do this, but I looked at the documentation for setuptools a bit. I think one way to meet the above requirements while avoiding code duplication would be the following:

  • create new Python package "panedr-lite" or similar
  • "panedr-lite" is almost identical to panedr as it is now, but is missing the edr_to_df() function.
  • pandas is an optional dependency of "panedr-lite" and can be installed with the package with pip install panedr-lite[pandas]
  • panedr now will only hold the user facing functions
  • panedr has a dependency of panedr-lite[pandas] (based on this example)
  • installing panedr would then automatically install panedr-lite and pandas and provide the user with the same functions as now
  • There would be no code duplication between panedr-lite and panedr

I am not very well-versed in Python package management, so please let me know if this approach makes sense and/or if there are better ways to do this.

@jbarnoud
Copy link
Collaborator

It is indeed basically what I have in mind. Just a thing : "panedr now will only hold the user facing functions" if panedr is an empty package that depends on the lite package and pandas, then it does not have to hold any code. It will just make sure you can do "import panedr" and have pandas.

@orbeckst
Copy link
Member

"panedr-lite" is almost identical to panedr as it is now, but is missing the edr_to_df() function.

I think panedr-lite should also contain edr_to_df() so that panedr can be a package that only adds the additional dependency on pandas. You can define the function depending on the presence of pandas

try:
   import pandas as pd
except ImportError:
   pass
else:
   def edr_to_df(...):
      ...
      return pd.DataFrame(...)

@hmacdope
Copy link
Member

Do we need to set up the repo for panedr-lite? as part of MDA org @orbeckst? Are we waiting for #33? Just trying to figure out what our next steps are.

@orbeckst
Copy link
Member

I don't think that we need a separate repository. From the discussion I assumed that the full and the lite package ship the same code and just the setup.py that sets the dependencies (or whatever mechanism is used for that) will differ. But I am not 100% sure how to do all of this elegantly.

@jbarnoud
Copy link
Collaborator

Indeed, the idea is to have only one code base in one single repository. The setup.py file at the root would be for the lite package; the non-lite package setup.py would go in a separate directory in the same repository. The non-lite, panedr, has no code attached: it is an empty package with only dependencies. A metapackage.

I never did that with pip. I did it with conda, though: https://gitlab.com/intangiblerealities/narupa-protocol/-/tree/master/python-libraries/narupa-server. The architecture of that example repo is odd, so the idea is not to mimick it, just to demonstrate the general idea of a metapackage.

@BFedder
Copy link
Collaborator Author

BFedder commented Jun 25, 2022

I have searched around for a way to have panedr be a subpackage of panedr-lite and came across setuptools-build-subpackage. I was able to build a wheel of the subpackage, but something is not quite right still as I can't import the package after pip installing the wheel. I'll try to get this to work, but could you please tell me if this seems like a good way to do this in the first place?

Alternatively, could we treat panedr and panedr-lite like MDAnalysis and MDAnalysisTests, and have them be neighbouring packages rather than sub-/superpackages?

@jbarnoud
Copy link
Collaborator

I think you are trying to do something too complex. You are not trying to do a subpackage, you need to create an empty package in a directory. See https://stackoverflow.com/questions/59726256/how-can-i-create-an-empty-python-package for instance.

@BFedder
Copy link
Collaborator Author

BFedder commented Jun 26, 2022

Thanks!

@BFedder
Copy link
Collaborator Author

BFedder commented Jul 6, 2022

With the merging of #42, this is now done.

@BFedder BFedder closed this as completed Jul 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants