Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relaxed Dependencies #13

Open
shinybrar opened this issue Mar 8, 2021 · 14 comments
Open

Relaxed Dependencies #13

shinybrar opened this issue Mar 8, 2021 · 14 comments
Assignees
Labels
enhancement New feature or request question Further information is requested

Comments

@shinybrar
Copy link

Issues

With ch_util supporting a wide myriad of utilities under the hood, downstream packages are currently required to build and install complicated dependencies by default even though they might not be required. An example of this could be mpi4py which is seldom used in developer environment. This proposal recommends to split the dependencies of ch_util by purpose under extras_require in order to considerably relax base requirements.

Additionally, dependency resolution systems enabled through PEP 517 such as flit, poetry and pipx try and evaluate requirements for ch_util,it nominally includes packages under extra_installs. This process errors out, since chimedb_config package is closed sourced. As a result, currently its not possible to use ch_util with these modern build systems.

Resolutions for:

Private Extra Installs

  1. Since chimedb_config will never be made open source, we can either remove it from the setup.py and raise an import error, if a function which requires it is called.
  2. We can give selective access to the CHIME/FRB personnel as a workaround.

Dependency Isolation

Example Scenario

If a developer wants access to the ch_util.tools.Antenna object, they are required to install bitshuffle, which requires a hard linked installation of hdf5. This additional step adds ~3-5 minutes of compile time.

Possible Split

Extra Dependencies
chime_io [bitshuffle, h5py]
chime_db [chimedb @ git+https://github.com/chime-experiment/chimedb.git, chimedb.data_index @ git+https://github.com/chime-experiment/chimedb_di.git, chimedb.dataflag @ git+https://github.com/chime-experiment/chimedb_dataflag.git]
chime_mpi [mpi4py]
all [*]

Additionally, I am happy to work on these issues, if the project team is interested in implementing them.

@shinybrar shinybrar self-assigned this Mar 8, 2021
@shinybrar shinybrar added enhancement New feature or request question Further information is requested labels Mar 8, 2021
@shinybrar shinybrar changed the title Complex Dependencies Relaxed Dependencies Mar 8, 2021
@anjakefala
Copy link
Contributor

anjakefala commented Mar 8, 2021

This proposal recommends to split the dependencies of ch_util by purpose under extras_require in order to considerably relax base requirements.

I do want to bring up, when considering this issue, the last time I checked extras_require does not play well with the new pip resolver.

Edit: This was potentially fixed here: pypa/pip#8291

The previous occurrence of a related conversation: #8

@shinybrar
Copy link
Author

I was not aware of the discussion in #8 , but if relaxing the dependencies is not ameneable, would it be okay to fork and change ch_utils under the chimefrb project?

@jrs65
Copy link
Contributor

jrs65 commented Mar 9, 2021

@shinybrar it seems like there are two (and maybe only two?) issues in here:

  1. bitshuffle is hard to install because it requires an h5py source build (although I think @nritsche thinks if you don't actually care about it working you don't need an h5py source build, a wheel is fine)
  2. chimedb_config is an issue as many FRB folks don't have access to it

I'd much rather solve these directly than split up ch_util into sets of optional dependencies. As I discussed in #8, we use a successful installation of ch_util as somewhat of a guarantee that you have everything you need to interact with CHIME data, and I'd rather not make that conditional.

There are some good avenues for fixing this:

  • For bitshuffle there exists a patch which should mean it doesn't require an h5py source build to use as it loads the HDF5 libraries dynamically (Loading libhdf5 dynamically from bitshuffle.h5 kiyo-masui/bitshuffle#81). This should allow us to build binary wheels of bitshuffle (and push them to PyPI) so installation should be pretty instant. We can put some more effort into that side as we're supposed to be taking on more maintenance of bitshuffle.
  • For chimedb_config there is no reason that FRB folks can't have access. We have an "frb" team in chime-experiment exactly for this purpose, all we need is for people to send their Github IDs over to us to they can be added. We can probably also make someone on the FRB side an admin for the org such that we can even be skipped in that process. I'd also be happy if we could figure out a way of turning off the chimedb_config dependency at install time, i.e. installed by default, by that can be changed with a command line switch.

I'm tagging @danielemichilli and @leungcalvin in here as they may have opinions on this one.

Also, I'd really rather avoid a fork if we can.

@shinybrar
Copy link
Author

I think the suggestion would be to change, the current installation of ch_util with an additional [all] moniker to maintain the exact same functionality offered today.

>>> pip install "ch_util[all] @ git+https://github.com/chime-experiment/ch_util.git#master"

>>> pip install ch_util[all]

Currently, bitshuffle is an issue, but additionally installing ch_util on other scientific clusters, e.g. scinet has also been problematic due to dependencies, and access to private repositories. chimedb_config is the core of that issue, and I am not sure how to proceed here, because even if we provide widespread access, someone will misuse or mistakenly not follow the correct security protocols, which could also lead to a new problems.

Currently, there exists a fork of ch_util on chimefrb/ch_util -- If it works for the chime-experiment team, we can maintain that flavour with optional dependencies and equal to the the current master in terms of commits.

@leungcalvin
Copy link

leungcalvin commented Mar 10, 2021

I have been tinkering on SciNet and I can confirm that installing ch_util on SciNet for the purpose of FRB + Outriggers is a singular pain. I have also installed ch_util on Cedar, on DRAO site, and various server nodes where things have been pretty darn smooth. There's one caveat though: I haven't been using chimedb_config; on all these other sites I use an interface to get_correlator_inputs that Shiny and Chitrang have implemented for CHIME/FRB + Outriggers. It works really well on those other sites, and in a world where everybody on Outriggers has their own preferred compute environment, I think it might be less hassle for ch_util to support accessing correlator inputs via two independent interfaces (chimedb_config and the new API) than to designate a contact person to manage access to chimedb_config. I don't know how else chimedb_config is used on the CHIME/Cosmology side, but for FRB + Outriggers I believe supporting the API would eliminate the dependency on chimedb_config for those who need to deploy at all these random sites (for FRB + Outriggers), eliminating the need for Richard's proposed dependency switch/contact person, and reduces the surface area for security vulnerabilities.

I'm gonna punt on bitshuffle, but I don't remember having problems with it on sites besides scinet.

Re: forking...I don't really care either way and I don't think the Outriggers science will be really affected, as long as we use a branch like gbo which is refactored for non-CHIME telescopes!

@jrs65
Copy link
Contributor

jrs65 commented Mar 10, 2021

I have been tinkering on SciNet and I can confirm that installing ch_util on SciNet for the purpose of FRB + Outriggers is a singular pain

Can you elaborate on that? Is it just bitshuffle that's the problem there? Details on this would be useful as I'd really rather fix the actual problems than put in these workarounds.

@mondana
Copy link
Contributor

mondana commented Mar 10, 2021

I wonder how get_correlator_inputs works. Is it just because it was introduced after everything was built and stable? It will be a while to capture the connectivity in outrigger sites in the database and correct all the cable swaps etc, and if get_correlator_inputs is not working with the database, then it would be a pain. unless I am wrongly interpreting what get_... does.

@jrs65
Copy link
Contributor

jrs65 commented Mar 10, 2021

Generally my attitude is this...

I really would rather avoid a fork if we can. Even the soft fork you're proposing will eventually turn into a hard fork as people start committing changes to it rather than upstream, and then at somepoint we'll change something which breaks your interfacing, and you won't want to fix it, etc. So probably the priority would be to avoid this.

I'd also like to avoid changing the dependencies to optional ones. I appreciate it sounds like a simple change to just say now you just install ch_util[all] instead, but it will require identifying a large number of places this installation is done (all CI builds, pipeline scripts, personal installations, ansible plays) and figuring out what they need and fixing them all; and it's also going to generate a bunch of support headache in the future when people install ch_util without the set of dependencies that they need and have obscure failures that they need help with.

Because of that I'd really like to explore other options. There's two I can think of:

  1. Identify what the actual pain points are, and make them easier. So far no one has mentioned anything beyond bitshuffle and chimedb_config installation. I believe both of these are soluble. @shinybrar @leungcalvin is there a good reason why fixing these wouldn't be sufficient?

  2. As suggested by @nritsche, split up ch_util into separate packages (i.e. io related code, db related code etc), and have ch_util depend on them and assemble them into something like the current API. That would mean we would continue to require an installation of ch_util, but you would be able to just pull in the parts you need.

As indicated above, I think I have a preference for 1, but it needs to be a path that everyone thinks would work.

@shinybrar
Copy link
Author

I think the current pains currently are, but not limited to:

  1. Extremely large build times, for bitshuffle which, as stated by @jrs65 can be alleviated via the work down by @nritsche
  2. Large number of dependencies required by ch_util. when most of the non-cosmology collaborators are using one maybe two functions from repository. This makes the dependency resolution process extremely lengthy.
  3. chimedb_config being a private dependency. This breaks a lot of installations and setup configurations, especially in non-chime, non-sudo environments.

I think the proposed solution 1 works, if:

  1. We hide chimedb_config under a ImportError exception rather than proliferating access. I fear, if we give chimedb_config to everyone, it will cause issues, and even then it does not work off-site.
  2. Merge the current work done by @nritsche in bitshuffle such that we no longer need a lengthy installation process.

I understand and relate to @jrs65 preference to limit hard forks.

@mondana
get_correlator_inputs is a separate backend which actually runs the latest and greatest version ofch_util. It essentially calls ch_util functions and makes them available over HTTP API. So any changes made to ch_util for outriggers are will not break it.

@nritsche
Copy link
Contributor

nritsche commented Mar 10, 2021

Merge the current work done by @nritsche in bitshuffle such that we no longer need a lengthy installation process.

I want to make clear that this is not my work but a PR from someone from ESRF, Grenoble (kiyo-masui/bitshuffle#81).

@jrs65
Copy link
Contributor

jrs65 commented Mar 10, 2021

@shinybrar okay great.

I think we can try and thin down the dependencies a bit too. Looking at the requirements.txt I think we can remove a few things:

  • mpi4py can be removed. It's not an actual dependency of ch_util.
  • matplotlib doesn't need to be a dependency (although you probably have it installed anyway so it's moot)
  • A bunch of things (numpy, scipy, h5py, skyfield) are already dependencies of caput, so you'll be forced to install those anyway, as I think you also need caput for things.
  • Hidden in the chimedb stuff is mysqlclient which I think can be a pain to install. There is a pure Python MySQL client we could switch to I think provided we tweak the peewee interfacing and so that would become easier.
  • The other things in there like networkx and peewee are generally very easy to install.

@jrs65
Copy link
Contributor

jrs65 commented Mar 10, 2021

I want to make clear that this is not my work but a PR from someone from ESRF, Grenoble (kiyo-masui/bitshuffle#81).

@nritsche can I ask you to take a look at that PR and test it and see if it does what we want (i.e. allow using bitshuffle without install h5py from source)?

@shinybrar
Copy link
Author

@jrs65 @nritsche Has there been any time devoted on implementing the changes on this issue, or is it on the back burner for now?

@jrs65
Copy link
Contributor

jrs65 commented Jun 23, 2021

@shinybrar this is making progress now. We're just about to get the code merged into bitshuffle (kiyo-masui/bitshuffle#81) which means that we won't need to build h5py from source to use bitshuffle. Next stop is to build a binary wheel of bitshuffle, so installing that part of the chain should be more or less instantaneous.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

6 participants