Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Huge requirements footprint #31

Closed
petiepooo opened this issue Oct 18, 2017 · 10 comments
Closed

Huge requirements footprint #31

petiepooo opened this issue Oct 18, 2017 · 10 comments
Assignees

Comments

@petiepooo
Copy link

If I don't have Cython, scipy, or spark installed, trying to install bat will attempt to install those as well. All I want to do is pull a brolog into a dictionary. Can we reduce the requirements footprint such that most dependencies are optional, and attempting to use or import a feature that requires one of them causes an exception explaining the need for it?

@petiepooo
Copy link
Author

Also, I don't believe bat should require bat. That's recursive. If I bumped the version on a fork, some versions of pip would try to install your version as a requirement of mine until I took that out.

@brifordwylie
Copy link
Member

@petiepooo Thanks for the feedback and bug finding. :)

  • Having bat in the setup.py is just a silly bug, thanks for finding that.
  • I agree the large set of requirements is suboptimal, in general I wanted folks to be to just install bat and then run all the stuff, but as you suggested perhaps we can minimize the libraries that get installed for the 'base' install 'pip install bat' and then push most of them to 'pip install bat[all]'.

I'll fix both of these issues right away and put up another PyPI release, thanks again.

@petiepooo
Copy link
Author

You rock! I've forked for now so I can get going on my setup..

@brifordwylie
Copy link
Member

brifordwylie commented Oct 18, 2017

@petiepooo so after some experimenting with this. I'm unsure I want to remove all the dependencies. The point of the project is to provide 'bridges' to Pandas, Scikit-Learn, Spark, etc. Without those dependencies basically nothing works and for most folks that will be a terrible 'out of box' experience.

I investigated a 'pip install bat[min]' option but there's no way to remove dependencies like that.. you can only add.

I've already removed the errant 'bat' dependency, and I've made sure that the wheel is pushed to pypi. So.. shrug.. you're probably savvy enough to just pull the few files you need and set your PYTHONPATH to those files... sorry to not get a great solution here.

@brifordwylie brifordwylie removed the bug label Oct 18, 2017
@petiepooo
Copy link
Author

I'd urge you to reconsider. People that want a bridge to scikit know to install scikit, or would the moment they get an import error trying to load the bat module that uses it. Same with spark or cython. By making a bridge to all those advanced components, then including them in the install dependencies, you're telling users they must have everything installed even if they only need bat for one small thing. That's not what install dependencies were meant for.

@brifordwylie
Copy link
Member

The point is well taken, I'll definitely give it further consideration. Thanks for the suggestion, I'll keep this issue open. Perhaps others might chime in...

@swedishmike
Copy link
Contributor

swedishmike commented Oct 19, 2017

Just to add my £0.02...

I agree with @brifordwylie in this question. If I am to run Bat I sort of expect to get these dependencies installed. My understanding is that is was made for heavier data wrangling and ML from the outset.

However, the idea of a BtD (Bro to Dictionary) utility is quite interesting for a more lightweight approach.

As I see things I think that should be separate though. As I see it 'pip install bat' or 'pip install btd'[1] could be the two options whether or not you want to go full hog.

Cheers, Mike

[1] Just an example - have not checked if that already exists etc.

@brifordwylie
Copy link
Member

brifordwylie commented Oct 19, 2017

@swedishmike yeah I was thinking the same thing.. make a no dependency version that just has the log reader/tailer... and that's about it. I made this as an experiment https://pypi.python.org/pypi/bat_min... so at a minimum (see what I did there?) I'll finish this up and then if folks just want the core log reader they can just do

$ pip install bat-min

Actually, like Mike suggested, perhaps a totally different name like 'bro2p' or something...

@petiepooo
Copy link
Author

It looks like I can also use "pip install --no-deps bat", but then have to manually install the dependencies I'd actually use, like watchdog, requests, etc. On the plus side, when I am missing a module, the ImportError exception makes it very clear what is missing.
Your monkey; your circus; your decision. I will leave this discussion with a link to the python.org's recommendation for how to handle dependencies:
https://packaging.python.org/discussions/install-requires-vs-requirements/
This seems to indicate that the install_requires keyword should be a minimal set of packages, and the full set of modules as described by @swedishmike might be better placed in a requirements.txt file.
Thanks for writing and releasing this package! It's quite useful.

@brifordwylie
Copy link
Member

brifordwylie commented Nov 6, 2017

@petiepooo sounds like you have a good solution of just running pip with the --no-deps option. As you mentioned people should be able to install BAT without all the dependencies, so we should support basically two use cases.

  1. install no packages, people can run Bro log reader/tailer and then install additional dependencies when/if they need them.
  2. install a bunch of packages and all the scripts/tests/example/notebooks just work

There are two approaches to supporting both of these use cases.
pip install --no-deps bat (to support 1) and pip install bat (to support 2)
OR
We rip out all the packages in setup.py and put them into requirements.txt (as you suggest)
pip install bat (to support 1) and pip install -r bat (to support 2)

Using the --no-deps option seems good to me, it's explicit. Also if we rip out all the package most 'non expert' python folks they will simply do a 'pip install bat' and then wonder why the tests don't run, and none of the examples run, and they will be like '..this project is broken...' and quit. Where as a 'python expert' (like yourself) will say hey I don't want all the dependencies.. I'll just run the --no-deps option.

So by keeping things as they are, you (the expert) can easily figure out how to do what you want and then non-expert just does the 'pip install bat' and everything works as expected.

I'll make a new section in the docs explaining how to install without any other packages because I believe that you are right there will be a set of people who want nothing else installed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants