Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make BigARTM great again! #847

Open
3 of 8 tasks
MelLain opened this issue Oct 21, 2017 · 12 comments
Open
3 of 8 tasks

Make BigARTM great again! #847

MelLain opened this issue Oct 21, 2017 · 12 comments

Comments

@MelLain
Copy link
Contributor

MelLain commented Oct 21, 2017

After some discussions we decided to formulate several things we should do before releasing the 1.0 version of the library. Here I'll place a roadmap of changes in BigARTM deploying and distribution (in importance decreasing order):

  • test wheel package created by @ofrei and fix any troubles in it's local installation and usage.

  • create the same package for all popular *nix-based platforms including newes Ubuntu distribution and OS X.

  • place these packages in some our repo and write an installation instruction.

  • put our packages in some official repo and tune autodeploy into it (as we deploy windows packages using appveyor). Test it with different platforms.

  • deal with conda in some way, maybe by using channels (custom repo paths) for installation. Test and create instruction.

  • update our docker

  • remove all third-party dependencies from library (protobuf and glog) (e.g. Building against system libraries #811, Use system protobuf library by default #420).

  • try to include BigARTM into kaggle kernel and anaconda (what if?:) )

@ofrei
Copy link
Contributor

ofrei commented Oct 21, 2017

@MelLain Great! Excellent plan!

Perhaps add "improve documentation" to this list?

create the same package for all popular *nix-based platforms

only one manylinux image https://github.com/pypa/manylinux should cover most popular *nix platforms. (actually, you still need few - one for each 3.X version of python, and two for each 2.X version,
the cp2Xmu and cp2Xm -see below; but at least this is not one-package-per-each-operating-system).

python_manylinux_demo-1.0-cp26-cp26m-manylinux1_x86_64.whl
python_manylinux_demo-1.0-cp26-cp26mu-manylinux1_x86_64.whl
python_manylinux_demo-1.0-cp27-cp27m-manylinux1_x86_64.whl
python_manylinux_demo-1.0-cp27-cp27mu-manylinux1_x86_64.whl
python_manylinux_demo-1.0-cp33-cp33m-manylinux1_x86_64.whl
python_manylinux_demo-1.0-cp34-cp34m-manylinux1_x86_64.whl
python_manylinux_demo-1.0-cp35-cp35m-manylinux1_x86_64.whl
python_manylinux_demo-1.0-cp36-cp36m-manylinux1_x86_64.whl

Perhaps OK to limit ourselves to x64 systems (we droped support for 32 bit).
Perhaps OK to only support 27, 35, 36 pythons.

Ongoing work / issues for manylinux wheel:
#841
#840
https://github.com/bigartm/bigartm-docker/tree/manylinux

In addition to manylinux one needs one wheel for windows and for osx.

The following things have, in my opinion, very low priority if bigarmt can be installed via a wheel:

try to include BigARTM into kaggle kernel and anaconda (what if?:) )
update our docker
deal with conda in some way

@ofrei
Copy link
Contributor

ofrei commented Oct 21, 2017

remove all third-party dependencies from library (protobuf and glog)

I don't see why we should avoid these dependencies. For native code protobuf and glog are linked statically into libartm.so - once we release python wheel people won't even notice we depend on protobuf and glog. For python code there is only protobuf, which is officially released to pipy. So dependency on protobuf is as OK as dependency on numpy.

@JeanPaulShapo
Copy link
Contributor

remove all third-party dependencies from library (protobuf and glog)

I don't see why we should avoid these dependencies

@JeanPaulShapo
Copy link
Contributor

Moreover, there is already no need in patched protobuf code, because they raise message limit to 2GB, starting from 3.2.0 version.

@ofrei
Copy link
Contributor

ofrei commented Oct 21, 2017

@JeanPaulShapo

There is some mess when users have system-wide installed glog and protobuf

Yes but it won't affect as many uses as today after we stop ask everyone to build bigartm. Most users will get pre-build binary linked statically with glog & protobuf.

Generally it will be nice to have a build feature that asks bigartm to use system-wide installed. But only for linux (perhaps also mac). And for windows it is much easier to have this 3rdparties together with bigartm code.

@kuraga
Copy link

kuraga commented Nov 19, 2017

Generally it will be nice to have a build feature that asks bigartm to use system-wide installed. But only for linux (perhaps also mac). And for windows it is much easier to have this 3rdparties together with bigartm code.

👍

@ofrei And what about to move these dependencies in their own repositories (nevertheless under https://github.com/bigartm organization), because: it's great to update dependencies (well, hm, their code)? Ultimately we'll store them as submodules. (And it won't affect users 😄 )

UPD: And about Windows:

And for windows it is much easier to have this 3rdparties together with bigartm code.

Ok, but which differences between Boost and Protobuf/Glog are here? We don't store Boost inside...

@ankane
Copy link

ankane commented Feb 21, 2020

Was looking to add bigartm to Homebrew to make installation quicker and easier on Mac, but it'll need to use system libraries to be accepted. It'd be great if cmake would check before compiling the ones in the 3rdparty directory. Here's an example of how to do that: https://github.com/Xtra-Computing/thundersvm/blob/8ac8ec344893219f4ccb1307219c00cc94837eae/CMakeLists.txt#L41-L49

Formula: Homebrew/homebrew-core@master...ankane:bigartm

@bt2901
Copy link
Contributor

bt2901 commented Aug 12, 2020

Thanks for your interest, @ankane!

The way forward seems to distribute BigARTM with pip instead (building the library in a cross-platform way and bundling all external dependencies in .whl file). There's some progress on this using cibuildwheel and Travis (see #1008 ).

Currently, we have a provisional MacOSX distribution: pip install -i https://test.pypi.org/simple/ bigartm (however, there could be problems related to mkl and conda: #1022 ). Would you mind testing it?

@ankane
Copy link

ankane commented Aug 12, 2020

Hey @bt2901, I think that makes sense for the Python library, but it's probably better to use a system package manager (like apt or Homebrew) or a C++ package manager (like vcpkg or Conan) for the C++ library.

@bt2901
Copy link
Contributor

bt2901 commented Aug 12, 2020

Is there a place where I could see pros and cons? As far as I know, the current setup has the following problems:

  1. pip does not install CLI utility (which is needed for building co-occurrences dictionary). This could be fixed in some other ways, however

  2. It does not solve the development hurdles. If I want to change something in the core C++ code, then I need to compile from source, and this remains difficult.

  3. We miss potential performance gains related to compilation targeting a specific platform/architecture/system, but I have no idea how significant they could be (@ofrei @MelLain @JeanPaulShapo -- do you know anything about this?).

  4. There could be some issues with dependencies (replacing/shadowing/conflicting with system protobuf/boost/glog), but I do not know for sure.

Does using a package manager address something from this list? Maybe some other advantages (better anaconda support maybe?)

@ankane
Copy link

ankane commented Aug 13, 2020

I'm not aware of any write-ups on the topic, but it's a common way to distribute C++ libraries and CLI programs.

You could use it for 1. 2 isn't a distribution issue. For 3, it depends on the package manager. Package managers that precompile libraries like Homebrew optimize for convenience/install speed/portability rather than performance. For 4, you'd want the library use dependencies from the package manager.

However, I'm only suggesting this for the standalone C++ library.

@ankane
Copy link

ankane commented Sep 18, 2020

fwiw, I was able to get things working with Homebrew, which would allow Mac users to install the CLI and shared library with:

brew install bigartm

However, Homebrew won't accept versions with a pre-release tag on GitHub. If there's any interest in this and someone can remove the pre-release tag from the latest release, I can go ahead and submit it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants