Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: add pyserini #14474

Open
wants to merge 38 commits into
base: main
Choose a base branch
from
Open

WIP: add pyserini #14474

wants to merge 38 commits into from

Conversation

h-vetinari
Copy link
Member

@h-vetinari h-vetinari commented Apr 6, 2021

Checklist

  • Title of this PR is meaningful: e.g. "Adding my_nifty_package", not "updated meta.yaml".
  • License file is packaged (see here for an example).
  • Source is from official source.
  • Package does not vendor other packages. (If a package uses the source of another package, they should be separate packages or the licenses of all packages need to be packaged).
  • If static libraries are linked in, the license of the static library is packaged.
  • Build number is 0.
  • A tarball (url) rather than a repo (e.g. git_url) is used in your recipe (see here for more details).
  • GitHub users listed in the maintainer section have posted a comment confirming they are willing to be listed there.
  • When in trouble, please check our knowledge base documentation before pinging a team.

I'm maintaining the feedstock for faiss, and saw a link on the feedstock tracker to castorini/pyserini#426, where there was interest in packaging pyserini.

This will probably need some more work for packaging for building anserini in conda-forge as well, rather than depending on the embedded jar.

Not sure if there are agreed-upon best practices for java-dependencies; the knowledge base search doesn't return something meaningful for "java" or "jdk".

CC @lintool @MXueguang for your info; I'd be more than happy to have you join as recipe maintainers. :)

PS. The upstream repo does not have a license file, but there is a badge pointing to Apache-2.0. on the readme-page. Fixed

@conda-forge-linter
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I wanted to let you know that I linted all conda-recipes in your PR (recipes/pyserini) and found some lint.

Here's what I've got...

For recipes/pyserini:

  • license_file entry is missing, but is required.

@conda-forge-linter
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipes/pyserini) and found it was in an excellent condition.

@h-vetinari
Copy link
Member Author

Dependency/recipe updates from a very quick reaction upstream.

Also, maintainer additions:
@lintool: confirmation
@MXueguang: confirmation

License was also solved extremely quickly: castorini/pyserini#462

run:
- python
- cython >=0.29.21
- faiss >=1.6.5
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MXueguang
Note, this is intentionally not faiss-cpu; conda-forge has a different naming scheme than the pytorch channel, which makes it possible for this recipe to depend on faiss independently of the CPU or GPU version (faiss-cpu & faiss-gpu are still provided for compatibility...).

Comment on lines 31 to 32
# blocked on https://github.com/conda-forge/tensorflow-feedstock/pull/110
# - tensorflow >=2.3
Copy link
Member Author

@h-vetinari h-vetinari Apr 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That PR for tensorflow has progressed very far now, and hopefully we'll have tensorflow builds on conda-forge in 1-2 weeks. Long-term, there are still some CI infra improvements necessary to consistently pytorch/tensorflow in conda-forge without too much manual intervention (currently running into the 6h azure timeout, not to mention not having GPUs in CI).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@h-vetinari I don't think we need TensorFlow for pyserini. Let's just remove the dependency?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it's in requirements.txt upstream, so I wouldn't know if that dependency is somehow optional... 😅

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not required. I'll double-check and remove it soon.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, cool.

By the same token, does pytorch need to be added to requirements.txt? At least you mentioned it in your recipe suggestion.

Note, it's also possible to build several different outputs (e.g. pyserini-core vs. pyserini-tensorflow vs. pyserini-pytorch vs. pyserini-all), in case that's interesting to you. Generally, packages should have the minimum amount of required dependencies, but if there's a good argument (usually optional code-path in the package) for certain dependencies, it can make sense to add more variants (pip/PyPI also support this with the optional package[extras] syntax).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we dont have to add pytorch to requirements. pytorch is usually installed by following guidance on official website I think?
But for conda forge we can try to contain everything

@h-vetinari
Copy link
Member Author

@conda-forge/staged-recipes, could someone please advise how to deal with the packaged anserini-jar? Is that OK, or should it be built it in conda-forge?

@xhochy
Copy link
Member

xhochy commented Apr 9, 2021

You will need to make sure to include the licenses of all dependencies in the conda package. That is something the build system (probably maven?) can do easily for you. Otherwise, I would see it as important to build the JAR inside of conda-forge if you would have compiled code (not Java bytecode) in it. If there is no compiled code, I would not see great benefits of rebuilding it here personally.

@h-vetinari
Copy link
Member Author

Thanks for the response @xhochy!

@lintool @MXueguang, could you tell us what's in the anserini-jar, i.e. what is (or isn't) compiled into it?

@lintool
Copy link

lintool commented Apr 9, 2021

I'll let @MXueguang chime in here, but the Anserini fat jar can be downloaded from Maven Central: https://search.maven.org/artifact/io.anserini/anserini

There, we can see it's clearly Apache 2 licensed also.

@MXueguang
Copy link

@h-vetinari
I think there is some compiled code.
I am not sure, but I think for the first version of Pyserini on conda-forge, we can try to keep things simple and just deal with Pyserini first? and see how it works?

@h-vetinari
Copy link
Member Author

h-vetinari commented Nov 20, 2021

Hey, sorry for the long silence. I originally thought we'd have to compile the java, but now I agree that it's fine to just move on for the time being with the jar that you have already.

In the meantime, the requirements have moved on quite a bit, and I'll have to fix a few things (try to package torchaudio, update nmslib). Of course, we could first create the feedstock for 0.11.0.0 and then work our way forward slowly.

@h-vetinari
Copy link
Member Author

@h-vetinari: I originally thought we'd have to compile the java, but now I agree that it's fine to just move on for the time being with the jar that you have already.

Well, never mind. I burrowed down into the stack a bit
pyserini -> anserini -> anserini-tools -> trec_eval, and now the following becomes relevant...

@xhochy: Otherwise, I would see it as important to build the JAR inside of conda-forge if you would have compiled code (not Java bytecode) in it.

because trec_eval definitely contains compiled (C) code.

@conda-forge-linter
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I wanted to let you know that I linted all conda-recipes in your PR (recipes/anserini, recipes/pyserini, recipes/trec_eval) and found some lint.

Here's what I've got...

For recipes/anserini:

  • noarch: python recipes are required to have a lower bound on the python version. Typically this means putting python >=3.6 in both host and run but you should check upstream for the package's Python compatibility.

For recipes/trec_eval:

  • The recipe license cannot be unknown.

For recipes/trec_eval:

  • License is not an SPDX identifier (or a custom LicenseRef) nor an SPDX license expression.

Documentation on acceptable licenses can be found here.

@conda-forge-webservices
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I wanted to let you know that I linted all conda-recipes in your PR (recipes/anserini, recipes/pyserini) and found some lint.

Here's what I've got...

For recipes/anserini:

  • noarch: python recipes are required to have a lower bound on the python version. Typically this means putting python >=3.6 in both host and run but you should check upstream for the package's Python compatibility.

@h-vetinari
Copy link
Member Author

It's been so long, I'm not even sure anymore why I thought that we needed torchaudio. It's installed in the setup upstream, but it's not in the requirements anymore. Let's see how it goes without it.

@conda-forge-webservices
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipes/anserini, recipes/pyserini) and found it was in an excellent condition.

@h-vetinari
Copy link
Member Author

h-vetinari commented Aug 4, 2023

OK, need to move anserini.jar to $PREFIX/share/anserini or something (compare pyjnius), and then patch pyserini to look for it there instead of $SP_DIR/pyserini/resources/jars.

Or perhaps nicer, just symlink it (but that brings all sorts of complications in rights-restricted environments, see how much was necessary to get this right in arrow).

@conda-forge conda-forge deleted a comment from conda-forge-webservices bot Aug 4, 2023
@conda-forge conda-forge deleted a comment from conda-forge-webservices bot Aug 4, 2023
@conda-forge conda-forge deleted a comment from conda-forge-webservices bot Aug 4, 2023
Copy link

stale bot commented Jan 1, 2024

Hi friend!

We really, really, really appreciate that you have taken the time to make a PR on conda-forge/staged-recipes! conda-forge only exists because people like you donate their time to build and maintain conda recipes for use by the community.

In an effort to maintain this repository and increase the signal-to-noise for open PRs, the maintainers of staged-recipes close excessively old PRs after six months. This PR will remain open for another month, and then will be closed.

If you'd like to keep it open, please comment/push and we will be happy to oblige! Note that very old PRs will likely need to be rebased on main so that they can be rebuilt with the most recent CI scripts. If you have any trouble, or we missed reviewing this PR in the first place (sorry!), feel free to ping the team using a special command in a comment on the PR to get the attention of the staged-recipes team.

Cheers and thank you for contributing to this community effort!

@stale stale bot added the stale will be closed in 30 days label Jan 1, 2024
@h-vetinari
Copy link
Member Author

Not stale

@stale stale bot removed the stale will be closed in 30 days label Jan 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants