-
-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: add pyserini #14474
base: main
Are you sure you want to change the base?
WIP: add pyserini #14474
Conversation
Hi! This is the friendly automated conda-forge-linting service. I wanted to let you know that I linted all conda-recipes in your PR ( Here's what I've got... For recipes/pyserini:
|
Hi! This is the friendly automated conda-forge-linting service. I just wanted to let you know that I linted all conda-recipes in your PR ( |
Dependency/recipe updates from a very quick reaction upstream. Also, maintainer additions: License was also solved extremely quickly: castorini/pyserini#462 |
recipes/pyserini/meta.yaml
Outdated
run: | ||
- python | ||
- cython >=0.29.21 | ||
- faiss >=1.6.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MXueguang
Note, this is intentionally not faiss-cpu
; conda-forge has a different naming scheme than the pytorch channel, which makes it possible for this recipe to depend on faiss independently of the CPU or GPU version (faiss-cpu
& faiss-gpu
are still provided for compatibility...).
recipes/pyserini/meta.yaml
Outdated
# blocked on https://github.com/conda-forge/tensorflow-feedstock/pull/110 | ||
# - tensorflow >=2.3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That PR for tensorflow has progressed very far now, and hopefully we'll have tensorflow builds on conda-forge in 1-2 weeks. Long-term, there are still some CI infra improvements necessary to consistently pytorch/tensorflow in conda-forge without too much manual intervention (currently running into the 6h azure timeout, not to mention not having GPUs in CI).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@h-vetinari I don't think we need TensorFlow for pyserini. Let's just remove the dependency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it's in requirements.txt
upstream, so I wouldn't know if that dependency is somehow optional... 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not required. I'll double-check and remove it soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, cool.
By the same token, does pytorch
need to be added to requirements.txt
? At least you mentioned it in your recipe suggestion.
Note, it's also possible to build several different outputs (e.g. pyserini-core
vs. pyserini-tensorflow
vs. pyserini-pytorch
vs. pyserini-all
), in case that's interesting to you. Generally, packages should have the minimum amount of required dependencies, but if there's a good argument (usually optional code-path in the package) for certain dependencies, it can make sense to add more variants (pip/PyPI also support this with the optional package[extras]
syntax).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel we dont have to add pytorch to requirements. pytorch is usually installed by following guidance on official website I think?
But for conda forge we can try to contain everything
@conda-forge/staged-recipes, could someone please advise how to deal with the packaged anserini-jar? Is that OK, or should it be built it in conda-forge? |
You will need to make sure to include the licenses of all dependencies in the conda package. That is something the build system (probably maven?) can do easily for you. Otherwise, I would see it as important to build the JAR inside of conda-forge if you would have compiled code (not Java bytecode) in it. If there is no compiled code, I would not see great benefits of rebuilding it here personally. |
Thanks for the response @xhochy! @lintool @MXueguang, could you tell us what's in the anserini-jar, i.e. what is (or isn't) compiled into it? |
I'll let @MXueguang chime in here, but the Anserini fat jar can be downloaded from Maven Central: https://search.maven.org/artifact/io.anserini/anserini There, we can see it's clearly Apache 2 licensed also. |
@h-vetinari |
Hey, sorry for the long silence. I originally thought we'd have to compile the java, but now I agree that it's fine to just move on for the time being with the jar that you have already. In the meantime, the requirements have moved on quite a bit, and I'll have to fix a few things (try to package torchaudio, update nmslib). Of course, we could first create the feedstock for 0.11.0.0 and then work our way forward slowly. |
Well, never mind. I burrowed down into the stack a bit
because trec_eval definitely contains compiled (C) code. |
Squashed from conda-forge#17034
Hi! This is the friendly automated conda-forge-linting service. I wanted to let you know that I linted all conda-recipes in your PR ( Here's what I've got... For recipes/anserini:
For recipes/trec_eval:
For recipes/trec_eval:
Documentation on acceptable licenses can be found here. |
Hi! This is the friendly automated conda-forge-linting service. I wanted to let you know that I linted all conda-recipes in your PR ( Here's what I've got... For recipes/anserini:
|
It's been so long, I'm not even sure anymore why I thought that we needed torchaudio. It's installed in the setup upstream, but it's not in the requirements anymore. Let's see how it goes without it. |
Hi! This is the friendly automated conda-forge-linting service. I just wanted to let you know that I linted all conda-recipes in your PR ( |
OK, need to move anserini.jar to Or perhaps nicer, just symlink it (but that brings all sorts of complications in rights-restricted environments, see how much was necessary to get this right in arrow). |
Hi friend! We really, really, really appreciate that you have taken the time to make a PR on In an effort to maintain this repository and increase the signal-to-noise for open PRs, the maintainers of If you'd like to keep it open, please comment/push and we will be happy to oblige! Note that very old PRs will likely need to be rebased on Cheers and thank you for contributing to this community effort! |
Not stale |
Checklist
url
) rather than a repo (e.g.git_url
) is used in your recipe (see here for more details).I'm maintaining the feedstock for faiss, and saw a link on the feedstock tracker to castorini/pyserini#426, where there was interest in packaging pyserini.
This will probably need some more work for packaging for building anserini in conda-forge as well, rather than depending on the embedded jar.
Not sure if there are agreed-upon best practices for java-dependencies; the knowledge base search doesn't return something meaningful for "java" or "jdk".
CC @lintool @MXueguang for your info; I'd be more than happy to have you join as recipe maintainers. :)
PS. The upstream repo does not have a license file, but there is a badge pointing to Apache-2.0. on the readme-page.Fixed