-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RankLLM Revamp - VLLM support, APEER prompt, fixed caching, improved documentation #119
Conversation
src/rank_llm/scripts/run_rank_llm.py
Outdated
@@ -153,5 +155,10 @@ def main(args): | |||
default="You are RankLLM, an intelligent assistant that can rank passages based on their relevancy to the query.", | |||
help="the system message used in prompts", | |||
) | |||
parser.add_argument( | |||
"--batched", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This batch can be confused with rerank_batch, the latter is for reranking multiple queries regardless of if it is happening one query at a time or multiple queries at once. The former refers to reranking multiple queries at once. from the default param values it looks like we want to support rerank_batch function with batch=false? if yes, I think the two different meanings of batch can be confusing, maybe we need to rename one of the two usecases? WDYT @ronakice , @lintool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm i agree this could be a source of confusion for users. Batching usually means multiple queries being processed at once correct? In which case rerank_batch
needs renaming?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with my understanding batch processing is async and yes it is normally multiple queries at once, but it doesn't restrict the batch size to be strictly greater than 1. IIRC, @lintool chose rerank
and rerank_batch
to align with pyserini function names (retrieve
and retrieve_batch
).
@ronakice WDYT about changing --batched
to --vllm-batched
?
I propose to have an --inference-method
enum (or some similar name) with FastChat as the default value. The user can set it to vllm if they want to use vllm rather than fastchat inference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add these two yes 👍🏼 I agree for the time being this is more clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sahel-sh would we need to mention the --inference-method
; --vllm-batched
is the only vllm
setting, otherwise it defaults to FastChat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Less than Ideal since batch in rerank_batch has a different implication than batch in vllm-batched, but it is ok for now
import json | ||
import os | ||
|
||
def convert_json_to_jsonl(input_file, output_file): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Convert from unsupported JSON request format to newer JSONL request format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it, thanks, please mention where the data class that the new format is comining from in the high level comment above.
Pull Request Checklist
Reference Issue
Please provide the reference to issue this PR is addressing (# followed by the issue number). If there is no associated issue, write "N/A".
ref:
Checklist Items
Before submitting your pull request, please review these items:
PR Type
What kind of change does this PR introduce?