Add a gRPC server that can handle eval requests from a client using SQLGen#1
Merged
tommyang merged 1 commit intoGoogleCloudPlatform:mainfrom Aug 20, 2024
Merged
Conversation
c8d4d36 to
2e4ae77
Compare
2e4ae77 to
f40a3e1
Compare
…QLGen - `eval_server.py` is the alternative `main()` of EvalBench. CLI mode `evalbench.py` is untouched. - `eval_service.py` implement the gRPC service logic, currently it converts streaming RPC requests into a list of EvalInput and call `Evaluator.evaluate()`. - Add a Containerfile definition for deployment, using multi-stage build and distroless. Known limitations/future work: - Configs are currently loaded at eval_service init, so cannot handle different databases yet. This is the main reason why GetDataset RPC is not implemented yet. - EvalResponse proto is under-defined, we likely need to report back scoring results eventually. - Because of the way how Evaluator.evaluate() + SQLPromptGenWork + SQLGenWork currently works (`nl_prompt` -> `generated_prompt` -> `generated_sql`), there is currently a hack in `eval_service.py` where client-side SQLGen-generated SQL converted as `EvalInput.nl_prompt` so that this becomes `generated_sql` after going through passthrough prompt and model generators. `Evaluator.evaluate()` needs to be refactored to get rid of this hack (e.g. the ability to skip SQLPromptGenWork & SQLGenWork, not just passthrough), otherwise the `generated_sql` value is still overriden by SQLGenWork. This would be a relatively invasive change since Evaluator.evaluate() is shared by the service and the CLI mode of evalbench. - There is high OOM potential.
f40a3e1 to
36591c3
Compare
IsmailMehdi
approved these changes
Aug 20, 2024
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
eval_server.pyis the alternativemain()of EvalBench. CLI modeevalbench.pyis untouched.eval_service.pyimplement the gRPC service logic, currently it converts streaming RPC requests into a list of EvalInput and callEvaluator.evaluate().Known limitations/future work:
nl_prompt->generated_prompt->generated_sql), there is currently a hack ineval_service.pywhere client-side SQLGen-generated SQL converted asEvalInput.nl_promptso that this becomesgenerated_sqlafter going through passthrough prompt and model generators.Evaluator.evaluate()needs to be refactored to get rid of this hack (e.g. the ability to skip SQLPromptGenWork & SQLGenWork, not just passthrough), otherwise thegenerated_sqlvalue is still overriden by SQLGenWork. This would be a relatively invasive change since Evaluator.evaluate() is shared by the service and the CLI mode of evalbench.