Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GenerationMixin class #29

Merged
merged 12 commits into from
Jul 18, 2022
Merged

Add GenerationMixin class #29

merged 12 commits into from
Jul 18, 2022

Conversation

artek0chumak
Copy link
Collaborator

Add generation abstraction, that's using inference_session.
Added modes:

  • Greedy, top-k/top-p sampling
  • Multibatch generation
  • Constraint abstraction
    In the future, I'll add prefix-tuned generation, beam-search and more hf-like stuff.

@@ -23,7 +23,7 @@ def __init__(self, *args, memory_cache: MemoryCache, **kwargs):
for name, buf in self.module.named_buffers():
assert not buf.requires_grad, f"Bloom layer parameters must not accumulate gradients, but {name} does"

self.inference_pool = TaskPool(self.inference_step, max_batch_size=1, name=f"{self.name}_inference")
self.inference_pool = TaskPool(self.inference_step, max_batch_size=4096, name=f"{self.name}_inference")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can have an adverse effect of grouping together concurrent requests for inference into one pytorch call. The current inference code will break in that case.

Copy link
Collaborator

@justheuristic justheuristic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setting max batch size other than 0 in TaskPool for inference will cause it to merge requests from different users, which is not supported on the backend (both queries will fail)

There are several options to work around that:

  • implement multi-source inference on the backend side
  • make a custom task pool for inference
  • set batch size 1 for now, fix in a subsequent PR

@artek0chumak
Copy link
Collaborator Author

I fix test/black/isort. Remove change in max_batch_size in server/handler.py.

@justheuristic justheuristic merged commit 6ee942e into main Jul 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants