-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document inference pipeline #1473
Comments
I've actually been working on this independently, just as a way to undedrstand how the inference works. Would be happy to get a PR up with what I understand thus far (can do mermaid markdown) and we can iterate. Where should the documentation live? inside /inference? @yk |
Hey @alando46 sorry just seeing this now :) I think inside |
@alando46 i'm happy to help too on adding anything into /docs check out this https://github.com/LAION-AI/Open-Assistant/blob/main/docs/README.md#contributing but feel free to ping me if need any help making a PR |
thanks @andrewm4894 for the offer to help. Things are going well on the documentation, I've been able to make my way through the bulk of the open-assistant inference code (can expand to hf's text-generation-inference next), i just want to run some tests with print statements to confirm my notes on the control flow. I've tried following inference development variants 2&3 (variant 1, the docker-compose workflow seemed to be missing some required services), and with both, I get the following error when invoking the inference worker:
I should note that In order to get the tokenizer, do I need to authenticate with hugging face? Their documentation seems to suggest (unless I'm missing something) that this workflow should be allowed -> https://huggingface.co/docs/hub/models-downloading#integrated-libraries Any suggestions? Thanks in advance. |
that doesn't really make sense, it should just let you download it. maybe there's another problem, like some network issue (i.e. you get a 404), or a cert issue, or HF temporarily down... could be any number of things |
ok, thanks for the clarification @yk. I was able to download the tokenizer on a different machine so indeed some weird connection issue w gitpod. working on pr now... |
A proper documentation of the inference pipeline is very important. @alando46 Is there any progress, are you still working on it or should be look for someone else? Do you already have intermediate results? |
@andreaskoepf Yup made a good amount of progress. Thanks for checking in, let me finalize what I have and I can get something up soon. |
have made a pr here to add inference server fastapi docs to the docs site #3059 |
@andreaskoepf here is the WIP: #3119 Need to wrap up that final section, I am mostly complete but the codebase has been updated so need to review and verify things are correct. let me know what you think. |
This is a mostly done (although not totally complete) PR with a technical overview of the inference architecture. I'm looking forward to high level feedback (general layout, flow of documentation) or specific suggestions (I'm sure I made some errors or missed some details.) I will try to wrap up the final section soon. See related discussion on the issue: #1473 (comment) --------- Co-authored-by: Andreas Koepf <andreas.koepf@provisio.com> Co-authored-by: Oliver Stanley <olivergestanley@gmail.com>
Currently, the inference system lives under
/inference
. After installing the various dependencies, plus tmux, you can run it using thefull-dev-setup.sh
script (or use the docker compose inference profile).The inference system consists of multiple parts:
The goal of this issue is to document the inference pipeline, both how the individual parts are built (e.g. how the central server manages users, workers, etc.), and how they connect. This can be done using text, diagrams, etc. whatever. If a diagram, then preferrably a mermaid diagram directly in markdown.
The text was updated successfully, but these errors were encountered: