Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document inference pipeline #1473

Closed
yk opened this issue Feb 11, 2023 · 10 comments · Fixed by #3119
Closed

Document inference pipeline #1473

yk opened this issue Feb 11, 2023 · 10 comments · Fixed by #3119

Comments

@yk
Copy link
Collaborator

yk commented Feb 11, 2023

Currently, the inference system lives under /inference. After installing the various dependencies, plus tmux, you can run it using the full-dev-setup.sh script (or use the docker compose inference profile).

The inference system consists of multiple parts:

  • one central inference server that connects to a postgres and a redis database.
  • many external workers that perform the actual inference
    • each worker consist again of two parts, a python connector and a text-generation backend
    • the text-generation backend exposes an HTTP API
    • the python connector connects to the HTTP API and also to the central server via websocket
  • optionally: the text-client. this one is mainly for testing

The goal of this issue is to document the inference pipeline, both how the individual parts are built (e.g. how the central server manages users, workers, etc.), and how they connect. This can be done using text, diagrams, etc. whatever. If a diagram, then preferrably a mermaid diagram directly in markdown.

@alando46
Copy link
Contributor

alando46 commented Feb 17, 2023

I've actually been working on this independently, just as a way to undedrstand how the inference works. Would be happy to get a PR up with what I understand thus far (can do mermaid markdown) and we can iterate. Where should the documentation live? inside /inference? @yk

@yk
Copy link
Collaborator Author

yk commented Feb 24, 2023

Hey @alando46 sorry just seeing this now :) I think inside docs/ would be cool as that gets deployed to: https://projects.laion.ai/Open-Assistant/docs/intro

@andrewm4894
Copy link
Collaborator

@alando46 i'm happy to help too on adding anything into /docs

check out this https://github.com/LAION-AI/Open-Assistant/blob/main/docs/README.md#contributing

but feel free to ping me if need any help making a PR

@alando46
Copy link
Contributor

alando46 commented Mar 7, 2023

thanks @andrewm4894 for the offer to help. Things are going well on the documentation, I've been able to make my way through the bulk of the open-assistant inference code (can expand to hf's text-generation-inference next), i just want to run some tests with print statements to confirm my notes on the control flow.

I've tried following inference development variants 2&3 (variant 1, the docker-compose workflow seemed to be missing some required services), and with both, I get the following error when invoking the inference worker:

(inf) gitpod /workspace/Open-Assistant/inference/worker (feature/inference_documentation) $ API_KEY=0000 python __main__.py
2023-03-07 02:07:55.853 | INFO     | __main__:main:16 - Inference protocol version: 1
Traceback (most recent call last):
  File "/workspace/Open-Assistant/inference/worker/__main__.py", line 149, in <module>
    main()
  File "/workspace/Open-Assistant/inference/worker/__main__.py", line 18, in main
    tokenizer = Tokenizer.from_pretrained(settings.model_id)
Exception: Model "distilgpt2" on the Hub doesn't have a tokenizer

I should note that (inf) is a virtualenv I created with all required dependencies.

In order to get the tokenizer, do I need to authenticate with hugging face? Their documentation seems to suggest (unless I'm missing something) that this workflow should be allowed -> https://huggingface.co/docs/hub/models-downloading#integrated-libraries

Any suggestions? Thanks in advance.

@yk
Copy link
Collaborator Author

yk commented Mar 7, 2023

that doesn't really make sense, it should just let you download it. maybe there's another problem, like some network issue (i.e. you get a 404), or a cert issue, or HF temporarily down... could be any number of things

@alando46
Copy link
Contributor

ok, thanks for the clarification @yk. I was able to download the tokenizer on a different machine so indeed some weird connection issue w gitpod. working on pr now...

@andreaskoepf
Copy link
Collaborator

A proper documentation of the inference pipeline is very important. @alando46 Is there any progress, are you still working on it or should be look for someone else? Do you already have intermediate results?

@alando46
Copy link
Contributor

alando46 commented May 5, 2023

@andreaskoepf Yup made a good amount of progress. Thanks for checking in, let me finalize what I have and I can get something up soon.

@andrewm4894
Copy link
Collaborator

andrewm4894 commented May 6, 2023

have made a pr here to add inference server fastapi docs to the docs site #3059

@alando46
Copy link
Contributor

alando46 commented May 10, 2023

@andreaskoepf here is the WIP: #3119

Need to wrap up that final section, I am mostly complete but the codebase has been updated so need to review and verify things are correct.

let me know what you think.

@olliestanley olliestanley linked a pull request Jun 12, 2023 that will close this issue
olliestanley added a commit that referenced this issue Jun 12, 2023
This is a mostly done (although not totally complete) PR with a
technical overview of the inference architecture. I'm looking forward to
high level feedback (general layout, flow of documentation) or specific
suggestions (I'm sure I made some errors or missed some details.) I will
try to wrap up the final section soon.

See related discussion on the issue:
#1473 (comment)

---------

Co-authored-by: Andreas Koepf <andreas.koepf@provisio.com>
Co-authored-by: Oliver Stanley <olivergestanley@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants