Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embed docs.openfn.org into a vector database #72

Open
josephjclark opened this issue Jun 17, 2024 · 0 comments · May be fixed by #86
Open

Embed docs.openfn.org into a vector database #72

josephjclark opened this issue Jun 17, 2024 · 0 comments · May be fixed by #86

Comments

@josephjclark
Copy link
Collaborator

josephjclark commented Jun 17, 2024

See #71 for a spec on adding an vector database to Apollo.

Once we have a vector database waiting to go, we need to work out how to encode the docs site into it. This will then be used by services like chat and the job generator to add really focused context to prompts

I think the process is something like this:

  • Clone the docs repo and build it
    • This is quite a computationally expensive step - but we do need to do it to get a nice clean markdown representation of all our docs. Would it be easier to scrape the HTML site at docs.openfn.org instead? I don't think so?
  • Pull all the parsed .md files into string
  • Break each .md file up into chunks by section. I think a section is bound by ## and another ## or the end of the document
  • I don't know if we need to encode any context into the section, like a path?
  • Embed each section into the database.

It is likely to be several distinct commands: build the doc site, extract the content chunks, and embed the content chunks.

This process all needs to run at build-time, when the Docker image is assembled, so that the database is nicely pre-seeded when it gets deployed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: New Issues
Development

Successfully merging a pull request may close this issue.

1 participant