Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Concept] : Weaviate Migration #788

Open
h0lybyte opened this issue Sep 2, 2023 · 5 comments
Open

[Concept] : Weaviate Migration #788

h0lybyte opened this issue Sep 2, 2023 · 5 comments
Assignees
Labels
2 enhancement New feature or request

Comments

@h0lybyte
Copy link
Member

h0lybyte commented Sep 2, 2023

Core Concept/Theory
A clear and concise description of what the concept is. Ex. It would be cool if [...]

This ticket is for the Weaviate Migration and Cluster test casing!


Alternative Ideas
Is there any other way this concept could be used?

We are migrating away from ChromaDB as a vector storage and using Weaviate.


Alternative Examples/Sources
Are there any other references that you can provide?

No major examples or sources, but we could put notes up on Weaviate


Additional information
Add any other context or examples of this concept here.

I will add additional information here regarding any sources or material.

@h0lybyte
Copy link
Member Author

h0lybyte commented Sep 2, 2023

We currently have three instances of Weaviate

1 - @ZachHandley Cluster through Peter from Weaviate - This will be our main / production cluster
2 - Local Swarm Cluster through Docker/Portainer Swarm - This is currently paused as the performance is really weak, storage is a bit too nested in emulation.
3 - Weaviate Test Instance under Appwrite's Test User - This is the one that @ernivani will mess around with.

This way we are not messing up anything when we are test casing the integration of weaviate during the dev cycle.

@h0lybyte h0lybyte added 2 and removed 0 labels Sep 2, 2023
@h0lybyte
Copy link
Member Author

h0lybyte commented Sep 4, 2023

We can utilize Unstructured -> https://github.com/Unstructured-IO to handle the _raw_data (raw) from HTML, PDFs, ect..

Blog Post from Weaviate -> https://weaviate.io/blog/ingesting-pdfs-into-weaviate
Repo for Weaviate + Unstructured -> https://github.com/weaviate/how-to-ingest-pdfs-with-unstructured

I am thinking we could have these be isolates, running outside of any main application then use a simple queue system to handle the data refinement.

_raw_data -> [ refinement node ] -> stores _data into the vector database ?

I am down to see if there could be any additional ways to handle it.

@ernivani
Copy link

ernivani commented Sep 4, 2023

Got Nodepy working with KBVE API, now setting up Weaviate to test case

@h0lybyte
Copy link
Member Author

h0lybyte commented Sep 4, 2023

Additional Reference Material:

Quick Tour -> https://colab.research.google.com/drive/1U8VCjY2-x8c6y5TYMbSFtQGlQVFHCVIW
SEC Extraction Repo -> https://github.com/Unstructured-IO/pipeline-sec-filings

Known Issue with duel column format -> Unstructured-IO/unstructured#356

@h0lybyte h0lybyte changed the title [Concept] : RentEarth.com - Weaviate Migration [Concept] : Weaviate Migration Sep 5, 2023
@h0lybyte
Copy link
Member Author

Having Weaviate operates as its own Appwrite function, which will reference either a cloud instance of Weaviate - OR - a swarm instance of Weaviate.

I suppose we could split this into two issue tickets a bit later, one ticket for the actual Weaviate Appwrite function and another ticket for the Weaviate Swarm instance.

I am just pondering my thoughts on this issue right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 enhancement New feature or request
Projects
Status: In Progress
Development

No branches or pull requests

4 participants