Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add vector database to the cloud #193

Closed
Tracked by #170
1jamesthompson1 opened this issue Jun 24, 2024 · 5 comments · Fixed by #216
Closed
Tracked by #170

Add vector database to the cloud #193

1jamesthompson1 opened this issue Jun 24, 2024 · 5 comments · Fixed by #216

Comments

@1jamesthompson1
Copy link
Owner

1jamesthompson1 commented Jun 24, 2024

Previously the viewer worked by having the data simply in a data folder.
The viewer currently works only locally by just giving a local address to the viewer app.

However it would be better to have a database that is in the cloud and seperate from the webapp

There is a curent problem with lancedb as it doesnt support fts in the cloud:
https://lancedb.github.io/lancedb/python/saas-python/#lancedb.remote.table.RemoteTable.search

Therefore I might have a look at qdrant or another vector search software package

@1jamesthompson1 1jamesthompson1 changed the title Upload the lancedb to a cloud provider so that it can work with the webapp still hosted on heroku Add vector database to the cloud Jun 24, 2024
@1jamesthompson1
Copy link
Owner Author

The problem with not being able to do ful text search can be fixed by just not using full text search!

I will try and implement it with lancedb and get a cloud version up and running it will require an update of Seracher class to handle async connections.

@1jamesthompson1
Copy link
Owner Author

The async error is addressed: https://github.com/lancedb/lancedb/pull/1102/files

I have missed the point that I need to have adlfs installed. I will do that now and see if that works

@1jamesthompson1
Copy link
Owner Author

I believe in 0.9 of lancedb it has been fixed. I haven't found out which thing fixed it.

Anyways the sync seems to be working. New problem is that the latency is too much. I.e it is taking about 15 seconds for a query.

Might be working looking at smb file share, managed disk or other searhcing techniques to reduce latency.

@1jamesthompson1
Copy link
Owner Author

I am trying to deploy it with Heroku.

I am getting an error that is todo with pyarrow.

2024-07-05T01:42:57.584452+00:00 app[web.1]: File "pyarrow/_fs.pyx", line 471, in pyarrow._fs.FileSystem.from_uri
2024-07-05T01:42:57.584453+00:00 app[web.1]: File "pyarrow/error.pxi", line 154, in pyarrow.lib.pyarrow_internal_check_status
2024-07-05T01:42:57.584453+00:00 app[web.1]: File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status
2024-07-05T01:42:57.584454+00:00 app[web.1]: pyarrow.lib.ArrowInvalid: Unrecognized filesystem type in URI: az://vectordb/testing/all_document_types.lance

@1jamesthompson1
Copy link
Owner Author

This now works and the only error is that requests times are longer than 30 seconds.

@1jamesthompson1 1jamesthompson1 linked a pull request Jul 12, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant