Fix/upload #609

mattzcarey · 2023-07-12T10:00:16Z

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Fixes: #607

Checklist before requesting a review

Please delete options that are not relevant.

My code follows the style guidelines of this project
I have performed a self-review of my code
I have commented hard-to-understand areas
I have ideally added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
Any dependent changes have been merged

Screenshots (if appropriate):

vercel · 2023-07-12T10:00:21Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
docs	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Jul 12, 2023 10:37am
quivrapp	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Jul 12, 2023 10:37am

github-actions · 2023-07-12T10:01:47Z

LOGAF Level 3 - /home/runner/work/quivr/quivr/backend/models/brains.py

The code is generally good, but there are areas for potential improvement.

The get_user_brains, get_brain_for_user, get_brain_details, delete_brain, create_brain, create_brain_user, create_brain_vector, get_vector_ids_from_file_sha1, update_brain_fields, update_brain_with_file, get_unique_brain_files, delete_file_from_brain methods are all interacting with the database. It would be better to separate these into a separate data access layer to keep the model layer clean and focused on business logic.
The get_unique_brain_files method is currently returning an empty list if there are no vector_ids. It would be better to return a more informative message to the user.
The delete_brain method could be improved by adding error handling for the case where the deletion operations fail.

Example changes:

class BrainDataAccess:
    def __init__(self, commons):
        self.commons = commons

    def get_user_brains(self, user_id):
        response = (
            self.commons["supabase"]
            .from_("brains_users")
            .select("id:brain_id, brains (id: brain_id, name)")
            .filter("user_id", "eq", user_id)
            .execute()
        )
        return [item["brains"] for item in response.data]

    # ... other methods ...

class Brain(BaseModel):
    # ... properties ...

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.data_access = BrainDataAccess(self.commons)

    def get_user_brains(self, user_id):
        return self.data_access.get_user_brains(user_id)

    # ... other methods ...

LOGAF Level 3 - /home/runner/work/quivr/quivr/backend/routes/explore_routes.py

The code is generally good, but there are areas for potential improvement.

The explore_endpoint, delete_endpoint, and download_endpoint methods could be improved by adding error handling for the case where the operations fail.
The download_endpoint method is currently returning an empty list if there are no documents. It would be better to return a more informative message to the user.

Example changes:

@explore_router.get("/explore/", dependencies=[Depends(AuthBearer())], tags=["Explore"])
async def explore_endpoint(
    brain_id: UUID = Query(..., description="The ID of the brain"),
):
    """
    Retrieve and explore unique user data vectors.
    """
    try:
        brain = Brain(id=brain_id)
        unique_data = brain.get_unique_brain_files()

        if not unique_data:
            return {"message": "No unique data found for this brain."}

        unique_data.sort(key=lambda x: int(x["size"]), reverse=True)
        return {"documents": unique_data}
    except Exception as e:
        return {"error": str(e)}

LOGAF Level 3 - /home/runner/work/quivr/quivr/backend/utils/vectors.py

The code is generally good, but there are areas for potential improvement.

The create_vector, create_embedding, similarity_search, create_summary, process_batch, get_unique_files_from_vector_ids methods are all interacting with the database. It would be better to separate these into a separate data access layer to keep the utility layer clean and focused on business logic.
The create_vector method could be improved by adding error handling for the case where the vector creation fails.

Example changes:

class VectorDataAccess:
    def __init__(self, commons):
        self.commons = commons

    def create_vector(self, doc, user_openai_api_key=None):
        logger.info("Creating vector for document")
        logger.info(f"Document: {doc}")
        if user_openai_api_key:
            self.commons["documents_vector_store"]._embedding = OpenAIEmbeddings(
                openai_api_key=user_openai_api_key
            )  # pyright: ignore reportPrivateUsage=none
        try:
            sids = self.commons["documents_vector_store"].add_documents([doc])
            if sids and len(sids) > 0:
                return sids
        except Exception as e:
            logger.error(f"Error creating vector for document {e}")
            return None

    # ... other methods ...

class Neurons(BaseModel):
    # ... properties ...

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.data_access = VectorDataAccess(self.commons)

    def create_vector(self, doc, user_openai_api_key=None):
        return self.data_access.create_vector(doc, user_openai_api_key)

    # ... other methods ...

🗂️🔍🔧

Powered by Code Review GPT

mamadoudicko · 2023-07-12T10:06:09Z

backend/utils/vectors.py

@@ -67,38 +67,39 @@ def error_callback(exception):
    print("An exception occurred:", exception)


-def process_batch(batch_ids):
+def process_batch(batch_ids: List[str]):


Sure about the return type ?

Thanks for typing ☺️

yep its a ulid now. We are handling them as strings I think

gozineb · 2023-07-12T10:22:18Z

scripts/202307111517031_change_vectors_id_type.sql

make it idempotent

gozineb · 2023-07-12T10:28:24Z

scripts/tables.sql

@@ -177,4 +175,4 @@ INSERT INTO migrations (name)
 SELECT '202307111517030_add_subscription_invitations_table'


Modify last migration reference to yours

gozineb

tables.sql to update

mamadoudicko · 2023-07-12T10:30:27Z

Agreed. We'll work on finding a way to automate this.

* fix: document upload * feat: explore fix to use uuid id * chore: remove prints * fix: tables.sql

mattzcarey added 2 commits July 12, 2023 10:37

fix: document upload

afbbfc4

feat: explore fix to use uuid id

9c15a00

mattzcarey requested a review from gozineb July 12, 2023 10:00

mattzcarey requested a review from mamadoudicko July 12, 2023 10:00

mattzcarey temporarily deployed to preview July 12, 2023 10:00 — with GitHub Actions Inactive

mattzcarey had a problem deploying to preview July 12, 2023 10:00 — with GitHub Actions Failure

vercel bot deployed to Preview – quivrapp July 12, 2023 10:00 View deployment

vercel bot deployed to Preview – docs July 12, 2023 10:01 View deployment

chore: remove prints

27ab500

mattzcarey temporarily deployed to preview July 12, 2023 10:01 — with GitHub Actions Inactive

mattzcarey had a problem deploying to preview July 12, 2023 10:01 — with GitHub Actions Failure

vercel bot deployed to Preview – quivrapp July 12, 2023 10:03 View deployment

vercel bot deployed to Preview – docs July 12, 2023 10:03 View deployment

mamadoudicko reviewed Jul 12, 2023

View reviewed changes

mamadoudicko previously approved these changes Jul 12, 2023

View reviewed changes

gozineb reviewed Jul 12, 2023

View reviewed changes

scripts/202307111517031_change_vectors_id_type.sql Outdated

Copy link

Contributor

gozineb Jul 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make it idempotent

gozineb requested review from gozineb and mamadoudicko July 12, 2023 10:27

gozineb reviewed Jul 12, 2023

View reviewed changes

gozineb suggested changes Jul 12, 2023

View reviewed changes

fix: tables.sql

d708cb2

mattzcarey dismissed mamadoudicko’s stale review via d708cb2 July 12, 2023 10:36

mattzcarey had a problem deploying to preview July 12, 2023 10:36 — with GitHub Actions Failure

mattzcarey temporarily deployed to preview July 12, 2023 10:36 — with GitHub Actions Inactive

vercel bot deployed to Preview – quivrapp July 12, 2023 10:37 View deployment

vercel bot deployed to Preview – docs July 12, 2023 10:37 View deployment

gozineb approved these changes Jul 12, 2023

View reviewed changes

mattzcarey merged commit cef45ea into main Jul 12, 2023

mattzcarey deleted the fix/upload branch July 12, 2023 10:44

StanGirard pushed a commit that referenced this pull request Sep 12, 2023

Fix: change vector id to UUID (#609)

79b3696

* fix: document upload * feat: explore fix to use uuid id * chore: remove prints * fix: tables.sql

dosubot bot mentioned this pull request Jan 10, 2024

[Bug]: can not create document vector #2003

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/upload #609

Fix/upload #609

mattzcarey commented Jul 12, 2023

vercel bot commented Jul 12, 2023 •

edited

Loading

github-actions bot commented Jul 12, 2023 •

edited

Loading

mamadoudicko Jul 12, 2023

mattzcarey Jul 12, 2023

gozineb Jul 12, 2023

gozineb Jul 12, 2023

mattzcarey Jul 12, 2023

gozineb left a comment

mamadoudicko commented Jul 12, 2023

		@@ -177,4 +175,4 @@ INSERT INTO migrations (name)
		SELECT '202307111517030_add_subscription_invitations_table'

Fix/upload #609

Fix/upload #609

Conversation

mattzcarey commented Jul 12, 2023

Description

Checklist before requesting a review

Screenshots (if appropriate):

vercel bot commented Jul 12, 2023 • edited Loading

github-actions bot commented Jul 12, 2023 • edited Loading

Powered by Code Review GPT

mamadoudicko Jul 12, 2023

Choose a reason for hiding this comment

mattzcarey Jul 12, 2023

Choose a reason for hiding this comment

gozineb Jul 12, 2023

Choose a reason for hiding this comment

gozineb Jul 12, 2023

Choose a reason for hiding this comment

mattzcarey Jul 12, 2023

Choose a reason for hiding this comment

gozineb left a comment

Choose a reason for hiding this comment

mamadoudicko commented Jul 12, 2023

vercel bot commented Jul 12, 2023 •

edited

Loading

github-actions bot commented Jul 12, 2023 •

edited

Loading