Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/upload #609

Merged
merged 4 commits into from
Jul 12, 2023
Merged

Fix/upload #609

merged 4 commits into from
Jul 12, 2023

Conversation

mattzcarey
Copy link
Contributor

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Fixes: #607

Checklist before requesting a review

Please delete options that are not relevant.

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented hard-to-understand areas
  • I have ideally added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged

Screenshots (if appropriate):

@mattzcarey mattzcarey requested a review from gozineb July 12, 2023 10:00
@vercel
Copy link

vercel bot commented Jul 12, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jul 12, 2023 10:37am
quivrapp ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jul 12, 2023 10:37am

@github-actions
Copy link
Contributor

github-actions bot commented Jul 12, 2023

LOGAF Level 3 - /home/runner/work/quivr/quivr/backend/models/brains.py

The code is generally good, but there are areas for potential improvement.

  1. The get_user_brains, get_brain_for_user, get_brain_details, delete_brain, create_brain, create_brain_user, create_brain_vector, get_vector_ids_from_file_sha1, update_brain_fields, update_brain_with_file, get_unique_brain_files, delete_file_from_brain methods are all interacting with the database. It would be better to separate these into a separate data access layer to keep the model layer clean and focused on business logic.

  2. The get_unique_brain_files method is currently returning an empty list if there are no vector_ids. It would be better to return a more informative message to the user.

  3. The delete_brain method could be improved by adding error handling for the case where the deletion operations fail.

Example changes:

class BrainDataAccess:
    def __init__(self, commons):
        self.commons = commons

    def get_user_brains(self, user_id):
        response = (
            self.commons["supabase"]
            .from_("brains_users")
            .select("id:brain_id, brains (id: brain_id, name)")
            .filter("user_id", "eq", user_id)
            .execute()
        )
        return [item["brains"] for item in response.data]

    # ... other methods ...

class Brain(BaseModel):
    # ... properties ...

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.data_access = BrainDataAccess(self.commons)

    def get_user_brains(self, user_id):
        return self.data_access.get_user_brains(user_id)

    # ... other methods ...

LOGAF Level 3 - /home/runner/work/quivr/quivr/backend/routes/explore_routes.py

The code is generally good, but there are areas for potential improvement.

  1. The explore_endpoint, delete_endpoint, and download_endpoint methods could be improved by adding error handling for the case where the operations fail.

  2. The download_endpoint method is currently returning an empty list if there are no documents. It would be better to return a more informative message to the user.

Example changes:

@explore_router.get("/explore/", dependencies=[Depends(AuthBearer())], tags=["Explore"])
async def explore_endpoint(
    brain_id: UUID = Query(..., description="The ID of the brain"),
):
    """
    Retrieve and explore unique user data vectors.
    """
    try:
        brain = Brain(id=brain_id)
        unique_data = brain.get_unique_brain_files()

        if not unique_data:
            return {"message": "No unique data found for this brain."}

        unique_data.sort(key=lambda x: int(x["size"]), reverse=True)
        return {"documents": unique_data}
    except Exception as e:
        return {"error": str(e)}

LOGAF Level 3 - /home/runner/work/quivr/quivr/backend/utils/vectors.py

The code is generally good, but there are areas for potential improvement.

  1. The create_vector, create_embedding, similarity_search, create_summary, process_batch, get_unique_files_from_vector_ids methods are all interacting with the database. It would be better to separate these into a separate data access layer to keep the utility layer clean and focused on business logic.

  2. The create_vector method could be improved by adding error handling for the case where the vector creation fails.

Example changes:

class VectorDataAccess:
    def __init__(self, commons):
        self.commons = commons

    def create_vector(self, doc, user_openai_api_key=None):
        logger.info("Creating vector for document")
        logger.info(f"Document: {doc}")
        if user_openai_api_key:
            self.commons["documents_vector_store"]._embedding = OpenAIEmbeddings(
                openai_api_key=user_openai_api_key
            )  # pyright: ignore reportPrivateUsage=none
        try:
            sids = self.commons["documents_vector_store"].add_documents([doc])
            if sids and len(sids) > 0:
                return sids
        except Exception as e:
            logger.error(f"Error creating vector for document {e}")
            return None

    # ... other methods ...

class Neurons(BaseModel):
    # ... properties ...

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.data_access = VectorDataAccess(self.commons)

    def create_vector(self, doc, user_openai_api_key=None):
        return self.data_access.create_vector(doc, user_openai_api_key)

    # ... other methods ...

🗂️🔍🔧


Powered by Code Review GPT

@@ -67,38 +67,39 @@ def error_callback(exception):
print("An exception occurred:", exception)


def process_batch(batch_ids):
def process_batch(batch_ids: List[str]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure about the return type ?

Thanks for typing ☺️

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep its a ulid now. We are handling them as strings I think

mamadoudicko
mamadoudicko previously approved these changes Jul 12, 2023
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make it idempotent

@@ -177,4 +175,4 @@ INSERT INTO migrations (name)
SELECT '202307111517030_add_subscription_invitations_table'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modify last migration reference to yours

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

@gozineb gozineb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tables.sql to update

@mamadoudicko
Copy link
Contributor

Agreed. We'll work on finding a way to automate this.

@mattzcarey mattzcarey merged commit cef45ea into main Jul 12, 2023
@mattzcarey mattzcarey deleted the fix/upload branch July 12, 2023 10:44
StanGirard pushed a commit that referenced this pull request Sep 12, 2023
* fix: document upload

* feat: explore fix to use uuid id

* chore: remove prints

* fix: tables.sql
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Issue] An error occurred while processing [FileName]
3 participants