Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/backend core #656

Merged
merged 12 commits into from
Jul 17, 2023
Merged

Feat/backend core #656

merged 12 commits into from
Jul 17, 2023

Conversation

mattzcarey
Copy link
Contributor

@mattzcarey mattzcarey commented Jul 14, 2023

Description

Please merge #635 before this one :)

Feat:

  • move backend into backend/core
  • create backend/private

Closes #638

Checklist before requesting a review

Please delete options that are not relevant.

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented hard-to-understand areas
  • I have ideally added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged

Screenshots (if appropriate):

@vercel
Copy link

vercel bot commented Jul 14, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jul 16, 2023 11:08pm
quivrapp ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jul 16, 2023 11:08pm

@github-actions
Copy link
Contributor

github-actions bot commented Jul 14, 2023

/home/runner/work/quivr/quivr/backend/core/vectorstore/supabase.py - LOGAF Level 3

The code is generally good, but there are a few areas for potential improvement.

  1. The similarity_search method is doing too much. It's embedding documents, making a request to the client, and then processing the response. Consider breaking this down into smaller, more manageable methods.

  2. The similarity_search method could benefit from error handling. What happens if the rpc call fails? What if res.data is not in the expected format?

  3. The similarity_search method assumes that res.data is iterable. This might not always be the case. Consider adding a check to ensure that res.data is iterable before proceeding with the list comprehension.

/home/runner/work/quivr/quivr/backend/private/logger.py - LOGAF Level 4

The code is of high quality and only requires minor tweaks.

/home/runner/work/quivr/quivr/backend/private/main.py - LOGAF Level 3

The code is generally good, but there are a few areas for potential improvement.

  1. The SENTRY_DSN environment variable is being accessed directly. Consider using a configuration class or function to handle environment variables. This would make it easier to handle missing or incorrect environment variables.

  2. The http_exception_handler function could be improved. Currently, it only returns the status code and the detail of the exception. Consider including more information in the response, such as the type of the exception.

/home/runner/work/quivr/quivr/backend/private/models/messages.py - LOGAF Level 4

The code is of high quality and only requires minor tweaks.

/home/runner/work/quivr/quivr/backend/private/routes/completions_routes.py - LOGAF Level 3

The code is generally good, but there are a few areas for potential improvement.

  1. The post_chat_completions function is not doing much. It's logging a message and then returning a string. Consider adding more functionality to this function.

  2. The post_chat_completions function could benefit from error handling. What happens if the model or messages parameters are not in the expected format?

/home/runner/work/quivr/quivr/backend/private/routes/embeddings_routes.py - LOGAF Level 3

The code is generally good, but there are a few areas for potential improvement.

  1. The post_embeddings function is not doing much. It's logging a message and then returning a string. Consider adding more functionality to this function.

  2. The post_embeddings function could benefit from error handling. What happens if the model or input parameters are not in the expected format?

/home/runner/work/quivr/quivr/install_helper.sh - LOGAF Level 2

The code functions, but has significant issues needing attention.

  1. The script assumes that the user is running it in a Unix-like environment. This might not always be the case. Consider adding checks to ensure that the script is being run in a compatible environment.

  2. The script is asking the user for environment variables and then updating the .env files. This is a security risk. Consider using a more secure method to handle environment variables, such as using a secrets manager.

  3. The script is running a migration script and then launching the app. This might not always be desirable. Consider adding flags to allow the user to choose whether to run the migration script and whether to launch the app.


/home/runner/work/quivr/quivr/backend/core/tests/test_brains.py: Level 4

This is a well-written set of tests for the 'brains' functionality. The code is clear and easy to understand, with good use of comments to explain what each test is doing. The assertions are appropriate and cover the expected functionality.

/home/runner/work/quivr/quivr/backend/core/tests/test_chats.py: Level 3

The tests are generally well-written, but there are a few areas for improvement. For example, the print statements in the test_create_chat_and_talk and test_create_chat_and_talk_with_no_brain functions should be removed or replaced with proper logging. Also, the assert response.status_code == 200 checks could be more specific, checking for the exact expected response instead of just a successful status code.

/home/runner/work/quivr/quivr/backend/core/tests/test_explore.py: Level 4

This test is well-written and clear. It tests the expected functionality and includes appropriate assertions.

/home/runner/work/quivr/quivr/backend/core/tests/test_upload.py: Level 3

The tests are generally well-written, but there are a few areas for improvement. For example, the assert upload_response.status_code == 200 checks could be more specific, checking for the exact expected response instead of just a successful status code. Also, the assert "message" in upload_response_data checks could be more specific, checking for the exact expected message.

/home/runner/work/quivr/quivr/backend/core/tests/test_user.py: Level 4

This test is well-written and clear. It tests the expected functionality and includes appropriate assertions.

/home/runner/work/quivr/quivr/backend/core/utils/chats.py: Level 4

This code is clear and easy to understand. It performs its intended functionality well.

/home/runner/work/quivr/quivr/backend/core/utils/constants.py: Level 5

/home/runner/work/quivr/quivr/backend/core/utils/file.py: Level 4

This code is well-written and clear. It performs its intended functionality well. The convert_bytes function is a nice touch, making the file size human-readable.

/home/runner/work/quivr/quivr/backend/core/utils/processors.py: Level 3

The code is generally well-written, but there are a few areas for improvement. For example, the create_response function could be moved to a separate utility module, as it seems to be a general-purpose function not specific to file processing. Also, the filter_file function is quite long and does a lot of different things, so it could potentially be broken down into smaller, more manageable functions.

/home/runner/work/quivr/quivr/backend/core/utils/users.py: Level 4

This code is clear and easy to understand. It performs its intended functionality well.

/home/runner/work/quivr/quivr/backend/core/utils/vectors.py: Level 3

The code is generally well-written, but there are a few areas for improvement. For example, the create_vector function could be more robust by handling the case where sids is None or an empty list. Also, the create_embedding function could potentially be moved to a separate utility module, as it seems to be a general-purpose function not specific to vectors.

/home/runner/work/quivr/quivr/backend/core/vectorstore/init.py: Level 5

/home/runner/work/quivr/quivr/backend/core/repository/chat/create_chat.py: Level 3
The code is generally good, but there are areas for potential improvement. The CreateChatProperties class seems redundant as it only wraps a single string. You could simplify the code by passing the chat name directly to the create_chat function.

/home/runner/work/quivr/quivr/backend/core/repository/chat/format_chat_history.py: Level 5

/home/runner/work/quivr/quivr/backend/core/repository/chat/get_chat_by_id.py: Level 4

/home/runner/work/quivr/quivr/backend/core/repository/chat/get_chat_history.py: Level 3
The code is generally good, but there are areas for potential improvement. The if history is None check could be simplified by using the or operator to return an empty list when history is None.

history: List[ChatHistory] = (
    commons["supabase"]
    .from_("chat_history")
    .select("*")
    .filter("chat_id", "eq", chat_id)
    .order("message_time", desc=False)  # Add the ORDER BY clause
    .execute()
).data or []

/home/runner/work/quivr/quivr/backend/core/repository/chat/get_user_chats.py: Level 4

/home/runner/work/quivr/quivr/backend/core/repository/chat/update_chat.py: Level 3
The code is generally good, but there are areas for potential improvement. The ChatUpdatableProperties class seems redundant as it only wraps a single optional string. You could simplify the code by passing the chat name directly to the update_chat function.

/home/runner/work/quivr/quivr/backend/core/repository/chat/update_chat_history.py: Level 3
The code is generally good, but there are areas for potential improvement. The if len(response) == 0 check could be simplified by using the not operator to check if response is empty.

if not response:
    raise HTTPException(
        status_code=500, detail="An exception occurred while updating chat history."
    )

/home/runner/work/quivr/quivr/backend/core/repository/chat/update_message_by_id.py: Level 3
The code is generally good, but there are areas for potential improvement. The if not message_id check could be moved to the top of the function to fail fast when message_id is not provided.

/home/runner/work/quivr/quivr/backend/core/routes/init.py: Level 5

/home/runner/work/quivr/quivr/backend/core/routes/api_key_routes.py: Level 2
The code functions, but has significant issues needing attention. The create_api_key function has a while loop that could potentially run indefinitely if a UniqueViolationError keeps occurring. Consider adding a limit to the number of attempts to generate a unique API key.

/home/runner/work/quivr/quivr/backend/core/routes/authorizations/init.py: Level 5

/home/runner/work/quivr/quivr/backend/core/routes/authorizations/brain_authorization.py: Level 3
The code is generally good, but there are areas for potential improvement. The validate_brain_authorization function could be simplified by removing the if required_role is None check. The required_role parameter has a default value of "Owner", so it will never be None.

/home/runner/work/quivr/quivr/backend/core/routes/brain_routes.py: Level 2
The code functions, but has significant issues needing attention. The create_brain_endpoint function creates a new brain and assigns it as the default brain for the user if they don't already have one. This could lead to unexpected behavior if the user expects their existing default brain to remain the default. Consider adding a separate endpoint for setting the default brain.

/home/runner/work/quivr/quivr/backend/core/llm/private_gpt4all.py: Level 3

This code is generally good, but there are areas for potential improvement.

  1. The model_path is hardcoded. It would be better to move this to a configuration file or environment variable to make it easier to change without modifying the code.
  2. The __init__ method is quite long and does a lot of things. Consider breaking it down into smaller, more manageable methods.
  3. The embeddings property currently returns an instance of OpenAIEmbeddings. This is tightly coupled to the OpenAIEmbeddings class. Consider using dependency injection or an interface to make this more flexible and easier to test.
  4. The _create_llm method logs the model and streaming status. This could potentially be moved to a separate method to improve readability.

/home/runner/work/quivr/quivr/backend/core/llm/prompts/CONDENSE_PROMPT.py: Level 5

This code is excellent and needs no changes. It's clear, concise, and does exactly what it's supposed to do.

/home/runner/work/quivr/quivr/backend/core/llm/prompts/LANGUAGE_PROMPT.py: Level 5

This code is excellent and needs no changes. It's clear, concise, and does exactly what it's supposed to do.

/home/runner/work/quivr/quivr/backend/core/llm/qa_base.py: Level 3

This code is generally good, but there are areas for potential improvement.

  1. The generate_answer method is quite long and does a lot of things. Consider breaking it down into smaller, more manageable methods.
  2. The generate_stream method is also quite long and complex. Consider breaking it down into smaller, more manageable methods.
  3. The generate_stream method uses a lot of low-level asyncio code. Consider using higher-level abstractions to make the code easier to understand and maintain.
  4. The generate_stream method has a lot of nested code. Consider flattening the code to improve readability.

/home/runner/work/quivr/quivr/backend/core/llm/utils/summarization.py: Level 2

This code functions, but has significant issues needing attention.

  1. The openai_api_key is being fetched directly from the environment variables. This should be moved to a configuration file or passed in as a parameter to improve flexibility and testability.
  2. The llm_summerize and llm_evaluate_summaries methods are quite long and complex. Consider breaking them down into smaller, more manageable methods.
  3. The llm_summerize and llm_evaluate_summaries methods are tightly coupled to the guidance and openai modules. Consider using dependency injection or an interface to make this more flexible and easier to test.
  4. The llm_summerize and llm_evaluate_summaries methods are using string formatting to create the guidance templates. Consider using a template engine to make this more robust and easier to maintain.

/home/runner/work/quivr/quivr/backend/core/logger.py: Level 5

This code is excellent and needs no changes. It's clear, concise, and does exactly what it's supposed to do.

/home/runner/work/quivr/quivr/backend/core/main.py: Level 4

This code is high-quality and requires only minor tweaks. Consider moving the route imports to the top of the file to follow the standard Python import order.

/home/runner/work/quivr/quivr/backend/core/middlewares/cors.py: Level 4

This code is high-quality and requires only minor tweaks. Consider moving the origins list to a configuration file or environment variable to make it easier to change without modifying the code.

/home/runner/work/quivr/quivr/backend/core/models/brains.py: Level 3

This code is generally good, but there are areas for potential improvement.

  1. The Brain class is quite large and does a lot of things. Consider breaking it down into smaller, more manageable classes.
  2. The Brain class is tightly coupled to the supabase module. Consider using dependency injection or an interface to make this more flexible and easier to test.
  3. The Brain class has a lot of methods that directly interact with the database. Consider moving these to a separate data access layer to improve separation of concerns.
  4. The Brain class has a lot of methods that return raw database results. Consider mapping these to domain objects to improve encapsulation and make the code easier to work with.

/home/runner/work/quivr/quivr/backend/core/routes/chat_routes.py: Level 3

This code is generally good, but there are areas for potential improvement.

  1. The delete_chat_from_db function is catching all exceptions and simply printing them. This could make debugging difficult in the future. It would be better to log the exceptions with a logging module, which would provide more context about when and where the exceptions occurred.

  2. The check_user_limit function has a pass statement in the else clause. If there is no code to execute in this clause, it would be cleaner to remove it.

  3. The create_question_handler and create_stream_question_handler functions have a lot of repeated code. This could be refactored into a separate function to improve readability and maintainability.

Here is an example of how you could refactor the delete_chat_from_db function:

import logging

def delete_chat_from_db(commons, chat_id):
    try:
        commons["supabase"].table("chat_history").delete().match(
            {"chat_id": chat_id}
        ).execute()
    except Exception as e:
        logging.error(f"Error deleting chat history for chat_id {chat_id}: {e}")
    try:
        commons["supabase"].table("chats").delete().match(
            {"chat_id": chat_id}
        ).execute()
    except Exception as e:
        logging.error(f"Error deleting chat for chat_id {chat_id}: {e}")

/home/runner/work/quivr/quivr/backend/core/routes/crawl_routes.py: Level 3

This code is generally good, but there are areas for potential improvement.

  1. The crawl_endpoint function is quite long and does a lot of different things. It would be better to break this function up into smaller, more manageable functions.

  2. The crawl_endpoint function is also catching all exceptions and simply printing them. This could make debugging difficult in the future. It would be better to log the exceptions with a logging module, which would provide more context about when and where the exceptions occurred.

Here is an example of how you could refactor the crawl_endpoint function:

async def process_crawl_website(crawl_website, request):
    if not crawl_website.checkGithub():
        (
            file_path,
            file_name,
        ) = crawl_website.process()  # pyright: ignore reportPrivateUsage=none
        # Create a SpooledTemporaryFile from the file_path
        spooled_file = SpooledTemporaryFile()
        with open(file_path, "rb") as f:
            shutil.copyfileobj(f, spooled_file)

        # Pass the SpooledTemporaryFile to UploadFile
        uploadFile = UploadFile(
            file=spooled_file,  # pyright: ignore reportPrivateUsage=none
            filename=file_name,
        )
        return uploadFile
    else:
        #  check remaining free space here !!
        message = await process_github(
            commons,
            crawl_website.url,
            "false",
            brain_id,
            user_openai_api_key=request.headers.get("Openai-Api-Key", None),
        )
        return message

async def crawl_endpoint(
    request: Request,
    crawl_website: CrawlWebsite,
    brain_id: UUID = Query(..., description="The ID of the brain"),
    enable_summarization: bool = False,
    current_user: User = Depends(get_current_user),
):
    """
    Crawl a website and process the crawled data.
    """

    # [TODO] check if the user is the owner/editor of the brain
    brain = Brain(id=brain_id)

    commons = common_dependencies()

    if request.headers.get("Openai-Api-Key"):
        brain.max_brain_size = os.getenv(
            "MAX_BRAIN_SIZE_WITH_KEY", 209715200
        )  # pyright: ignore reportPrivateUsage=none

    file_size = 1000000
    remaining_free_space = brain.remaining_brain_size

    if remaining_free_space - file_size < 0:
        message = {
            "message": f"❌ User's brain will exceed maximum capacity with this upload. Maximum file allowed is : {convert_bytes(remaining_free_space)}",
            "type": "error",
        }
    else:
        uploadFile = await process_crawl_website(crawl_website, request)
        file = File(file=uploadFile)
        #  check remaining free space here !!
        message = await filter_file(
            commons,
            file,
            enable_summarization,
            brain.id,
            openai_api_key=request.headers.get("Openai-Api-Key", None),
        )
    return message

/home/runner/work/quivr/quivr/backend/core/routes/explore_routes.py: Level 4

This code is high-quality and requires only minor tweaks.

/home/runner/work/quivr/quivr/backend/core/routes/misc_routes.py: Level 5

This code is excellent and needs no changes.

/home/runner/work/quivr/quivr/backend/core/routes/subscription_routes.py: Level 3

This code is generally good, but there are areas for potential improvement.

  1. The invite_user_to_brain function is catching all exceptions and simply printing them. This could make debugging difficult in the future. It would be better to log the exceptions with a logging module, which would provide more context about when and where the exceptions occurred.

  2. The remove_user_subscription function has a lot of nested if-else statements. This could be refactored to improve readability and maintainability.

Here is an example of how you could refactor the invite_user_to_brain function:

import logging

async def invite_user_to_brain(
    brain_id: UUID, users: List[dict], current_user: User = Depends(get_current_user)
):
    # TODO: Ensure the current user has permissions to invite users to this brain

    for user in users:
        subscription = BrainSubscription(
            brain_id=brain_id,
            email=user["email"],
            rights=user["rights"],
            inviter_email=current_user.email or "Quivr",
        )

        try:
            subscription.create_or_update_subscription_invitation()
            subscription.resend_invitation_email()
        except Exception as e:
            logging.error(f"Error inviting user: {e}")
            raise HTTPException(status_code=400, detail=f"Error inviting user: {e}")

    return {"message": "Invitations sent successfully"}

/home/runner/work/quivr/quivr/backend/core/routes/upload_routes.py: Level 3

This code is generally good, but there are areas for potential improvement.

  1. The upload_file function is quite long and does a lot of different things. It would be better to break this function up into smaller, more manageable functions.

  2. The upload_file function is also catching all exceptions and simply printing them. This could make debugging difficult in the future. It would be better to log the exceptions with a logging module, which would provide more context about when and where the exceptions occurred.

Here is an example of how you could refactor the upload_file function:

async def process_upload_file(uploadFile
---
**/home/runner/work/quivr/quivr/backend/core/models/brains_subscription_invitations.py: LOGAF Level 3**

This code is generally good, but there are a few areas for potential improvement. 

1. The `create_subscription_invitation`, `update_subscription_invitation`, and `create_or_update_subscription_invitation` methods are directly interacting with the database. It would be better to separate the database operations into a separate repository class to follow the Single Responsibility Principle.

2. The `resend_invitation_email` method is directly interacting with the `resend` API. This could be abstracted into a separate service class to improve modularity and testability.

3. The `resend_invitation_email` method is also directly reading from the environment variables. It would be better to read these configuration values once during application startup and pass them into this class.

4. The `resend_invitation_email` method is using a print statement for logging. It would be better to use a dedicated logging library for this.

**/home/runner/work/quivr/quivr/backend/core/models/chat.py: LOGAF Level 4**

This code is of high quality and only requires minor tweaks. 

**/home/runner/work/quivr/quivr/backend/core/models/chats.py: LOGAF Level 4**

This code is of high quality and only requires minor tweaks. 

**/home/runner/work/quivr/quivr/backend/core/models/files.py: LOGAF Level 2**

This code functions, but has significant issues needing attention.

1. The `compute_file_sha1`, `compute_documents`, `set_file_vectors_ids`, `file_already_exists`, `file_already_exists_in_brain`, `file_is_empty`, and `link_file_to_brain` methods are directly interacting with the database and file system. It would be better to separate these operations into separate repository and service classes to follow the Single Responsibility Principle.

2. The `compute_file_sha1` and `compute_documents` methods are creating temporary files but not securely deleting them. It would be better to use a context manager to ensure that the temporary files are deleted even if an error occurs.

3. The `compute_file_sha1` and `compute_documents` methods are directly reading and writing to the file system. It would be better to abstract these operations into a separate service class to improve modularity and testability.

4. The `compute_file_sha1` and `compute_documents` methods are also directly reading from the environment variables. It would be better to read these configuration values once during application startup and pass them into this class.

5. The `compute_file_sha1` and `compute_documents` methods are using print statements for logging. It would be better to use a dedicated logging library for this.

**/home/runner/work/quivr/quivr/backend/core/models/settings.py: LOGAF Level 4**

This code is of high quality and only requires minor tweaks. 

**/home/runner/work/quivr/quivr/backend/core/models/users.py: LOGAF Level 3**

This code is generally good, but there are a few areas for potential improvement. 

1. The `create_user`, `get_user_request_stats`, `fetch_user_requests_count`, and `increment_user_request_count` methods are directly interacting with the database. It would be better to separate these operations into a separate repository class to follow the Single Responsibility Principle.

2. The `create_user` and `increment_user_request_count` methods are directly logging to the console. It would be better to use a dedicated logging library for this.

**/home/runner/work/quivr/quivr/backend/core/parsers/__init__.py: LOGAF Level 5**

This code is excellent and needs no changes.

**/home/runner/work/quivr/quivr/backend/core/parsers/audio.py: LOGAF Level 2**

This code functions, but has significant issues needing attention.

1. The `process_audio` method is directly interacting with the OpenAI API, database, and file system. It would be better to separate these operations into separate service and repository classes to follow the Single Responsibility Principle.

2. The `process_audio` method is directly reading from the environment variables. It would be better to read these configuration values once during application startup and pass them into this class.

3. The `process_audio` method is using print statements for logging. It would be better to use a dedicated logging library for this.

**/home/runner/work/quivr/quivr/backend/core/parsers/common.py: LOGAF Level 3**

This code is generally good, but there are a few areas for potential improvement. 

1. The `process_file` method is directly interacting with the database and file system. It would be better to separate these operations into separate repository and service classes to follow the Single Responsibility Principle.

2. The `process_file` method is directly reading from the environment variables. It would be better to read these configuration values once during application startup and pass them into this class.

3. The `process_file` method is using print statements for logging. It would be better to use a dedicated logging library for this.

**/home/runner/work/quivr/quivr/backend/core/parsers/csv.py: LOGAF Level 4**

This code is of high quality and only requires minor tweaks. 

**/home/runner/work/quivr/quivr/backend/core/parsers/docx.py: LOGAF Level 4**

This code is of high quality and only requires minor tweaks. 

**/home/runner/work/quivr/quivr/backend/core/parsers/epub.py: LOGAF Level 4**

This code is of high quality and only requires minor tweaks. 

**/home/runner/work/quivr/quivr/backend/core/parsers/files.py: LOGAF Level 4**

This code is of high quality and only requires minor tweaks. 

**/home/runner/work/quivr/quivr/backend/core/parsers/github.py: LOGAF Level 2**

This code functions, but has significant issues needing attention.

1. The `process_github` method is directly interacting with the GitHub API, database, and file system. It would be better to separate these operations into separate service and repository classes to follow the Single Responsibility Principle.

2. The `process_github` method is directly reading from the environment variables. It would be better to read these configuration values once during application startup and pass them into this class.

3. The `process_github` method is using print statements for logging. It would be better to use a dedicated logging library for this.

**/home/runner/work/quivr/quivr/backend/core/parsers/html.py: LOGAF Level 3**

This code is generally good, but there are a few areas for potential improvement. 

1. The `get_html` method is directly interacting with the network. It would be better to separate this operation into a separate service class to follow the Single Responsibility Principle.

2. The `get_html` method is not handling network errors. It would be better to add error handling to this method.

3. The `slugify` method is directly manipulating strings. It would be better to separate this operation into a separate utility class to improve modularity and testability.

**/home/runner/work/quivr/quivr/backend/core/parsers/markdown.py: LOGAF Level 4**

This code is of high quality and only requires minor tweaks. 

**/home/runner/work/quivr/quivr/backend/core/parsers/notebook.py: LOGAF Level 4**

This code is of high quality and only requires minor tweaks. 

**/home/runner/work/quivr/quivr
---
**LOGAF Level 3 - /home/runner/work/quivr/quivr/backend/core/auth/api_key_handler.py**

This code is generally good, but there are areas for potential improvement. 

1. The `verify_api_key` function is quite long and does a lot of things. It would be better to split it into smaller functions, each with a single responsibility. For example, you could have a separate function to fetch the API key data from the database, another to parse the creation date, and another to check if the API key is active.

2. The `get_user_from_api_key` function also does multiple things. It would be better to split it into smaller functions, each with a single responsibility. For example, you could have a separate function to fetch the user ID from the database, another to fetch the user email, and another to create the User object.

3. The `get_user_from_api_key` function raises an HTTPException if the user ID data is not found. This is not ideal because it couples your business logic with your HTTP layer. It would be better to raise a custom exception (e.g., `ApiKeyNotFoundError`) and handle it at a higher level, where you can decide how to respond to the HTTP request.

**LOGAF Level 3 - /home/runner/work/quivr/quivr/backend/core/auth/auth_bearer.py**

This code is generally good, but there are areas for potential improvement.

1. The `check_scheme` function raises an HTTPException if the credentials are not present or if the scheme is not "Bearer". This is not ideal because it couples your business logic with your HTTP layer. It would be better to raise a custom exception (e.g., `InvalidCredentialsError`) and handle it at a higher level, where you can decide how to respond to the HTTP request.

2. The `authenticate` function does multiple things. It would be better to split it into smaller functions, each with a single responsibility. For example, you could have a separate function to check if authentication is disabled, another to verify the token, and another to verify the API key.

**LOGAF Level 3 - /home/runner/work/quivr/quivr/backend/core/auth/jwt_token_handler.py**

This code is generally good, but there are areas for potential improvement.

1. The `create_access_token` function modifies the input data by adding an "exp" field. This is not ideal because it can lead to unexpected side effects if the caller does not expect the input data to be modified. It would be better to create a copy of the data before modifying it.

2. The `decode_access_token` function returns None if an error occurs. This can make it difficult to debug issues because you lose information about what went wrong. It would be better to raise a custom exception (e.g., `TokenDecodingError`) with the original error message as the detail.

3. The `verify_token` function returns a boolean to indicate whether the token is valid. This is not ideal because it does not provide any information about why the token is invalid. It would be better to raise a custom exception (e.g., `InvalidTokenError`) with a detailed error message.

**LOGAF Level 3 - /home/runner/work/quivr/quivr/backend/core/crawl/crawler.py**

This code is generally good, but there are areas for potential improvement.

1. The `_crawl` function returns None if the HTTP status code is not 200. This can make it difficult to debug issues because you lose information about what went wrong. It would be better to raise a custom exception (e.g., `CrawlError`) with the HTTP status code and the response body as the detail.

2. The `process` function writes the crawled content to a temporary file. This is not ideal because it can lead to disk space issues if the content is large or if many files are created. It would be better to process the content in memory, or to stream it to the file to reduce memory usage.

3. The `checkGithub` function returns a boolean to indicate whether the URL contains "github.com". This is not ideal because it does not provide any information about why the URL is not a GitHub URL. It would be better to raise a custom exception (e.g., `InvalidGithubUrlError`) with a detailed error message.

**LOGAF Level 3 - /home/runner/work/quivr/quivr/backend/core/llm/base.py**

This code is generally good, but there are areas for potential improvement.

1. The `BaseBrainPicking` class has many abstract methods that are not implemented. This can make it difficult to understand how to use the class. It would be better to provide default implementations for these methods, or to document how they should be implemented.

2. The `BaseBrainPicking` class has many class attributes that are not used. This can make the class difficult to understand and maintain. It would be better to remove these unused attributes.

3. The `BaseBrainPicking` class has a `Config` inner class with a `arbitrary_types_allowed` attribute set to True. This can make it difficult to validate the data because any type is allowed. It would be better to specify the allowed types for each attribute.

**LOGAF Level 3 - /home/runner/work/quivr/quivr/backend/core/llm/openai.py**

This code is generally good, but there are areas for potential improvement.

1. The `OpenAIBrainPicking` class has a `model` attribute with a default value of "gpt-3.5-turbo". This can make it difficult to use the class with other models. It would be better to require the model name as a parameter in the constructor.

2. The `OpenAIBrainPicking` class has a `__init__` method that calls `super().__init__`. This is not ideal because it can lead to unexpected behavior if the superclass's `__init__` method changes. It would be better to explicitly initialize all necessary attributes in the `__init__` method.

3. The `OpenAIBrainPicking` class has a `embeddings` property that creates a new `OpenAIEmbeddings` object each time it is accessed. This can lead to performance issues if the property is accessed many times. It would be better to create the `OpenAIEmbeddings` object once in the `__init__` method and store it as an attribute.

**LOGAF Level 3 - /home/runner/work/quivr/quivr/backend/core/llm/openai_functions.py**

This code is generally good, but there are areas for potential improvement.

1. The `OpenAIFunctionsBrainPicking` class has a `model` attribute with a default value of "gpt-3.5-turbo-0613". This can make it difficult to use the class with other models. It would be better to require the model name as a parameter in the constructor.

2. The `OpenAIFunctionsBrainPicking` class has a `__init__` method that calls `super().__init__`. This is not ideal because it can lead to unexpected behavior if the superclass's `__init__` method changes. It would be better to explicitly initialize all necessary attributes in the `__init__` method.

3. The `OpenAIFunctionsBrainPicking` class has a `openai_client` property that creates a new `ChatOpenAI` object each time it is accessed. This can lead to performance issues if the property is accessed many times. It would be better to create the `ChatOpenAI` object once in the `__init__` method and store it as an attribute.

---

👍🔧📚

---

#### Powered by [Code Review GPT](https://github.com/mattzcarey/code-review-gpt)

@StanGirard
Copy link
Collaborator

Tests are now failing ;) don't forget to move the tests folder too ;)

@mattzcarey mattzcarey temporarily deployed to preview July 16, 2023 22:46 — with GitHub Actions Inactive
@mattzcarey mattzcarey merged commit e61f437 into main Jul 17, 2023
5 checks passed
@gozineb gozineb mentioned this pull request Jul 18, 2023
6 tasks
StanGirard pushed a commit that referenced this pull request Sep 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Private LLMs] AABEDev I have a 'private' container which runs a fastAPI server
2 participants