Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LanceDB Integration #548

Merged
merged 4 commits into from
Dec 25, 2023
Merged

LanceDB Integration #548

merged 4 commits into from
Dec 25, 2023

Conversation

PrashantDixit0
Copy link
Contributor

@PrashantDixit0 PrashantDixit0 commented Dec 24, 2023

Type

Enhancement


Description

This PR introduces LanceDB integration into the pr_agent tool. The most significant changes include:

  • The pr_agent/tools/pr_similar_issue.py file has been updated to support both Pinecone and LanceDB as vector databases. This is controlled by the pr_similar_issue.vectordb setting.
  • New methods _update_table_with_issues for indexing issues in LanceDB have been added.
  • The run method has been updated to support querying from both Pinecone and LanceDB.
  • The documentation file docs/SIMILAR_ISSUE.md has been updated to include information about the new LanceDB integration.
  • The configuration file pr_agent/settings/configuration.toml has been updated to include a new setting vectordb under pr_similar_issue and a new section for lancedb settings.
  • The requirements.txt file has been updated to include the lancedb package.

PR changes walkthrough

Relevant files                                                                                                                                 
Enhancement
1 files
pr_similar_issue.py                                                                                 
    pr_agent/tools/pr_similar_issue.py

    **The file pr_agent/tools/pr_similar_issue.py has been
    significantly updated to support LanceDB integration. The
    changes include:

    • Added import lancedb at the top of the
      file.
    • Modified the PRSimilarIssue class to support both
      Pinecone and LanceDB as vector databases. This is controlled
      by the pr_similar_issue.vectordb setting.
    • Added new
      methods _update_table_with_issues for indexing issues in
      LanceDB.
    • Updated the run method to support querying from
      both Pinecone and LanceDB.**
+271/-86
Documentation
1 files
SIMILAR_ISSUE.md                                                                                       
    docs/SIMILAR_ISSUE.md

    The documentation file docs/SIMILAR_ISSUE.md has been
    updated to include information about the new LanceDB
    integration.

+9/-1
Configuration changes
1 files
configuration.toml                                                                                   
    pr_agent/settings/configuration.toml

    The configuration file
    pr_agent/settings/configuration.toml has been updated to
    include a new setting vectordb under pr_similar_issue
    and a new section for lancedb settings.

+3/-0
Dependencies
1 files
requirements.txt                                                                                       
    requirements.txt

    The requirements.txt file has been updated to include the
    lancedb package.

+1/-0

Copy link

PR Description updated to latest commit (1d435ef)

Copy link

PR Analysis

  • 🎯 Main theme: Integration of LanceDB into the pr_agent tool
  • 📝 PR summary: This PR introduces the integration of LanceDB into the pr_agent tool. It includes changes to support both Pinecone and LanceDB as vector databases, new methods for indexing issues in LanceDB, and updates to the run method to support querying from both databases. The documentation and configuration files have also been updated to reflect these changes.
  • 📌 Type of PR: Enhancement
  • 🧪 Relevant tests added: No
  • ⏱️ Estimated effort to review [1-5]: 4, because the PR introduces a significant amount of new code, including the integration of a new database, which requires a thorough understanding of the existing codebase and the new database's functionality.
  • 🔒 Security concerns: No security concerns found

PR Feedback

  • 💡 General suggestions: The PR is well-structured and the changes are logically grouped. However, it would be beneficial to include tests to validate the new functionality and ensure that it integrates correctly with the existing codebase. Additionally, error handling could be improved in some areas to provide more informative error messages.

  • 🤖 Code feedback:
    relevant filepr_agent/tools/pr_similar_issue.py
    suggestion      

    Consider adding more specific error messages when exceptions are raised. This will help in debugging and understanding the exact issue when an error occurs. For example, in the try-except block where you are trying to get the Pinecone API key and environment, the error message could include the name of the missing key. [medium]

    relevant lineraise Exception("Please set pinecone api key and environment in secrets file")

    relevant filepr_agent/tools/pr_similar_issue.py
    suggestion      

    It seems like the run_from_scratch variable is used for debugging purposes. It would be better to control this via a configuration setting or a command-line argument rather than changing the code. This would make it easier to switch between modes without modifying the code. [medium]

    relevant linerun_from_scratch = False

    relevant filepr_agent/tools/pr_similar_issue.py
    suggestion      

    The run method has become quite long and complex with the addition of LanceDB support. Consider refactoring this method to improve readability and maintainability. You could extract the Pinecone and LanceDB specific code into separate methods. [important]

    relevant lineasync def run(self):

    relevant filepr_agent/tools/pr_similar_issue.py
    suggestion      

    The get_settings().pr_similar_issue.vectordb is checked multiple times to decide between Pinecone and LanceDB. This could be improved by using a strategy pattern or a factory pattern to create a Vectordb interface and then have PineconeVectordb and LanceDBVectordb classes that implement this interface. This would make the code more modular and easier to extend with other databases in the future. [important]

    relevant lineif get_settings().pr_similar_issue.vectordb == "pinecone":

    How to use

    Instructions

    To invoke the PR-Agent, add a comment using one of the following commands:
    /review: Request a review of your Pull Request.
    /describe: Update the PR title and description based on the contents of the PR.
    /improve [--extended]: Suggest code improvements. Extended mode provides a higher quality feedback.
    /ask <QUESTION>: Ask a question about the PR.
    /update_changelog: Update the changelog based on the PR's contents.
    /add_docs: Generate docstring for new components introduced in the PR.
    /generate_labels: Generate labels for the PR based on the PR's contents.
    /analyze: Automatically analyzes the PR, and presents changes walkthrough for each component.
    see the tools guide for more details.

    To edit any configuration parameter from the configuration.toml, add --config_path=new_value.
    For example: /review --pr_reviewer.extra_instructions="focus on the file: ..."
    To list the possible configuration parameters, add a /config comment.

@mrT23
Copy link
Collaborator

mrT23 commented Dec 25, 2023

Hi @PrashantDixit0
Thanks for the contribution

  1. i started reviewing the PR. One thing I noticed is that you must set the baste folder to the root folder.
    When I tried running it from ./pr_agent/, I got an error. i hope this won't create problems during deployment

  2. I tested both tools on the following issue:
    Issue with GitHub Action and GPT Model Selection #538

Both pinecone and lancedb returned the same 5 op results 👍
image

@mrT23
Copy link
Collaborator

mrT23 commented Dec 25, 2023

  1. when there is no new issues, lancdb uses the old index 👍

  2. I opened a new issue, to see how the db is updated. this is what I got:

image is this the expected output ? why there is a line:` Table codium-ai-pr-agent-issues doesn't exists!` ?
  1. when I re-run the issue, I still get the same response. i think its a problem. the DB should be updated after a call with the new issues

@PrashantDixit0
Copy link
Contributor Author

LanceDB saves everything locally

@PrashantDixit0
Copy link
Contributor Author

I'll check to update new issues in Table

@mrT23
Copy link
Collaborator

mrT23 commented Dec 25, 2023

locally is ok for CLI, but when working the github app or github action, a local database is not possible.
Please state in the readme that lanceDB is a local database, so more suited for CLI.
due to that, i want the default to remain pinecone, since it is more suitable for an application.

but other than that, the PR looks very good. its very nice to work with lanceDB, without needing any key or stuff like that.
the readme can also state that for local CLI testing, lancedb is easier

@PrashantDixit0
Copy link
Contributor Author

Thank you @mrT23 , I'll state about LanceDB in README. LanceDB is working on bringing a high-speed, scalable, and serverless cloud-based system, we can integrate it once it is released.

Btw, when can we merge this PR for local testing

@mrT23
Copy link
Collaborator

mrT23 commented Dec 25, 2023

@PrashantDixit0 change back the default to pinecone (so the app would still work), and I think we can merge.
If you want to continue to improve it after that, you are welcome

@PrashantDixit0
Copy link
Contributor Author

Sure @mrT23, I change default to pinecone and I am really interested in continue improving the amazing PR-agent

@PrashantDixit0
Copy link
Contributor Author

Done

@mrT23 mrT23 merged commit ba3a8b2 into Codium-ai:main Dec 25, 2023
2 checks passed
yochail pushed a commit to yochail/pr-agent that referenced this pull request Feb 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants