Skip to content

Dev to main sync#156

Merged
lfnothias merged 38 commits intomainfrom
dev
Apr 30, 2025
Merged

Dev to main sync#156
lfnothias merged 38 commits intomainfrom
dev

Conversation

@lfnothias
Copy link
Contributor

@lfnothias lfnothias commented Apr 30, 2025

PR Type

enhancement, documentation


Description

  • Enhanced data retrieval logic in tools_database.py.

  • Improved data fetching in postgres_tool_database.py.

  • Added question instructions dialog in streamlit_app.py.

  • Updated license information in README.md.


Changes walkthrough 📝

Relevant files
Enhancement
tools_database.py
Improve data retrieval logic and error handling                   

app/core/memory/tools_database.py

  • Improved data retrieval logic for tools.
  • Added error handling for database operations.
  • Enhanced logging for data retrieval process.
  • +58/-10 
    postgres_tool_database.py
    Enhance data fetching logic and error handling                     

    streamlit_webapp/postgres_tool_database.py

  • Enhanced data fetching logic for tools.
  • Added error handling for database operations.
  • Improved logging for data retrieval process.
  • +65/-9   
    streamlit_app.py
    Add question instructions dialog                                                 

    streamlit_webapp/streamlit_app.py

  • Added dialog for question instructions.
  • Integrated new button for instructions.
  • +14/-0   
    Documentation
    README.md
    Update license information                                                             

    README.md

    • Updated license from MIT to Apache 2.0.
    +1/-1     

    Need help?
  • Type /help how to ... in the comments thread for any questions about PR-Agent usage.
  • Check out the documentation for more information.
  • @qodo-code-review
    Copy link

    Qodo Merge was enabled for this repository. To continue using it, please link your Git account with your Qodo account here.

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Docstring Format

    The new docstring in the get method should follow the Google DocString format, including explicit sections for Args, Returns, and Raises where applicable.

    """
    Select the data for the specified tool (filename).
    It first checks the most recent interaction. If the data for the tool
    is NULL in that interaction, it retrieves the data from the
    immediately preceding interaction.
    Returns the data value (str/bytes) or None.
    """
    Missing Docstring Details

    The new dialog function question_instructions_dialog lacks a Google-style docstring. Consider adding a structured docstring to describe the parameters and function behavior.

    @st.dialog("Question Instructions")
    def question_instructions_dialog():
        st.write(question_instructions)
        if st.button("Close"):
            st.rerun()

    Copy link

    Copilot AI left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Pull Request Overview

    This PR synchronizes changes from the development branch to main while introducing minor feature enhancements and documentation updates. Key changes include:

    • Adding a new dialog for displaying question instructions in the Streamlit web app.
    • Updating logic to fetch recent tool data in both Postgres and SQLite implementations.
    • Modifying cost information and usage documentation in text files, along with a license change in the README.

    Reviewed Changes

    Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

    Show a summary per file
    File Description
    streamlit_webapp/streamlit_app.py Added reading of question instructions and a new dialog function.
    streamlit_webapp/postgres_tool_database.py Updated query logic and error handling to fetch fallback data.
    streamlit_webapp/misc/splash_text.txt Revised cost details and added visualization guidelines.
    streamlit_webapp/misc/question_instructions.txt Newly added guidelines to help users frame effective questions.
    app/core/memory/tools_database.py Updated query logic with additional exception handling.
    README.md Changed license information from MIT to Apache License 2.0.
    Comments suppressed due to low confidence (2)

    app/core/memory/tools_database.py:110

    • [nitpick] The parameter name 'filename' is ambiguous since it represents a tool name. Consider renaming it to 'tool_name' for improved clarity and consistency.
    def get(self, filename):
    

    streamlit_webapp/misc/splash_text.txt:11

    • [nitpick] Please confirm that the updated cost range accurately reflects the current billing model, as it differs from the previous values. Verifying this will help ensure consistency in the documentation.
    **Cost**: When asking a question to the MetaboT, the AI-system performs API calls to OpenAI models that are billed on the account associated with the key. The cost varies depending on the question and is usually in the range of 0.04-0.1 usd.
    

    # Grab the two most-recent interactions for this session/thread
    cursor.execute(
    f"""
    SELECT interaction, {tool_name}_data
    Copy link

    Copilot AI Apr 30, 2025

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Dynamically inserting tool_name into the SQL query can be risky if the value is not properly validated. Consider restricting tool_name to a predefined set of values or sanitizing the input to prevent SQL injection risks.

    Copilot uses AI. Check for mistakes.
    @qodo-code-review
    Copy link

    qodo-code-review bot commented Apr 30, 2025

    Qodo Merge was enabled for this repository. To continue using it, please link your Git account with your Qodo account here.

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Impact
    Possible issue
    Validate dynamic column names

    Ensure that the dynamic column name derived from filename is validated against a
    whitelist of allowed values before being embedded into the SQL query. This
    prevents potential SQL injection vulnerabilities.

    app/core/memory/tools_database.py [124-127]

    -SELECT interaction, {filename}_data
    +# Validate filename against allowed column names before using it
    +allowed_columns = ["tool1", "tool2", "tool3"]  # example whitelist
    +if filename not in allowed_columns:
    +    logger.error(f"Invalid tool name: {filename}")
    +    return None
    +cursor.execute(f"""
    +                SELECT interaction, {filename}_data
                     FROM metadata
                     ORDER BY interaction DESC
                     LIMIT 2
    +                """)

    [To ensure code accuracy, apply this suggestion manually]

    Suggestion importance[1-10]: 9

    __

    Why: The suggestion correctly identifies a potential SQL injection vulnerability by using an f-string to insert the filename directly into the SQL query. Validating filename against a whitelist is crucial for security.

    High
    Whitelist dynamic SQL columns

    Similar to the sqlite approach, ensure that the dynamic column name derived from
    tool_name is validated or whitelisted prior to constructing the SQL query. This
    helps to mitigate any SQL injection risks arising from untrusted input.

    streamlit_webapp/postgres_tool_database.py [81-85]

    -SELECT interaction, {tool_name}_data
    +# Validate tool_name against allowed column names before using it
    +allowed_columns = ["toolA", "toolB", "toolC"]  # example whitelist
    +if tool_name not in allowed_columns:
    +    logger.error(f"Invalid tool name: {tool_name}")
    +    return None
    +cursor.execute(
    +    f"""
    +                SELECT interaction, {tool_name}_data
                     FROM tool_data
                     WHERE session_id = %s AND thread_id = %s
                     ORDER BY interaction DESC
                     LIMIT 2;
    +                """,
    +    (session_id, thread_id),
    +)

    [To ensure code accuracy, apply this suggestion manually]

    Suggestion importance[1-10]: 9

    __

    Why: Similar to the previous suggestion, this correctly points out a potential SQL injection risk due to inserting tool_name directly into the SQL query via an f-string. Whitelisting tool_name is essential for preventing security vulnerabilities.

    High
    General
    Add file read error handling

    Wrap the file reading block in a try-except structure to gracefully handle
    scenarios where the question_instructions.txt file might be missing or
    unreadable. This ensures the app continues functioning even when external
    resource issues occur.

    streamlit_webapp/streamlit_app.py [65-67]

     question_instructions_path = Path(__file__).parent / "misc" / "question_instructions.txt"
    -with question_instructions_path.open("r") as file:
    -    question_instructions = file.read()
    +try:
    +    with question_instructions_path.open("r") as file:
    +        question_instructions = file.read()
    +except Exception as e:
    +    logger.error(f"Failed to read question instructions: {e}")
    +    question_instructions = "Instructions not available."
    • Apply this suggestion
    Suggestion importance[1-10]: 5

    __

    Why: The suggestion proposes adding a try-except block for reading the question_instructions.txt file. This improves robustness by handling potential FileNotFoundError or permission issues, although the impact might be moderate if the file is considered essential deployment artifact.

    Low

    No more code suggestions

    @github-actions
    Copy link
    Contributor

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    DocString Format

    The added docstring for the get method should follow the Google DocString style, including clear sections for Args, Returns, and Raises.

    Select the data for the specified tool (filename).
    It first checks the most recent interaction. If the data for the tool
    is NULL in that interaction, it retrieves the data from the
    immediately preceding interaction.
    Returns the data value (str/bytes) or None.
    """
    DocString Format

    The docstring for the get method is written in a descriptive bullet list but should be updated to conform to Google DocString standards for consistency.

    """
    Fetch the most recent value for <tool_name>_data.
    • If the latest interaction has a non-NULL value, return it.
    • Otherwise, fall back to the previous interaction (if any).
    • Return None when no data is found.
    """

    @github-actions
    Copy link
    Contributor

    PR Code Suggestions ✨

    No code suggestions found for the PR.

    @lfnothias lfnothias merged commit ae673d4 into main Apr 30, 2025
    1 check passed
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    4 participants