Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(agent): Implement local temporary dir #7115

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

kcze
Copy link
Contributor

@kcze kcze commented Apr 30, 2024

User description

Background

Python file execution doesn't work when non-local FileStorage is used.

Changes 🏗️

Implement mounting of local temporary directory synchronized with storage, where python file is put and can be executed even when remote file storage such as S3 or GCS is used.

PR Quality Scorecard ✨

  • Have you used the PR description template?   +2 pts
  • Is your pull request atomic, focusing on a single change?   +5 pts
  • Have you linked the GitHub issue(s) that this PR addresses?   +5 pts
  • Have you documented your changes clearly and comprehensively?   +5 pts
  • Have you changed or added a feature?   -4 pts
    • Have you added/updated corresponding documentation?   +4 pts
    • Have you added/updated corresponding integration tests?   +5 pts
  • Have you changed the behavior of AutoGPT?   -5 pts
    • Have you also run agbenchmark to verify that these changes do not regress performance?   +10 pts

Type

enhancement


Description

  • Enhanced Python code execution by making execute_python_code asynchronous and improving file handling.
  • Implemented file synchronization with storage using FileSyncHandler and context manager for mounting directories.
  • Added local storage mounting capabilities to handle file operations transparently.
  • Updated project dependencies to include watchdog for monitoring file system events.

Changes walkthrough

Relevant files
Enhancement
execute_code.py
Enhance Python code execution and file handling                   

autogpts/autogpt/autogpt/commands/execute_code.py

  • Converted execute_python_code to an asynchronous function.
  • Implemented a new method for handling temporary Python files using a
    workspace.
  • Enhanced execute_python_file to support execution in a Docker
    container with mounted local paths.
  • Added a helper method _generate_random_string to create random
    filenames.
  • +102/-84
    base.py
    Implement file synchronization and temporary mounting       

    autogpts/autogpt/autogpt/file_storage/base.py

  • Added a FileSyncHandler class to synchronize file operations with the
    storage.
  • Implemented a context manager mount to handle temporary local
    directories.
  • +61/-1   
    local.py
    Add local storage mounting capability                                       

    autogpts/autogpt/autogpt/file_storage/local.py

  • Added a mount method as a context manager for local file storage.
  • +11/-1   
    Dependencies
    pyproject.toml
    Update project dependencies                                                           

    autogpts/autogpt/pyproject.toml

    • Added watchdog dependency to monitor file system events.
    +1/-0     

    PR-Agent usage:
    Comment /help on the PR to get a list of all available PR-Agent tools and their descriptions

    Allow to use synchronized local temp dir
    Copy link

    netlify bot commented Apr 30, 2024

    Deploy Preview for auto-gpt-docs ready!

    Name Link
    🔨 Latest commit fab0d5a
    🔍 Latest deploy log https://app.netlify.com/sites/auto-gpt-docs/deploys/664089e9620b660008d2fc37
    😎 Deploy Preview https://deploy-preview-7115--auto-gpt-docs.netlify.app
    📱 Preview on mobile
    Toggle QR Code...

    QR Code

    Use your smartphone camera to open QR code link.

    To edit notification comments on pull requests, go to your Netlify site configuration.

    Copy link

    codecov bot commented Apr 30, 2024

    Codecov Report

    Attention: Patch coverage is 21.36752% with 92 lines in your changes are missing coverage. Please review.

    Project coverage is 42.64%. Comparing base (b0cbf71) to head (fab0d5a).
    Report is 6 commits behind head on master.

    Files Patch % Lines
    autogpts/autogpt/autogpt/commands/execute_code.py 11.11% 56 Missing ⚠️
    autogpts/autogpt/autogpt/file_storage/base.py 28.57% 35 Missing ⚠️
    autogpts/autogpt/autogpt/file_storage/local.py 80.00% 1 Missing ⚠️
    Additional details and impacted files
    @@            Coverage Diff             @@
    ##           master    #7115      +/-   ##
    ==========================================
    - Coverage   44.68%   42.64%   -2.05%     
    ==========================================
      Files         133      130       -3     
      Lines        6315     6355      +40     
      Branches      823      825       +2     
    ==========================================
    - Hits         2822     2710     -112     
    - Misses       3382     3540     +158     
    + Partials      111      105       -6     
    Flag Coverage Δ
    Linux ?
    Windows 42.64% <21.36%> (-0.13%) ⬇️
    autogpt-agent 42.64% <21.36%> (-2.02%) ⬇️
    macOS ?

    Flags with carried forward coverage won't be shown. Click here to find out more.

    ☔ View full report in Codecov by Sentry.
    📢 Have feedback on the report? Share it here.

    Make temp file manually
    @kcze kcze marked this pull request as ready for review April 30, 2024 18:03
    @kcze kcze requested a review from a team as a code owner April 30, 2024 18:03
    @codiumai-pr-agent-pro codiumai-pr-agent-pro bot added the enhancement New feature or request label Apr 30, 2024
    Copy link

    PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here.

    PR Description updated to latest commit (86c5dc7)

    Copy link

    PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here.

    PR Review

    ⏱️ Estimated effort to review [1-5]

    4, due to the complexity of the changes involving asynchronous operations, file handling, and Docker integration. The PR modifies critical functionality and integrates new libraries, requiring careful consideration of thread safety, error handling, and system interactions.

    🧪 Relevant tests

    No

    🔍 Possible issues

    Possible Bug: The method _generate_random_string uses random.choices which is not cryptographically secure. This could be an issue if predictability of the file names affects security or functionality.

    Error Handling Concern: In the execute_python_file method, the Docker container execution does not handle potential exceptions from container.exec_run other than checking the exit code. This might miss other types of errors that could occur during command execution.

    🔒 Security concerns

    No

    Code feedback:
    relevant fileautogpts/autogpt/autogpt/commands/execute_code.py
    suggestion      

    Consider using secrets.token_urlsafe() instead of random.choices for generating secure random strings. This method provides a URL-safe text string, which is suitable for cryptographic use and ensures that the file names are not predictable. [important]

    relevant linerandom_string = ''.join(random.choices(characters, k=length))

    relevant fileautogpts/autogpt/autogpt/commands/execute_code.py
    suggestion      

    Add error handling for Docker command execution to capture and log any exceptions that might occur during container.exec_run, not just the exit code. This can help in diagnosing issues that occur during the execution of Python files in Docker containers. [important]

    relevant lineif exec_result.exit_code != 0:

    relevant fileautogpts/autogpt/autogpt/commands/execute_code.py
    suggestion      

    Implement logging for the entry and exit points of the execute_python_file method to improve traceability and debugging. This can help in understanding the flow of file execution, especially when errors occur. [medium]

    relevant linelogger.info(f"Executing python file '{filename}'")

    Comment on lines 117 to 122
    temp_path = Path("")
    while True:
    temp_path = f"temp{self._generate_random_string()}.py"
    if not self.workspace.exists(temp_path):
    break
    await self.workspace.write_file(temp_path, code)

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Suggestion: Replace the manual loop for checking if a file exists with a more robust temporary file creation using the tempfile module. This avoids potential race conditions and simplifies the code. [enhancement]

    Suggested change
    temp_path = Path("")
    while True:
    temp_path = f"temp{self._generate_random_string()}.py"
    if not self.workspace.exists(temp_path):
    break
    await self.workspace.write_file(temp_path, code)
    with NamedTemporaryFile(suffix=".py", dir=self.workspace.root, delete=False) as tmp_file:
    temp_path = tmp_file.name
    await self.workspace.write_file(temp_path, code)

    Comment on lines 167 to 171
    file_path = self.workspace.get_path(filename)
    if not self.workspace.exists(file_path):
    # Mimic the response that you get from the command line to make it
    # intuitively understandable for the LLM
    raise FileNotFoundError(

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Suggestion: Use a context manager to handle file operations to ensure that files are always closed properly, even if an error occurs. [best practice]

    Suggested change
    file_path = self.workspace.get_path(filename)
    if not self.workspace.exists(file_path):
    # Mimic the response that you get from the command line to make it
    # intuitively understandable for the LLM
    raise FileNotFoundError(
    with self.workspace.get_path(filename) as file_path:
    if not self.workspace.exists(file_path):
    raise FileNotFoundError(
    "AutoGPT is running in a Docker container; "
    f"executing {file_path} directly..."
    )

    Comment on lines 161 to 162
    if isinstance(args, str):
    args = args.split() # Convert space-separated string to a list

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Suggestion: Instead of manually checking if args is a string and then splitting it, use a more Pythonic approach by ensuring args is always a list at the start of the function. [best practice]

    Suggested change
    if isinstance(args, str):
    args = args.split() # Convert space-separated string to a list
    args = shlex.split(args) if isinstance(args, str) else args

    Copy link
    Member

    @Pwuts Pwuts May 1, 2024

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Suggested change does not match suggestion description. 👎🏼

    BUT we do argument validation so this block can be removed entirely.

    Comment on lines 34 to 37
    with open(file_path, "rb") as f:
    content = f.read()
    assert isinstance(content, bytes)
    await self.storage.write_file(file_path, content)

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Suggestion: Use a more efficient method for reading file content into memory by leveraging read() directly on the file object opened in binary mode, which is more memory efficient for large files. [performance]

    Suggested change
    with open(file_path, "rb") as f:
    content = f.read()
    assert isinstance(content, bytes)
    await self.storage.write_file(file_path, content)
    with open(file_path, "rb") as f:
    await self.storage.write_file(file_path, f.read())

    Copy link
    Member

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    I think the proposed change has zero effect on memory consumption, as the argument f.read() will still be evaluated and executed before write_file is called.

    Comment on lines 237 to 251
    container: DockerContainer = client.containers.run(
    image_name,
    ["sleep", "60"], # Max 60 seconds to prevent permanent hangs
    volumes={
    str(Path(local_path).absolute()): {
    "bind": "/workspace",
    "mode": "rw",
    }
    },
    working_dir="/workspace",
    stderr=True,
    stdout=True,
    detach=True,
    name=container_name,
    ) # type: ignore

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Suggestion: Simplify the creation and handling of Docker containers by extracting the container setup and execution logic into separate helper methods. This improves code readability and maintainability. [maintainability]

    Suggested change
    container: DockerContainer = client.containers.run(
    image_name,
    ["sleep", "60"], # Max 60 seconds to prevent permanent hangs
    volumes={
    str(Path(local_path).absolute()): {
    "bind": "/workspace",
    "mode": "rw",
    }
    },
    working_dir="/workspace",
    stderr=True,
    stdout=True,
    detach=True,
    name=container_name,
    ) # type: ignore
    def setup_container(image_name, local_path, container_name):
    return client.containers.run(
    image_name,
    ["sleep", "60"],
    volumes={
    str(Path(local_path).absolute()): {
    "bind": "/workspace",
    "mode": "rw",
    }
    },
    working_dir="/workspace",
    stderr=True,
    stdout=True,
    detach=True,
    name=container_name,
    )
    container = setup_container(image_name, local_path, container_name)
    container_is_fresh = True

    Copy link
    Member

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Idea is good, proposed diff is bad because it adds a nested function rather than a helper method.

    Copy link

    PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here.

    Changelog updates:

    2024-04-30

    Added

    • Implemented local temporary directory synchronization with storage for executing Python files in environments using non-local FileStorage.

    Changed

    • Enhanced execute_python_code to be asynchronous and improved file handling.
    • Updated project dependencies to include watchdog for monitoring file changes.

    Fixed

    • Resolved issues with Python file execution in Docker environments by ensuring proper file path handling and synchronization.

    to commit the new content to the CHANGELOG.md file, please type:
    '/update_changelog --pr_update_changelog.push_changelog_changes=true'

    Copy link

    PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here.

    PR Analysis

    • This screen contains a list of code components that were changed in this PR.
    • You can initiate specific actions for each component, by checking the relevant boxes.
    • After you check a box, the action will be performed automatically by PR-Agent.
    • Results will appear as a comment on the PR, typically after 30-60 seconds.
    fileChanged components
    execute_code.py
    • Test
    • Docs
    • Improve
    • Similar
     
    execute_python_code
    (function)
     
    +9/-8
     
    • Test
    • Docs
    • Improve
    • Similar
     
    execute_python_file
    (function)
     
    +82/-74
     
    • Test
    • Docs
    • Improve
    • Similar
     
    _generate_random_string
    (method of CodeExecutorComponent)
     
    +6/-0
     
    base.py
    • Test
    • Docs
    • Improve
    • Similar
     
    FileSyncHandler
    (class)
     
    +35/-0
     
    • Test
    • Docs
    • Improve
    • Similar
     
    mount
    (method of FileStorage)
     
    +15/-0
     
    local.py
    • Test
    • Docs
    • Improve
    • Similar
     
    mount
    (method of LocalFileStorage)
     
    +7/-0
     

    ✨ Usage guide:

    Using static code analysis capabilities, the analyze tool scans the PR code changes and find the code components (methods, functions, classes) that changed in the PR.
    The tool can be triggered automatically every time a new PR is opened, or can be invoked manually by commenting on any PR:

    /analyze
    

    Language that are currently supported: Python, Java, C++, JavaScript, TypeScript.
    See more information about the tool in the docs.

    Copy link
    Member

    @Pwuts Pwuts left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    A few comments, but overall I like the setup!

    with self.workspace.mount() as local_path:
    # Ensure the directory exists
    directory = (local_path / filename).parent
    os.makedirs(directory, exist_ok=True)
    Copy link
    Member

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    If the existence of the file has already been checked (lines 167-168), that asserts that its containing directory also exists, so this statement should be unnecessary. Right?

    Copy link
    Contributor Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Lines 167, 168 check if file exists in the FileStorage here we make sure that the directory exists at the temporary local_path. This wouldn't be necessary if the whole workspace was copied.
    Relevant: #7115 (comment)

    Copy link
    Contributor Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    removed

    Comment on lines 210 to 257
    try:
    client.images.get(image_name)
    logger.debug(f"Image '{image_name}' found locally")
    except ImageNotFound:
    logger.info(
    f"Image '{image_name}' not found locally,"
    " pulling from Docker Hub..."
    )
    # Use the low-level API to stream the pull response
    low_level_client = docker.APIClient()
    for line in low_level_client.pull(
    image_name, stream=True, decode=True
    ):
    # Print the status and progress, if available
    status = line.get("status")
    progress = line.get("progress")
    if status and progress:
    logger.info(f"{status}: {progress}")
    elif status:
    logger.info(status)

    logger.debug(f"Creating new {image_name} container...")
    container: DockerContainer = client.containers.run(
    image_name,
    ["sleep", "60"], # Max 60 seconds to prevent permanent hangs
    volumes={
    str(self.workspace.root): {
    "bind": "/workspace",
    "mode": "rw",
    }
    },
    working_dir="/workspace",
    container: DockerContainer = client.containers.get(
    container_name
    ) # type: ignore
    except NotFound:
    try:
    client.images.get(image_name)
    logger.debug(f"Image '{image_name}' found locally")
    except ImageNotFound:
    logger.info(
    f"Image '{image_name}' not found locally,"
    " pulling from Docker Hub..."
    )
    # Use the low-level API to stream the pull response
    low_level_client = docker.APIClient()
    for line in low_level_client.pull(
    image_name, stream=True, decode=True
    ):
    # Print the status and progress, if available
    status = line.get("status")
    progress = line.get("progress")
    if status and progress:
    logger.info(f"{status}: {progress}")
    elif status:
    logger.info(status)

    logger.debug(f"Creating new {image_name} container...")
    container: DockerContainer = client.containers.run(
    image_name,
    ["sleep", "60"], # Max 60 seconds to prevent permanent hangs
    volumes={
    str(Path(local_path).absolute()): {
    "bind": "/workspace",
    "mode": "rw",
    }
    },
    working_dir="/workspace",
    stderr=True,
    stdout=True,
    detach=True,
    name=container_name,
    ) # type: ignore
    container_is_fresh = True

    if not container.status == "running":
    container.start()
    elif not container_is_fresh:
    container.restart()
    Copy link
    Member

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Maybe abstract this into a helper method

    Copy link
    Contributor Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    done

    autogpts/autogpt/autogpt/file_storage/base.py Outdated Show resolved Hide resolved
    autogpts/autogpt/autogpt/file_storage/base.py Show resolved Hide resolved
    Comment on lines 199 to 207
    local_path = tempfile.mkdtemp()
    observer = Observer()
    try:
    # Sync changes
    event_handler = FileSyncHandler(self)
    observer.schedule(event_handler, path, recursive=True)
    observer.start()

    yield Path(local_path)
    Copy link
    Member

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    How about the rest of the workspace content? A significant use case for execute_python_code is to manipulate files and data. Can't do that if the files are not in the mount.

    Copy link
    Contributor Author

    @kcze kcze May 1, 2024

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    I think this requires to copy files from the workspace (which I removed) and also keep them in sync both ways.

    it can just return the workspace path locally :)
    no need to copy

    I might be misunderstanding something because I don't see how can we make files available for python without copying them (when using anything other than local storage) 🤔
    Relevant: #7115 (comment)

    Copy link
    Contributor Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Unless we allow python code to use FileStorage somehow?

    Copy link
    Member

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    I think this requires to copy files from the workspace (which I removed) and also keep them in sync both ways.

    Yes, that's what I had in mind with mount().

    Copy link
    Contributor Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    done but there's no storage -> temp dir sync after initial copying (only temp dir -> storage) as this would require significant change to the storage, for example events for every method in storage. Should I do this in this PR?

    @Pwuts Pwuts changed the title feat(autogpt): Implement local temporary dir feat(agent): Implement local temporary dir May 1, 2024
    @kcze kcze requested a review from Pwuts May 1, 2024 13:18
    Copy link

    github-actions bot commented May 4, 2024

    This pull request has conflicts with the base branch, please resolve those so we can evaluate the pull request.

    @github-actions github-actions bot added the conflicts Automatically applied to PRs with merge conflicts label May 4, 2024
    @github-actions github-actions bot removed the conflicts Automatically applied to PRs with merge conflicts label May 6, 2024
    Copy link

    github-actions bot commented May 6, 2024

    Conflicts have been resolved! 🎉 A maintainer will review the pull request shortly.

    @Pwuts Pwuts self-assigned this May 9, 2024
    file_path.parent.mkdir(parents=True, exist_ok=True)
    with open(file_path, "wb") as f:
    content = self.read_file(file, binary=True)
    assert isinstance(content, bytes)
    Copy link
    Member

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    why is this necessary?

    Copy link
    Contributor Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    It's simplified as you suggested in #7115 (comment)
    mkdir is there because write_bytes won't create file directories. E.g. there's one file in the workspace workspace/path/to/file.txt, and we have local temp directory /temp/. If we try to write to file /temp/path/to/file.txt it will fail because dirs on the way to file don't exist.

    Comment on lines 171 to 174
    with open(file_path, "wb") as f:
    content = self.read_file(file, binary=True)
    assert isinstance(content, bytes)
    f.write(content)
    Copy link
    Member

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    If you remove the assert, this could be file_path.write_bytes(self.read_file(file, binary=True))

    Copy link
    Contributor Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    done, removed with open as well

    file_path = event.src_path
    with open(file_path, "rb") as f:
    content = f.read()
    assert isinstance(content, bytes)
    Copy link
    Member

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    ??

    Copy link
    Contributor Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Python skill issue, changed+fixed both occurrences to:

    file_path = Path(event.src_path).relative_to(self.path)
    content = file_path.read_bytes()
    await self.storage.write_file(file_path, content)

    @kcze kcze requested a review from Pwuts May 12, 2024 09:21
    @github-actions github-actions bot added the conflicts Automatically applied to PRs with merge conflicts label May 14, 2024
    Copy link

    This pull request has conflicts with the base branch, please resolve those so we can evaluate the pull request.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    AutoGPT Agent conflicts Automatically applied to PRs with merge conflicts enhancement New feature or request Review effort [1-5]: 4 size/l
    Projects
    Status: 🚧 Needs work
    Development

    Successfully merging this pull request may close these issues.

    None yet

    2 participants