New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(agent): Implement local temporary dir #7115
base: master
Are you sure you want to change the base?
feat(agent): Implement local temporary dir #7115
Conversation
Allow to use synchronized local temp dir
✅ Deploy Preview for auto-gpt-docs ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #7115 +/- ##
==========================================
- Coverage 44.68% 42.64% -2.05%
==========================================
Files 133 130 -3
Lines 6315 6355 +40
Branches 823 825 +2
==========================================
- Hits 2822 2710 -112
- Misses 3382 3540 +158
+ Partials 111 105 -6
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Make temp file manually
PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here. PR Description updated to latest commit (86c5dc7)
|
PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here. PR Review
Code feedback:
|
temp_path = Path("") | ||
while True: | ||
temp_path = f"temp{self._generate_random_string()}.py" | ||
if not self.workspace.exists(temp_path): | ||
break | ||
await self.workspace.write_file(temp_path, code) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Replace the manual loop for checking if a file exists with a more robust temporary file creation using the tempfile
module. This avoids potential race conditions and simplifies the code. [enhancement]
temp_path = Path("") | |
while True: | |
temp_path = f"temp{self._generate_random_string()}.py" | |
if not self.workspace.exists(temp_path): | |
break | |
await self.workspace.write_file(temp_path, code) | |
with NamedTemporaryFile(suffix=".py", dir=self.workspace.root, delete=False) as tmp_file: | |
temp_path = tmp_file.name | |
await self.workspace.write_file(temp_path, code) |
file_path = self.workspace.get_path(filename) | ||
if not self.workspace.exists(file_path): | ||
# Mimic the response that you get from the command line to make it | ||
# intuitively understandable for the LLM | ||
raise FileNotFoundError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Use a context manager to handle file operations to ensure that files are always closed properly, even if an error occurs. [best practice]
file_path = self.workspace.get_path(filename) | |
if not self.workspace.exists(file_path): | |
# Mimic the response that you get from the command line to make it | |
# intuitively understandable for the LLM | |
raise FileNotFoundError( | |
with self.workspace.get_path(filename) as file_path: | |
if not self.workspace.exists(file_path): | |
raise FileNotFoundError( | |
"AutoGPT is running in a Docker container; " | |
f"executing {file_path} directly..." | |
) |
if isinstance(args, str): | ||
args = args.split() # Convert space-separated string to a list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Instead of manually checking if args
is a string and then splitting it, use a more Pythonic approach by ensuring args
is always a list at the start of the function. [best practice]
if isinstance(args, str): | |
args = args.split() # Convert space-separated string to a list | |
args = shlex.split(args) if isinstance(args, str) else args |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested change does not match suggestion description. 👎🏼
BUT we do argument validation so this block can be removed entirely.
with open(file_path, "rb") as f: | ||
content = f.read() | ||
assert isinstance(content, bytes) | ||
await self.storage.write_file(file_path, content) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Use a more efficient method for reading file content into memory by leveraging read()
directly on the file object opened in binary mode, which is more memory efficient for large files. [performance]
with open(file_path, "rb") as f: | |
content = f.read() | |
assert isinstance(content, bytes) | |
await self.storage.write_file(file_path, content) | |
with open(file_path, "rb") as f: | |
await self.storage.write_file(file_path, f.read()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the proposed change has zero effect on memory consumption, as the argument f.read()
will still be evaluated and executed before write_file
is called.
container: DockerContainer = client.containers.run( | ||
image_name, | ||
["sleep", "60"], # Max 60 seconds to prevent permanent hangs | ||
volumes={ | ||
str(Path(local_path).absolute()): { | ||
"bind": "/workspace", | ||
"mode": "rw", | ||
} | ||
}, | ||
working_dir="/workspace", | ||
stderr=True, | ||
stdout=True, | ||
detach=True, | ||
name=container_name, | ||
) # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Simplify the creation and handling of Docker containers by extracting the container setup and execution logic into separate helper methods. This improves code readability and maintainability. [maintainability]
container: DockerContainer = client.containers.run( | |
image_name, | |
["sleep", "60"], # Max 60 seconds to prevent permanent hangs | |
volumes={ | |
str(Path(local_path).absolute()): { | |
"bind": "/workspace", | |
"mode": "rw", | |
} | |
}, | |
working_dir="/workspace", | |
stderr=True, | |
stdout=True, | |
detach=True, | |
name=container_name, | |
) # type: ignore | |
def setup_container(image_name, local_path, container_name): | |
return client.containers.run( | |
image_name, | |
["sleep", "60"], | |
volumes={ | |
str(Path(local_path).absolute()): { | |
"bind": "/workspace", | |
"mode": "rw", | |
} | |
}, | |
working_dir="/workspace", | |
stderr=True, | |
stdout=True, | |
detach=True, | |
name=container_name, | |
) | |
container = setup_container(image_name, local_path, container_name) | |
container_is_fresh = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idea is good, proposed diff is bad because it adds a nested function rather than a helper method.
PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here. Changelog updates: 2024-04-30Added
Changed
Fixed
|
PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here. PR Analysis
✨ Usage guide:Using static code analysis capabilities, the
Language that are currently supported: Python, Java, C++, JavaScript, TypeScript. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments, but overall I like the setup!
with self.workspace.mount() as local_path: | ||
# Ensure the directory exists | ||
directory = (local_path / filename).parent | ||
os.makedirs(directory, exist_ok=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the existence of the file has already been checked (lines 167-168), that asserts that its containing directory also exists, so this statement should be unnecessary. Right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines 167, 168 check if file exists in the FileStorage
here we make sure that the directory exists at the temporary local_path
. This wouldn't be necessary if the whole workspace was copied.
Relevant: #7115 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
try: | ||
client.images.get(image_name) | ||
logger.debug(f"Image '{image_name}' found locally") | ||
except ImageNotFound: | ||
logger.info( | ||
f"Image '{image_name}' not found locally," | ||
" pulling from Docker Hub..." | ||
) | ||
# Use the low-level API to stream the pull response | ||
low_level_client = docker.APIClient() | ||
for line in low_level_client.pull( | ||
image_name, stream=True, decode=True | ||
): | ||
# Print the status and progress, if available | ||
status = line.get("status") | ||
progress = line.get("progress") | ||
if status and progress: | ||
logger.info(f"{status}: {progress}") | ||
elif status: | ||
logger.info(status) | ||
|
||
logger.debug(f"Creating new {image_name} container...") | ||
container: DockerContainer = client.containers.run( | ||
image_name, | ||
["sleep", "60"], # Max 60 seconds to prevent permanent hangs | ||
volumes={ | ||
str(self.workspace.root): { | ||
"bind": "/workspace", | ||
"mode": "rw", | ||
} | ||
}, | ||
working_dir="/workspace", | ||
container: DockerContainer = client.containers.get( | ||
container_name | ||
) # type: ignore | ||
except NotFound: | ||
try: | ||
client.images.get(image_name) | ||
logger.debug(f"Image '{image_name}' found locally") | ||
except ImageNotFound: | ||
logger.info( | ||
f"Image '{image_name}' not found locally," | ||
" pulling from Docker Hub..." | ||
) | ||
# Use the low-level API to stream the pull response | ||
low_level_client = docker.APIClient() | ||
for line in low_level_client.pull( | ||
image_name, stream=True, decode=True | ||
): | ||
# Print the status and progress, if available | ||
status = line.get("status") | ||
progress = line.get("progress") | ||
if status and progress: | ||
logger.info(f"{status}: {progress}") | ||
elif status: | ||
logger.info(status) | ||
|
||
logger.debug(f"Creating new {image_name} container...") | ||
container: DockerContainer = client.containers.run( | ||
image_name, | ||
["sleep", "60"], # Max 60 seconds to prevent permanent hangs | ||
volumes={ | ||
str(Path(local_path).absolute()): { | ||
"bind": "/workspace", | ||
"mode": "rw", | ||
} | ||
}, | ||
working_dir="/workspace", | ||
stderr=True, | ||
stdout=True, | ||
detach=True, | ||
name=container_name, | ||
) # type: ignore | ||
container_is_fresh = True | ||
|
||
if not container.status == "running": | ||
container.start() | ||
elif not container_is_fresh: | ||
container.restart() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe abstract this into a helper method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
local_path = tempfile.mkdtemp() | ||
observer = Observer() | ||
try: | ||
# Sync changes | ||
event_handler = FileSyncHandler(self) | ||
observer.schedule(event_handler, path, recursive=True) | ||
observer.start() | ||
|
||
yield Path(local_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about the rest of the workspace content? A significant use case for execute_python_code
is to manipulate files and data. Can't do that if the files are not in the mount.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this requires to copy files from the workspace (which I removed) and also keep them in sync both ways.
it can just return the workspace path locally :)
no need to copy
I might be misunderstanding something because I don't see how can we make files available for python without copying them (when using anything other than local storage) 🤔
Relevant: #7115 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless we allow python code to use FileStorage
somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this requires to copy files from the workspace (which I removed) and also keep them in sync both ways.
Yes, that's what I had in mind with mount()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done but there's no storage -> temp dir sync after initial copying (only temp dir -> storage) as this would require significant change to the storage, for example events for every method in storage. Should I do this in this PR?
This pull request has conflicts with the base branch, please resolve those so we can evaluate the pull request. |
…broken-on-non-local-file-storage
Conflicts have been resolved! 🎉 A maintainer will review the pull request shortly. |
file_path.parent.mkdir(parents=True, exist_ok=True) | ||
with open(file_path, "wb") as f: | ||
content = self.read_file(file, binary=True) | ||
assert isinstance(content, bytes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's simplified as you suggested in #7115 (comment)
mkdir
is there because write_bytes
won't create file directories. E.g. there's one file in the workspace workspace/path/to/file.txt
, and we have local temp directory /temp/
. If we try to write to file /temp/path/to/file.txt
it will fail because dirs on the way to file don't exist.
with open(file_path, "wb") as f: | ||
content = self.read_file(file, binary=True) | ||
assert isinstance(content, bytes) | ||
f.write(content) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you remove the assert
, this could be file_path.write_bytes(self.read_file(file, binary=True))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, removed with open
as well
file_path = event.src_path | ||
with open(file_path, "rb") as f: | ||
content = f.read() | ||
assert isinstance(content, bytes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python skill issue, changed+fixed both occurrences to:
file_path = Path(event.src_path).relative_to(self.path)
content = file_path.read_bytes()
await self.storage.write_file(file_path, content)
…broken-on-non-local-file-storage
This pull request has conflicts with the base branch, please resolve those so we can evaluate the pull request. |
User description
Background
Python file execution doesn't work when non-local
FileStorage
is used.Changes 🏗️
Implement mounting of local temporary directory synchronized with storage, where python file is put and can be executed even when remote file storage such as S3 or GCS is used.
PR Quality Scorecard ✨
+2 pts
+5 pts
+5 pts
+5 pts
-4 pts
+4 pts
+5 pts
-5 pts
agbenchmark
to verify that these changes do not regress performance?+10 pts
Type
enhancement
Description
execute_python_code
asynchronous and improving file handling.FileSyncHandler
and context manager for mounting directories.watchdog
for monitoring file system events.Changes walkthrough
execute_code.py
Enhance Python code execution and file handling
autogpts/autogpt/autogpt/commands/execute_code.py
execute_python_code
to an asynchronous function.workspace.
execute_python_file
to support execution in a Dockercontainer with mounted local paths.
_generate_random_string
to create randomfilenames.
base.py
Implement file synchronization and temporary mounting
autogpts/autogpt/autogpt/file_storage/base.py
FileSyncHandler
class to synchronize file operations with thestorage.
mount
to handle temporary localdirectories.
local.py
Add local storage mounting capability
autogpts/autogpt/autogpt/file_storage/local.py
mount
method as a context manager for local file storage.pyproject.toml
Update project dependencies
autogpts/autogpt/pyproject.toml
watchdog
dependency to monitor file system events.