-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not overwrite prompt artifacts #53
Do not overwrite prompt artifacts #53
Conversation
Signed-off-by: Mynhardt Burger <mynhardt@gmail.com>
Signed-off-by: Mynhardt Burger <mynhardt@gmail.com>
Signed-off-by: Mynhardt Burger <mynhardt@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for digging into this one Mynhardt! My main question with how it's currently written is what advantage the .swp
/rename
offers over simply not copying if the named file itself already exists. Depending on the filesystem implementation, this could offer the advantage that the named file itself comes into existence atomically during the rename, but for other filesystems (most notably s3fs), don't guarantee an atomic rename, so that wouldn't always be the case.
@@ -57,7 +60,7 @@ class TGISConnection: | |||
# Paths to client key/cert pair when TGIS requires mTLS | |||
client_tls: Optional[TLSFilePair] = None | |||
# TLS HN override | |||
tls_hostname_override: str = None | |||
tls_hostname_override: Optional[str] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this!
@@ -221,13 +224,29 @@ def load_prompt_artifacts(self, prompt_id: str, *artifact_paths: List[str]): | |||
str, | |||
artifact_paths=artifact_paths, | |||
) | |||
target_dir = os.path.join(self.prompt_dir, prompt_id) | |||
|
|||
target_dir = Path(self.prompt_dir) / prompt_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One day I'll actually learn pathlib
!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like the path building using /
:)
for artifact_path in artifact_paths: | ||
|
||
# Don't copy files which are already in the target_dir | ||
existing_artifact_names = {f.name for f in target_dir.iterdir()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious, do you happen to know if iterdir
lazily evaluates the contents of the dir, or if it takes a proactive os.listdir
and then iterates that (i.e. if the list changes during iteration, what happens)? Not particularly important in this instance, just curious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iterdir()
is a wrapper for os.listdir
. So should be functionally the same:
def iterdir(self):
"""Iterate over the files in this directory. Does not yield any
result for the special paths '.' and '..'.
"""
for name in os.listdir(self):
yield self._make_child_relpath(name)
Atomic rename guarantees that we don't see multiple states for the same file at the same time. But in our case it doesn't matter much: On file systems which has atomic rename there is no downside. If 2 pods happens to start a download for the same |
Great! This makes a lot of sense |
Signed-off-by: Mynhardt Burger <mynhardt@gmail.com>
Signed-off-by: Mynhardt Burger <mynhardt@gmail.com>
Signed-off-by: Mynhardt Burger <mynhardt@gmail.com>
f00a8c0
to
9a77a8f
Compare
Signed-off-by: Mynhardt Burger <mynhardt@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for the contribution!
Issue: #51
Updates
load_prompt_artifacts()
to only copy new artifacts.New artifacts are those which themselves (eg.
foo.pt
) or their swap/in-progress variant (eg. foo.pt.swp
) already exist in theprompt_dir
location.Copying is done in two stages:
.swp
extension appended to indicate a copy is in progress.swp
extension after the copy is completed