Skip to content

Slow performance for small files #110

@jayqi

Description

@jayqi

Currently, _refresh_cache is called as part of a lot of methods to ensure the local cache is up-to-date.

Unfortunately, _refresh_cache makes a whole bunch of network requests to fetch metadata from the cloud storage service. In fact, it looks like a run through _refresh_cache that doesn't download a new copy still needs to make 4 of these requests, while a run through that does download makes a total of 6 requests. On my current internet connection, one of these requests is 350-400 ms. This means cloud paths may take a minimum of 1-2 sec for doing any method that hits _refresh_cache for a file that exists in cloud storage, no matter how small the file is.

Screen Shot 2020-11-19 at 5 04 49 PM

def _refresh_cache(self, force_overwrite_from_cloud=False):
# nothing to cache if the file does not exist; happens when creating
# new files that will be uploaded
if not self.exists():
return
if self.is_dir():
raise ValueError("Only individual files can be cached")
# if not exist or cloud newer
if (
not self._local.exists()
or (self._local.stat().st_mtime < self.stat().st_mtime)
or force_overwrite_from_cloud
):
# ensure there is a home for the file
self._local.parent.mkdir(parents=True, exist_ok=True)
self.download_to(self._local)
# force cache time to match cloud times
os.utime(self._local, times=(self.stat().st_mtime, self.stat().st_mtime))
if self._dirty:
raise OverwriteDirtyFile(
f"Local file ({self._local}) for cloud path ({self}) has been changed by your code, but "
f"is being requested for download from cloud. Either (1) push your changes to the cloud, "
f"(2) remove the local file, or (3) pass `force_overwrite_from_cloud=True` to "
f"overwrite."
)
# if local newer but not dirty, it was updated
# by a separate process; do not overwrite unless forced to
if self._local.stat().st_mtime > self.stat().st_mtime:
raise OverwriteNewerLocal(
f"Local file ({self._local}) for cloud path ({self}) is newer on disk, but "
f"is being requested for download from cloud. Either (1) push your changes to the cloud, "
f"(2) remove the local file, or (3) pass `force_overwrite_from_cloud=True` to "
f"overwrite."
)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions