Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Short-circuit 'easy' rglob/glob patterns #276

Merged
merged 1 commit into from Nov 2, 2022

Conversation

Gilthans
Copy link
Contributor

Resolves #274

@Gilthans
Copy link
Contributor Author

@pjbull friendly ping :)

@pjbull
Copy link
Member

pjbull commented Oct 20, 2022

Thanks @Gilthans, while I think this fix looks directionally reasonable, but I'd like to do a bit more investigation on why performance is bad in #274.

I assumed that it was because there was a network call somewhere in the glob path, but we actually only call the list dir once here:

def _glob(self, selector):
root = _CloudPathSelectable(
PurePosixPath(self._no_prefix_no_drive),
{
PurePosixPath(c._no_prefix_no_drive): is_dir
for c, is_dir in self.client._list_dir(self, recursive=True)
},
is_dir=True,
exists=True,
)

If you get a chance, could you do some profiling and report in #274? I'm hoping there may be a better more general fix than the short circuit if this isn't networking related (or maybe it is and I just missed where the call is in my scan of the code).

@Gilthans
Copy link
Contributor Author

I assumed that it was because there was a network call somewhere in the glob path, but we actually only call the list dir once here:

I actually came to the same conclusion before opening the issue. I got that far into the code and couldn't figure out why it was slow either.
I'd love to do this profiling (one of my hobbies actually :) ), but in the meantime, maybe we could push forward with this? It seems to me like a net-positive regardless

@pjbull pjbull changed the base branch from master to 276-live-tests November 2, 2022 01:05
@pjbull pjbull merged commit bd5445e into drivendataorg:276-live-tests Nov 2, 2022
pjbull added a commit that referenced this pull request Nov 2, 2022
* Short-circuit 'easy' rglob/glob patterns (#276)

* Add change description

Co-authored-by: Daniel Oriyan <gilthans@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

glob and rglob can be slow for large directories or lots of files
2 participants