-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make the progress bar smoother #38
Make the progress bar smoother #38
Conversation
Use the "Submit and use as completed" pattern as described by Jason Brownlee in "ThreadPoolExecutor in Python: The Complete Guide". Fixes connelldave#37
botocove/cove_runner.py
Outdated
with futures.ThreadPoolExecutor(max_workers=self.thread_workers) as executor: | ||
with ThreadPoolExecutor(max_workers=self.thread_workers) as executor: | ||
jobs = [executor.submit(self.cove_thread, s) for s in self.sessions] | ||
outputs = (f.result() for f in as_completed(jobs)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like a huge nitpick comment, but I stared at this for ages trying to work out what didn't sit well with me: it's a super clean solution - I feel like outputs
should be something that makes clear that the variable stored is a generator.
Passing the generator directly to line 43 doesn't make it any clearer, but line 39 means you need to know exactly how this works mechanically or it's just baffling :D
My suggestion is completed_futures_generator
. It's not exactly a clever name :D A comment with # https://superfastpython.com/threadpoolexecutor-in-python/#Submit_and_Use_as_Completed
feels worthwhile too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a nitpick at all! In fact the difference between this working "smoothly" and being totally broken is exactly two characters.
If you replace the generator syntax (...)
with list comprehension syntax [...]
the progress bar is invisible during the whole operation until it appear at the end already 100% complete. That's because tqdm doesn't even see the input until all the futures have completed.
So I think making it more explicit is the right thing to do.
Names don't have to be clever, but they should be clear. outputs
is about as vague as they come :-) completed_futures_generator
does a better job of telling what it is.
I wonder if maybe a new helper function called something like iter_each_result_as_completed
that uses the yield
syntax would be even clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My latest commit aims to make more explicit what's going on to implement the pattern. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice change - much more parsable imo. I've been burned enough times by (x)
vs (x,)
to treat braces with suspicion where possible :)
botocove/cove_runner.py
Outdated
with ThreadPoolExecutor(max_workers=self.thread_workers) as executor: | ||
jobs = [executor.submit(self.cove_thread, s) for s in self.sessions] | ||
outputs = (f.result() for f in as_completed(jobs)) | ||
def submit_decorated_func_with_all_sessions( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think this can just be inline - a type hint on the list comp has the same impact without needing a seperate func
futures: List["Future[CoveSessionInformation]"] = [executor.submit(self.cove_thread, s) for s in self.sessions]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer functions with descriptive names over inline comments, so I had replaced your inline comment with a function named to similar effect. In this case the inline function doesn't do much for the readability of the code, so I'm okay with inlining it.
The original comment is still gone because I think the new comment about the pattern does a better job of explaining the code.
Mypy knows the type without a type hint, so I'd prefer to omit the hint. Do you have another reason for including it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not particularly worried about that comment - the original that's removed isn't super useful and should probably be a docstring if anything.
My reply was intended for the inner function being better served by just resolving the list comprehension in the outer func, or in this case, didn't feel it needed a func at all since the value it was adding in my eyes was the return annotation :)
Mypy knows the type without a type hint, so I'd prefer to omit the hint. Do you have another reason for including it?
The reason I like that approach is for two reasons - annotations can be more useful for humans than just mypy inference - in this case it's mainly just an annotation saying "here's what this magic results in", and declaring the var type means that if mypy infers different it'll fail which is a useful signal for the occasional corner case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I've added the type hint for clarify. How does it look now?
botocove/cove_runner.py
Outdated
) -> FutureCoveResults: | ||
return [executor.submit(self.cove_thread, s) for s in self.sessions] | ||
|
||
def iterate_results_in_order_of_completion( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this idea - much more digestable. Only suggestion is not having it inside closure of run_cove_function, no functional benefit - can be outside the class itself as a generic helper
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea of using an inner function was to make it clearer that it's just a helper for the enclosing function. It's impossible to reference it from outside.
But I think the generic helper is actually more readable, so I'll go with that.
As a bonus, VS Code will warn about unused functions whose names start with an underscore, so it achieves the same thing as what I was trying with the inner function.
botocove/cove_types.py
Outdated
@@ -21,6 +22,7 @@ class CoveSessionInformation(TypedDict): | |||
|
|||
|
|||
CoveResults = List[CoveSessionInformation] | |||
FutureCoveResults = List["Future[CoveSessionInformation]"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably not worth defining for one shot, but again - nitpick :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was modelled after CoveResults
, but I wasn't totally convinced either. I'll inline the type hint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out CoveResults is the same one-shot type alias, so I added another commit to remove it as well.
I think "list of thing" is more readable than an alias to the same. The CoveResults
alias had me thinking the type was something more complicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, agreed - I think it was a bit more useful once upon a time :)
Looks good! Happy to merge when you are - wasn't sure from a previous merge strategy conversation if I'd been jumping the gun sometimes? |
I seem to have merge rights in the repo. Are you happy for me to press the button? In #36 I suggested that we replace the default squash merge message with something that "smooths out the wrinkles" of corrective commits. You can use
I think here we can discard all the messages except the first one as they are just corrective commits that don't add to the headline feature. So my squashed commit message would look like this:
|
I think you can merge because I approved it, but I did make you a collaborator so CI ran so maybe you can just merge whenever anyway :D Personally I just rebase -i and squash out the noisy commits into one that way before force pushing over my work in progress commits - "squash and merge" is a lazier version of that but I'm not worried about enforcing anything stylistically here. I think your suggested approach is similar to that but writing the prettified commit message in Github instead of local git, which is probably more user friendly? If you want to bump the patch version in pyproject.toml and update the changelog though that'd be helpful before merging, then I can just hit the release button. |
I think the CI requires a manual invocation only for a first-time contributor even if they are not a collaborator. As a collaborator, I have a lot of permissions to make changes to your repo and the supporting resources. All the collaborator access permissions are documented. Apparently I even have permissions to manage releases. Is that what you were expecting? Would you like me to handle the release or is that something you want to retain control of?
I'm happy to bump the patch version and update the changelog, but it feels dirty to mix in those changes with this PR. I like how so far the commits that prepare a release are separate from any commits for features and bug fixes. Since I have collaborator permissions, I could merge this now as-is and then push a new commit to master with the changes to pyproject.toml and the changelog. But before I make any changes I want to check that the setup is as you were expecting :-) |
The master branch is protected and requires a PR review - I pushed a codeowners file today while I remembered, but I wasn't worried about consequences of the collab permissions, both from a trust point of view and that I'd need to review :) I'm relatively sure that just giving "run CI" is the biggie, since you can could pull the pypi key out of the CI anyway? Appreciate you checking though - I definitely haven't given any of this a great deal of scrutiny past my initial thought :) I'll keep responsibility for releases for now (and add a changelog/release commit). Feel free to merge an approved PR when you're ready though. |
Done! Thanks :-) |
Use the "Submit and use as completed" pattern as described by Jason Brownlee in
"ThreadPoolExecutor in Python: The Complete Guide".
Fixes #37