Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: UnboundLocalError wait in dispatcher #4135

Closed

Conversation

KimSoungRyoul
Copy link
Contributor

What does this PR address?

Fixes #4128

this issue is so wired

this fix won't help us figure out that why wait does not exist under line if batch_size > 1: ...
I monkey-patched bentoml internal code due to this issue.

@KimSoungRyoul KimSoungRyoul requested a review from a team as a code owner August 20, 2023 07:05
@KimSoungRyoul KimSoungRyoul requested review from ssheng and removed request for a team August 20, 2023 07:05
@@ -217,6 +211,10 @@ async def train_optimizer(
if batch_size > 1: # only wait if batch_size
a = self.optimizer.o_a
b = self.optimizer.o_b
wait = min(
self.max_latency * 0.95,
(batch_size * 2 + 1) * (self.a + self.optimizer.o_b),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.a is a typo?

@aarnphm aarnphm changed the title [#4128] UnboundLocalError in the dispatcher fix: UnboundLocalError wait in dispatcher Aug 20, 2023
Copy link
Contributor

@aarnphm aarnphm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think this is the correct fix for this problem.
Also, putting wait here will increase the bytecode since it is inside the while loop, as this is also a performance-sensitive code.

Comment on lines +214 to +217
wait = min(
self.max_latency * 0.95,
(batch_size * 2 + 1) * (a + b),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think moving wait here solves the issue

Since wait should already be defined as before

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does since wait is only used by L219 below. But it seems to change the behavior since a, b and self.max_latency all change during the loop while wait doesn't change before.

Copy link
Contributor Author

@KimSoungRyoul KimSoungRyoul Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this fix does not good solution .

It is a temporary fix Pull Request aimed at hotfix.

currently we can not use workers_per_resource Option in BentoML v1.1.0 ~v1.1.1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sauyon does this look ok with you?

@KimSoungRyoul
Copy link
Contributor Author

This hotfix feels rushed and sloppy.
Even if there is a bug right now, it might be better to analyze the exact cause and then fix it.

thanks, I close this PR myself

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: UnboundLocalError in the dispatcher
3 participants