Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Usage tracking with certain missing network causes 20s startup delay. #220

Closed
benclifford opened this Issue Apr 23, 2018 · 5 comments

Comments

Projects
None yet
2 participants
@benclifford
Copy link
Contributor

benclifford commented Apr 23, 2018

Usage tracking appears to happen synchronously when something like this is called:

dfk = parsl.DataFlowKernel(executors=[workers])

When something is wrong with the network that causes long delays (for example, slow/timing-out name resolution - for example, in a container with a weird network configuration), startup of the DataFlowKernel is delayed, before the program then successfully continues onwards.

In my case, I'm running in a docker container where the /etc/resolv.conf resolver points to a not-responding IP address, and the startup delay is 20 seconds.

No log message appears to be output describing the cause of the delay.

This occurs with several versions, including 48d46ba

@yadudoc

This comment has been minimized.

Copy link
Member

yadudoc commented Apr 23, 2018

@benclifford What should be the correct behavior ?

  1. We can try to send a message asynchronously and abort if it doesn't work
  2. We can inform the user of the delay if we wait for sending to complete.
@benclifford

This comment has been minimized.

Copy link
Contributor Author

benclifford commented Apr 23, 2018

I think usage tracking should be async and never significantly delay workflow start or cause any other workflow problems in the case of problems.

@yadudoc yadudoc self-assigned this Apr 23, 2018

@yadudoc yadudoc added the bug label Apr 23, 2018

@yadudoc yadudoc modified the milestones: Parsl-0.6.0, Parsl-0.5.1 Apr 23, 2018

yadudoc added a commit that referenced this issue May 4, 2018

yadudoc added a commit that referenced this issue May 4, 2018

@yadudoc

This comment has been minimized.

Copy link
Member

yadudoc commented May 4, 2018

@benclifford I'm running tests on this, and if there's no network, I see that being caught and reported in the logs. Is it possible that the network is mis-configured ? I've pushed some code that puts a time limit on send. Not fully async.

@benclifford

This comment has been minimized.

Copy link
Contributor Author

benclifford commented May 4, 2018

It specifically is when system hostname resolution hangs: you should be able to recreate that by changing a linux machine so that /etc/resolv.conf contains only the line:

nameserver 2.2.2.2

That's a relatively common network problem that I encounter in real life as the cause of "mysterious hangs" in loads of software.

yadudoc added a commit that referenced this issue May 11, 2018

yadudoc added a commit that referenced this issue May 11, 2018

Moving UDP messaging for usage_tracking to separate process. Fixed #220
Added decorator to launch fn on a new process.
Cleaner isolated udp_messenger code.
Added cleanup code triggered from dfk at cleanup

yadudoc added a commit that referenced this issue May 11, 2018

yadudoc added a commit that referenced this issue May 11, 2018

yadudoc added a commit that referenced this issue May 11, 2018

@benclifford

This comment has been minimized.

Copy link
Contributor Author

benclifford commented May 11, 2018

I've just tried dfccb8a, and I no longer experience this delay (after checking that I do get the delay against master)

@yadudoc yadudoc closed this in dfccb8a May 11, 2018

yadudoc added a commit that referenced this issue May 11, 2018

Merge pull request #270 from Parsl/async_usage_track_#220
Async usage track #220 and checkpoint testing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.