Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitoring 2.0 #812

Merged
merged 55 commits into from Mar 14, 2019
Merged

Monitoring 2.0 #812

merged 55 commits into from Mar 14, 2019

Conversation

yadudoc
Copy link
Member

@yadudoc yadudoc commented Mar 11, 2019

Extensive updates to the monitoring subsystem:

  • Monitoring now uses a separate MonitoringHub that launches on the Parsl side
  • Cleaner DB integration with Sqlalchemy
  • Default logging to a local sqlite database
  • ZMQ channels for priority messages to ensure ciritical info is not dropped
  • UDP packets for lower priority resource monitoring information
  • Updated configs with a cleaner class based model.

Here's a sample config:

config = Config(
    executors=[
        HighThroughputExecutor(
            label="local_htex",
            cores_per_worker=1,
            address=address_by_hostname(),
        )
    ],
    monitoring=MonitoringHub(
        hub_address=address_by_hostname(),
        hub_port=55055,
        logging_level=logging.INFO,
        resource_monitoring_interval=10,
    ),
    strategy=None
)

Please note that this is not tested for remote interchange configurations where worker nodes are not able to reach the parsl client side.

kylechard and others added 30 commits December 13, 2018 07:09
…tion priority.

This system uses ZMQ for higher priority information from the DFK and UDP
for resource monitoring information from workers. We estimate worker_count/poll_period
messages per second, which can afford a higher rate of message drops.
2. Adding completion time column for workflow table
3. Insert "running" state to status table when the first resource monitoring message is received
@yadudoc yadudoc merged commit c15d6b6 into master Mar 14, 2019
@yadudoc yadudoc deleted the udp_monitoring branch March 14, 2019 18:15
@yadudoc yadudoc mentioned this pull request Mar 14, 2019
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants