-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add basic monitoring #60
Conversation
Deploying with Cloudflare Pages
|
bbeba3b
to
3986261
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will the production DAG post the status of every run?
|
||
|
||
@task(trigger_rule=TriggerRule.ALL_DONE) | ||
def check_weaviate_status(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are we trying to check here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- weaviate class exist if not then the task fail
- print the number of record in weaviate class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should just add what we are monitoring in weaviate as doctoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pankajastro we need the following changes in the DAGs:
- On a regular basis we can post the status on slack.
- We should run the DAGs like very 10 minutes or so and check if anything is down if it is we should post a status on Slack (and maybe email folks).
Something like this incident can happen any time.
As per the current state of PR, it would post on Slack and we can add an env variable for schedule intervals for example |
As discussed in call:
|
@sunank200 I have tested this you can check the sample message in the PR description |
@jedcunningham requesting your feedback on this! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM. @pankajastro added few comments
|
||
|
||
@task(trigger_rule=TriggerRule.ALL_DONE) | ||
def check_weaviate_status(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should just add what we are monitoring in weaviate as doctoring.
closes: #39
an example post success and fail service status