Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Observability tools #1563

Merged
merged 10 commits into from Feb 15, 2023
Merged

Conversation

andrewm4894
Copy link
Collaborator

@andrewm4894 andrewm4894 commented Feb 14, 2023

  • add Prometheus & Grafana for custom metrics and visualization (/metrics endpoints and anything else we might want to add).
  • add netdata for infrastructure monitoring and alerts (redis, postgres, containers, also prometheus metrics too etc)
  • configure netdata to collect postgress, redis, and container metrics.
  • configure Prometheus to scrape itself, backend, and inference-server.
  • optional env var of NETDATA_CLAIM_TOKEN to claim to netdata cloud - makes it easier to work with infra and alerts to discord etc. I work there so am pretty sure can get us a free sponsored space that might be useful. Not trying to sell here or anything, just that it's a potential useful overlap given i work there :) .
  • add initial sort of dummy fastapi custom dashboard in docker/grafana/dashboards. Idea is we can save dashboards as code in there (NOTE: needs much more work - anyone can add/improve dashboards as follow on PR's, my promql skills not great).
  • add observability tools to observability docker compose profile (NOTE: not sure what best approach is here, would need some input from other more familiar with the docker set up).
  • add Grafana on port 2000 instead of 3000 since app itself on 3000.
  • add some README.md under each /docker folder.

@github-actions
Copy link

pre-commit failed.
Please run pre-commit run --all-files locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md

@github-actions
Copy link

pre-commit failed.
Please run pre-commit run --all-files locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md

@github-actions
Copy link

pre-commit failed.
Please run pre-commit run --all-files locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md

@github-actions
Copy link

pre-commit failed.
Please run pre-commit run --all-files locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md

@github-actions
Copy link

pre-commit failed.
Please run pre-commit run --all-files locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md

@andrewm4894 andrewm4894 marked this pull request as ready for review February 15, 2023 12:13
Copy link
Collaborator

@yk yk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very cool, thank you so much. didn't know about netdata yet, interested to see how it does. eventually we'll need to get this stuff into the playbooks

@andrewm4894
Copy link
Collaborator Author

very cool, thank you so much. didn't know about netdata yet, interested to see how it does. eventually we'll need to get this stuff into the playbooks

Yep, was wondering about how to actually get it deployed as such. I see some stuff in /ansible but not 100% sure on that side of things yet in terms of how it hangs together.

@andrewm4894 andrewm4894 merged commit 34d400f into LAION-AI:main Feb 15, 2023
@andrewm4894
Copy link
Collaborator Author

Merged anyway (so people can add better dashboards etc in grafana than mine) and maybe can add additional PR to add to the playbooks for deployment.

@andrewm4894 andrewm4894 deleted the observability-tools branch February 15, 2023 23:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants