New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wip] New diagnosis system #534

Open
wants to merge 17 commits into
base: stretch-unstable
from

Conversation

Projects
None yet
2 participants
@alexAubin
Copy link
Member

alexAubin commented Sep 1, 2018

The problem

Setting up and maintaining a server is a hard task, and even though we have documentation regarding port forwarding, DNS and other things, it is not entirely easy to diagnose that things are working smoothly or at their best.

Also, some issues might appear such as lacking disk space not having swap, or one critical service going down for some reason - stuff that you might notice only when somethings breaks for real (ie app failure because of lack of space).

Solution

This PR proposes a new, high-level diagnosis system, meant to be ran regularly to check for common issues, such as DNS configuration mismatch with the expected conf, having a reasonnable amount of disk space remaining, all critical services being up, and so on.

The diagnosis system shall be ran from time to time with a cron job that should alert the admin in case of problems found.

Features would include :

  • "high-level" tests/reports
  • modularity (one diagnoser per file)
  • extendability (app could add custom diagnosers)
  • cached value (e.g. don't rediagnose the DNS every 30 second, instead keep it in cache for something like an hour or a day, and be able to force a rediagnosing if needed)
  • be able to ignore warning/error (sometimes it's expected that the DNS does not match the recommended conf for instance)

More complete list of examples of diagnoser :

  • Internet connectivity / what are the IPv4 and IPv6
  • Ports forwarded / reachable
  • DNS configuration
  • Having some swap available
  • Having some RAM available
  • Having some disk space available (check the different partitions, e.g. /tmp and /var)
  • Check certificates are valid
  • Check mail score is reasonable
  • Check all critical services are running
  • Check the last upgrade of the system and apps happened not so long ago
  • Check the last backup of the system and apps happened not so long ago
  • Check spectre/meltdown vulnerability
  • Misc security checks
  • ???

PR Status

Requires #535
Work in progress

How to test

...

Validation

  • Principle agreement 0/2 :
  • Quick review 0/1 :
  • Simple test 0/1 :
  • Deep review 0/1 :
[mod] misc, better error message
I'm using repr to be able to detect if it's a string or a number since it's an error I'm expecting
@Psycojoker

This comment has been minimized.

Copy link
Member

Psycojoker commented Sep 1, 2018

We are going to have a confusing CLI api with "yunohost diagnosis" and "yunohost tools diagnosis" 😓

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment