The infrastructure checker is a proactive and configurable command line tool that enables the user to diagnose potential problems within the infrastructure.
The installation process is a straight forward process, just copy the binary from GitHub releases here and copy it to the node that you want to test, copy the configuration file from here, and configure the necessary checks.
**On linux:
Ensure the binary is executable by running the following cli command
chmod +x infra-checker_linux_amd64
Make sure your user has access to the orchestrator kubectl, docker, podman, etc;
If your user doesn’t have access to the orchestrator, use the root user.
After downloading the configuration template from GitHub, place it next to the binary, the application will try to locate its configuration file relative to its path.
Supported checks as of version v.1.5.0
Name | Description |
---|---|
container_exists_check | Used to check if a given container exists on the system |
container_running_check | Used to check if a given container status is running. |
http_response_check | Used to check if a given HTTP host responds with an expected HTTP status code |
mqtt_connection_check | Used to check if the node is able to connect to an MQTT broker. |
redis_connection_check | Used to check if the node is able to connect to a Redis server |
database_connection_check | Used to check if the node is able to connect to a RDBMS |
file_exists_check | Used to check if a given file exists on a storage system |
execution_engine_heartbeat_check | Used to check if a given instance of execution engine is present |
terminus_check | Used to check the status of a nest.js installation via Terminus module |
local_disk_space_check | Used to check the available local disk space |
local_memory_space_check | Used to check if the available memory is below a certain threshold |
Anatomy of the configuration file
{
"engine": "podman",
"redis": {
"host": "localhost",
"password": "redis_password",
"port": 6379
},
"database": {
"engine": "postgresql",
"host": "localhost",
"port": 5432
},
"mqtt": {
"host": "localhost",
"port": 1883,
"username": "mqtt_user",
"password": "mqtt_password"
},
"minio": {
"host": "localhost",
"accessKey": "",
"secretKey": "",
"port": 9001,
"useSSL": false
},
"checks": [
{
"label": "Dummy check",
"type": "dummy_check"
}
]
}
At the moment the tool supports Docker, Podman and Kubernetes (k8s) orchestration engines.
This key represents a collection of necessary parameters in order to connect to a Redis daemon.
Since the tool is usually ran on the node the host parameter is “localhost”, this value needs to be changed if the tool is ran on a diferent node.
The password field si optional and will be skipped if empty
This key represents a collection of necessary parameters in order to connect to database.
The “engine” parameter can have the following values “postgresql” or “mysql”, currently the infrastructure uses only “postgresql”
The “host” parameter is usually “localhost” and needs to be changed in order to point to a given RDBMS
This key represents a collection of necessary parameters in order to connect to an MQTT broker.
This key represents a collection of necessary parameters in order to connect to minio instance.
This key represents a collection of checks that will be ran by the tool in the order they are being added.
Locate the checks list in the config.json file, by default the configuration file will come with a “dummy check” that will do nothing. You can remove that check and start adding your own. All checks will be separated by a comma “,”
Check if a container exists on the system
{
"type": "container_exists_check",
"label": "Container exists (haproxy)",
"container_name": "haproxy"
}
Understanding the above check:
- “type” this is the type of check that the tool will preform on the system
- “label” this is an arbitrary string defined by the user in order to differentiate this check from others.
- “container_name” this is the actual container name as it appears by running commands such as: “docker ps” or “podman ps”
- If you are using the Kubernetes orchestration engine you need to add the key “namespace”
Check if a container status is “running”
{
"type": "container_running_check",
"label": "Container running (broker)",
"container_name": "broker"
}
Understanding the above check
- “type” the type of the check preformed by the tool
- “label” an arbitrary string defined by the user in order to differentiate this check from others.
- “container_name” this is the actual container name as it appears by running commands such as: “docker ps” or “podman ps” same as the above check.
- If you are using the Kubernetes orchestration engine you need to add the key “namespace”
Check if a HTTP daemon responds with a given status code
{
"type": "http_response_check",
"label": "Equipment health check",
"url": "https://google.com",
"code": 200
}
Understanding the above check
- “type” the type of the check preformed by the tool
- “label” an arbitrary string defined by the user in order to differentiate this check from others.
- “url” is the Uniform Resource Locator, also known as the “address” of a web page
- “code” the expected status code of the request you can find additional details about response code here
Check if a node can connect to an MQTT broker
{
"type": "mqtt_connection_check",
"label": "MQTT connection check"
}
Understanding the above check
- “type” the type of the check preformed by the tool
- “label” an arbitrary string defined by the user in order to differentiate this check from others.
Check if a node can connect to a Redis daemon
{
"type": "redis_connection_check",
"label": "Redis connection check"
}
Understanding the above check
- “type” the type of the check preformed by the tool
- “label” an arbitrary string defined by the user in order to differentiate this check from others.
Check if a node can connect to a RDBMS
{
"type": "database_connection_check",
"label": "Equipment database connection check",
"database": "equipment",
"username": "equipment",
"password": "password_here"
}
Understanding the above check
- “type” the type of the check preformed by the tool
- “label” an arbitrary string defined by the user in order to differentiate this check from others.
- “database” is the name of the database we want to connect to, this database must exist.
- “username” the database username
- “password” the database password
** RDBMS can have multe databases and pairs of username and passwords
Check if an object exists on a minio instance
{
"type": "file_exists_check",
"label": "Minio object exists",
"engine": "minio",
"path": "test/test.txt"
}
Understanding the above check
- “type” the type of the check preformed by the tool
- “label” an arbitrary string defined by the user in order to differentiate this check from others.
- “engine” this is the storage engine (can also be referred to as file system), in this case the value has to be “minio”
- “path” this is the path to a given object which is bucket_name/object
Check if a file exists on the local file system
{
"type": "file_exists_check",
"label": "Local file exists",
"engine": "local",
"path": "test.txt"
}
Understanding the above check
- “type” the type of the check preformed by the tool
- “label” an arbitrary string defined by the user in order to differentiate this check from others.
- “engine” this is the storage engine (can also be referred to as file system), in this case the value has to be “local”
- “path” this is the path to a given file on the file system, this path can be relative to the tool or absolute ex: test.txt is relative and /home/user/text.txt is an absolute path
Check if a given Execution Engine instance exists
{
"type": "execution_engine_heartbeat_check",
"label": "Execution engine heartbeat check",
"box_name": "edge-dev"
}
Understanding the above check
- “type” the type of the check preformed by the tool
- “label” an arbitrary string defined by the user in order to differentiate this check from others.
- “box_name” this is a string representing the name or host id of an Execution Engine instance ex: gts-staging, gts-ws, etc;
Check if Terminus module reports and error
{
"type": "terminus_check",
"label": "Terminus check",
"url": "http://localhost:3000/health"
}
Understanding the above check
- “type” the type of the check preformed by the tool
- “label” an arbitrary string defined by the user in order to differentiate this check from others.
- “url” this is a string representing the URL to the health check module for nest.js installation
Check if there is enough free space on the disk
{
"type": "local_disk_space_check",
"label": "Local disk space check",
"path": "/",
"threshold": 80.0
}
Understanding the above check
- “type” the type of the check preformed by the tool
- “label” an arbitrary string defined by the user in order to differentiate this check from others.
- “path”, this key represents the disk partition you want to check usually you will want to check the “/” (root partition)
- “threshold” this key represents the maximum accepted occupied disk space, in the above example that is 80%
Check if there is enough free memory
{
"type": "local_memory_space_check",
"label": "Local memory space check",
"threshold": 80.0
}
Understanding the above check
- “type” the type of the check preformed by the tool
- “label” an arbitrary string defined by the user in order to differentiate this check from others.
- “threshold” this key represents the maximum accepted occupied memory space, in the above example that is 80%
{
"engine": "podman",
"redis": {
"host": "localhost",
"password": "redis_password",
"port": 6379
},
"database": {
"engine": "postgresql",
"host": "localhost",
"port": 5432
},
"mqtt": {
"host": "localhost",
"port": 1883,
"username": "mqtt_user",
"password": "mqtt_password"
},
"minio": {
"host": "localhost",
"accessKey": "",
"secretKey": "",
"port": 9001,
"useSSL": false
},
"checks": [
{
"label": "Dummy check",
"type": "dummy_check"
},
{
"label": "Dummy check",
"type": "dummy_check"
}
]
}
ERROR Container exists (haproxy) failed with error: exec: "podman": executable file not found in $PATH
The above error signals the fact that the podman executable does not exist on the system.
ERROR HTTP smoke test failed with error: Get "http://cucu.tech": dial tcp: lookup cucu.tech: no such host
The above error signals the fact that the URL “http://cucu.tech” cannot be reached (does not exist or the server is down)
The above error signals the fact that the URL was expected to return a 403 Forbidden status code, but returned 200 OK instead (the request was not successfull)
ERROR MQTT connection check failed with error: network Error : dial tcp [::1]:1883: connect: connection refused
The above error signals the fact that the MQTT broker refused the connection because it either does not exist or it is not running
ERROR Redis connection check failed with error: ERR AUTH called without any password configured for the default user. Are you sure your configuration is correct?
The above error signals the fact that the Redis daemon is not configured to ask for a password.
The above error signals the fact that the Redis daemon refused the connection because it either does not exist or it is not running
The above error signals the fact that the database daemon refused the connection.
Possible solutions:
- check the database connection details
- the database server is down
The above error signals the fact that the minio daemon refused the connection.
Possible solutions:
- check the minio connection details
The above error signals the fact that the file doesn’t exist on the file system
Possible solutions:
- the file doesn’t exist
ERROR Execution engine heartbeat check failed with error: network Error : dial tcp [::1]:1883: connect: connection refused
The above error signals the fact that the MQTT broker refused our connection
Possible solutions:
- Check MQTT connection
- Check MQTT connection details
The above error signals the fact that the Execution Engine instance didn’t sent a heartbeat message
Possible solutions:
- Check “box_name” parameter
- Check if physical machine is up and running