-
Notifications
You must be signed in to change notification settings - Fork 0
Operations and Monitoring
This page covers IPC services, logging, analytics, SNMP, Grafana/Prometheus, and email notification support.
The command queue (services/command_queue.py, CommandQueueRelay) provides file-based command delivery as a reliable fallback mechanism.
How It Works:
- Commands are written to a text file at
temp/command_queues/{server_name}_commands.txt. - Each line has the format:
timestamp:command_text. - A dedicated polling thread reads the file every 100 milliseconds.
- When new commands are detected, they are delivered to the server process via a registered callback function.
- Processed commands are tracked by unique ID to prevent double-delivery.
- The command file is automatically truncated after 100 processed commands to prevent unbounded growth.
Thread Safety: The command queue maintains a global registry of active relays, ensuring one relay per server. All file operations are wrapped in appropriate error handling for concurrent access.
The stdin relay (services/stdin_relay.py) uses Windows Named Pipes for efficient cross-process command delivery.
Pipe Creation:
- A named pipe is created at
\\.\pipe\ServerManager_stdin_{server_name}. - The pipe uses a null DACL security descriptor for broad access across different user contexts.
- A non-daemon listener thread waits for connections on the pipe.
Command Flow:
- A client (dashboard, web API, automation system) connects to the named pipe.
- The client writes the command string to the pipe.
- The relay thread reads the command and writes it to the server process's stdin.
- A JSON acknowledgment is sent back through the pipe confirming delivery.
Client Function: send_command_via_relay(server_name, command) handles the client side: connecting to the pipe, writing the command, and reading the response.
The persistent stdin pipe (services/persistent_stdin.py, PersistentStdinPipe) creates a named pipe that is used as the subprocess stdin handle at creation time.
How It Differs from stdin_relay:
- The persistent stdin pipe is created before the server process is spawned and passed as the
stdinparameter tosubprocess.Popen(). - This ensures the server process always has a writable stdin, even if it does not normally accept input.
- The pipe handle is created as inheritable using
win32securityand converted to a C file descriptor viamsvcrt.open_osfhandle()for compatibility with Python's subprocess module.
The dashboard tracker (services/dashboard_tracker.py, DashboardTracker) monitors the state of dashboards and servers.
Functions:
-
scan_dashboards()— Reads PID files from thetemp/directory and verifies each process is still running usingpsutil.pid_exists(). Returns a list of active dashboard/component processes. -
scan_servers()— Loads server configurations from the database and checks whether each server's recorded PID is still running. -
start_auto_refresh()— Starts a background daemon thread that refreshes the dashboard and server status every 10 seconds.
The dashboard tracker is used by the web server to provide real-time status information to the web interface.
Command Delivery Wrapper:
The send_command_to_server() function in Modules/core/common.py provides a unified high-level interface for sending commands to server processes. It is used by ServerAutomationManager (MOTD, warnings) and ServerUpdateManager (pre-restart warnings). The function:
- Attempts delivery via the persistent stdin pipe first.
- Falls back to the file-based command queue if the pipe is not available.
- Returns a boolean indicating success or failure.
The logging system is centralised in Modules/core/server_logging.py through the LogManager singleton class. All application modules use this system rather than configuring their own logging handlers.
Three Log Formatters:
-
Default:
%(asctime)s - %(name)s - %(levelname)s - %(message)s -
Detailed:
%(asctime)s - %(name)s - %(levelname)s - %(filename)s:%(lineno)d - %(funcName)s() - %(message)s— Includes source file, line number, and function name. -
JSON:
{"timestamp": "%(asctime)s", "logger": "%(name)s", "level": "%(levelname)s", "message": "%(message)s", "module": "%(filename)s", "line": %(lineno)d}— Machine-readable JSON format for log aggregation tools.
File Handler Configuration:
-
Handler Type:
RotatingFileHandlerfrom Python'slogging.handlersmodule. - Max File Size: 10 MB per log file (configurable).
- Backup Count: 3 rotated files kept (configurable).
-
Date Format:
%Y-%m-%d %H:%M:%S
Log Consolidation: To reduce the number of log files, 30+ component loggers are mapped to approximately 15 shared log files. For example:
- Dashboard, DashboardFunctions, DashboardUI →
Dashboard.log - ServerManager, ServerOperations, ServerUpdates →
ServerManager.log - SteamDatabase, MinecraftDatabase, DatabaseUtils →
Database.log - NetworkManager, ClusterManager, AgentManager →
Network.log - WebServer, WebSecurity →
WebServer.log
Early Crash Logging: The early_crash_log() function provides emergency logging before the LogManager is fully initialised. It writes directly to the component log file using basic file I/O during the earliest stages of module loading.
All log files are stored under the logs/ directory:
| Directory | Contents |
|---|---|
logs/components/ |
Per-component log files (Dashboard.log, ServerManager.log, etc.) |
logs/debug/ |
Debug and diagnostic log files |
logs/services/ |
Service-layer log files (CommandQueue, StdinRelay, etc.) |
The LogManager includes automated log maintenance:
Log Compression:
- A background daemon thread periodically scans for log files older than 7 days.
- Old log files are compressed using gzip (
.gzextension) to save disk space.
Log Deletion:
- Log files (both compressed and uncompressed) older than 30 days are automatically deleted.
- This prevents unbounded disk usage from log accumulation.
Log Statistics: The LogManager tracks error and warning counts since the last reset. These statistics are accessible through the analytics system and can be included in diagnostic reports.
The analytics module (Modules/ui/analytics.py, AnalyticsCollector) collects real-time metrics and provides health scoring:
Data Collection:
- Metrics are stored in memory using
collections.dequewith a maximum of 1440 entries (representing 24 hours of data at 1-minute intervals). - Thread-safe data structures using
collections.defaultdictof deques. - Collects CPU usage, memory usage, disk usage, server counts, and per-server metrics.
Health Scoring:
- The analytics system calculates a health score on a 0-100 scale.
- Factors include CPU usage, memory availability, disk space, number of error-state servers, and logging error rates.
- Health scores are categorised: 90-100 = Healthy, 70-89 = Warning, Below 70 = Critical.
Data Export:
-
get_analytics_summary()— Returns current values and 24-hour trends. -
get_time_series_data()— Returns historical time-series data for charts. -
export_to_json()— Exports all analytics data as JSON for external processing.
The SNMP manager (Modules/SMNP/snmp_manager.py, SNMPManager) provides SNMP monitoring data:
Enterprise OID Base: 1.3.6.1.4.1.12345
OID Mappings:
| OID Suffix | Metric | Description |
|---|---|---|
.1.1 |
health_score | Overall system health (0-100) |
.1.2 |
cpu_percent | Current CPU usage |
.1.3 |
memory_percent | Current memory usage |
.1.4 |
disk_percent | Current disk usage |
.1.5 |
uptime | System uptime in seconds |
.2.1 |
servers_total | Total managed servers |
.2.2 |
servers_running | Currently running servers |
.2.3 |
servers_offline | Offline/stopped servers |
.2.4 |
servers_error | Servers in error state |
.3.1 |
webserver_cpu | Web server CPU usage |
.3.2 |
webserver_memory | Web server memory usage |
.3.3 |
webserver_connections | Active web connections |
.3.4 |
dashboards_count | Active dashboards |
Methods:
-
get_snmp_metrics()— Returns all SNMP metrics as a dictionary. -
get_snmp_walk_data()— Returns data formatted for SNMP walk operations. -
get_metric_by_oid(oid)— Returns a single metric by its OID.
The Grafana manager (Modules/SMNP/graphana.py, GrafanaManager) provides monitoring system integration:
Prometheus Metrics Endpoint:
The web server exposes a /metrics endpoint that returns metrics in Prometheus text exposition format:
# HELP server_manager_health_score Overall system health score
# TYPE server_manager_health_score gauge
server_manager_health_score 95.0
# HELP server_manager_cpu_usage Current CPU usage percentage
# TYPE server_manager_cpu_usage gauge
server_manager_cpu_usage 23.5
# HELP server_manager_servers_total Total managed servers
# TYPE server_manager_servers_total gauge
server_manager_servers_total 5
...
Grafana JSON Metrics: A JSON format endpoint provides structured metrics data with three sections: system metrics, server metrics, and application metrics.
Time-Series Data:
The get_time_series_data() method provides time-stamped metric data suitable for Grafana graph panels.
Pre-Built Dashboard:
The get_dashboard_config() method returns a complete Grafana dashboard JSON definition with three panels:
- Health Score — A stat panel showing the current health score.
- Server Status — A pie chart showing the distribution of server states (running, stopped, error).
- System Resources — A time-series graph showing CPU, memory, and disk usage over time.
This JSON can be imported directly into Grafana to create a monitoring dashboard without manual configuration.
The mail server module (Modules/SMTP/mailserver.py, MailServer) supports multiple email providers and protocols:
Provider Presets:
| Provider | SMTP Server | Port | Security |
|---|---|---|---|
| Gmail | smtp.gmail.com | 587 | STARTTLS |
| Outlook | smtp-mail.outlook.com | 587 | STARTTLS |
| Office365 | smtp.office365.com | 587 | STARTTLS |
| Yahoo | smtp.mail.yahoo.com | 587 | STARTTLS |
| Custom | User-defined | User-defined | TLS/SSL/None |
Configuration Storage: SMTP settings are stored in the Windows Registry under:
HKEY_LOCAL_MACHINE\Software\SkywereIndustries\Servermanager\MailServer
Capabilities:
- Send plain text and HTML emails.
- Attach files (MIME multipart with Base64 encoding).
- Send to multiple recipients.
- Connection testing (
test_connection()) to verify SMTP settings. - Automatic provider detection from email domain.
For organisations using Microsoft 365 with modern authentication, Server Manager supports OAuth 2.0 via MSAL (Microsoft Authentication Library):
Setup Process:
- Register an application in Azure Active Directory.
- Configure the required API permissions (Mail.Send for Microsoft Graph).
- Enter the Application (client) ID and Tenant ID in Server Manager.
- On first use, an interactive browser window opens for user consent.
- After consent, the refresh token is stored securely for silent authentication.
Token Management:
- Tokens are refreshed silently (without user interaction) using the stored refresh token.
- Tokens are refreshed 5 minutes before expiration to prevent authentication failures.
- If silent refresh fails, the interactive browser flow is triggered again.
Email Sending:
- OAuth-authenticated emails are sent via the Microsoft Graph API (
/me/sendMail) rather than traditional SMTP. - This bypasses the need for app passwords or enabling "less secure apps".
The notification system (Modules/SMTP/notifications.py, NotificationManager) provides templated email notifications:
Available Templates:
| Template | Trigger | Description |
|---|---|---|
welcome |
User account creation | Welcome message with login instructions |
password_reset |
Password reset request | Password reset instructions with temporary credentials |
account_locked |
Account lockout | Notification that the account has been locked due to failed login attempts |
server_alert |
Server issues | Alert about server problems (crash, high resource usage, errors) |
maintenance |
Scheduled maintenance | Advance notice of planned maintenance windows |
custom |
Manual send | Custom message from the admin panel |
Template Structure:
Each template consists of three files in the Modules/SMTP/Mail-Templates/ directory:
-
{template_name}_html.html— HTML version of the email body -
{template_name}_text.txt— Plain text fallback -
{template_name}_subject.txt— Email subject line
Templates use placeholder replacement (e.g., {username}, {server_name}, {timestamp}) to personalise each notification.
Template Variables:
-
{username}— Recipient's username -
{display_name}— Recipient's display name -
{server_name}— Name of the affected server -
{timestamp}— Current date and time -
{base_url}— Application base URL -
{message}— Custom message content
CSS Styling: All HTML templates reference mail-template.css for consistent styling. The CSS is embedded inline in the HTML before sending for maximum email client compatibility.
Notification Toggles:
Each notification type can be individually enabled or disabled through the admin dashboard. There is also an admin_only_alerts option that restricts server alerts and maintenance notifications to admin users only.