Skip to content

Operations and Monitoring

Sparks Skywere edited this page May 14, 2026 · 1 revision

Operations and Monitoring

This page covers IPC services, logging, analytics, SNMP, Grafana/Prometheus, and email notification support.

Home

16. Services and Inter-Process Communication

16.1 Command Queue

The command queue (services/command_queue.py, CommandQueueRelay) provides file-based command delivery as a reliable fallback mechanism.

How It Works:

  • Commands are written to a text file at temp/command_queues/{server_name}_commands.txt.
  • Each line has the format: timestamp:command_text.
  • A dedicated polling thread reads the file every 100 milliseconds.
  • When new commands are detected, they are delivered to the server process via a registered callback function.
  • Processed commands are tracked by unique ID to prevent double-delivery.
  • The command file is automatically truncated after 100 processed commands to prevent unbounded growth.

Thread Safety: The command queue maintains a global registry of active relays, ensuring one relay per server. All file operations are wrapped in appropriate error handling for concurrent access.

16.2 Stdin Relay (Named Pipes)

The stdin relay (services/stdin_relay.py) uses Windows Named Pipes for efficient cross-process command delivery.

Pipe Creation:

  • A named pipe is created at \\.\pipe\ServerManager_stdin_{server_name}.
  • The pipe uses a null DACL security descriptor for broad access across different user contexts.
  • A non-daemon listener thread waits for connections on the pipe.

Command Flow:

  1. A client (dashboard, web API, automation system) connects to the named pipe.
  2. The client writes the command string to the pipe.
  3. The relay thread reads the command and writes it to the server process's stdin.
  4. A JSON acknowledgment is sent back through the pipe confirming delivery.

Client Function: send_command_via_relay(server_name, command) handles the client side: connecting to the pipe, writing the command, and reading the response.

16.3 Persistent Stdin Pipe

The persistent stdin pipe (services/persistent_stdin.py, PersistentStdinPipe) creates a named pipe that is used as the subprocess stdin handle at creation time.

How It Differs from stdin_relay:

  • The persistent stdin pipe is created before the server process is spawned and passed as the stdin parameter to subprocess.Popen().
  • This ensures the server process always has a writable stdin, even if it does not normally accept input.
  • The pipe handle is created as inheritable using win32security and converted to a C file descriptor via msvcrt.open_osfhandle() for compatibility with Python's subprocess module.

16.4 Dashboard Tracker

The dashboard tracker (services/dashboard_tracker.py, DashboardTracker) monitors the state of dashboards and servers.

Functions:

  • scan_dashboards() — Reads PID files from the temp/ directory and verifies each process is still running using psutil.pid_exists(). Returns a list of active dashboard/component processes.
  • scan_servers() — Loads server configurations from the database and checks whether each server's recorded PID is still running.
  • start_auto_refresh() — Starts a background daemon thread that refreshes the dashboard and server status every 10 seconds.

The dashboard tracker is used by the web server to provide real-time status information to the web interface.

Command Delivery Wrapper: The send_command_to_server() function in Modules/core/common.py provides a unified high-level interface for sending commands to server processes. It is used by ServerAutomationManager (MOTD, warnings) and ServerUpdateManager (pre-restart warnings). The function:

  1. Attempts delivery via the persistent stdin pipe first.
  2. Falls back to the file-based command queue if the pipe is not available.
  3. Returns a boolean indicating success or failure.

17. Logging System

17.1 Log Manager

The logging system is centralised in Modules/core/server_logging.py through the LogManager singleton class. All application modules use this system rather than configuring their own logging handlers.

Three Log Formatters:

  1. Default: %(asctime)s - %(name)s - %(levelname)s - %(message)s
  2. Detailed: %(asctime)s - %(name)s - %(levelname)s - %(filename)s:%(lineno)d - %(funcName)s() - %(message)s — Includes source file, line number, and function name.
  3. JSON: {"timestamp": "%(asctime)s", "logger": "%(name)s", "level": "%(levelname)s", "message": "%(message)s", "module": "%(filename)s", "line": %(lineno)d} — Machine-readable JSON format for log aggregation tools.

File Handler Configuration:

  • Handler Type: RotatingFileHandler from Python's logging.handlers module.
  • Max File Size: 10 MB per log file (configurable).
  • Backup Count: 3 rotated files kept (configurable).
  • Date Format: %Y-%m-%d %H:%M:%S

Log Consolidation: To reduce the number of log files, 30+ component loggers are mapped to approximately 15 shared log files. For example:

  • Dashboard, DashboardFunctions, DashboardUI → Dashboard.log
  • ServerManager, ServerOperations, ServerUpdates → ServerManager.log
  • SteamDatabase, MinecraftDatabase, DatabaseUtils → Database.log
  • NetworkManager, ClusterManager, AgentManager → Network.log
  • WebServer, WebSecurity → WebServer.log

Early Crash Logging: The early_crash_log() function provides emergency logging before the LogManager is fully initialised. It writes directly to the component log file using basic file I/O during the earliest stages of module loading.

17.2 Log File Locations

All log files are stored under the logs/ directory:

Directory Contents
logs/components/ Per-component log files (Dashboard.log, ServerManager.log, etc.)
logs/debug/ Debug and diagnostic log files
logs/services/ Service-layer log files (CommandQueue, StdinRelay, etc.)

17.3 Log Rotation and Maintenance

The LogManager includes automated log maintenance:

Log Compression:

  • A background daemon thread periodically scans for log files older than 7 days.
  • Old log files are compressed using gzip (.gz extension) to save disk space.

Log Deletion:

  • Log files (both compressed and uncompressed) older than 30 days are automatically deleted.
  • This prevents unbounded disk usage from log accumulation.

Log Statistics: The LogManager tracks error and warning counts since the last reset. These statistics are accessible through the analytics system and can be included in diagnostic reports.


18. Monitoring and Analytics

18.1 Analytics Collector

The analytics module (Modules/ui/analytics.py, AnalyticsCollector) collects real-time metrics and provides health scoring:

Data Collection:

  • Metrics are stored in memory using collections.deque with a maximum of 1440 entries (representing 24 hours of data at 1-minute intervals).
  • Thread-safe data structures using collections.defaultdict of deques.
  • Collects CPU usage, memory usage, disk usage, server counts, and per-server metrics.

Health Scoring:

  • The analytics system calculates a health score on a 0-100 scale.
  • Factors include CPU usage, memory availability, disk space, number of error-state servers, and logging error rates.
  • Health scores are categorised: 90-100 = Healthy, 70-89 = Warning, Below 70 = Critical.

Data Export:

  • get_analytics_summary() — Returns current values and 24-hour trends.
  • get_time_series_data() — Returns historical time-series data for charts.
  • export_to_json() — Exports all analytics data as JSON for external processing.

18.2 SNMP Integration

The SNMP manager (Modules/SMNP/snmp_manager.py, SNMPManager) provides SNMP monitoring data:

Enterprise OID Base: 1.3.6.1.4.1.12345

OID Mappings:

OID Suffix Metric Description
.1.1 health_score Overall system health (0-100)
.1.2 cpu_percent Current CPU usage
.1.3 memory_percent Current memory usage
.1.4 disk_percent Current disk usage
.1.5 uptime System uptime in seconds
.2.1 servers_total Total managed servers
.2.2 servers_running Currently running servers
.2.3 servers_offline Offline/stopped servers
.2.4 servers_error Servers in error state
.3.1 webserver_cpu Web server CPU usage
.3.2 webserver_memory Web server memory usage
.3.3 webserver_connections Active web connections
.3.4 dashboards_count Active dashboards

Methods:

  • get_snmp_metrics() — Returns all SNMP metrics as a dictionary.
  • get_snmp_walk_data() — Returns data formatted for SNMP walk operations.
  • get_metric_by_oid(oid) — Returns a single metric by its OID.

18.3 Grafana and Prometheus Integration

The Grafana manager (Modules/SMNP/graphana.py, GrafanaManager) provides monitoring system integration:

Prometheus Metrics Endpoint: The web server exposes a /metrics endpoint that returns metrics in Prometheus text exposition format:

# HELP server_manager_health_score Overall system health score
# TYPE server_manager_health_score gauge
server_manager_health_score 95.0

# HELP server_manager_cpu_usage Current CPU usage percentage
# TYPE server_manager_cpu_usage gauge
server_manager_cpu_usage 23.5

# HELP server_manager_servers_total Total managed servers
# TYPE server_manager_servers_total gauge
server_manager_servers_total 5
...

Grafana JSON Metrics: A JSON format endpoint provides structured metrics data with three sections: system metrics, server metrics, and application metrics.

Time-Series Data: The get_time_series_data() method provides time-stamped metric data suitable for Grafana graph panels.

Pre-Built Dashboard: The get_dashboard_config() method returns a complete Grafana dashboard JSON definition with three panels:

  1. Health Score — A stat panel showing the current health score.
  2. Server Status — A pie chart showing the distribution of server states (running, stopped, error).
  3. System Resources — A time-series graph showing CPU, memory, and disk usage over time.

This JSON can be imported directly into Grafana to create a monitoring dashboard without manual configuration.


19. Email Notifications (SMTP)

19.1 Mail Server Configuration

The mail server module (Modules/SMTP/mailserver.py, MailServer) supports multiple email providers and protocols:

Provider Presets:

Provider SMTP Server Port Security
Gmail smtp.gmail.com 587 STARTTLS
Outlook smtp-mail.outlook.com 587 STARTTLS
Office365 smtp.office365.com 587 STARTTLS
Yahoo smtp.mail.yahoo.com 587 STARTTLS
Custom User-defined User-defined TLS/SSL/None

Configuration Storage: SMTP settings are stored in the Windows Registry under:

HKEY_LOCAL_MACHINE\Software\SkywereIndustries\Servermanager\MailServer

Capabilities:

  • Send plain text and HTML emails.
  • Attach files (MIME multipart with Base64 encoding).
  • Send to multiple recipients.
  • Connection testing (test_connection()) to verify SMTP settings.
  • Automatic provider detection from email domain.

19.2 OAuth 2.0 for Microsoft Exchange

For organisations using Microsoft 365 with modern authentication, Server Manager supports OAuth 2.0 via MSAL (Microsoft Authentication Library):

Setup Process:

  1. Register an application in Azure Active Directory.
  2. Configure the required API permissions (Mail.Send for Microsoft Graph).
  3. Enter the Application (client) ID and Tenant ID in Server Manager.
  4. On first use, an interactive browser window opens for user consent.
  5. After consent, the refresh token is stored securely for silent authentication.

Token Management:

  • Tokens are refreshed silently (without user interaction) using the stored refresh token.
  • Tokens are refreshed 5 minutes before expiration to prevent authentication failures.
  • If silent refresh fails, the interactive browser flow is triggered again.

Email Sending:

  • OAuth-authenticated emails are sent via the Microsoft Graph API (/me/sendMail) rather than traditional SMTP.
  • This bypasses the need for app passwords or enabling "less secure apps".

19.3 Notification Templates

The notification system (Modules/SMTP/notifications.py, NotificationManager) provides templated email notifications:

Available Templates:

Template Trigger Description
welcome User account creation Welcome message with login instructions
password_reset Password reset request Password reset instructions with temporary credentials
account_locked Account lockout Notification that the account has been locked due to failed login attempts
server_alert Server issues Alert about server problems (crash, high resource usage, errors)
maintenance Scheduled maintenance Advance notice of planned maintenance windows
custom Manual send Custom message from the admin panel

Template Structure: Each template consists of three files in the Modules/SMTP/Mail-Templates/ directory:

  • {template_name}_html.html — HTML version of the email body
  • {template_name}_text.txt — Plain text fallback
  • {template_name}_subject.txt — Email subject line

Templates use placeholder replacement (e.g., {username}, {server_name}, {timestamp}) to personalise each notification.

Template Variables:

  • {username} — Recipient's username
  • {display_name} — Recipient's display name
  • {server_name} — Name of the affected server
  • {timestamp} — Current date and time
  • {base_url} — Application base URL
  • {message} — Custom message content

CSS Styling: All HTML templates reference mail-template.css for consistent styling. The CSS is embedded inline in the HTML before sending for maximum email client compatibility.

Notification Toggles: Each notification type can be individually enabled or disabled through the admin dashboard. There is also an admin_only_alerts option that restricts server alerts and maintenance notifications to admin users only.


Clone this wiki locally