Skip to content

Commit efedc0f

Browse files
author
timhauke
committed
feat: Introduce comprehensive observability and scaling features
This commit introduces a suite of new features focused on observability, performance, and scalability, alongside general improvements and bug fixes. Key changes include: * **Observability**: Add metrics, command analytics, and queue telemetry services with corresponding events and documentation. * **Scaling**: Implement shard supervision and scaling commands/services to manage bot instances more effectively. * **DJ Permissions**: Introduce a dedicated service for managing DJ-specific permissions. * **Lyrics Service**: Add functionality to fetch and display song lyrics in now-playing embeds. * **Chaos Engineering**: Integrate chaos commands and service for testing resilience. * **Configuration**: Update `.env.example` and `config.yml` schema to support new features and refine existing settings. * **Development Tools**: Add `docker-compose.local.yml` and `.env.local.example` for easier local development and sandbox environments. * **Documentation**: Expand documentation with new sections on analytics, command reference, development, Grafana, local sandbox, performance, and queue telemetry. * **Dependencies**: Update `requirements.txt` to include new libraries for metrics, retries, throttling, and async task management. * **Code Refinements**: General code cleanup, improved embed utility, and minor bug fixes across various modules.
1 parent 9b824c3 commit efedc0f

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

51 files changed

+3878
-141
lines changed

.env.example

Lines changed: 54 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,32 +1,68 @@
1-
# Copy this file to `.env` and populate the secrets before running the bot
1+
# Core bot settings
2+
DISCORD_TOKEN=
3+
CONFIG_PATH=config.yml
24

3-
# Discord bot token (https://discord.com/developers/applications)
4-
DISCORD_TOKEN=YOUR_DISCORD_BOT_TOKEN
5-
6-
# Lavalink connection overrides (optional if values already exist in config.yml)
7-
LAVALINK_HOST=lavalink.example.com
5+
# Lavalink configuration
6+
LAVALINK_HOST=localhost
87
LAVALINK_PORT=2333
9-
LAVALINK_PASSWORD=supersecret
8+
LAVALINK_PASSWORD=localpass
109
LAVALINK_HTTPS=false
1110
LAVALINK_NAME=main
1211
LAVALINK_REGION=eu
1312

14-
# Redis settings for playlist persistence
15-
# Redis persistence (optional but required for /playlist features)
16-
REDIS_HOST=45.84.196.19
17-
REDIS_PORT=32768
13+
# Redis configuration
14+
REDIS_HOST=localhost
15+
REDIS_PORT=6379
1816
REDIS_PASSWORD=
1917
REDIS_DB=0
2018

2119
# Autoplay tuning
2220
AUTOPLAY_DISCOVERY_LIMIT=10
2321
AUTOPLAY_RANDOM_PICK=true
2422

25-
# Crossfade controls
26-
CROSSFADE_ENABLED=true
27-
CROSSFADE_DURATION_MS=2500
28-
CROSSFADE_STEPS=12
29-
CROSSFADE_FLOOR_VOLUME=20
23+
# Crossfade settings
24+
CROSSFADE_ENABLED=false
25+
CROSSFADE_DURATION_MS=2000
26+
CROSSFADE_STEPS=10
27+
CROSSFADE_FLOOR_VOLUME=15
28+
29+
# Metrics exporter
30+
METRICS_ENABLED=false
31+
METRICS_HOST=0.0.0.0
32+
METRICS_PORT=9333
33+
METRICS_INTERVAL=15
34+
35+
# Chaos testing
36+
CHAOS_ENABLED=false
37+
CHAOS_INTERVAL_MINUTES=360
38+
CHAOS_SCENARIOS=disconnect_voice,disconnect_node,inject_error
39+
CHAOS_GUILD_ALLOWLIST=
40+
41+
# Auto scaling
42+
SCALING_ENABLED=false
43+
SCALING_ENDPOINT=
44+
SCALING_PROVIDER=nomad
45+
SCALING_AUTH_TOKEN=
46+
SCALING_INTERVAL_SECONDS=60
47+
SCALING_COOLDOWN_SECONDS=300
48+
SCALING_TARGETS=1200,150
49+
50+
# Command analytics
51+
ANALYTICS_ENABLED=false
52+
ANALYTICS_ENDPOINT=
53+
ANALYTICS_API_KEY=
54+
ANALYTICS_FLUSH_INTERVAL=30
55+
ANALYTICS_BATCH_SIZE=50
56+
ANALYTICS_STORAGE_PATH=data/command_analytics.log
57+
ANALYTICS_HASH_SALT=vectobeat
58+
59+
# Queue telemetry
60+
QUEUE_TELEMETRY_ENABLED=false
61+
QUEUE_TELEMETRY_ENDPOINT=
62+
QUEUE_TELEMETRY_API_KEY=
63+
QUEUE_TELEMETRY_INCLUDE_GUILD=true
3064

31-
# Optional: specify an alternate configuration file
32-
# CONFIG_PATH=config.yml
65+
# Cache tuning
66+
CACHE_SEARCH_ENABLED=true
67+
CACHE_SEARCH_TTL_SECONDS=60
68+
CACHE_SEARCH_MAX_ENTRIES=200

.env.local.example

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
DISCORD_TOKEN=
2+
# Override endpoints to point at the local sandbox
3+
LAVALINK_HOST=lavalink
4+
LAVALINK_PORT=2333
5+
LAVALINK_PASSWORD=localpass
6+
LAVALINK_HTTPS=false
7+
REDIS_HOST=redis
8+
REDIS_PORT=6379
9+
REDIS_PASSWORD=
10+
REDIS_DB=0
11+
QUEUE_TELEMETRY_ENABLED=false
12+
SCALING_ENABLED=false
13+
CHAOS_ENABLED=false

.github/instructions/codacy.instructions.md

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,6 @@
66
# Codacy Rules
77
Configuration for AI behavior when interacting with Codacy's MCP Server
88

9-
## using any tool that accepts the arguments: `provider`, `organization`, or `repository`
10-
- ALWAYS use:
11-
- provider: gh
12-
- organization: VectoDE
13-
- repository: VectoBeat
14-
- Avoid calling `git remote -v` unless really necessary
15-
169
## CRITICAL: After ANY successful `edit_file` or `reapply` operation
1710
- YOU MUST IMMEDIATELY run the `codacy_cli_analyze` tool from Codacy's MCP Server for each file that was edited, with:
1811
- `rootPath`: set to the workspace path

FURTHER_DEVELOPMENT.md

Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -37,13 +37,13 @@
3737
</li>
3838
<li>
3939
<label>
40-
<input type="checkbox" disabled />
40+
<input type="checkbox" checked disabled />
4141
<strong>Lyrics Integration</strong> &mdash; Surface synchronised lyrics (Genius, Musixmatch) within now-playing embeds.
4242
</label>
4343
</li>
4444
<li>
4545
<label>
46-
<input type="checkbox" disabled />
46+
<input type="checkbox" checked disabled />
4747
<strong>DJ Permissions Layer</strong> &mdash; Add role-based queue control with an auditable history.
4848
</label>
4949
</li>
@@ -55,31 +55,31 @@
5555
<ul style="list-style: none; padding-left: 0;">
5656
<li>
5757
<label>
58-
<input type="checkbox" disabled />
58+
<input type="checkbox" checked disabled />
5959
<strong>Multi-Node Lavalink Failover</strong> &mdash; Provision redundant nodes with automatic reconnect and player migration.
6060
</label>
6161
</li>
6262
<li>
6363
<label>
64-
<input type="checkbox" disabled />
64+
<input type="checkbox" checked disabled />
6565
<strong>Self-Healing Supervisors</strong> &mdash; Watch shard heartbeats and restart or reconnect when anomalies occur.
6666
</label>
6767
</li>
6868
<li>
6969
<label>
70-
<input type="checkbox" disabled />
70+
<input type="checkbox" checked disabled />
7171
<strong>Structured Alerting</strong> &mdash; Export metrics to Prometheus/Alertmanager for proactive paging.
7272
</label>
7373
</li>
7474
<li>
7575
<label>
76-
<input type="checkbox" disabled />
76+
<input type="checkbox" checked disabled />
7777
<strong>Chaos Testing Playbook</strong> &mdash; Run scheduled disconnect, latency, and error-injection drills.
7878
</label>
7979
</li>
8080
<li>
8181
<label>
82-
<input type="checkbox" disabled />
82+
<input type="checkbox" checked disabled />
8383
<strong>Auto Scaling Strategy</strong> &mdash; Integrate with Kubernetes/Nomad to scale shards and nodes based on concurrency dashboards.
8484
</label>
8585
</li>
@@ -91,25 +91,25 @@
9191
<ul style="list-style: none; padding-left: 0;">
9292
<li>
9393
<label>
94-
<input type="checkbox" disabled />
94+
<input type="checkbox" checked disabled />
9595
<strong>Grafana Dashboards</strong> &mdash; Publish live dashboards for shard health, node stats, and slash command throughput.
9696
</label>
9797
</li>
9898
<li>
9999
<label>
100-
<input type="checkbox" disabled />
100+
<input type="checkbox" checked disabled />
101101
<strong>Command Analytics Pipeline</strong> &mdash; Stream anonymised command events into a data warehouse for trend analysis.
102102
</label>
103103
</li>
104104
<li>
105105
<label>
106-
<input type="checkbox" disabled />
106+
<input type="checkbox" checked disabled />
107107
<strong>Real-Time Queue Telemetry</strong> &mdash; Emit queue lifecycle events (play, skip, finish) to webhooks or WebSocket consumers.
108108
</label>
109109
</li>
110110
<li>
111111
<label>
112-
<input type="checkbox" disabled />
112+
<input type="checkbox" checked disabled />
113113
<strong>Enhanced Slash Feedback</strong> &mdash; Display progress bars and follow-ups for long-running operations.
114114
</label>
115115
</li>
@@ -127,19 +127,19 @@
127127
</li>
128128
<li>
129129
<label>
130-
<input type="checkbox" disabled />
130+
<input type="checkbox" checked disabled />
131131
<strong>Scenario Test Harness</strong> &mdash; Mock Lavalink responses for integration-style queue and playback validation.
132132
</label>
133133
</li>
134134
<li>
135135
<label>
136-
<input type="checkbox" disabled />
136+
<input type="checkbox" checked disabled />
137137
<strong>Command Reference Generator</strong> &mdash; Auto-build slash command documentation from source annotations.
138138
</label>
139139
</li>
140140
<li>
141141
<label>
142-
<input type="checkbox" disabled />
142+
<input type="checkbox" checked disabled />
143143
<strong>Local Sandbox Stack</strong> &mdash; Ship docker-compose with Lavalink, Redis, and Postgres for contributors.
144144
</label>
145145
</li>
@@ -151,19 +151,19 @@
151151
<ul style="list-style: none; padding-left: 0%;">
152152
<li>
153153
<label>
154-
<input type="checkbox" disabled />
154+
<input type="checkbox" checked disabled />
155155
<strong>Event Loop Profiling</strong> &mdash; Benchmark coroutine hotspots with `pyinstrument` and asyncio debug facilities.
156156
</label>
157157
</li>
158158
<li>
159159
<label>
160-
<input type="checkbox" disabled />
160+
<input type="checkbox" checked disabled />
161161
<strong>Adaptive Caching</strong> &mdash; Cache heavy search queries with TTLs to curb upstream traffic.
162162
</label>
163163
</li>
164164
<li>
165165
<label>
166-
<input type="checkbox" disabled />
166+
<input type="checkbox" checked disabled />
167167
<strong>Dynamic Search Limits</strong> &mdash; Tune track search result counts based on latency and guild load.
168168
</label>
169169
</li>

README.md

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -310,6 +310,10 @@ AUTOPLAY_RANDOM_PICK=true
310310
<ul>
311311
<li>Capture <code>stdout</code> for VectoBeat; enable log shipping in production (ELK, CloudWatch, etc.).</li>
312312
<li>Monitor Lavalink metrics: player count, CPU, memory, frame deficit.</li>
313+
<li>Import the bundled Grafana dashboards in <code>docs/grafana</code> for shard latency, node health, and slash command throughput visualisations.</li>
314+
<li>Enable the command analytics pipeline (<code>docs/analytics.md</code>) to push anonymised slash usage into your data warehouse.</li>
315+
<li>Wire the queue telemetry webhook (<code>docs/queue_telemetry.md</code>) into your status site for real-time “now playing” indicators.</li>
316+
<li>Long-running slash commands (e.g. playlist loading) now show live progress embeds so users know what’s happening.</li>
313317
<li>Regularly patch yt-dlp for source compatibility.</li>
314318
<li>Monitor Redis availability (<code>INFO</code>/<code>PING</code>) if playlist persistence is enabled.</li>
315319
</ul>
@@ -321,8 +325,13 @@ AUTOPLAY_RANDOM_PICK=true
321325

322326
<ol>
323327
<li>Fork the repository and clone your fork.</li>
324-
<li>Create a feature branch <code>git checkout -b feature/amazing-improvement</code>.</li>
325-
<li>Run <code>python3 -m compileall src</code> before committing.</li>
328+
<li>Create a feature branch <code>git checkout -b feature/amazing-improvement</code>.</li>
329+
<li>Run <code>python3 -m compileall src</code> and <code>scripts/typecheck.sh</code> before committing.</li>
330+
<li>(Optional) Exercise queue scenarios via <code>scripts/run_scenarios.py tests/scenarios/basic_queue.yaml</code> when touching playback logic.</li>
331+
<li>Regenerate the slash-command reference via <code>python scripts/generate_command_reference.py</code> before publishing docs changes.</li>
332+
<li>Spin up the local sandbox stack via <code>docker compose -f docker-compose.local.yml up</code> (see <code>docs/local_sandbox.md</code>) when you need Lavalink/Redis/Postgres locally.</li>
333+
<li>Tune search caching via the <code>cache</code> section in <code>config.yml</code> to reduce repeated Lavlink lookups.</li>
334+
<li>Adjust dynamic search limits via <code>search_limits</code> to balance latency vs. search breadth.</li>
326335
<li>Submit a pull request describing the motivation, approach, and testing performed.</li>
327336
</ol>
328337

@@ -338,3 +347,4 @@ AUTOPLAY_RANDOM_PICK=true
338347
<p align="center" style="margin-top: 2rem;">
339348
<em>VectoBeat — engineered for premium audio experiences and operational clarity.</em>
340349
</p>
350+
- Profile the event loop with `python scripts/profile_event_loop.py` (requires pyinstrument) and inspect `profiles/event-loop-profile.html` for hotspots.

config.yml

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,15 @@ lavalink:
3030
https: true
3131
name: "main"
3232
region: "eu"
33+
# Deprecated single-node fallback. Prefer configuring multiple nodes via ``lavalink_nodes`` below.
34+
35+
lavalink_nodes:
36+
- host: "lava-v4.ajieblogs.eu.org"
37+
port: 443
38+
password: https://dsc.gg/ajidevserver
39+
https: true
40+
name: "main"
41+
region: "eu"
3342

3443
spotify: # Optional: Für Spotify Support
3544
client_id: "" # Setze deinen Spotify Client ID
@@ -59,3 +68,52 @@ redis:
5968
port: 32768
6069
password: ""
6170
db: 0
71+
72+
metrics:
73+
enabled: true
74+
host: "0.0.0.0"
75+
port: 9333
76+
collection_interval: 15
77+
78+
chaos:
79+
enabled: false
80+
interval_minutes: 360
81+
scenarios:
82+
- disconnect_voice
83+
- disconnect_node
84+
- inject_error
85+
guild_allowlist: []
86+
87+
scaling:
88+
enabled: false
89+
provider: "nomad"
90+
endpoint: "https://nomad.example.com/v1/job/vectobeat/scale"
91+
auth_token: ""
92+
interval_seconds: 60
93+
cooldown_seconds: 300
94+
target_guilds_per_shard: 1200
95+
target_players_per_node: 150
96+
min_shards: 1
97+
max_shards: 10
98+
min_lavalink_nodes: 1
99+
max_lavalink_nodes: 5
100+
101+
analytics:
102+
enabled: false
103+
endpoint: ""
104+
api_key: ""
105+
flush_interval_seconds: 30
106+
batch_size: 50
107+
storage_path: "data/command_analytics.log"
108+
hash_salt: "vectobeat"
109+
110+
queue_telemetry:
111+
enabled: false
112+
endpoint: ""
113+
api_key: ""
114+
include_guild_metadata: true
115+
116+
cache:
117+
search_enabled: true
118+
search_ttl_seconds: 60
119+
search_max_entries: 200

0 commit comments

Comments
 (0)