feat: add github actions and fixes to speaker-recognition#150
feat: add github actions and fixes to speaker-recognition#150AnkushMalaker merged 26 commits intoSimpleOpenSoftware:mainfrom
Conversation
… use 'docker images'
…r extras services
…b Actions workflow
…selection and adding fallback to default runner
…ndling in GitHub Actions workflow
…flow by using a structured array for Docker Compose services
…itHub Actions workflow
- Changed speaker service URL from `http://host.docker.internal:8085` to `http://127.0.0.1:8085` in `wizard.py` and updated related documentation. - Added validation for `HF_TOKEN` in the speaker-recognition setup, prompting the user if it's missing or invalid. - Introduced a new `ParakeetStreamConsumer` for handling audio streams with Parakeet, including graceful shutdown handling. - Updated `docker-compose.yml` and related files to ensure proper environment variable usage and service health checks. - Enhanced error handling in audio stream workers for better logging and user feedback.
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThis PR introduces Parakeet ASR as a dynamic transcription provider alongside Deepgram, adds conditional worker startup logic, updates Speaker Recognition networking to use localhost, and enhances the setup wizard with HF_TOKEN validation. Multiple backend controllers, models, and workers are updated to support dynamic provider selection and graceful fallback handling. Changes
Sequence Diagram(s)sequenceDiagram
participant WebSocket as WebSocket<br/>Controller
participant Env as Environment<br/>Variables
participant Producer as Audio Stream<br/>Producer
participant Consumer as Stream<br/>Consumer
WebSocket->>Env: Check TRANSCRIPTION_PROVIDER
alt Provider is "offline" or "parakeet"
WebSocket->>Producer: Use "parakeet"
else Provider is "deepgram"
WebSocket->>Producer: Use "deepgram"
else Auto-detect
WebSocket->>Env: Check PARAKEET_ASR_URL /<br/>OFFLINE_ASR_TCP_URI
alt Parakeet configured
WebSocket->>Producer: Use "parakeet"
else Check DEEPGRAM_API_KEY
alt Deepgram configured
WebSocket->>Producer: Use "deepgram"
else No provider
WebSocket->>WebSocket: Raise ValueError
end
end
end
Producer->>Consumer: init_session(provider)
sequenceDiagram
participant Script as start-workers.sh
participant Env as Environment
participant Deep as Deepgram<br/>Worker
participant Para as Parakeet<br/>Worker
participant Monitor as Process<br/>Monitor
Script->>Env: Check DEEPGRAM_API_KEY
alt API key present
Script->>Deep: Start deepgram worker
Note over Script: AUDIO_STREAM_WORKER_PID set
else Missing
Note over Script: Skip deepgram<br/>AUDIO_STREAM_WORKER_PID empty
end
Script->>Env: Check PARAKEET_ASR_URL /<br/>OFFLINE_ASR_TCP_URI
alt URL present
Script->>Para: Start parakeet worker
Note over Script: PARAKEET_STREAM_WORKER_PID set
else Missing
Note over Script: Skip parakeet<br/>PARAKEET_STREAM_WORKER_PID empty
end
Script->>Monitor: Wait for child processes
Monitor-->>Script: Any worker exits
Script->>Deep: Kill if PID set
Script->>Para: Kill if PID set
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Areas requiring extra attention:
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (7)
backends/advanced/src/advanced_omi_backend/models/conversation.py (1)
162-173: Consider migration script for heavy data transformations.The logic correctly populates missing legacy fields from the active transcript version. However, running this in
clean_legacy_datameans it executes on every model load, which could impact performance for large datasets.For better performance, consider:
- A one-time migration script to backfill these fields across existing documents
- Using this validator only as a fallback for unmigrated data
backends/advanced/src/advanced_omi_backend/workers/transcription_jobs.py (1)
245-245: Remove unnecessary f-string prefix.The string at line 245 has no placeholders and doesn't need the
fprefix.- logger.info(f"📊 Created single segment from transcript text (no segments returned by provider)") + logger.info("📊 Created single segment from transcript text (no segments returned by provider)")backends/advanced/src/advanced_omi_backend/services/transcription/parakeet_stream_consumer.py (1)
27-40: Consider explicit Optional type annotation.The
service_urlparameter should use explicitOptional[str]orstr | Noneto comply with PEP 484, improving type safety and IDE support.- def __init__(self, redis_client, service_url: str = None, buffer_chunks: int = 30): + def __init__(self, redis_client, service_url: Optional[str] = None, buffer_chunks: int = 30):Don't forget to import
Optionalfromtypingat the top of the file:from typing import Optionalbackends/advanced/docker-compose.yml (1)
35-35: Review hardcoded IPs in CORS origins.The CORS_ORIGINS includes a hardcoded IP
192.168.1.153, which appears to be a development/testing IP. Consider:
- Moving this to an
.envfile variable for environment-specific configuration- Using a pattern like
CORS_ORIGINS=${CORS_ORIGINS:-http://localhost:3010,...}to allow override- Documenting why specific IPs are needed
- - CORS_ORIGINS=http://localhost:3010,http://localhost:8000,http://192.168.1.153:3010,http://192.168.1.153:8000,https://localhost:3010,https://localhost:8000,https://100.105.225.45,https://localhost + - CORS_ORIGINS=${CORS_ORIGINS:-http://localhost:3010,http://localhost:8000,https://localhost:3010,https://localhost:8000,https://localhost}backends/advanced/src/advanced_omi_backend/controllers/websocket_controller.py (1)
316-339: Dynamic provider resolution looks good with minor suggestion.The provider selection logic properly handles:
- Explicit provider configuration via
TRANSCRIPTION_PROVIDER- Auto-detection based on available credentials
- Clear error when no provider is configured
One minor improvement: the error message on line 331 is inline with the exception, which static analysis flags as TRY003. Consider defining it as a constant if this pattern is used elsewhere.
+# Error messages +NO_PROVIDER_ERROR = "No transcription provider configured (DEEPGRAM_API_KEY or PARAKEET_ASR_URL required)" + # Determine transcription provider from environment transcription_provider = os.getenv("TRANSCRIPTION_PROVIDER", "").lower() if transcription_provider in ["offline", "parakeet"]: provider = "parakeet" elif transcription_provider == "deepgram": provider = "deepgram" else: # Auto-detect: prefer Parakeet if URL is set, otherwise Deepgram parakeet_url = os.getenv("PARAKEET_ASR_URL") or os.getenv("OFFLINE_ASR_TCP_URI") deepgram_key = os.getenv("DEEPGRAM_API_KEY") if parakeet_url: provider = "parakeet" elif deepgram_key: provider = "deepgram" else: - raise ValueError("No transcription provider configured (DEEPGRAM_API_KEY or PARAKEET_ASR_URL required)") + raise ValueError(NO_PROVIDER_ERROR)extras/speaker-recognition/README.md (1)
71-73: Add language identifier to fenced code block.The error message example should specify it's a shell/text output for better rendering.
-``` +```text cannot load certificate "/etc/nginx/ssl/server.crt": BIO_new_file() failed</blockquote></details> <details> <summary>backends/advanced/src/advanced_omi_backend/workers/audio_stream_parakeet_worker.py (1)</summary><blockquote> `1-1`: **Make file executable to match shebang.** The file has a shebang line but isn't marked as executable. Either make it executable or remove the shebang if it's only run via `python -m`. ```shell # If file should be directly executable: chmod +x backends/advanced/src/advanced_omi_backend/workers/audio_stream_parakeet_worker.pyOr remove the shebang if it's not needed (since it's run via
uv run python -min start-workers.sh).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (17)
.github/workflows/advanced-docker-compose-build.yml(1 hunks)Docs/init-system.md(2 hunks)backends/advanced/docker-compose.yml(4 hunks)backends/advanced/src/advanced_omi_backend/controllers/conversation_controller.py(3 hunks)backends/advanced/src/advanced_omi_backend/controllers/websocket_controller.py(1 hunks)backends/advanced/src/advanced_omi_backend/models/conversation.py(1 hunks)backends/advanced/src/advanced_omi_backend/services/transcription/parakeet.py(1 hunks)backends/advanced/src/advanced_omi_backend/services/transcription/parakeet_stream_consumer.py(1 hunks)backends/advanced/src/advanced_omi_backend/workers/audio_stream_deepgram_worker.py(1 hunks)backends/advanced/src/advanced_omi_backend/workers/audio_stream_parakeet_worker.py(1 hunks)backends/advanced/src/advanced_omi_backend/workers/transcription_jobs.py(1 hunks)backends/advanced/start-workers.sh(4 hunks)extras/speaker-recognition/.env.template(1 hunks)extras/speaker-recognition/README.md(4 hunks)extras/speaker-recognition/docker-compose.yml(2 hunks)extras/speaker-recognition/nginx.conf.template(2 hunks)wizard.py(2 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Format Python code with Black using a 100-character line length
Use isort to organize Python imports
Place all imports at the top of the Python file after the docstring; never import modules in the middle of functions or files
Use lazy imports only when absolutely necessary to resolve circular import issues
Group Python imports by: standard library, third-party, then local imports
Always raise errors rather than silently ignoring failures; use explicit exceptions
Understand data structures instead of adding defensive hasattr checks; prefer correct models/parsing over ad-hoc guards
Files:
backends/advanced/src/advanced_omi_backend/services/transcription/parakeet.pybackends/advanced/src/advanced_omi_backend/controllers/websocket_controller.pywizard.pybackends/advanced/src/advanced_omi_backend/workers/audio_stream_deepgram_worker.pybackends/advanced/src/advanced_omi_backend/models/conversation.pybackends/advanced/src/advanced_omi_backend/controllers/conversation_controller.pybackends/advanced/src/advanced_omi_backend/workers/audio_stream_parakeet_worker.pybackends/advanced/src/advanced_omi_backend/workers/transcription_jobs.pybackends/advanced/src/advanced_omi_backend/services/transcription/parakeet_stream_consumer.py
{**/*.sh,**/Dockerfile}
📄 CodeRabbit inference engine (CLAUDE.md)
Use uv to run Python: replace direct python/python3 invocations with 'uv run python3 ...'
Files:
backends/advanced/start-workers.sh
**/*.sh
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.sh: Use 'docker compose build' without --no-cache by default; only add --no-cache when explicitly needed
Do not use 'docker compose restart' to pick up code changes; rebuild images when src is not volume-mounted
Files:
backends/advanced/start-workers.sh
🪛 actionlint (1.7.8)
.github/workflows/advanced-docker-compose-build.yml
257-257: "steps" section is missing in job "build-default"
(syntax-check)
268-268: "steps" section must be sequence node but got alias node with "" tag
(syntax-check)
🪛 markdownlint-cli2 (0.18.1)
extras/speaker-recognition/README.md
71-71: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 Ruff (0.14.3)
backends/advanced/src/advanced_omi_backend/controllers/websocket_controller.py
331-331: Avoid specifying long messages outside the exception class
(TRY003)
backends/advanced/src/advanced_omi_backend/workers/audio_stream_parakeet_worker.py
1-1: Shebang is present but file is not executable
(EXE001)
57-57: Unused function argument: frame
(ARG001)
59-59: Store a reference to the return value of asyncio.create_task
(RUF006)
backends/advanced/src/advanced_omi_backend/workers/transcription_jobs.py
245-245: f-string without any placeholders
Remove extraneous f prefix
(F541)
backends/advanced/src/advanced_omi_backend/services/transcription/parakeet_stream_consumer.py
27-27: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
40-40: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (21)
backends/advanced/src/advanced_omi_backend/services/transcription/parakeet.py (1)
10-10: LGTM!The
osimport addition correctly supports the temporary file cleanup logic at lines 89-90.backends/advanced/src/advanced_omi_backend/workers/audio_stream_deepgram_worker.py (1)
32-35: LGTM! Graceful startup behavior.The change from error+exit to warning+return aligns well with the conditional worker startup pattern seen in
start-workers.sh. This allows the system to continue operating with alternative transcription providers when Deepgram is not configured.backends/advanced/src/advanced_omi_backend/controllers/conversation_controller.py (2)
140-146: LGTM! Ensures transcript data availability.Calling
_update_legacy_transcript_fields()before formatting ensures the API response includes transcript and segments from the active version. The explicit inclusion of segments in the output (while excluding heavy nested fields) aligns with the PR's goal to display transcription text on the conversation page.
355-355: LGTM! Clean import.The import cleanup removes unused
redis_connfrom the import path, keeping the code clean and aligned with the decorator-based injection pattern noted at line 406.backends/advanced/src/advanced_omi_backend/workers/transcription_jobs.py (1)
200-244: LGTM! Good fallback for segment-less transcription.The conditional logic properly handles providers like Parakeet that return transcript text without pre-segmented speaker data. The duration calculation is robust:
- Prefers word-level timestamps when available
- Falls back to reasonable estimation (0.4s per word)
- Defensive check ensures end > start (line 239)
This ensures consistent downstream processing regardless of provider capabilities.
backends/advanced/src/advanced_omi_backend/services/transcription/parakeet_stream_consumer.py (1)
59-75: LGTM! Robust confidence calculation.The confidence averaging logic provides a sensible fallback (0.9) when word-level confidence is unavailable, and properly computes the mean when confidence scores are present. This ensures consistent transcription result formatting for downstream consumers.
.github/workflows/advanced-docker-compose-build.yml (1)
140-186: LGTM! Well-structured sequential build process.The base services build loop properly:
- Handles different compose files and project directories
- Resolves built image IDs to avoid name mismatches
- Tags with both version and latest
- Includes cleanup of local tags
The error handling with
continueon missing images is appropriate.extras/speaker-recognition/.env.template (1)
31-32: LGTM! Localhost URL aligns with Docker networking.The change to
127.0.0.1:8085(localhost) instead of the hostname is appropriate for the Docker setup, and the added comment clearly explains the reasoning for different service variants.extras/speaker-recognition/docker-compose.yml (2)
32-32: LGTM - Correct healthcheck endpoint.The change from
http://speaker-service:8085/healthtohttp://localhost:8085/healthis correct. Healthcheck commands run inside the container, so they should check the service via localhost rather than the service name.
99-100: LGTM - Proper service dependency.Adding
depends_onwithservice_healthycondition ensures nginx only starts after the web-ui is healthy and ready to serve requests. This prevents potential connection errors during startup.extras/speaker-recognition/nginx.conf.template (1)
54-55: LGTM - Modern nginx HTTP/2 syntax.The change from
listen 443 ssl http2;to separatelisten 443 ssl;andhttp2 on;directives follows the modern nginx configuration syntax introduced in newer versions.backends/advanced/docker-compose.yml (2)
20-20: LGTM - Parakeet ASR URL configuration.Adding
PARAKEET_ASR_URLto the backend environment enables dynamic transcription provider selection, aligning with the changes in websocket_controller.py.
68-68: The file already has correct executable permissions; no changes needed.The
start-workers.shfile is already executable (permissions:rwxr-xr-x). Docker will successfully execute it without permission issues.extras/speaker-recognition/README.md (2)
53-74: Excellent documentation of SSL requirements.The new section clearly explains:
- SSL certificates are required for nginx
- How to generate them
- What files are created
- Expected errors when missing
This will help users avoid a common setup issue.
418-426: LGTM - Helpful troubleshooting guidance.The new troubleshooting section provides clear steps to diagnose and fix SSL certificate issues, including verification commands and restart instructions.
backends/advanced/src/advanced_omi_backend/workers/audio_stream_parakeet_worker.py (1)
26-76: LGTM - Clean worker implementation.The worker properly:
- Checks for required environment variables before starting
- Provides clear logging about why it's not starting if config is missing
- Handles graceful shutdown
- Cleans up Redis connection
This aligns well with the conditional startup logic in start-workers.sh.
backends/advanced/start-workers.sh (3)
80-87: LGTM - Conditional Deepgram worker startup.The conditional logic properly:
- Checks for
DEEPGRAM_API_KEYbefore starting the worker- Provides clear logging when skipping
- Sets
AUDIO_STREAM_WORKER_PIDto empty string when not startedThis prevents unnecessary processes and aligns with the dynamic provider resolution in websocket_controller.py.
91-99: LGTM - Conditional Parakeet worker startup.The conditional logic properly:
- Checks for either
PARAKEET_ASR_URLorOFFLINE_ASR_TCP_URI- Uses variable fallback pattern to handle both environment variables
- Provides clear logging when skipping
- Sets
PARAKEET_STREAM_WORKER_PIDto empty string when not started
44-45: LGTM - Safe PID cleanup with guards.The shutdown logic properly guards all kill commands with non-empty checks
[ -n "$PID" ], preventing errors when optional workers weren't started. This is a clean approach to handling conditional processes.Also applies to: 109-114, 128-128
wizard.py (2)
161-161: Loopback URL aligns with the updated container topology.Pointing the advanced backend at
http://127.0.0.1:8085keeps it consistent with the refreshed speaker-recognition compose/env defaults, so the services stay reachable without relying on host.docker.internal indirection. Nicely done.
177-204: HF token validation flow is solid.Love that we now hard-stop on placeholders, prompt once, and propagate the validated token via
--hf-token; this prevents silent setup failures and keeps the init script happy in one go.
backends/advanced/src/advanced_omi_backend/workers/audio_stream_parakeet_worker.py
Outdated
Show resolved
Hide resolved
|
I am pretty sure this one would fix the issues, if you build your offline infrastructure from the start. |
- Introduced a new `status.py` script for checking the health status of services, including container and HTTP health checks. - Added a `status.sh` script for easier execution of the health checker. - Updated `CLAUDE.md` to include instructions for setting up the test environment and running the health status checker. - Enhanced `setup-requirements.txt` by adding `requests` as a dependency. - Modified `.dockerignore` to include `Caddyfile` for better Docker management. - Updated service URL for speaker recognition in `wizard.py` to use Docker service name.
AnkushMalaker
left a comment
There was a problem hiding this comment.
Thank you for this :)
Looks great
feat: add github actions and fixes to speaker-recognition
Summary by CodeRabbit
New Features
Documentation
Bug Fixes