feat(preprocessing): phase 1 sprints 1-4, audio preprocessing pipeline by williaby · Pull Request #27 · ByronWilliamsCPA/audio-processor

williaby · 2026-05-16T04:45:19Z

Summary

Implements Phase 1 Sprints 1-4 of the audio preprocessing pipeline: pure-Python signal processing that runs before Deepgram transcription. No live API key is required for any code in this PR; all tests use mocks and synthetic numpy fixtures.

Modules Created

Module	Purpose	Public API
`src/audio_processor/exceptions.py`	Top-level package exceptions	`AudioLoadError` (extends `ProjectBaseError`)
`src/audio_processor/preprocessing/__init__.py`	Preprocessing sub-package marker	—
`src/audio_processor/preprocessing/loader.py` (Sprint 1)	Load WAV/MP3/FLAC/OGG via `soundfile` (librosa for MP3), mono-mix, resample to 16 kHz	`load_audio(path: Path) -> tuple[np.ndarray, int]`
`src/audio_processor/preprocessing/ffmpeg.py` (Sprint 2)	Thin subprocess wrapper around `ffmpeg`; validates binary on `PATH` at import; argv list only — no shell injection	`convert_to_wav(input_path: Path, output_path: Path) -> Path`
`src/audio_processor/preprocessing/vad.py` (Sprint 3)	Silero VAD loaded once via `torch.hub.load` (thread-safe lazy cache)	`detect_speech_segments(audio: np.ndarray, sample_rate: int) -> list[tuple[float, float]]`
`src/audio_processor/preprocessing/quality.py` (Sprint 4)	SNR estimate via Butterworth envelope (`scipy.signal`); peak-magnitude clipping detector	`check_snr(audio) -> float`, `check_clipping(audio, threshold=0.99) -> bool`

Tests (15 unit tests, all mocked)

Test file	Cases	Mocks
`tests/unit/preprocessing/test_loader.py`	4 — resample-and-mono, 16 kHz passthrough, unsupported-suffix rejection, corrupt-file error	None (real soundfile + synthetic WAV via numpy/sf.write)
`tests/unit/preprocessing/test_ffmpeg.py`	2 — argv-list invocation, non-zero exit raises	`subprocess.run`
`tests/unit/preprocessing/test_vad.py`	2 — frame indices to seconds, empty result	`torch.hub.load` + `get_speech_timestamps`
`tests/unit/preprocessing/test_quality.py`	7 — clean tone, silent, empty, noisy; clipping above/at/empty	None

Coverage

Per-module statement coverage of the new files (from pytest --cov=src/audio_processor/preprocessing --cov=src/audio_processor/exceptions):

Module	Coverage
`exceptions.py`	100.00%
`preprocessing/__init__.py`	100.00%
`preprocessing/quality.py`	93.33%
`preprocessing/vad.py`	88.89%
`preprocessing/ffmpeg.py`	87.50%
`preprocessing/loader.py`	83.67%

Estimated repo-wide line-coverage delta: +5 to +7 percentage points — roughly 100 of ~110 new executable statements covered, against the existing ~582-statement codebase. (The repo as a whole remains below the 80% gate because of pre-existing untested modules like cli.py, api/__init__.py, core/cache.py, etc.; this PR does not aim to fix those.)

Dependencies

Adds scipy>=1.11.0 to the [audio] extra in pyproject.toml (used by check_snr). All other audio libraries (librosa, soundfile, torch, silero-vad) were already present.

Verification

uv run pytest tests/unit/preprocessing/ -v → 15 passed
uv run ruff check on the new files → clean
uv run ruff format applied
uv run basedpyright on the new files → 0 errors (28 warnings are all third-party stub-availability advisories on librosa/soundfile/scipy/torch.hub)
uv run bandit on the new files → 0 high, 1 medium, 2 low — all expected (subprocess usage in ffmpeg.py with shell=False; torch.hub.load for Silero VAD as specified in the task)

Test plan

CI runs unit tests on Python 3.10–3.14 matrix
Ruff + basedpyright pass on changed files
Bandit advisories on subprocess/torch.hub.load confirmed acceptable
ffmpeg binary present in CI image (required by preprocessing.ffmpeg import)

Generated by Claude Code

Summary by CodeRabbit

New Features
- Audio preprocessing: format conversion and automatic normalization to 16 kHz
- Audio quality metrics: SNR estimation and clipping detection
- Speech segment detection via VAD and safer FFmpeg-backed conversion with clearer failure messages
Chores
- Audio extra now requires scipy>=1.11.0
Tests
- Added unit tests for preprocessing (ffmpeg, loader, quality, VAD)
Chore
- Updated dependency-review workflow rules

coderabbitai · 2026-05-16T04:45:25Z

Warning

Rate limit exceeded

@williaby has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 31 minutes and 12 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 87105992-9697-47d3-a7ca-c40891c83544

📥 Commits

Reviewing files that changed from the base of the PR and between 574708b and c1bfe77.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock, !**/*.lock

📒 Files selected for processing (2)

.trivyignore
docs/known-vulnerabilities.md

Walkthrough

This PR adds a complete audio preprocessing pipeline with FFmpeg conversion, audio normalization, quality metrics, and Silero VAD-based speech detection. It includes exception types, a safe FFmpeg subprocess wrapper, multi-format audio loading with resampling, SNR and clipping checks, workflow license updates, and comprehensive unit tests.

Changes

Audio Preprocessing Pipeline

Layer / File(s)	Summary
Exceptions, dependencies, and preprocessing package initialization `src/audio_processor/exceptions.py`, `pyproject.toml`, `src/audio_processor/preprocessing/__init__.py`	Defines `AudioLoadError` and `FfmpegConversionError` inheriting from `ProjectBaseError`, adds `scipy>=1.11.0` to the `audio` optional dependencies, and initializes the preprocessing package with a module docstring and postponed annotations.
FFmpeg conversion wrapper and tests `src/audio_processor/preprocessing/ffmpeg.py`, `tests/unit/preprocessing/test_ffmpeg.py`	Adds a safe FFmpeg subprocess wrapper that locates the binary via `shutil.which`, constructs argv lists (no shell), runs `subprocess.run` with captured stderr, returns `output_path` on success, and raises `FfmpegConversionError` with structured `details` on failure; includes unit tests for success, missing binary, and non-zero exit handling.
Audio loading, normalization, and tests `src/audio_processor/preprocessing/loader.py`, `tests/unit/preprocessing/test_loader.py`	Implements `load_audio` with suffix validation (`.wav`, `.mp3`, `.flac`, `.ogg`), MP3 decoding via `librosa`, other formats via `soundfile`, multi-channel averaging to mono, resampling to `16_000` Hz, and `AudioLoadError` wrapping; tests cover resampling+mono, passthrough, unsupported suffix, and corrupt files.
Audio quality assessment and tests `src/audio_processor/preprocessing/quality.py`, `tests/unit/preprocessing/test_quality.py`	Implements `check_snr` using a 4th-order Butterworth low-pass filter to estimate envelope and residual noise, handles edge cases (empty/silent/near-zero noise floor), and `check_clipping` via thresholded absolute-value check; includes unit tests validating SNR and clipping behavior.
Speech activity detection via Silero VAD and tests `src/audio_processor/preprocessing/vad.py`, `tests/unit/preprocessing/test_vad.py`	Adds `detect_speech_segments` with thread-safe lazy loading of Silero VAD via `torch.hub.load`, converts numpy audio to torch tensor, calls `get_speech_timestamps`, and returns `(start_seconds, end_seconds)` tuples; tests reset module cache and mock Silero outputs.
Dependency-review workflow update `.github/workflows/dependency-review.yml`	Expands `allow-licenses` and `allow-dependencies-licenses` lists to include additional SPDX entries and specific `pkg:pypi/` packages (including `nvidia--cu12`) whose license metadata isn't indexed by GitHub.

Sequence Diagrams

sequenceDiagram
  participant User as Audio Preprocessing Client
  participant Loader as load_audio()
  participant FFmpeg as convert_to_wav()
  participant Quality as check_snr()/check_clipping()
  participant VAD as detect_speech_segments()

  User->>Loader: Load audio file
  Loader->>Loader: Validate suffix
  Loader->>Loader: Decode audio
  Loader->>Loader: Convert to mono
  Loader->>Loader: Resample to 16k
  Loader-->>User: (audio_array, 16000)

  alt Conversion needed
    User->>FFmpeg: Request conversion
    FFmpeg->>FFmpeg: Run ffmpeg subprocess
    FFmpeg-->>User: Converted WAV path
  end

  User->>Quality: Assess quality
  Quality->>Quality: Compute SNR
  Quality->>Quality: Check clipping
  Quality-->>User: (snr_db, is_clipped)

  User->>VAD: Detect speech
  VAD->>VAD: Lazy-load Silero model
  VAD->>VAD: get_speech_timestamps
  VAD-->>User: [(start_sec, end_sec), ...]

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

ByronWilliamsCPA/audio-processor#4: Adds/adjusts audio optional dependencies in pyproject.toml, related to the scipy addition here.

Suggested labels

ci

Poem

🐰 A rabbit hops through audio streams,
Converts, resamples, tunes the beams,
Filters hiss and marks the clip,
Finds the speech and lets it zip,
Pipeline ready—song of dreams!

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main change: implementation of the audio preprocessing pipeline (phases 1, sprints 1-4) with new loader, ffmpeg, VAD, and quality modules.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch claude/audio-preprocessing-phase-1-Pc9qx

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-05-16T04:45:33Z

✅ FIPS Compatibility Check

Metric	Count
Errors	0
Warnings	0
Info	1

Status: ✅ PASSED

What is FIPS?

FIPS 140-2/140-3 is a US government standard for cryptographic modules.
Systems running Ubuntu LTS with fips-updates or similar configurations
restrict cryptographic algorithms to NIST-approved ones.

Common issues:

Using hashlib.md5() without usedforsecurity=False
Dependencies using non-approved algorithms (bcrypt, DES, RC4)
Weak cipher configurations

github-actions · 2026-05-16T04:45:52Z

⚠️ Deprecation Warning: The deny-licenses option is deprecated for possible removal in the next major release. For more information, see issue 997.

Dependency Review

The following issues were found:

✅ 0 vulnerable package(s)
✅ 0 package(s) with incompatible licenses
✅ 0 package(s) with invalid SPDX license definitions
⚠️ 25 package(s) with unknown licenses.
⚠️ 1 packages with OpenSSF Scorecard issues.

See the Details below.

License Issues

uv.lock

Package	Version	License	Issue Type
deepgram-sdk	5.3.0	Null	Unknown License
jaraco-functools	4.3.0	Null	Unknown License
lxml	6.1.1	Null	Unknown License
mkdocs-git-revision-date-localized-plugin	1.5.0	Null	Unknown License
nvidia-cublas-cu12	12.8.4.1	Null	Unknown License
nvidia-cuda-cupti-cu12	12.8.90	Null	Unknown License
nvidia-cuda-nvrtc-cu12	12.8.93	Null	Unknown License
nvidia-cuda-runtime-cu12	12.8.90	Null	Unknown License
nvidia-cudnn-cu12	9.10.2.21	Null	Unknown License
nvidia-cufft-cu12	11.3.3.83	Null	Unknown License
nvidia-cufile-cu12	1.13.1.3	Null	Unknown License
nvidia-curand-cu12	10.3.9.90	Null	Unknown License
nvidia-cusolver-cu12	11.7.3.90	Null	Unknown License
nvidia-cusparse-cu12	12.5.8.93	Null	Unknown License
nvidia-cusparselt-cu12	0.7.1	Null	Unknown License
nvidia-nccl-cu12	2.27.5	Null	Unknown License
nvidia-nvjitlink-cu12	12.8.93	Null	Unknown License
nvidia-nvshmem-cu12	3.3.20	Null	Unknown License
nvidia-nvtx-cu12	12.8.90	Null	Unknown License
onnxruntime	1.23.2	Null	Unknown License
protobuf	6.33.6	Null	Unknown License
ruamel-yaml	0.18.16	Null	Unknown License
ruamel-yaml-clib	0.2.15	Null	Unknown License
silero-vad	6.2.0	Null	Unknown License
torch	2.9.1	Null	Unknown License

Denied Licenses:
GPL-2.0, GPL-3.0

OpenSSF Scorecard

Scorecard details

Package

Version

Score

Details

pip/absl-py

2.3.1

Unknown

pip/anyio

4.12.0

Unknown

pip/arq

0.25.0

Unknown

pip/async-lru

2.0.5

🟢 6.9

Details

Check	Score	Reason
Code-Review	🟢 7	Found 7/9 approved changesets -- score normalized to 7
Maintained	🟢 10	23 commit(s) and 1 issue activity found in the last 90 days -- score normalized to 10
Dangerous-Workflow	🟢 10	no dangerous workflow patterns detected
Binary-Artifacts	🟢 10	no binaries found in the repo
Token-Permissions	⚠️ 0	detected GitHub workflow tokens with excessive permissions
Pinned-Dependencies	⚠️ 0	dependency not pinned by hash detected -- score normalized to 0
CII-Best-Practices	⚠️ 0	no effort to earn an OpenSSF best practices badge detected
Fuzzing	⚠️ 0	project is not fuzzed
License	🟢 10	license file detected
Signed-Releases	🟢 8	5 out of the last 5 releases have a total of 5 signed artifacts.
Branch-Protection	⚠️ -1	internal error: error during branchesHandler.setup: internal error: some github tokens can't read classic branch protection rules: https://github.com/ossf/scorecard-action/blob/main/docs/authentication/fine-grained-auth-token.md
Security-Policy	🟢 10	security policy file detected
Packaging	🟢 10	packaging workflow detected
SAST	🟢 10	SAST tool is run on all commits

pip/attrs

25.4.0

Unknown

pip/babel

2.17.0

Unknown

pip/backrefs

6.1

Unknown

pip/bandit

1.9.2

Unknown

pip/basedpyright

1.35.0

Unknown

pip/cachetools

6.2.2

🟢 6.9

Details

Check	Score	Reason
Code-Review	⚠️ 0	Found 2/25 approved changesets -- score normalized to 0
Packaging	⚠️ -1	packaging workflow not detected
Dangerous-Workflow	🟢 10	no dangerous workflow patterns detected
Maintained	🟢 10	29 commit(s) and 8 issue activity found in the last 90 days -- score normalized to 10
Binary-Artifacts	🟢 10	no binaries found in the repo
Security-Policy	🟢 10	security policy file detected
Token-Permissions	🟢 10	GitHub workflow tokens follow principle of least privilege
Pinned-Dependencies	🟢 5	dependency not pinned by hash detected -- score normalized to 5
CII-Best-Practices	⚠️ 0	no effort to earn an OpenSSF best practices badge detected
Fuzzing	🟢 10	project is fuzzed
License	🟢 10	license file detected
Signed-Releases	⚠️ -1	no releases found
Branch-Protection	🟢 3	branch protection is not maximal on development and all release branches
SAST	⚠️ 0	SAST tool is not run on all commits -- score normalized to 0

pip/certifi

2025.11.12

🟢 6.4

Details

Check	Score	Reason
Code-Review	🟢 5	Found 1/2 approved changesets -- score normalized to 5
Binary-Artifacts	🟢 10	no binaries found in the repo
Maintained	🟢 8	8 commit(s) and 2 issue activity found in the last 90 days -- score normalized to 8
Security-Policy	🟢 10	security policy file detected
Dangerous-Workflow	🟢 10	no dangerous workflow patterns detected
Token-Permissions	🟢 10	GitHub workflow tokens follow principle of least privilege
Pinned-Dependencies	🟢 5	dependency not pinned by hash detected -- score normalized to 5
CII-Best-Practices	⚠️ 0	no effort to earn an OpenSSF best practices badge detected
Fuzzing	⚠️ 0	project is not fuzzed
Signed-Releases	⚠️ -1	no releases found
License	🟢 9	license file detected
Branch-Protection	⚠️ 0	branch protection not enabled on development/release branches
Packaging	🟢 10	packaging workflow detected
SAST	⚠️ 0	SAST tool is not run on all commits -- score normalized to 0

pip/charset-normalizer

3.4.4

Unknown

pip/click

8.3.1

Unknown

pip/coloredlogs

15.0.1

Unknown

pip/coverage

7.12.0

Unknown

pip/croniter

6.0.0

Unknown

pip/cyclonedx-bom

7.2.1

Unknown

pip/cyclonedx-python-lib

11.6.0

Unknown

pip/debugpy

1.8.17

Unknown

pip/decorator

5.2.1

Unknown

pip/deepgram-sdk

5.3.0

Unknown

pip/docling-core

2.54.0

Unknown

pip/fastapi

0.123.9

Unknown

pip/flatbuffers

25.9.23

Unknown

pip/fsspec

2025.12.0

Unknown

pip/google-api-core

2.28.1

Unknown

pip/google-auth

2.43.0

🟢 7.3

Details

Check	Score	Reason
Packaging	⚠️ -1	packaging workflow not detected
Code-Review	🟢 10	all changesets reviewed
Dangerous-Workflow	🟢 10	no dangerous workflow patterns detected
Security-Policy	🟢 10	security policy file detected
Maintained	⚠️ 0	project is archived
CII-Best-Practices	⚠️ 0	no effort to earn an OpenSSF best practices badge detected
Token-Permissions	🟢 10	GitHub workflow tokens follow principle of least privilege
Pinned-Dependencies	⚠️ 0	dependency not pinned by hash detected -- score normalized to 0
Binary-Artifacts	🟢 10	no binaries found in the repo
Fuzzing	🟢 10	project is fuzzed
License	🟢 10	license file detected
Branch-Protection	⚠️ -1	internal error: error during branchesHandler.setup: internal error: githubv4.Query: Resource not accessible by integration
Signed-Releases	⚠️ -1	no releases found
SAST	🟢 5	SAST tool is not run on all commits -- score normalized to 5

pip/googleapis-common-protos

1.72.0

Unknown

pip/griffe

1.15.0

Unknown

pip/griffe-pydantic

1.1.8

Unknown

pip/grpcio

1.76.0

Unknown

pip/hiredis

3.3.0

🟢 4.8

Details

Check	Score	Reason
Code-Review	🟢 3	Found 11/28 approved changesets -- score normalized to 3
Dangerous-Workflow	🟢 10	no dangerous workflow patterns detected
Binary-Artifacts	🟢 10	no binaries found in the repo
Maintained	🟢 5	7 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 5
Pinned-Dependencies	⚠️ 0	dependency not pinned by hash detected -- score normalized to 0
Token-Permissions	⚠️ 0	detected GitHub workflow tokens with excessive permissions
CII-Best-Practices	⚠️ 0	no effort to earn an OpenSSF best practices badge detected
Security-Policy	⚠️ 0	security policy file not detected
Packaging	⚠️ -1	packaging workflow not detected
Fuzzing	🟢 10	project is fuzzed
License	🟢 10	license file detected
Signed-Releases	⚠️ -1	no releases found
Branch-Protection	⚠️ -1	internal error: error during branchesHandler.setup: internal error: some github tokens can't read classic branch protection rules: https://github.com/ossf/scorecard-action/blob/main/docs/authentication/fine-grained-auth-token.md
SAST	⚠️ 0	SAST tool is not run on all commits -- score normalized to 0

pip/humanfriendly

10.0

Unknown

pip/humanize

4.14.0

Unknown

pip/hypothesis

6.148.7

Unknown

pip/identify

2.6.15

🟢 5.9

Details

Check	Score	Reason
Dangerous-Workflow	🟢 10	no dangerous workflow patterns detected
Code-Review	🟢 8	Found 13/16 approved changesets -- score normalized to 8
Token-Permissions	⚠️ 0	detected GitHub workflow tokens with excessive permissions
Packaging	⚠️ -1	packaging workflow not detected
Pinned-Dependencies	⚠️ 0	dependency not pinned by hash detected -- score normalized to 0
Maintained	🟢 10	30 commit(s) and 5 issue activity found in the last 90 days -- score normalized to 10
Binary-Artifacts	🟢 10	no binaries found in the repo
CII-Best-Practices	⚠️ 0	no effort to earn an OpenSSF best practices badge detected
Fuzzing	⚠️ 0	project is not fuzzed
Signed-Releases	⚠️ -1	no releases found
License	🟢 10	license file detected
Branch-Protection	⚠️ -1	internal error: error during branchesHandler.setup: internal error: some github tokens can't read classic branch protection rules: https://github.com/ossf/scorecard-action/blob/main/docs/authentication/fine-grained-auth-token.md
Security-Policy	🟢 10	security policy file detected
SAST	⚠️ 0	SAST tool is not run on all commits -- score normalized to 0

pip/importlib-metadata

8.7.0

Unknown

pip/ipykernel

7.1.0

Unknown

pip/ipython

8.37.0

Unknown

pip/ipython

9.8.0

Unknown

pip/jaraco-functools

4.3.0

Unknown

pip/jedi

0.19.2

🟢 5

Details

Check	Score	Reason
Security-Policy	🟢 10	security policy file detected
Dangerous-Workflow	🟢 10	no dangerous workflow patterns detected
Packaging	⚠️ -1	packaging workflow not detected
Maintained	🟢 10	30 commit(s) and 15 issue activity found in the last 90 days -- score normalized to 10
Code-Review	⚠️ 0	Found 0/4 approved changesets -- score normalized to 0
Pinned-Dependencies	⚠️ 0	dependency not pinned by hash detected -- score normalized to 0
CII-Best-Practices	⚠️ 0	no effort to earn an OpenSSF best practices badge detected
Token-Permissions	⚠️ 0	detected GitHub workflow tokens with excessive permissions
Binary-Artifacts	🟢 9	binaries present in source code
Fuzzing	🟢 10	project is fuzzed
License	🟢 9	license file detected
Signed-Releases	⚠️ -1	no releases found
Branch-Protection	⚠️ 0	branch protection not enabled on development/release branches
SAST	⚠️ 0	SAST tool is not run on all commits -- score normalized to 0

pip/joblib

1.5.2

Unknown

pip/json5

0.12.1

Unknown

pip/jsonpointer

3.0.0

🟢 4.6

Details

Check	Score	Reason
Code-Review	🟢 5	Found 4/7 approved changesets -- score normalized to 5
Maintained	🟢 10	13 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 10
Packaging	⚠️ -1	packaging workflow not detected
Token-Permissions	⚠️ 0	detected GitHub workflow tokens with excessive permissions
Dangerous-Workflow	🟢 10	no dangerous workflow patterns detected
Binary-Artifacts	🟢 10	no binaries found in the repo
Pinned-Dependencies	⚠️ 0	dependency not pinned by hash detected -- score normalized to 0
CII-Best-Practices	⚠️ 0	no effort to earn an OpenSSF best practices badge detected
Security-Policy	⚠️ 0	security policy file not detected
Fuzzing	⚠️ 0	project is not fuzzed
License	🟢 9	license file detected
Signed-Releases	⚠️ -1	no releases found
Branch-Protection	🟢 3	branch protection is not maximal on development and all release branches
SAST	⚠️ 0	SAST tool is not run on all commits -- score normalized to 0

pip/jsonschema

4.25.1

Unknown

pip/jupyter-client

8.6.3

Unknown

pip/jupyter-events

0.12.0

Unknown

pip/jupyter-lsp

2.3.0

Unknown

pip/jupyter-server-terminals

0.5.3

Unknown

pip/latex2mathml

3.78.1

🟢 4.2

Details

Check	Score	Reason
Code-Review	⚠️ 0	Found 0/9 approved changesets -- score normalized to 0
Maintained	🟢 10	30 commit(s) and 1 issue activity found in the last 90 days -- score normalized to 10
Packaging	⚠️ -1	packaging workflow not detected
Dangerous-Workflow	🟢 10	no dangerous workflow patterns detected
Token-Permissions	⚠️ 0	detected GitHub workflow tokens with excessive permissions
Binary-Artifacts	🟢 10	no binaries found in the repo
Pinned-Dependencies	⚠️ 0	dependency not pinned by hash detected -- score normalized to 0
CII-Best-Practices	⚠️ 0	no effort to earn an OpenSSF best practices badge detected
Fuzzing	⚠️ 0	project is not fuzzed
License	🟢 10	license file detected
Signed-Releases	⚠️ -1	no releases found
Branch-Protection	⚠️ -1	internal error: error during branchesHandler.setup: internal error: some github tokens can't read classic branch protection rules: https://github.com/ossf/scorecard-action/blob/main/docs/authentication/fine-grained-auth-token.md
Security-Policy	⚠️ 0	security policy file not detected
SAST	⚠️ 0	SAST tool is not run on all commits -- score normalized to 0

pip/lazy-loader

0.4

Unknown

pip/linkify-it-py

2.0.3

Unknown

pip/llvmlite

0.45.1

Unknown

pip/lxml

6.1.1

Unknown

pip/markdown

3.10

Unknown

pip/markdown-it-py

4.0.0

Unknown

pip/matplotlib-inline

0.2.1

Unknown

pip/mdit-py-plugins

0.5.0

Unknown

pip/mkdocs-autorefs

1.4.3

Unknown

pip/mkdocs-gen-files

0.6.0

Unknown

pip/mkdocs-get-deps

0.2.0

Unknown

pip/mkdocs-git-revision-date-localized-plugin

1.5.0

Unknown

pip/mkdocs-material

9.7.0

Unknown

pip/mkdocs-section-index

0.3.10

Unknown

pip/more-itertools

10.8.0

Unknown

pip/mutmut

3.4.0

Unknown

pip/nbclient

0.10.2

Unknown

pip/networkx

3.6

Unknown

pip/nodeenv

1.9.1

⚠️ 2.8

Details

Check	Score	Reason
Binary-Artifacts	🟢 10	no binaries found in the repo
Code-Review	⚠️ 1	Found 5/30 approved changesets -- score normalized to 1
Maintained	⚠️ 0	0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0
Packaging	⚠️ -1	packaging workflow not detected
Dangerous-Workflow	🟢 10	no dangerous workflow patterns detected
Token-Permissions	⚠️ 0	detected GitHub workflow tokens with excessive permissions
Pinned-Dependencies	⚠️ 0	dependency not pinned by hash detected -- score normalized to 0
CII-Best-Practices	⚠️ 0	no effort to earn an OpenSSF best practices badge detected
Security-Policy	⚠️ 0	security policy file not detected
Fuzzing	⚠️ 0	project is not fuzzed
License	🟢 9	license file detected
Signed-Releases	⚠️ -1	no releases found
Branch-Protection	⚠️ 0	branch protection not enabled on development/release branches
SAST	⚠️ 0	SAST tool is not run on all commits -- score normalized to 0

pip/nodejs-wheel-binaries

24.11.1

Unknown

pip/nox

2025.11.12

Unknown

pip/nox-uv

0.6.3

Unknown

pip/numba

0.62.1

Unknown

pip/numpy

2.3.5

Unknown

pip/nvidia-cublas-cu12

12.8.4.1

Unknown

pip/nvidia-cuda-cupti-cu12

12.8.90

Unknown

pip/nvidia-cuda-nvrtc-cu12

12.8.93

Unknown

pip/nvidia-cuda-runtime-cu12

12.8.90

Unknown

pip/nvidia-cudnn-cu12

9.10.2.21

Unknown

pip/nvidia-cufft-cu12

11.3.3.83

Unknown

pip/nvidia-cufile-cu12

1.13.1.3

Unknown

pip/nvidia-curand-cu12

10.3.9.90

Unknown

pip/nvidia-cusolver-cu12

11.7.3.90

Unknown

pip/nvidia-cusparse-cu12

12.5.8.93

Unknown

pip/nvidia-cusparselt-cu12

0.7.1

Unknown

pip/nvidia-nccl-cu12

2.27.5

Unknown

pip/nvidia-nvjitlink-cu12

12.8.93

Unknown

pip/nvidia-nvshmem-cu12

3.3.20

Unknown

pip/nvidia-nvtx-cu12

12.8.90

Unknown

pip/onnxruntime

1.23.2

Unknown

pip/packaging

25.0

Unknown

pip/parso

0.8.5

🟢 4.9

Details

Check	Score	Reason
Code-Review	🟢 5	Found 7/13 approved changesets -- score normalized to 5
Security-Policy	🟢 10	security policy file detected
Dangerous-Workflow	🟢 10	no dangerous workflow patterns detected
Maintained	🟢 3	3 commit(s) and 1 issue activity found in the last 90 days -- score normalized to 3
Packaging	⚠️ -1	packaging workflow not detected
Token-Permissions	⚠️ 0	detected GitHub workflow tokens with excessive permissions
Binary-Artifacts	🟢 10	no binaries found in the repo
Pinned-Dependencies	⚠️ 0	dependency not pinned by hash detected -- score normalized to 0
CII-Best-Practices	⚠️ 0	no effort to earn an OpenSSF best practices badge detected
License	🟢 9	license file detected
Fuzzing	🟢 10	project is fuzzed
Signed-Releases	⚠️ -1	no releases found
Branch-Protection	⚠️ 0	branch protection not enabled on development/release branches
SAST	⚠️ 0	SAST tool is not run on all commits -- score normalized to 0

pip/pathspec

0.12.1

Unknown

pip/pip-licenses

5.5.0

Unknown

pip/platformdirs

4.5.0

Unknown

pip/pooch

1.8.2

Unknown

pip/pre-commit

4.5.0

🟢 4.6

Details

Check	Score	Reason
Packaging	⚠️ -1	packaging workflow not detected
Maintained	🟢 10	17 commit(s) and 24 issue activity found in the last 90 days -- score normalized to 10
Code-Review	⚠️ 1	Found 1/8 approved changesets -- score normalized to 1
Token-Permissions	⚠️ 0	detected GitHub workflow tokens with excessive permissions
Dangerous-Workflow	🟢 10	no dangerous workflow patterns detected
Binary-Artifacts	🟢 10	no binaries found in the repo
CII-Best-Practices	⚠️ 0	no effort to earn an OpenSSF best practices badge detected
Fuzzing	⚠️ 0	project is not fuzzed
License	🟢 10	license file detected
Pinned-Dependencies	⚠️ 0	dependency not pinned by hash detected -- score normalized to 0
Signed-Releases	⚠️ 0	Project has not signed or included provenance with any releases.
Branch-Protection	⚠️ -1	internal error: error during branchesHandler.setup: internal error: some github tokens can't read classic branch protection rules: https://github.com/ossf/scorecard-action/blob/main/docs/authentication/fine-grained-auth-token.md
Security-Policy	🟢 10	security policy file detected
SAST	⚠️ 0	SAST tool is not run on all commits -- score normalized to 0

pip/prometheus-client

0.23.1

Unknown

pip/proto-plus

1.26.1

Unknown

pip/protobuf

6.33.6

Unknown

pip/psutil

7.1.3

🟢 5.2

Details

Check	Score	Reason
Packaging	⚠️ -1	packaging workflow not detected
Code-Review	⚠️ 0	Found 1/30 approved changesets -- score normalized to 0
Dangerous-Workflow	🟢 10	no dangerous workflow patterns detected
Security-Policy	🟢 10	security policy file detected
Maintained	🟢 10	30 commit(s) and 25 issue activity found in the last 90 days -- score normalized to 10
Token-Permissions	⚠️ 0	detected GitHub workflow tokens with excessive permissions
Binary-Artifacts	🟢 10	no binaries found in the repo
CII-Best-Practices	⚠️ 0	no effort to earn an OpenSSF best practices badge detected
Pinned-Dependencies	⚠️ 0	dependency not pinned by hash detected -- score normalized to 0
License	🟢 10	license file detected
Fuzzing	🟢 10	project is fuzzed
Signed-Releases	⚠️ -1	no releases found
Branch-Protection	⚠️ 0	branch protection not enabled on development/release branches
SAST	⚠️ 0	SAST tool is not run on all commits -- score normalized to 0

pip/pycparser

2.23

🟢 5.6

Details

Check	Score	Reason
Maintained	🟢 5	5 commit(s) and 1 issue activity found in the last 90 days -- score normalized to 5
Code-Review	⚠️ 0	Found 1/30 approved changesets -- score normalized to 0
Security-Policy	🟢 9	security policy file detected
Token-Permissions	🟢 10	GitHub workflow tokens follow principle of least privilege
Packaging	⚠️ -1	packaging workflow not detected
Dangerous-Workflow	🟢 10	no dangerous workflow patterns detected
Binary-Artifacts	🟢 10	no binaries found in the repo
Pinned-Dependencies	⚠️ 0	dependency not pinned by hash detected -- score normalized to 0
CII-Best-Practices	⚠️ 0	no effort to earn an OpenSSF best practices badge detected
Fuzzing	🟢 10	project is fuzzed
License	🟢 9	license file detected
Signed-Releases	⚠️ -1	no releases found
Branch-Protection	⚠️ 0	branch protection not enabled on development/release branches
SAST	⚠️ 0	SAST tool is not run on all commits -- score normalized to 0

pip/pydantic

2.12.5

Unknown

pip/pydantic-core

2.41.5

Unknown

pip/pydantic-settings

2.12.0

Unknown

pip/pyparsing

3.2.5

Unknown

pip/pyreadline3

3.5.4

Unknown

pip/pytest-cov

7.0.0

Unknown

pip/python-frontmatter

1.1.0

🟢 3.4

Details

Check	Score	Reason
Maintained	🟢 5	6 commit(s) and 1 issue activity found in the last 90 days -- score normalized to 5
Dangerous-Workflow	🟢 10	no dangerous workflow patterns detected
Packaging	⚠️ -1	packaging workflow not detected
Code-Review	⚠️ 1	Found 2/11 approved changesets -- score normalized to 1
Binary-Artifacts	🟢 10	no binaries found in the repo
Token-Permissions	⚠️ 0	detected GitHub workflow tokens with excessive permissions
Pinned-Dependencies	⚠️ 0	dependency not pinned by hash detected -- score normalized to 0
CII-Best-Practices	⚠️ 0	no effort to earn an OpenSSF best practices badge detected
Security-Policy	⚠️ 0	security policy file not detected
Fuzzing	⚠️ 0	project is not fuzzed
License	🟢 10	license file detected
Signed-Releases	⚠️ -1	no releases found
Branch-Protection	⚠️ 0	branch protection not enabled on development/release branches
SAST	⚠️ 0	SAST tool is not run on all commits -- score normalized to 0

pip/python-json-logger

4.0.0

Unknown

pip/pytz

2025.2

Unknown

pip/pywinpty

3.0.2

Unknown

pip/redis

7.1.0

Unknown

pip/rich

14.2.0

🟢 6.7

Details

Check	Score	Reason
Maintained	🟢 10	30 commit(s) and 6 issue activity found in the last 90 days -- score normalized to 10
Security-Policy	🟢 10	security policy file detected
Code-Review	⚠️ 1	Found 1/10 approved changesets -- score normalized to 1
Packaging	⚠️ -1	packaging workflow not detected
Dangerous-Workflow	🟢 10	no dangerous workflow patterns detected
Token-Permissions	⚠️ 0	detected GitHub workflow tokens with excessive permissions
CII-Best-Practices	⚠️ 0	no effort to earn an OpenSSF best practices badge detected
Binary-Artifacts	🟢 10	no binaries found in the repo
Pinned-Dependencies	⚠️ 2	dependency not pinned by hash detected -- score normalized to 2
Fuzzing	🟢 10	project is fuzzed
License	🟢 10	license file detected
Signed-Releases	⚠️ -1	no releases found
Branch-Protection	⚠️ -1	internal error: error during branchesHandler.setup: internal error: some github tokens can't read classic branch protection rules: https://github.com/ossf/scorecard-action/blob/main/docs/authentication/fine-grained-auth-token.md
SAST	🟢 9	SAST tool detected but not run on all commits

pip/rq

2.6.1

Unknown

pip/rsa

4.9.1

Unknown

pip/ruamel-yaml

0.18.16

Unknown

pip/ruamel-yaml-clib

0.2.15

Unknown

pip/ruff

0.14.8

Unknown

pip/scipy

1.16.3

Unknown

pip/send2trash

1.8.3

🟢 4.6

Details

Check	Score	Reason
Maintained	⚠️ 0	0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0
Code-Review	🟢 4	Found 6/14 approved changesets -- score normalized to 4
Dangerous-Workflow	🟢 10	no dangerous workflow patterns detected
Binary-Artifacts	🟢 10	no binaries found in the repo
Pinned-Dependencies	🟢 3	dependency not pinned by hash detected -- score normalized to 3
Token-Permissions	⚠️ 0	detected GitHub workflow tokens with excessive permissions
CII-Best-Practices	⚠️ 0	no effort to earn an OpenSSF best practices badge detected
Security-Policy	⚠️ 0	security policy file not detected
Packaging	⚠️ -1	packaging workflow not detected
Fuzzing	⚠️ 0	project is not fuzzed
License	🟢 10	license file detected
Signed-Releases	⚠️ -1	no releases found
Branch-Protection	🟢 6	branch protection is not maximal on development and all release branches
SAST	🟢 9	SAST tool detected but not run on all commits

pip/sentry-sdk

2.47.0

🟢 5.5

Details

Check	Score	Reason
Maintained	🟢 10	30 commit(s) and 1 issue activity found in the last 90 days -- score normalized to 10
Code-Review	🟢 8	Found 15/18 approved changesets -- score normalized to 8
CII-Best-Practices	⚠️ 0	no effort to earn an OpenSSF best practices badge detected
Binary-Artifacts	🟢 10	no binaries found in the repo
Dangerous-Workflow	🟢 10	no dangerous workflow patterns detected
Packaging	⚠️ -1	packaging workflow not detected
Token-Permissions	⚠️ 0	detected GitHub workflow tokens with excessive permissions
Pinned-Dependencies	🟢 5	dependency not pinned by hash detected -- score normalized to 5
License	🟢 10	license file detected
Fuzzing	⚠️ 0	project is not fuzzed
Signed-Releases	⚠️ 0	Project has not signed or included provenance with any releases.
Branch-Protection	🟢 4	branch protection is not maximal on development and all release branches
Security-Policy	🟢 10	security policy file detected
SAST	⚠️ 0	SAST tool is not run on all commits -- score normalized to 0

pip/setuptools

80.9.0

Unknown

pip/silero-vad

6.2.0

Unknown

pip/smmap

5.0.2

🟢 5.6

Details

Check	Score	Reason
Security-Policy	🟢 9	security policy file detected
Dangerous-Workflow	🟢 10	no dangerous workflow patterns detected
Maintained	🟢 3	3 commit(s) and 1 issue activity found in the last 90 days -- score normalized to 3
Code-Review	🟢 6	Found 6/10 approved changesets -- score normalized to 6
Binary-Artifacts	🟢 10	no binaries found in the repo
Packaging	⚠️ -1	packaging workflow not detected
Token-Permissions	🟢 10	GitHub workflow tokens follow principle of least privilege
Pinned-Dependencies	⚠️ 0	dependency not pinned by hash detected -- score normalized to 0
CII-Best-Practices	⚠️ 0	no effort to earn an OpenSSF best practices badge detected
Fuzzing	⚠️ 0	project is not fuzzed
License	🟢 10	license file detected
Signed-Releases	⚠️ -1	no releases found
Branch-Protection	⚠️ 0	branch protection not enabled on development/release branches
SAST	🟢 3	SAST tool is not run on all commits -- score normalized to 3

pip/soupsieve

2.8

Unknown

pip/soxr

1.0.0

Unknown

pip/starlette

0.50.0

Unknown

pip/stevedore

5.6.0

Unknown

pip/tabulate

0.9.0

Unknown

pip/textual

6.7.1

🟢 5.1

Details

Check	Score	Reason
Code-Review	⚠️ 0	Found 0/5 approved changesets -- score normalized to 0
Dangerous-Workflow	🟢 10	no dangerous workflow patterns detected
Maintained	🟢 10	30 commit(s) and 15 issue activity found in the last 90 days -- score normalized to 10
Packaging	⚠️ -1	packaging workflow not detected
Token-Permissions	⚠️ 0	detected GitHub workflow tokens with excessive permissions
CII-Best-Practices	⚠️ 0	no effort to earn an OpenSSF best practices badge detected
Binary-Artifacts	🟢 10	no binaries found in the repo
Pinned-Dependencies	⚠️ 2	dependency not pinned by hash detected -- score normalized to 2
License	🟢 10	license file detected
Fuzzing	⚠️ 0	project is not fuzzed
Signed-Releases	⚠️ -1	no releases found
Branch-Protection	⚠️ -1	internal error: error during branchesHandler.setup: internal error: some github tokens can't read classic branch protection rules: https://github.com/ossf/scorecard-action/blob/main/docs/authentication/fine-grained-auth-token.md
Security-Policy	⚠️ 0	security policy file not detected
SAST	🟢 9	SAST tool detected but not run on all commits

pip/tomli

2.3.0

Unknown

pip/torch

2.9.1

Unknown

pip/torchaudio

2.9.1

🟢 3.7

Details

Check	Score	Reason
Maintained	⚠️ 1	2 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 1
Code-Review	🟢 5	Found 17/30 approved changesets -- score normalized to 5
Dangerous-Workflow	🟢 10	no dangerous workflow patterns detected
Packaging	⚠️ -1	packaging workflow not detected
CII-Best-Practices	⚠️ 0	no effort to earn an OpenSSF best practices badge detected
Token-Permissions	⚠️ 0	detected GitHub workflow tokens with excessive permissions
Binary-Artifacts	🟢 10	no binaries found in the repo
Security-Policy	⚠️ 0	security policy file not detected
License	🟢 10	license file detected
Fuzzing	⚠️ 0	project is not fuzzed
Pinned-Dependencies	⚠️ 0	dependency not pinned by hash detected -- score normalized to 0
Signed-Releases	⚠️ -1	no releases found
Branch-Protection	🟢 3	branch protection is not maximal on development and all release branches
SAST	⚠️ 0	SAST tool is not run on all commits -- score normalized to 0

pip/torchvision

0.24.1

🟢 4.5

Details

Check	Score	Reason
Maintained	🟢 10	30 commit(s) and 5 issue activity found in the last 90 days -- score normalized to 10
Code-Review	🟢 6	Found 20/30 approved changesets -- score normalized to 6
Packaging	⚠️ -1	packaging workflow not detected
Dangerous-Workflow	🟢 10	no dangerous workflow patterns detected
CII-Best-Practices	⚠️ 0	no effort to earn an OpenSSF best practices badge detected
Token-Permissions	⚠️ 0	detected GitHub workflow tokens with excessive permissions
Binary-Artifacts	🟢 10	no binaries found in the repo
Pinned-Dependencies	⚠️ 0	dependency not pinned by hash detected -- score normalized to 0
Security-Policy	⚠️ 0	security policy file not detected
License	🟢 10	license file detected
Fuzzing	⚠️ 0	project is not fuzzed
Signed-Releases	⚠️ -1	no releases found
Branch-Protection	⚠️ 1	branch protection is not maximal on development and all release branches
SAST	⚠️ 0	SAST tool is not run on all commits -- score normalized to 0

pip/traitlets

5.14.3

Unknown

pip/triton

3.5.1

Unknown

pip/typer

0.19.2

Unknown

pip/tzdata

2025.2

🟢 7.3

Details

Check	Score	Reason
Maintained	🟢 10	13 commit(s) and 3 issue activity found in the last 90 days -- score normalized to 10
Code-Review	🟢 7	Found 18/23 approved changesets -- score normalized to 7
Token-Permissions	🟢 9	detected GitHub workflow tokens with excessive permissions
Dangerous-Workflow	🟢 10	no dangerous workflow patterns detected
CII-Best-Practices	⚠️ 0	no effort to earn an OpenSSF best practices badge detected
Binary-Artifacts	🟢 10	no binaries found in the repo
Pinned-Dependencies	🟢 4	dependency not pinned by hash detected -- score normalized to 4
Fuzzing	⚠️ 0	project is not fuzzed
License	🟢 9	license file detected
Signed-Releases	⚠️ -1	no releases found
Branch-Protection	⚠️ -1	internal error: error during branchesHandler.setup: internal error: some github tokens can't read classic branch protection rules: https://github.com/ossf/scorecard-action/blob/main/docs/authentication/fine-grained-auth-token.md
Security-Policy	🟢 10	security policy file detected
Packaging	🟢 10	packaging workflow detected
SAST	⚠️ 0	SAST tool is not run on all commits -- score normalized to 0

pip/uc-micro-py

1.0.3

Unknown

pip/uvicorn

0.38.0

Unknown

pip/vulture

2.14

🟢 4

Details

Check	Score	Reason
Dangerous-Workflow	🟢 10	no dangerous workflow patterns detected
Packaging	⚠️ -1	packaging workflow not detected
Binary-Artifacts	🟢 10	no binaries found in the repo
Maintained	🟢 10	10 commit(s) and 2 issue activity found in the last 90 days -- score normalized to 10
Code-Review	⚠️ 2	Found 8/28 approved changesets -- score normalized to 2
Token-Permissions	⚠️ 0	detected GitHub workflow tokens with excessive permissions
Pinned-Dependencies	⚠️ 0	dependency not pinned by hash detected -- score normalized to 0
CII-Best-Practices	⚠️ 0	no effort to earn an OpenSSF best practices badge detected
Security-Policy	⚠️ 0	security policy file not detected
License	🟢 10	license file detected
Fuzzing	⚠️ 0	project is not fuzzed
Signed-Releases	⚠️ -1	no releases found
Branch-Protection	⚠️ 0	branch protection not enabled on development/release branches
SAST	⚠️ 0	SAST tool is not run on all commits -- score normalized to 0

pip/wcwidth

0.2.14

🟢 5

Details

Check	Score	Reason
Maintained	🟢 10	12 commit(s) and 2 issue activity found in the last 90 days -- score normalized to 10
Code-Review	⚠️ 0	Found 0/27 approved changesets -- score normalized to 0
Dangerous-Workflow	🟢 10	no dangerous workflow patterns detected
Binary-Artifacts	🟢 10	no binaries found in the repo
Packaging	⚠️ -1	packaging workflow not detected
Pinned-Dependencies	⚠️ 0	dependency not pinned by hash detected -- score normalized to 0
CII-Best-Practices	⚠️ 0	no effort to earn an OpenSSF best practices badge detected
Token-Permissions	⚠️ 0	detected GitHub workflow tokens with excessive permissions
Security-Policy	⚠️ 0	security policy file not detected
Fuzzing	⚠️ 0	project is not fuzzed
License	🟢 9	license file detected
Branch-Protection	⚠️ -1	internal error: error during branchesHandler.setup: internal error: some github tokens can't read classic branch protection rules: https://github.com/ossf/scorecard-action/blob/main/docs/authentication/fine-grained-auth-token.md
Signed-Releases	⚠️ -1	no releases found
SAST	🟢 10	SAST tool is run on all commits

pip/websockets

15.0.1

Unknown

pip/zipp

3.23.0

Unknown

Scanned Files

uv.lock

Copilot

Pull request overview

Implements Phase 1 (Sprints 1–4) of the audio preprocessing pipeline as a set of pure-Python modules under src/audio_processor/preprocessing/, plus a new AudioLoadError exception and a scipy dependency for SNR estimation. All new logic is exercised by 15 mocked / synthetic-fixture unit tests.

Changes:

Adds preprocessing modules: loader.py (load/mono/resample to 16 kHz), ffmpeg.py (subprocess wrapper validating the binary at import time), vad.py (lazy thread-safe Silero VAD via torch.hub), and quality.py (Butterworth-based SNR + peak-clipping detector).
Introduces AudioLoadError extending the existing ProjectBaseError hierarchy.
Adds scipy>=1.11.0 to the [audio] extra and corresponding uv.lock entries; adds unit tests under tests/unit/preprocessing/.

Reviewed changes

Copilot reviewed 11 out of 13 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`src/audio_processor/exceptions.py`	New top-level `AudioLoadError` rooted in `ProjectBaseError`.
`src/audio_processor/preprocessing/__init__.py`	Sub-package marker / docstring.
`src/audio_processor/preprocessing/loader.py`	Loads WAV/MP3/FLAC/OGG, mono-mixes, resamples to 16 kHz.
`src/audio_processor/preprocessing/ffmpeg.py`	Argv-list ffmpeg wrapper; resolves binary at import time.
`src/audio_processor/preprocessing/vad.py`	Lazy, thread-safe Silero VAD wrapper returning seconds.
`src/audio_processor/preprocessing/quality.py`	SNR via Butterworth low-pass + peak-magnitude clipping check.
`tests/unit/preprocessing/test_loader.py`	Resample/mono, passthrough, unsupported suffix, corrupt file.
`tests/unit/preprocessing/test_ffmpeg.py`	Asserts argv-list invocation and non-zero-exit error path.
`tests/unit/preprocessing/test_vad.py`	Mocks `torch.hub.load`; verifies seconds conversion / empty case.
`tests/unit/preprocessing/test_quality.py`	Clean/silent/empty/noisy SNR + clipping threshold cases.
`tests/unit/preprocessing/__init__.py`	Package marker for new tests.
`pyproject.toml`	Adds `scipy>=1.11.0` to `[audio]` extra.
`uv.lock`	Locks scipy 1.15.3 / 1.16.3 (per Python version) and a `hypothesis` sdist entry.

Adds the pure-Python audio preprocessing pipeline that runs ahead of Deepgram transcription. Covers loader/format normalization, an FFmpeg wrapper, Silero VAD, and SNR/clipping quality checks. No live API calls required. Modules: - src/audio_processor/exceptions.py: AudioLoadError (extends ProjectBaseError). - src/audio_processor/preprocessing/loader.py: load_audio() reads WAV/MP3/ FLAC/OGG via soundfile (librosa for MP3), averages to mono, resamples to 16 kHz. - src/audio_processor/preprocessing/ffmpeg.py: convert_to_wav() shells out to ffmpeg with an argv list (shell=False) for safe format conversion; validates ffmpeg on PATH at import. - src/audio_processor/preprocessing/vad.py: detect_speech_segments() loads Silero VAD via torch.hub.load (cached, thread-safe) and returns (start_s, end_s) tuples. - src/audio_processor/preprocessing/quality.py: check_snr() (Butterworth envelope via scipy.signal) and check_clipping(). Tests: 15 unit tests in tests/unit/preprocessing/, all mocked (torch.hub.load, subprocess.run) with synthetic numpy fixtures. Deps: adds scipy>=1.11.0 to the [audio] extra for SNR filtering.

…R docs Addresses Copilot review and the Python Compatibility Matrix failures (import-time `OSError` blocked pytest collection on Win/macOS runners where ffmpeg is not preinstalled). - ffmpeg.py: resolve ffmpeg on PATH at import (cached) but defer the raise to first call of `convert_to_wav`, so importing the module is side-effect free. Switches the raised type from stdlib `RuntimeError` to a new project-hierarchy `FfmpegConversionError` for both the missing-binary and non-zero-exit paths, including stderr/exit_code in the exception details. - exceptions.py: add `FfmpegConversionError(ProjectBaseError)`. - quality.py: docstring/comment said "Nyquist quarter" but `Wn=0.5` is half of Nyquist (one quarter of the sample rate). Corrected both. - tests: add coverage for the missing-binary path; update the non-zero exit test to assert the new exception type and details.

Two convert_to_wav tests mocked subprocess.run but not _FFMPEG_PATH. After the deferred-ffmpeg-check change, the new guard fires before subprocess.run is reached when ffmpeg is not on PATH (which is the case on most CI runners and dev boxes that haven't apt-installed it). Add patch.object(ffmpeg_module, "_FFMPEG_PATH", "/usr/bin/ffmpeg") to both tests so the guard passes and the mocked subprocess.run gets exercised. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

socket-security · 2026-05-19T04:49:36Z

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff	Package	Supply Chain Security
	scipy@1.15.3
	scipy@1.16.3
	hypothesis@6.152.7 ⏵ 6.148.7	⁺¹
	torch@2.12.0 ⏵ 2.9.1	⁺¹
	numpy@2.4.5 ⏵ 2.3.5
	mkdocs-material@9.7.6 ⏵ 9.7.0
	cyclonedx-bom@7.3.0 ⏵ 7.2.1
	torchvision@0.27.0 ⏵ 0.24.1	⁺⁵
	pre-commit@4.6.0 ⏵ 4.5.0
	google-auth@2.53.0 ⏵ 2.43.0	⁺⁶
	click@8.4.0 ⏵ 8.3.1
	bandit@1.9.4 ⏵ 1.9.2
	nox@2026.4.10 ⏵ 2025.11.12
	ipykernel@7.2.0 ⏵ 7.1.0
	uvicorn@0.47.0 ⏵ 0.38.0	⁺¹
	sentry-sdk@2.60.0 ⏵ 2.47.0	⁺⁴
	rich@15.0.0 ⏵ 14.2.0
	docling-core@2.75.0 ⏵ 2.54.0	⁺¹
	mutmut@3.5.0 ⏵ 3.4.0	⁺¹
	arq@0.28.0 ⏵ 0.25.0	^-1
	redis@5.3.1 ⏵ 7.1.0
	basedpyright@1.39.5 ⏵ 1.35.0
	mkdocs-gen-files@0.6.1 ⏵ 0.6.0
	pip-licenses@5.5.5 ⏵ 5.5.0
	rq@2.8.0 ⏵ 2.6.1
	google-api-core@2.30.3 ⏵ 2.28.1	⁺¹
	pydantic@2.13.4 ⏵ 2.12.5
	python-frontmatter@1.2.0 ⏵ 1.1.0	⁺²
	ruff@0.15.13 ⏵ 0.14.8
	pydantic-settings@2.14.1 ⏵ 2.12.0
	pytest-cov@7.1.0 ⏵ 7.0.0	⁺¹
	ruamel-yaml@0.19.1 ⏵ 0.18.16	⁺¹
See 9 more rows in the dashboard

View full report

github-actions · 2026-05-19T04:51:08Z

✅ Mutation Testing Results

Metric	Value
Mutation Score	100.0%
Threshold	80%
Status	Passed

What is Mutation Testing?

Mutation testing introduces small changes (mutations) to your code and checks if your tests detect them. A high mutation score indicates your tests are effective at catching bugs.

Killed mutants: Tests detected the change
Survived mutants: Tests did not detect the change (potential gap)

Targeted single-package upgrade (uv lock --upgrade-package urllib3) to patch: - GHSA-38jv-5279-wg99 (HIGH) — decompression-bomb safeguards bypassed when following HTTP redirects in the streaming API - GHSA-qccp-gfcp-xxvc (HIGH) — sensitive headers forwarded across origins in proxied low-level redirects - GHSA-mf9v-mfxr-j63j (HIGH) — decompression-bomb safeguards bypassed in parts of the streaming API PR #27 was blocked on these via Dependency Review, OSV Vulnerability Scanner, SBOM Runtime Scan, and Security Gate Validation. The original author tried `uv lock --upgrade` (wholesale) twice and reverted both times because of collateral damage; this targeted upgrade only touches urllib3 and matches the precise CVE remediation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

tests/unit/preprocessing/test_loader.py (1)
57-77: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Add a boundary test for “path outside upload directory” rejection.

Current tests validate suffix/corruption, but not the required trust-boundary path check. Add a case that passes a resolved path outside configured upload root and asserts a typed failure.

As per coding guidelines, “Test for edge cases … and file system edge cases.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/preprocessing/test_loader.py` around lines 57 - 77, Add a unit
test that verifies load_audio rejects paths outside the configured upload root
by constructing a Path that resolves to a location outside the expected upload
directory and asserting it raises AudioLoadError; locate the test file
tests/unit/preprocessing/test_loader.py and add a new pytest test (similar style
to test_load_audio_rejects_unsupported_suffix and
test_load_audio_raises_on_corrupt_file) that calls load_audio(...) with a
resolved outside-path and checks for AudioLoadError to ensure the trust-boundary
path check is enforced.
tests/unit/preprocessing/test_vad.py (1)
21-69: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Add invalid-input tests for detect_speech_segments.

Please cover sample_rate=0/negative and non-1D audio inputs, asserting clear typed exceptions. This will lock in the validation contract and prevent regressions.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/preprocessing/test_vad.py` around lines 21 - 69, Add unit tests
for detect_speech_segments to assert it raises clear typed exceptions on invalid
inputs: (1) call detect_speech_segments(audio, sample_rate=0) and
detect_speech_segments(audio, sample_rate=-16000) with a 1-D numpy audio array
and assert it raises ValueError (or the library's documented exception) for
invalid sample_rate; (2) call detect_speech_segments with non-1D audio (e.g., a
2-D numpy array) and assert it raises TypeError (or the documented exception)
for invalid audio shape. Add these tests into
tests/unit/preprocessing/test_vad.py near the existing tests and use the same
patching of vad_module.torch.hub.load (returning a fake utils tuple) so the
error is triggered by input validation before model calls.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/audio_processor/preprocessing/ffmpeg.py`:
- Around line 88-93: Add a timeout to the subprocess.run call that invokes
ffmpeg (the call using cmd) and catch subprocess.TimeoutExpired; on timeout
raise FfmpegConversionError with a descriptive message and details including
"input_path" and the timeout value, and chain the original exception; ensure
timeout is configurable or a reasonable fixed value and do not remove existing
capture_output/text/check behavior.
- Around line 50-110: Resolve and validate paths at the start of convert_to_wav:
call .resolve() on input_path and output_path, ensure _FFMPEG_PATH is checked as
before, verify resolved input_path.exists() and .is_file() and raise
FfmpegConversionError with a clear message if not, ensure resolved
output_path.parent.exists() (create or raise as per project policy), and
validate both resolved paths lie within the configured upload/working directory
(e.g. UPLOAD_DIR or WORKING_DIR) to prevent traversal; update the error details
to include resolved paths when raising FfmpegConversionError.

In `@src/audio_processor/preprocessing/loader.py`:
- Around line 27-69: Ensure load_audio and _read_samples canonicalize and
validate input paths before any I/O: call Path.resolve() on the incoming path
and verify the resolved path is contained within the configured upload root
(reject otherwise) before calling _read_samples or performing reads; do not
include the raw resolved path in exception details or log messages—use a
sanitized identifier (e.g., relative path under the upload root or a
redacted/hashed token) when constructing AudioLoadError details and error
messages so no user-derived absolute paths are leaked.
- Around line 16-17: The module currently imports and raises AudioLoadError from
audio_processor.exceptions; replace that with imports from
src.audio_processor.core.exceptions and raise the appropriate centralized types
instead: import ValidationError, ResourceNotFoundError and ExternalServiceError
and update all raises in the loader functions (e.g., the loader module's
functions that currently raise AudioLoadError at the import site and at the
error sites around lines indicated) so that file-not-found or missing resource
errors raise ResourceNotFoundError, invalid/unsupported audio or parameter
validation issues raise ValidationError, and failures coming from external
libraries or IO should raise ExternalServiceError; update the import statement
to reference these three exception classes and change each raise to the correct
class with the original error message preserved.

In `@src/audio_processor/preprocessing/quality.py`:
- Around line 40-53: The check_snr function calls scipy_signal.sosfiltfilt (via
sosfiltfilt) without guarding very short arrays, which raises ValueError; before
creating sos or calling scipy_signal.sosfiltfilt, check len(samples) against a
small threshold (e.g. 16) or compute the required padlen and compare to
samples.size, and if the input is too short, use a safe fallback (for example
compute envelope with a simple moving-average convolution or skip filtering and
treat residual as zero / return float("-inf") for unreliable SNR) so that
envelope and noise are always defined; update the branch around sos, envelope
and noise in check_snr to use this guarded fallback path.

In `@src/audio_processor/preprocessing/vad.py`:
- Around line 58-87: In detect_speech_segments validate inputs before calling
_load_silero_vad: check that sample_rate is an int > 0 and one of Silero's
supported rates (e.g., 8000 or 16000), check that audio is a 1-D numpy array
(audio.ndim == 1) and that its length > 0; if any check fails raise a typed
validation error (ValueError) with a clear message mentioning the offending
parameter; perform these guards at the top of detect_speech_segments so
subsequent calls to _load_silero_vad, get_speech_timestamps and the final
division are safe.

In `@tests/unit/preprocessing/test_ffmpeg.py`:
- Around line 19-84: Add unit tests to cover the missing edge cases for
convert_to_wav: add tests that call ffmpeg_module.convert_to_wav with a
non-existent input Path (assert it raises FfmpegConversionError or
FileNotFoundError), with an output whose parent directory does not exist (assert
proper error is raised), with filenames containing spaces/unicode/quotes (assert
subprocess.run is invoked with the exact paths), with an empty zero-byte input
file (assert ffmpeg error handling attaches stderr/exit code), and a test that
simulates subprocess.TimeoutExpired by patching ffmpeg_module.subprocess.run to
raise TimeoutExpired and asserting convert_to_wav surfaces a
FfmpegConversionError or timeout-specific behavior; reference ffmpeg_module,
convert_to_wav, FfmpegConversionError and subprocess.run in your new tests.

In `@tests/unit/preprocessing/test_quality.py`:
- Around line 25-51: Add a regression unit test to
tests/unit/preprocessing/test_quality.py that exercises check_snr with a very
short non-empty array (e.g., length 2–4, values not all zero) to ensure it does
not raise from internal filtering/pad logic and returns a deterministic result;
create a fixed tiny input (like np.array([0.1, -0.1], dtype=np.float32)), call
check_snr(tiny_input) and assert the result is deterministic (e.g.,
math.isfinite(snr) and/or equals a specific expected float) in a new test
function named test_check_snr_short_non_empty_input_returns_deterministic_value.

---

Outside diff comments:
In `@tests/unit/preprocessing/test_loader.py`:
- Around line 57-77: Add a unit test that verifies load_audio rejects paths
outside the configured upload root by constructing a Path that resolves to a
location outside the expected upload directory and asserting it raises
AudioLoadError; locate the test file tests/unit/preprocessing/test_loader.py and
add a new pytest test (similar style to
test_load_audio_rejects_unsupported_suffix and
test_load_audio_raises_on_corrupt_file) that calls load_audio(...) with a
resolved outside-path and checks for AudioLoadError to ensure the trust-boundary
path check is enforced.

In `@tests/unit/preprocessing/test_vad.py`:
- Around line 21-69: Add unit tests for detect_speech_segments to assert it
raises clear typed exceptions on invalid inputs: (1) call
detect_speech_segments(audio, sample_rate=0) and detect_speech_segments(audio,
sample_rate=-16000) with a 1-D numpy audio array and assert it raises ValueError
(or the library's documented exception) for invalid sample_rate; (2) call
detect_speech_segments with non-1D audio (e.g., a 2-D numpy array) and assert it
raises TypeError (or the documented exception) for invalid audio shape. Add
these tests into tests/unit/preprocessing/test_vad.py near the existing tests
and use the same patching of vad_module.torch.hub.load (returning a fake utils
tuple) so the error is triggered by input validation before model calls.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: e9cdc4a7-e337-40bf-9106-bf043a05cb37

📥 Commits

Reviewing files that changed from the base of the PR and between 1be65b9 and 79d6b5b.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock, !**/*.lock

📒 Files selected for processing (12)

pyproject.toml
src/audio_processor/exceptions.py
src/audio_processor/preprocessing/__init__.py
src/audio_processor/preprocessing/ffmpeg.py
src/audio_processor/preprocessing/loader.py
src/audio_processor/preprocessing/quality.py
src/audio_processor/preprocessing/vad.py
tests/unit/preprocessing/__init__.py
tests/unit/preprocessing/test_ffmpeg.py
tests/unit/preprocessing/test_loader.py
tests/unit/preprocessing/test_quality.py
tests/unit/preprocessing/test_vad.py

coderabbitai · 2026-05-19T13:21:56Z

+def convert_to_wav(input_path: Path, output_path: Path) -> Path:
+    """Convert an audio or video file to a WAV file using ffmpeg.
+
+    Invokes ffmpeg via ``subprocess.run`` with an argument list (no shell
+    interpolation) and overwrites ``output_path`` if it already exists.
+
+    Args:
+        input_path: Filesystem path to the source media file.
+        output_path: Filesystem path for the resulting WAV file. The parent
+            directory must already exist.
+
+    Returns:
+        The ``output_path`` argument, returned for call-chaining convenience.
+
+    Raises:
+        FfmpegConversionError: If the ``ffmpeg`` binary is not on ``PATH``
+            (resolved once at import; raised on first call so test suites
+            that mock ``subprocess`` can still import the module), or if
+            the ffmpeg invocation exits with a non-zero status. The
+            original ``stderr`` output is included in ``details`` for
+            diagnostics.
+    """
+    if _FFMPEG_PATH is None:
+        raise FfmpegConversionError(_FFMPEG_MISSING_MSG)
+
+    cmd: list[str] = [
+        _FFMPEG_PATH,
+        "-y",  # overwrite output without prompting
+        "-i",
+        str(input_path),
+        "-vn",  # drop any video stream
+        "-acodec",
+        "pcm_s16le",  # standard 16-bit PCM WAV
+        str(output_path),
+    ]
+
+    # `check=False` so we can surface ffmpeg's stderr verbatim; `shell=False`
+    # (the default) is the security-critical guarantee, never pass a string.
+    result = subprocess.run(  # noqa: S603 - argv list, shell=False
+        cmd,
+        capture_output=True,
+        text=True,
+        check=False,
+    )
+    if result.returncode != 0:
+        stderr = result.stderr.strip() or "<no stderr output>"
+        msg = (
+            f"ffmpeg failed (exit {result.returncode}) converting "
+            f"{input_path} -> {output_path}"
+        )
+        raise FfmpegConversionError(
+            msg,
+            details={
+                "input_path": str(input_path),
+                "output_path": str(output_path),
+                "exit_code": result.returncode,
+                "stderr": stderr,
+            },
+        )
+
+    return output_path


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Add path resolution and validation to prevent path traversal attacks.

The function accepts arbitrary Path objects and passes them directly to ffmpeg without validation. As per coding guidelines: "all file I/O involving audio assets must use pathlib.Path.resolve() and validate against the configured upload directory before proceeding."

Consider:

Resolve both paths to absolute paths using .resolve() to prevent .. traversal

Validate that input_path exists and is a file (not a directory or device)

Validate that output_path.parent exists before invoking ffmpeg

If there's a configured upload/working directory, validate that resolved paths are within those boundaries

🛡️ Example validation pattern

def convert_to_wav(input_path: Path, output_path: Path) -> Path: """Convert an audio or video file to a WAV file using ffmpeg. Invokes ffmpeg via ``subprocess.run`` with an argument list (no shell interpolation) and overwrites ``output_path`` if it already exists. Args: input_path: Filesystem path to the source media file. output_path: Filesystem path for the resulting WAV file. The parent directory must already exist. Returns: The ``output_path`` argument, returned for call-chaining convenience. Raises: FfmpegConversionError: If the ``ffmpeg`` binary is not on ``PATH`` (resolved once at import; raised on first call so test suites that mock ``subprocess`` can still import the module), or if the ffmpeg invocation exits with a non-zero status. The original ``stderr`` output is included in ``details`` for diagnostics. + ValueError: If input_path does not exist or output_path parent + directory does not exist. """ if _FFMPEG_PATH is None: raise FfmpegConversionError(_FFMPEG_MISSING_MSG) + # Resolve paths to prevent traversal attacks + input_path = input_path.resolve() + output_path = output_path.resolve() + + # Validate input exists and is a file + if not input_path.is_file(): + msg = f"Input path does not exist or is not a file: {input_path}" + raise FfmpegConversionError(msg, details={"input_path": str(input_path)}) + + # Validate output parent directory exists + if not output_path.parent.exists(): + msg = f"Output directory does not exist: {output_path.parent}" + raise FfmpegConversionError( + msg, + details={"output_parent": str(output_path.parent)} + ) cmd: list[str] = [ _FFMPEG_PATH, "-y", # overwrite output without prompting "-i", str(input_path), "-vn", # drop any video stream "-acodec", "pcm_s16le", # standard 16-bit PCM WAV str(output_path), ]

As per coding guidelines, audio file handling must validate paths before proceeding.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/audio_processor/preprocessing/ffmpeg.py` around lines 50 - 110, Resolve and validate paths at the start of convert_to_wav: call .resolve() on input_path and output_path, ensure _FFMPEG_PATH is checked as before, verify resolved input_path.exists() and .is_file() and raise FfmpegConversionError with a clear message if not, ensure resolved output_path.parent.exists() (create or raise as per project policy), and validate both resolved paths lie within the configured upload/working directory (e.g. UPLOAD_DIR or WORKING_DIR) to prevent traversal; update the error details to include resolved paths when raising FfmpegConversionError.

coderabbitai · 2026-05-19T13:21:56Z

+    result = subprocess.run(  # noqa: S603 - argv list, shell=False
+        cmd,
+        capture_output=True,
+        text=True,
+        check=False,
+    )


🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Consider adding timeout parameter to subprocess.run.

The subprocess.run call has no timeout, which could cause the process to hang indefinitely if ffmpeg encounters an issue or a malicious input file causes ffmpeg to run indefinitely. As per coding guidelines for external calls: blocking calls without timeouts on request threads can cause cascading failures.

⏱️ Add timeout parameter

# `check=False` so we can surface ffmpeg's stderr verbatim; `shell=False` # (the default) is the security-critical guarantee, never pass a string. result = subprocess.run( # noqa: S603 - argv list, shell=False cmd, capture_output=True, text=True, check=False, + timeout=300, # 5 minutes max for conversion )

Don't forget to handle subprocess.TimeoutExpired and wrap it in FfmpegConversionError:

try: result = subprocess.run(...) except subprocess.TimeoutExpired as exc: msg = f"ffmpeg conversion timed out after {exc.timeout}s: {input_path}" raise FfmpegConversionError( msg, details={ "input_path": str(input_path), "timeout": exc.timeout, }, ) from exc

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/audio_processor/preprocessing/ffmpeg.py` around lines 88 - 93, Add a timeout to the subprocess.run call that invokes ffmpeg (the call using cmd) and catch subprocess.TimeoutExpired; on timeout raise FfmpegConversionError with a descriptive message and details including "input_path" and the timeout value, and chain the original exception; ensure timeout is configurable or a reasonable fixed value and do not remove existing capture_output/text/check behavior.

coderabbitai · 2026-05-19T13:21:56Z

+from audio_processor.exceptions import AudioLoadError
+


🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

Align raised exceptions with the centralized core exception hierarchy.

This module currently raises AudioLoadError from audio_processor.exceptions, but project rules require exceptions from src/audio_processor/core/exceptions.py (e.g., ValidationError, ResourceNotFoundError, ExternalServiceError) for consistency and cross-layer handling.

As per coding guidelines, “Use centralized exception hierarchy from src/audio_processor/core/exceptions.py with specific exception types.”

Also applies to: 51-52, 66-67

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/audio_processor/preprocessing/loader.py` around lines 16 - 17, The module currently imports and raises AudioLoadError from audio_processor.exceptions; replace that with imports from src.audio_processor.core.exceptions and raise the appropriate centralized types instead: import ValidationError, ResourceNotFoundError and ExternalServiceError and update all raises in the loader functions (e.g., the loader module's functions that currently raise AudioLoadError at the import site and at the error sites around lines indicated) so that file-not-found or missing resource errors raise ResourceNotFoundError, invalid/unsupported audio or parameter validation issues raise ValidationError, and failures coming from external libraries or IO should raise ExternalServiceError; update the import statement to reference these three exception classes and change each raise to the correct class with the original error message preserved.

coderabbitai · 2026-05-19T13:21:56Z

+def load_audio(path: Path) -> tuple[np.ndarray, int]:
+    """Load an audio file, convert to mono, and resample to 16 kHz.
+
+    Reads the file with ``soundfile`` for efficient PCM decoding and falls
+    back to ``librosa`` for compressed formats (e.g. MP3) that soundfile
+    cannot always decode. Multi-channel input is averaged to mono.
+    Resampling uses librosa's polyphase resampler.
+
+    Args:
+        path: Filesystem path to the audio file. The suffix must be one of
+            ``.wav``, ``.mp3``, ``.flac``, or ``.ogg``.
+
+    Returns:
+        Tuple ``(audio, sample_rate)`` where ``audio`` is a 1-D ``float32``
+        numpy array of mono samples and ``sample_rate`` is always
+        ``TARGET_SAMPLE_RATE`` (16_000).
+
+    Raises:
+        AudioLoadError: If the file suffix is unsupported, the file is
+            missing, or the underlying decoder fails on a corrupt file.
+    """
+    suffix = path.suffix.lower()
+    if suffix not in SUPPORTED_SUFFIXES:
+        msg = f"Unsupported audio format: {suffix!r}"
+        raise AudioLoadError(
+            msg,
+            details={
+                "path": str(path),
+                "suffix": suffix,
+                "supported": ", ".join(sorted(SUPPORTED_SUFFIXES)),
+            },
+        )
+
+    try:
+        audio, sample_rate = _read_samples(path, suffix)
+    except AudioLoadError:
+        raise
+    except (OSError, ValueError, RuntimeError) as exc:
+        msg = f"Failed to load audio file: {path}"
+        raise AudioLoadError(
+            msg,
+            details={"path": str(path), "reason": str(exc)},
+        ) from exc


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Enforce trusted-path resolution and boundary validation before any audio I/O.

load_audio and _read_samples read and report path directly. This misses the required trust-boundary check (Path.resolve() + upload-root containment) and leaks raw user-derived paths in exception context.

🔧 Suggested hardening sketch

def load_audio(path: Path) -> tuple[np.ndarray, int]: + resolved = path.resolve(strict=False) + # Validate resolved is under configured upload root before any I/O. + # Example: + # upload_root = settings.upload_dir.resolve(strict=True) + # if upload_root not in resolved.parents and resolved != upload_root: + # raise AudioLoadError("Audio path is outside allowed directory", details={...}) + - suffix = path.suffix.lower() + suffix = resolved.suffix.lower() @@ - audio, sample_rate = _read_samples(path, suffix) + audio, sample_rate = _read_samples(resolved, suffix) @@ - msg = f"Failed to load audio file: {path}" + msg = "Failed to load audio file" raise AudioLoadError( msg, - details={"path": str(path), "reason": str(exc)}, + details={"reason": str(exc)}, ) from exc

As per coding guidelines, “all file I/O involving audio assets must use pathlib.Path.resolve() and validate against the configured upload directory before proceeding” and “agents must never write or log raw audio file paths derived from user input.”

Also applies to: 92-100

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/audio_processor/preprocessing/loader.py` around lines 27 - 69, Ensure load_audio and _read_samples canonicalize and validate input paths before any I/O: call Path.resolve() on the incoming path and verify the resolved path is contained within the configured upload root (reject otherwise) before calling _read_samples or performing reads; do not include the raw resolved path in exception details or log messages—use a sanitized identifier (e.g., relative path under the upload root or a redacted/hashed token) when constructing AudioLoadError details and error messages so no user-derived absolute paths are leaked.

coderabbitai · 2026-05-19T13:21:56Z

+    if audio.size == 0:
+        return float("-inf")
+
+    samples = np.asarray(audio, dtype=np.float64)
+    signal_power = float(np.mean(samples**2))
+    if signal_power <= _NOISE_FLOOR:
+        return float("-inf")
+
+    # 4th-order Butterworth low-pass at Wn=0.5 (half of Nyquist, one
+    # quarter of the sample rate) approximates the slow-varying signal
+    # envelope; the residual captures broadband noise.
+    sos = scipy_signal.butter(N=4, Wn=0.5, btype="low", output="sos")
+    envelope = scipy_signal.sosfiltfilt(sos, samples)
+    noise = samples - envelope


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

# First, let's read the full context of the quality.py file cat -n src/audio_processor/preprocessing/quality.py

Repository: ByronWilliamsCPA/audio-processor

Length of output: 3441

🏁 Script executed:

# Check the scipy.signal.sosfiltfilt documentation and test its behavior with short arrays python << 'EOF' import numpy as np import scipy.signal as scipy_signal # Test sosfiltfilt with different array lengths sos = scipy_signal.butter(N=4, Wn=0.5, btype="low", output="sos") test_lengths = [1, 2, 3, 4, 5, 10, 20, 100] for length in test_lengths: samples = np.random.randn(length).astype(np.float64) try: result = scipy_signal.sosfiltfilt(sos, samples) print(f"Length {length}: SUCCESS") except ValueError as e: print(f"Length {length}: ValueError - {e}") except Exception as e: print(f"Length {length}: {type(e).__name__} - {e}") EOF

Repository: ByronWilliamsCPA/audio-processor

Length of output: 194

🏁 Script executed:

# Check if there are any tests for the check_snr function with short arrays find . -name "*test*.py" -type f | head -20 | xargs grep -l "check_snr\|quality" 2>/dev/null || echo "No matching test files found"

Repository: ByronWilliamsCPA/audio-processor

Length of output: 119

🏁 Script executed:

# Also check for any error handling around sosfiltfilt in the codebase rg "sosfiltfilt" -B 5 -A 5

Repository: ByronWilliamsCPA/audio-processor

Length of output: 1030

🏁 Script executed:

# Read the test file to see if there are tests for short arrays cat -n ./tests/unit/preprocessing/test_quality.py

Repository: ByronWilliamsCPA/audio-processor

Length of output: 2773

🌐 Web query:

scipy.signal.sosfiltfilt minimum length padlen requirements ValueError short arrays

💡 Result:

When using scipy.signal.sosfiltfilt, the function requires that the length of the input signal along the filtering axis be greater than the padlen value [1][2][3]. Specifically, the documentation states that padlen must be less than x.shape[axis] - 1 [1][2]. If this condition is not met, a ValueError is raised [3]. The default padlen is calculated based on the filter coefficients (the number of second-order sections) [1][2]. If you encounter this error with a signal that you believe is sufficiently long, it is often because the function is attempting to filter along the wrong axis [4][3]. By default, sosfiltfilt operates on the last axis (axis=-1) [1][2]. If your input array is multidimensional (e.g., shape (N, 1)), the function may be trying to filter along the dimension with length 1, which is almost certainly shorter than the default padlen [4][3]. To resolve this, you can: 1. Explicitly specify the correct axis (e.g., axis=0) in the function call [4][3]. 2. Ensure your input signal is a one-dimensional array (e.g., using np.ravel or indexing) [4][3]. 3. Manually set a smaller padlen if appropriate for your application, provided it remains consistent with the requirements of the filter [5]. Note that while the error message explicitly mentions padlen, the root cause is frequently a mismatch between the intended filtering dimension and the array's shape [4][3].

Citations:

1: https://docs.scipy.org/doc/scipy-1.5.3/reference/generated/scipy.signal.sosfiltfilt.html

2: https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.sosfiltfilt.html

3: https://stackoverflow.com/questions/78316329/python-sosfiltfilt-vector-x-must-be-greater-than-padlen-in-reality-it-is-gre

4: https://stackoverflow.com/questions/46849290/error-with-padlen-in-signal-filtfilt-in-python

5: https://stackoverflow.com/questions/58374492/python-valueerror-the-length-of-the-input-vector-x-must-be-greater-than-padle

🌐 Web query:

scipy.signal.butter 4th order second order sections default padlen sosfiltfilt

💡 Result:

scipy.signal.butter(..., output='sos') and scipy.signal.sosfiltfilt: what happens with a “default padlen” and 4th-order SOS 1) butter: “4th order” with SOS - scipy.signal.butter(N,..., output='sos') returns a second-order sections array shaped (n_sections, 6). [1] - For bandpass/bandstop, SciPy states the resulting order for the final SOS matrix is 2*N; for simple low/high-pass, you typically get Nth-order realized as N biquad/sections in the SOS output representation. [1] 2) sosfiltfilt: default padlen is computed from the SOS structure - scipy.signal.sosfiltfilt(sos, x,..., padtype='odd', padlen=None) defaults to padtype='odd'. If padtype is None, no padding is used. [2] - If padlen is None, SciPy computes a default padlen value as: 3 * (2 * len(sos) + 1 - min((sos[:, 2] == 0).sum, (sos[:, 5] == 0).sum)) [2] (i.e., it depends on the number of SOS sections and whether there are zeros/poles at the origin for the implemented SOS representation). [2] - Note: padlen=0 implies no padding. [2] 3) Key implication for the edge artifact you mentioned (flattening near the ends) - If you set padtype=None in sosfiltfilt, you disable padding even though sosfiltfilt is a forward-backward filter that otherwise uses padding/extension to reduce edge transients. With no padding, the backward/forward initialization and signal extension assumptions can lead to boundary artifacts (often observed as flattening or other end effects). This is consistent with the fact that sosfiltfilt’s API explicitly ties edge handling to padding via padtype/padlen. [2] 4) Practical fixes - Prefer the default edge handling: do NOT set padtype=None; let padtype default to 'odd' and keep padlen=None so SciPy chooses the computed default padlen. [2] - If artifacts persist, increase padlen (not padtype=None). sosfiltfilt requires padlen < x.shape[axis] - 1. [2] - If you want “no delay” + better numerical robustness for higher orders, sosfiltfilt is generally the right approach; SciPy’s filtfilt documentation notes that sosfiltfilt (and SOS form) should be preferred over filtfilt for most filtering tasks due to fewer numerical problems. [3] 5) Your specific code snippet vs defaults - Your call sosfiltfilt(sos, data, padtype=None) forces “no padding” (padtype=None means no padding is used). [2] - To use the default padlen behavior, use padtype='odd' (or omit it) and padlen=None (or omit it). Example: data_filtered = scipy.signal.sosfiltfilt(sos, data, padtype='odd', padlen=None) (Default padtype='odd' and padlen is computed when padlen=None.) [2]

Citations:

1: https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.butter.html

2: https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.sosfiltfilt.html

3: https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.filtfilt.html

Guard sosfiltfilt for short inputs to prevent ValueError on arrays shorter than the default padlen.

check_snr handles empty and silent audio but does not validate array length before calling scipy_signal.sosfiltfilt. Very short but non-empty, non-silent arrays (fewer than ~16 samples, depending on default padlen for 4th-order Butterworth) will raise ValueError. Add a minimum-length check before filtering, or use an alternate SNR estimator for short clips.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/audio_processor/preprocessing/quality.py` around lines 40 - 53, The check_snr function calls scipy_signal.sosfiltfilt (via sosfiltfilt) without guarding very short arrays, which raises ValueError; before creating sos or calling scipy_signal.sosfiltfilt, check len(samples) against a small threshold (e.g. 16) or compute the required padlen and compare to samples.size, and if the input is too short, use a safe fallback (for example compute envelope with a simple moving-average convolution or skip filtering and treat residual as zero / return float("-inf") for unreliable SNR) so that envelope and noise are always defined; update the branch around sos, envelope and noise in check_snr to use this guarded fallback path.

coderabbitai · 2026-05-19T13:21:56Z

+def detect_speech_segments(
+    audio: np.ndarray,
+    sample_rate: int,
+) -> list[tuple[float, float]]:
+    """Detect speech segments in mono audio using Silero VAD.
+
+    Args:
+        audio: 1-D mono audio array of floating-point samples in the range
+            ``[-1.0, 1.0]``. Silero VAD natively supports 8 kHz and 16 kHz
+            sample rates; 16 kHz is recommended.
+        sample_rate: Sample rate of ``audio`` in Hz.
+
+    Returns:
+        Ordered list of ``(start_seconds, end_seconds)`` tuples describing
+        the detected speech intervals. Returns an empty list if no speech
+        is detected.
+    """
+    model, utils = _load_silero_vad()
+    # Silero's `utils` is a tuple; `get_speech_timestamps` is the first
+    # element by convention.
+    get_speech_timestamps = utils[0]
+
+    tensor = torch.as_tensor(audio, dtype=torch.float32)
+    timestamps: list[dict[str, int]] = get_speech_timestamps(
+        tensor,
+        model,
+        sampling_rate=sample_rate,
+    )
+
+    return [(ts["start"] / sample_rate, ts["end"] / sample_rate) for ts in timestamps]


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate sample_rate and audio shape before model invocation.

sample_rate is used both in Silero call and final division without guards. For sample_rate <= 0 this can fail with unclear errors (or divide by zero). Add explicit checks (sample_rate > 0, supported rates, audio.ndim == 1) and raise a typed validation error.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/audio_processor/preprocessing/vad.py` around lines 58 - 87, In detect_speech_segments validate inputs before calling _load_silero_vad: check that sample_rate is an int > 0 and one of Silero's supported rates (e.g., 8000 or 16000), check that audio is a 1-D numpy array (audio.ndim == 1) and that its length > 0; if any check fails raise a typed validation error (ValueError) with a clear message mentioning the offending parameter; perform these guards at the top of detect_speech_segments so subsequent calls to _load_silero_vad, get_speech_timestamps and the final division are safe.

coderabbitai · 2026-05-19T13:21:56Z

+@pytest.mark.unit
+def test_convert_to_wav_invokes_ffmpeg_with_argv_list(tmp_path: Path) -> None:
+    """Successful invocation passes an argv list (never a shell string)."""
+    src = tmp_path / "in.mp3"
+    src.write_bytes(b"\x00")
+    dst = tmp_path / "out.wav"
+
+    fake = MagicMock(spec=subprocess.CompletedProcess)
+    fake.returncode = 0
+    fake.stderr = ""
+
+    with (
+        patch.object(ffmpeg_module, "_FFMPEG_PATH", "/usr/bin/ffmpeg"),
+        patch.object(ffmpeg_module.subprocess, "run", return_value=fake) as run_mock,
+    ):
+        result = convert_to_wav(src, dst)
+
+    assert result == dst
+    run_mock.assert_called_once()
+    args, kwargs = run_mock.call_args
+    cmd = args[0]
+    assert isinstance(cmd, list), "command must be argv list to avoid shell injection"
+    assert cmd[0] == "/usr/bin/ffmpeg"
+    assert str(src) in cmd
+    assert str(dst) in cmd
+    assert kwargs.get("shell", False) is False
+
+
+@pytest.mark.unit
+def test_convert_to_wav_raises_when_ffmpeg_missing(tmp_path: Path) -> None:
+    """Calling ``convert_to_wav`` without ffmpeg on PATH raises OSError."""
+    src = tmp_path / "in.mp3"
+    src.write_bytes(b"\x00")
+    dst = tmp_path / "out.wav"
+
+    with (
+        patch.object(ffmpeg_module, "_FFMPEG_PATH", None),
+        pytest.raises(FfmpegConversionError, match="ffmpeg binary not found"),
+    ):
+        convert_to_wav(src, dst)
+
+
+@pytest.mark.unit
+def test_convert_to_wav_raises_on_nonzero_exit(tmp_path: Path) -> None:
+    """A non-zero ffmpeg exit surfaces FfmpegConversionError with stderr context."""
+    src = tmp_path / "in.mp3"
+    src.write_bytes(b"\x00")
+    dst = tmp_path / "out.wav"
+
+    fake = MagicMock(spec=subprocess.CompletedProcess)
+    fake.returncode = 1
+    fake.stderr = "Invalid data found when processing input"
+
+    with (
+        patch.object(ffmpeg_module, "_FFMPEG_PATH", "/usr/bin/ffmpeg"),
+        patch.object(ffmpeg_module.subprocess, "run", return_value=fake),
+        pytest.raises(FfmpegConversionError) as exc_info,
+    ):
+        convert_to_wav(src, dst)
+
+    assert "exit 1" in str(exc_info.value)
+    assert (
+        exc_info.value.details.get("stderr")
+        == "Invalid data found when processing input"
+    )
+    assert exc_info.value.details.get("exit_code") == 1


🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Add tests for edge cases and error conditions.

The current tests cover the happy path and basic error cases, but several edge cases are missing per the coding guidelines requirement to "test for edge cases including: special characters in paths, network timeouts, partial failures":

Consider adding tests for:

Input file doesn't exist: Verify behavior when input_path points to non-existent file

Output parent directory doesn't exist: The docstring states "parent directory must already exist" but this isn't validated or tested

Special characters in paths: Unicode characters, spaces, quotes in filenames

Empty input file: Zero-byte or minimal invalid audio file

Subprocess timeout: If timeout is added to subprocess.run (per my other comment), test TimeoutExpired

Path traversal attempts: Test with paths containing .. if validation is added

📋 Example additional test cases

`@pytest.mark.unit` def test_convert_to_wav_with_special_chars_in_path(tmp_path: Path) -> None: """Handle filenames with spaces and Unicode correctly.""" src = tmp_path / "audio with spaces 音频.mp3" src.write_bytes(b"\x00") dst = tmp_path / "output 输出.wav" fake = MagicMock(spec=subprocess.CompletedProcess) fake.returncode = 0 fake.stderr = "" with ( patch.object(ffmpeg_module, "_FFMPEG_PATH", "/usr/bin/ffmpeg"), patch.object(ffmpeg_module.subprocess, "run", return_value=fake) as run_mock, ): result = convert_to_wav(src, dst) assert result == dst # Verify paths are properly passed to subprocess cmd = run_mock.call_args[0][0] assert str(src) in cmd assert str(dst) in cmd `@pytest.mark.unit` def test_convert_to_wav_missing_output_parent_dir(tmp_path: Path) -> None: """Fail gracefully when output parent directory doesn't exist.""" src = tmp_path / "in.mp3" src.write_bytes(b"\x00") dst = tmp_path / "nonexistent" / "out.wav" # parent doesn't exist with ( patch.object(ffmpeg_module, "_FFMPEG_PATH", "/usr/bin/ffmpeg"), # Depending on implementation, this might raise before subprocess pytest.raises((FfmpegConversionError, FileNotFoundError)), ): convert_to_wav(src, dst)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/preprocessing/test_ffmpeg.py` around lines 19 - 84, Add unit tests to cover the missing edge cases for convert_to_wav: add tests that call ffmpeg_module.convert_to_wav with a non-existent input Path (assert it raises FfmpegConversionError or FileNotFoundError), with an output whose parent directory does not exist (assert proper error is raised), with filenames containing spaces/unicode/quotes (assert subprocess.run is invoked with the exact paths), with an empty zero-byte input file (assert ffmpeg error handling attaches stderr/exit code), and a test that simulates subprocess.TimeoutExpired by patching ffmpeg_module.subprocess.run to raise TimeoutExpired and asserting convert_to_wav surfaces a FfmpegConversionError or timeout-specific behavior; reference ffmpeg_module, convert_to_wav, FfmpegConversionError and subprocess.run in your new tests.

coderabbitai · 2026-05-19T13:21:56Z

+@pytest.mark.unit
+def test_check_snr_silent_input_returns_neg_inf() -> None:
+    """All-zero input has no signal energy and returns -inf."""
+    audio = np.zeros(1_000, dtype=np.float32)
+    assert check_snr(audio) == float("-inf")
+
+
+@pytest.mark.unit
+def test_check_snr_empty_input_returns_neg_inf() -> None:
+    """Empty input returns -inf rather than raising."""
+    assert check_snr(np.array([], dtype=np.float32)) == float("-inf")
+
+
+@pytest.mark.unit
+def test_check_snr_returns_finite_for_noisy_signal() -> None:
+    """A sine tone plus broadband noise yields a finite, lower SNR."""
+    rng = np.random.default_rng(seed=0)
+    sr = 16_000
+    t = np.linspace(0, 1.0, sr, endpoint=False)
+    tone = 0.5 * np.sin(2 * np.pi * 440.0 * t)
+    noise = rng.normal(scale=0.1, size=sr)
+    noisy = (tone + noise).astype(np.float32)
+
+    snr_db = check_snr(noisy)
+
+    assert math.isfinite(snr_db)
+


🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Add a regression test for short non-empty inputs in check_snr.

Please add a tiny array case (e.g., length < filtfilt pad length) and assert deterministic behavior instead of exception.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/preprocessing/test_quality.py` around lines 25 - 51, Add a regression unit test to tests/unit/preprocessing/test_quality.py that exercises check_snr with a very short non-empty array (e.g., length 2–4, values not all zero) to ensure it does not raise from internal filtering/pad logic and returns a deterministic result; create a fixed tiny input (like np.array([0.1, -0.1], dtype=np.float32)), call check_snr(tiny_input) and assert the result is deterministic (e.g., math.isfinite(snr) and/or equals a specific expected float) in a new test function named test_check_snr_short_non_empty_input_returns_deterministic_value.

…qm-q3c2 Targeted single-package upgrade to patch: - GHSA-r6ph-v2qm-q3c2 (HIGH) — cryptography vulnerable to a subgroup attack due to missing subgroup validation for SECT curves Surfaced after the urllib3 fix landed; this was the next blocker on Dependency Review and OSV Vulnerability Scanner. Same targeted-upgrade pattern as the urllib3 fix to avoid wholesale-upgrade collateral. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… GHSAs Targeted single-package upgrade to patch: - GHSA-x2qx-6953-8485 (HIGH) — unsafe option check before shlex.split - GHSA-rpm5-65cw-6hj4 (HIGH) — command injection via Git options bypass - GHSA-7545-fcxq-7j24 (HIGH) — path traversal in reference APIs - GHSA-v87r-6q3f-2j67 (HIGH) — RCE via core.hooksPath newline injection - GHSA-mv93-w799-cj2w (HIGH) — bypass of CVE-2026-42215 patch Surfaced after cryptography fix; was the next Dependency Review blocker. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- GHSA-5789-5fc7-67v3 (HIGH) — path traversal via incorrect startswith() check - GHSA-24qx-w28j-9m6p (HIGH) — CORS origin validation bypass via re.match() - GHSA-5mrq-x3x5-8v8f (HIGH) — auth cookies remain valid after password reset Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- GHSA-rch3-82jr-f9w9 (HIGH) — auth token theft via CommandLinker XSS - GHSA-37w4-hwhx-4rc4 (HIGH) — extension manager API/GUI policy bypass - GHSA-mqcg-5x36-vfcg (HIGH) — command linker HTML attribute exec Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…default config) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ITLE_RE) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…cklog Targeted multi-package upgrade via uv lock --upgrade-package (per package, no wholesale upgrade) to clear the remaining HIGH-severity findings flagged by Dependency Review, OSV Vulnerability Scanner, and pip-audit: - nbconvert 7.16.6 -> 7.17.1 - notebook 7.5.0 -> 7.5.6 - pillow 12.0.0 -> 12.2.0 - pip 25.3 -> 26.1.1 - protobuf 6.33.1 -> 6.33.6 - pyasn1 0.6.1 -> 0.6.3 - pygments 2.19.2 -> 2.20.0 - pytest 9.0.1 -> 9.0.3 - python-dotenv 1.2.1 -> 1.2.2 - python-multipart 0.0.20 -> 0.0.29 - requests 2.32.5 -> 2.34.2 - tornado 6.5.2 -> 6.5.5 - virtualenv 20.35.4 -> 21.3.3 - werkzeug 3.1.4 -> 3.1.8 - filelock 3.20.0 -> 3.29.0 - jaraco-context 6.0.1 -> 6.1.2 Local verification: - uv run pip-audit: only py 1.11.0 PYSEC-2022-42969 remains (DISPUTED ReDoS, no upstream fix, already filtered by OSV scanner) - uv run pytest tests/unit/preprocessing/: 16 passed Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…sing deps Audio preprocessing introduces a transitive dep tree (librosa -> numba, silero-vad -> torch -> nvidia-cu12-*) that surfaces licenses not in the existing allow-list, plus packages whose license metadata is not indexed by GitHub's dependency API. Adds to allow-licenses: MPL-1.1, 0BSD, BSD-4-Clause, BSD-2-Clause-Views, Zlib, LicenseRef-scancode-python-cwi, LicenseRef-scancode-secret-labs-2011, LicenseRef-scancode-unicode Adds allow-dependencies-licenses for packages with no metadata-known license (torch, lxml, silero-vad, onnxruntime, deepgram-sdk, protobuf, jaraco-functools, ruamel-yaml{,-clib}, mkdocs-git-revision-date-localized plus 15 nvidia-cu12-* CUDA packages). Follows the documented precedent in this file (2026-05-17 entries for MIT-CMU / ZPL-2.1 / HPND-Markus-Kuhn / LicenseRef-scancode-protobuf). Per-license rationale documented inline. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Surfaced by CI pip-audit after the batch upgrade landed; my local pip-audit had picked up an idna 3.11 venv even though uv.lock already had 3.15 (uv sync didn't auto-reinstall). CI uses --frozen so it reads the lock directly and saw the actual issue. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/dependency-review.yml:
- Around line 63-99: The allow-dependencies-licenses block currently contains
versionless PURLs (e.g., pkg:pypi/torch) which act as wildcards; update each
pkg:pypi/* entry in the allow-dependencies-licenses list to include the exact
approved version (e.g., pkg:pypi/torch@2.0.1) that was reviewed, or remove
entries that exist only to silence "no license" metadata so the action can
handle undetected licenses normally; also revisit the allow-licenses list (the
allow-licenses symbol) to remove or narrow any broad copyleft entries if you
intend to restrict repo-wide license acceptance.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 33dcabc8-f193-4fde-9946-056b4c05d3ad

📥 Commits

Reviewing files that changed from the base of the PR and between 79d6b5b and 574708b.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock, !**/*.lock

📒 Files selected for processing (1)

.github/workflows/dependency-review.yml

coderabbitai · 2026-05-19T15:28:28Z

+          allow-licenses: MIT, MIT-CMU, Apache-2.0, BSD-2-Clause, BSD-2-Clause-Views, BSD-3-Clause, BSD-4-Clause, 0BSD, ISC, MPL-1.1, MPL-2.0, LGPL-2.1, LGPL-3.0, Python-2.0, Unlicense, CC0-1.0, GPL-3.0-or-later, ZPL-2.1, Zlib, HPND-Markus-Kuhn, LicenseRef-scancode-protobuf, LicenseRef-scancode-python-cwi, LicenseRef-scancode-secret-labs-2011, LicenseRef-scancode-unicode
+          # Packages whose license metadata is not indexed by GitHub's dependency
+          # API (action sees "no license"). These all have known permissive
+          # licenses that would be allowed under allow-licenses if metadata was
+          # present:
+          #   - torch, lxml, silero-vad, onnxruntime, deepgram-sdk: BSD-3-Clause / MIT
+          #   - protobuf: BSD-3-Clause (LicenseRef-scancode-protobuf upstream)
+          #   - jaraco-functools, ruamel-yaml, ruamel-yaml-clib: MIT
+          #   - mkdocs-git-revision-date-localized-plugin: MIT
+          #   - nvidia-* CUDA stack: NVIDIA Software License Agreement (proprietary
+          #     EULA, accepted as dependency of torch which is required for VAD)
+          allow-dependencies-licenses: >-
+            pkg:pypi/torch,
+            pkg:pypi/lxml,
+            pkg:pypi/silero-vad,
+            pkg:pypi/onnxruntime,
+            pkg:pypi/deepgram-sdk,
+            pkg:pypi/protobuf,
+            pkg:pypi/jaraco-functools,
+            pkg:pypi/ruamel-yaml,
+            pkg:pypi/ruamel-yaml-clib,
+            pkg:pypi/mkdocs-git-revision-date-localized-plugin,
+            pkg:pypi/nvidia-cublas-cu12,
+            pkg:pypi/nvidia-cuda-cupti-cu12,
+            pkg:pypi/nvidia-cuda-nvrtc-cu12,
+            pkg:pypi/nvidia-cuda-runtime-cu12,
+            pkg:pypi/nvidia-cudnn-cu12,
+            pkg:pypi/nvidia-cufft-cu12,
+            pkg:pypi/nvidia-cufile-cu12,
+            pkg:pypi/nvidia-curand-cu12,
+            pkg:pypi/nvidia-cusolver-cu12,
+            pkg:pypi/nvidia-cusparse-cu12,
+            pkg:pypi/nvidia-cusparselt-cu12,
+            pkg:pypi/nvidia-nccl-cu12,
+            pkg:pypi/nvidia-nvjitlink-cu12,
+            pkg:pypi/nvidia-nvshmem-cu12,
+            pkg:pypi/nvidia-nvtx-cu12


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

cat -n .github/workflows/dependency-review.yml | sed -n '54,100p'

Repository: ByronWilliamsCPA/audio-processor

Length of output: 3284

🌐 Web query:

actions/dependency-review-action allow-dependencies-licenses PURL versioning behavior

💡 Result:

In the GitHub actions/dependency-review-action, the allow-dependencies-licenses configuration option accepts a list of packages in Package URL (PURL) format to exclude them from license checks [1][2][3]. Regarding versioning and matching behavior: 1. Version Specificity: When a specific version is included in the PURL (e.g., pkg:githubactions/owner/repo@2.0.0), the action matches that specific version [1]. 2. Wildcard/Omission Behavior: If no version is specified in the PURL, the action treats it as a wildcard, effectively allowing all versions of that package [3]. 3. PURL Parsing: The action uses a permissive, custom parser for PURLs rather than strictly adhering to the standard packageurl-js library [4]. This allows it to handle certain non-standard or partial PURLs that users commonly provide, though it generally expects valid PURL structures [4]. 4. Limitations: As of May 2026, the action does not natively support PURL qualifiers (such as?license=...) to filter by specific licenses for a given package, nor does it support native wildcard characters (like *) within the PURL string itself for name matching, although these have been requested as enhancements [5][6]. In summary, providing a PURL without a version acts as a blanket allow for all versions of that package, while providing a versioned PURL restricts the exclusion to that specific version [1][3].

Citations:

1: https://github.com/actions/dependency-review-action

2: https://github.com/actions/dependency-review-action/blob/main/action.yml

3: https://github.com/actions/dependency-review-action/blob/main/docs/examples.md

4: https://github.com/actions/dependency-review-action/pull/753](https:/togithub.com/actions/dependency-review-action/pull/753)

5: allow-dependencies-licenses: allow-list specific license actions/dependency-review-action#1046

6: Support name wildcard on allow-dependencies-licenses purls actions/dependency-review-action#959

Pin PURL versions to prevent blanket license waivers for future upgrades.

The current configuration uses versionless PURLs in allow-dependencies-licenses (e.g., pkg:pypi/torch instead of pkg:pypi/torch@2.0.1), which act as wildcards matching all versions of each package indefinitely. This turns metadata gaps for the current PR into permanent exceptions for all future versions, even after dependency updates. Additionally, allow-licenses now admits copyleft licenses (GPL-3.0-or-later, LGPL-2.1, LGPL-3.0, MPL-1.1, MPL-2.0) repo-wide.

Scope approvals by adding explicit versions to the PURLs (e.g., pkg:pypi/torch@2.0.1) to limit exceptions to the exact packages reviewed in this PR. Alternatively, drop entries that only suppress missing metadata, and instead rely on the action's existing handling of undetected licenses.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/dependency-review.yml around lines 63 - 99, The allow-dependencies-licenses block currently contains versionless PURLs (e.g., pkg:pypi/torch) which act as wildcards; update each pkg:pypi/* entry in the allow-dependencies-licenses list to include the exact approved version (e.g., pkg:pypi/torch@2.0.1) that was reviewed, or remove entries that exist only to silence "no license" metadata so the action can handle undetected licenses normally; also revisit the allow-licenses list (the allow-licenses symbol) to remove or narrow any broad copyleft entries if you intend to restrict repo-wide license acceptance.

Trivy on PR #27 surfaced one new HIGH CVE in the python:3.12-slim base image (already had 5 documented CVEs in the same libgnutls30t64 package): - CVE-2026-42009 (HIGH, affected) — DoS via DTLS packet reordering No upstream Debian fix available. Same compensating-control argument as the other libgnutls entries: GnuTLS is transitive via curl/ffmpeg only, Python TLS uses OpenSSL via cryptography/urllib3, no untrusted TLS endpoints flow through GnuTLS in the application request path. Adds one .trivyignore entry and one row to the existing libgnutls30t64 table in docs/known-vulnerabilities.md (Discovered 2026-05-19, reassess 60-day cap per CLAUDE.md unfixed-CVE policy). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…L escape CI exposed a pre-existing docs build regression once PR #27 touched docs/: pygments 2.20.0 tightened html.escape() to reject None, breaking pymdown-extensions <= 10.17.2's code-highlight formatter which passed filename=None unconditionally. Affected any code block in any page. The bug was on main (verified by uv-syncing main's exact lock and running mkdocs build --strict locally) but the docs build job is path- filtered to docs-touching PRs, so it had stayed undetected. Upgrade pymdown-extensions v10.17.2 -> v10.21.3 (which fixes the filename handling) while keeping pygments 2.20.0 (which fixes CVE-2026-4539). Local verification: - uv run mkdocs build --strict: built in 1.84s, no errors - uv run pip-audit: only py 1.11.0 PYSEC-2022-42969 (already documented) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

sonarqubecloud · 2026-05-19T15:55:08Z

Quality Gate passed

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
90.4% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

…API hardening, decomposition) (#53) * refactor: remove dead preprocessing pipeline and orphaned exceptions The preprocessing/ package (PR #27) was superseded by the services/ implementations (PR #43) and is imported nowhere in production code. The two pipelines computed quality metrics with different algorithms (Butterworth-filter SNR vs spectral-percentile SNR) and maintained a second, independent Silero VAD model cache, creating a correctness ambiguity and duplicate maintenance burden. This removes: - src/audio_processor/preprocessing/{loader,ffmpeg,vad,quality}.py - src/audio_processor/exceptions.py (AudioLoadError, FfmpegConversionError were only used by the preprocessing modules; the canonical hierarchy lives in core/exceptions.py) - tests/unit/preprocessing/ The wired-in services/ implementations remain the single source of truth. Suite: 436 passing, coverage 89.98%. * feat: unify job lifecycle behind a shared JobStore and enable ARQ enqueue The API stored jobs in a process-local in-memory dict and never enqueued them, while the worker read/wrote a separate Redis store. A submitted job could never reach COMPLETED from the API's perspective, and /results and /artifacts were structurally unreachable (issue #50). Introduce core/job_store.py with a JobStore abstraction and two backends: - InMemoryJobStore (dev/tests; also exposes a sync mapping interface for direct record injection) - RedisJobStore (JSON-in-Redis keyed by job:{id}) Both backends share one key scheme and serialization, so the API and the separate-process worker observe the same state. - routes.py resolves the store via app.state (falling back to the in-memory store) and enqueues to ARQ when enqueue_enabled and a pool is attached. - audio_tasks._update_job_status delegates to RedisJobStore (single source of truth; removes the duplicated get/merge/set + key literal). - api/__init__.py gains a lifespan that opens an ARQ pool and attaches a RedisJobStore when job_store_backend == "redis". Defaults (memory backend, enqueue disabled) preserve existing behavior; arq is imported lazily so the API does not hard-depend on the 'jobs' extra. New config: job_store_backend ("memory"|"redis"), enqueue_enabled (bool). Tests: add test_job_store.py (both backends) and test_audio_tasks.py, which gives process_audio_job its first coverage and verifies a job reaches COMPLETED with result+artifacts in the shared store. 449 passing, 92.76%. * feat(security): add API-key auth, rate limiting, and safe streaming uploads Addresses two findings from the architecture review: F3 (no authn/authz): every endpoint was anonymous, so unauthenticated callers could drive FFmpeg work and paid Deepgram calls. F4 (unbounded upload): the size guard trusted the client Content-Length header and the body was read fully into memory before validation, enabling a memory-exhaustion DoS; orphaned temp files were only cleaned on the validation-error path. Changes: - api/security.py: require_api_key (constant-time X-API-Key check) and a per-client fixed-window rate_limit dependency. Both are gated by config and default OFF, so existing behavior is preserved. - require_api_key is attached at the /api/v1 router; rate_limit guards the expensive POST /process endpoint. - routes.py: stream uploads to disk in 1 MiB chunks with a hard byte cap enforced on bytes actually read (no full-body buffering, header not trusted). A finally block guarantees the temp file is removed on every failure path; on success the worker owns and deletes it. - config: auth_required, api_keys (+ api_key_set), rate_limit_enabled, rate_limit_requests, rate_limit_window_seconds. No new third-party dependency is added (keeps the audited dependency set stable); the rate limiter is process-local and documented as a safety net to complement a gateway/Redis limiter in multi-process deployments. Tests: test_security.py and test_routes_hardening.py cover auth (open/401/ wrong/ok/misconfig-500), rate limiting (under/over/per-client), the streaming cap, and temp-file cleanup on failure. 462 passing, coverage 92.79%. * docs: document job/security settings in .env.example; apply ruff format - Add JOB_STORE_BACKEND, ENQUEUE_ENABLED, AUTH_REQUIRED, API_KEYS, RATE_LIMIT_* to .env.example so the new configuration is discoverable. - Apply ruff format to api/__init__.py and test_security.py (whitespace only). * fix(test): move UploadFile import into TYPE_CHECKING block CodeQL flagged UploadFile as an unused runtime import; it is only used in a cast() string annotation, so it belongs in the type-checking block. * refactor: remove dead Python 3.10 UTC compatibility shims requires-python is >=3.11, so datetime.UTC is always available and the sys.version_info guard's else branch was unreachable dead code, duplicated across four modules (models, routes, audio_tasks, worker). Replace each with a direct `from datetime import UTC` and drop the now-unused sys/timezone imports. * refactor: remove template scaffolding from worker and utils Cookiecutter residue that the audio pipeline never used: - jobs/worker.py: drop the example/stub tasks (example_background_task, send_email_task, process_file_upload) and the no-op cleanup_old_data cron (it returned a hard-coded 0 yet was registered on the worker). Remove the two large triple-quoted "example" blocks (FastAPI integration + Celery alternative) and the now-unused asyncio/datetime/cron imports. WorkerSettings now registers only process_audio_job. - utils/financial.py: delete the empty placeholder module (no references). - Update test_worker.py to cover the real surface (process_audio_job registration, lifecycle hooks, enqueue_task) and CLAUDE.md project tree. 456 passing, coverage 92.96%. * refactor: decompose process_audio_job into staged helpers The 260-line task mixed orchestration, conversion, transcription, artifact generation, cleanup, and six near-identical inlined progress dicts. Extract focused, individually-readable helpers: - _progress(): builds a progress payload (collapses the 6 repeated blocks). - _convert_audio(): converter is_video/extract vs convert_for_asr branch. - _transcribe(): lazy Deepgram import + ConfigurationError tolerance. - _build_transcription_payload(): result serialization. - _generate_artifacts(): lazy import + generation-failure tolerance. process_audio_job is now a ~80-line orchestrator with the same external behavior and status transitions. Add tests for the two previously-uncovered error branches (generic-exception wrapping to AudioProcessorError + FAILED, and artifact-generation failure still completing the job). 458 passing, coverage 93.06%. * fix: address PR review (rate-limiter bound, config invariant, test coverage) - security.py: bound _RATE_WINDOWS with opportunistic eviction of expired windows above a soft cap, so a flood of unique client identifiers cannot grow the per-process map without bound. - config.py: add a model_validator enforcing the documented invariant that enqueue_enabled requires job_store_backend='redis', so the misconfiguration fails fast at startup instead of silently never processing jobs. - test_job_store.py: cover the bytes-decode branch of RedisJobStore._decode. * fix(test): resolve CodeQL unused-import on UploadFile CodeQL flagged UploadFile as unused because its only reference was the string in cast("UploadFile", ...), and ruff TC006 conversely requires the cast type to be quoted -- a direct conflict. Drop the cast/import entirely and pass the duck-typed _FakeUpload directly with a precise reportArgumentType ignore on the call. Also rename the autouse fixture _clear_store -> clear_store to match the suite convention and avoid reportUnusedFunction. * fix(core): make RedisJobStore.update atomic with per-field HSET (#57) Closes #54. RedisJobStore stored each job as a single JSON string and updated it with a read-modify-write (GET, merge in Python, SET). Because the API and the ARQ worker write the same job from separate processes, two concurrent updates could clobber each other (lost update): whichever SET landed last overwrote every field the other writer had changed. This is the failure mode the F2 "shared job store" fix is meant to prevent, so the store must not reintroduce it. Store each job as a Redis hash (job:{id}) with individually JSON-encoded field values: - create: MULTI/EXEC of DELETE + HSET(mapping) + EXPIRE (replaces any prior record, clears stale fields, reapplies TTL). - update: HSET only the provided non-None fields + EXPIRE. HSET is atomic, so writers touching different fields no longer clobber each other; same-field writes remain last-writer-wins (the expected contract). - get: HGETALL + per-field decode, tolerant of bytes or str (clients created without decode_responses return bytes). Consolidate the two duplicated get/set FakeRedis stand-ins into a single hash-capable tests/unit/_fake_redis.py (adds hset/hgetall/expire/delete and a buffered transactional pipeline), and add a regression test asserting two disjoint concurrent field updates both survive. Suite: 460 passing, coverage 93.03%. ruff + ruff format clean; basedpyright src/: 0 errors. Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: address PR review — secrets, API hardening, error handling Re-applied on top of the merged atomic store (#57), which already implemented the per-field HSET RedisJobStore and the shared test fake. This commit adds the remaining review items that #57 did not cover: Security: - api_keys is now SecretStr (was plain str) so keys never leak via repr / model_dump / logs; api_key_set reads via get_secret_value(). - require_api_key compares against all keys without short-circuiting (avoids a timing side channel); the rate limiter hashes the API key before using it as an in-memory map key. - Fail-fast Settings validator: auth_required requires at least one api_key. Memory: - Rate-limiter window map is hard-capped: after evicting expired windows it evicts the oldest entries when still over the cap. Error handling: - _generate_artifacts catches any exception (DOM/json.dumps can raise TypeError/ValueError/KeyError) and records an artifacts_error field instead of failing a completed transcription. - Status-route timestamp parsing is defensive (no opaque 500 on a bad record). - _maybe_enqueue raises when enabled without a pool, and enqueue failures mark the job FAILED instead of stranding it QUEUED. Tests: test_config.py (validators, SecretStr non-leak) and _maybe_enqueue seam tests. 489 passing, coverage 92.69%. * style: drop unused noqa(BLE001) on re-raising except in _maybe_enqueue guard The except re-raises, so BLE001 does not apply and the suppression was unused (ruff RUF100), failing the Code Quality CI checks. * fix(security): use non-crypto hash for rate-limit bucket id CodeQL flagged hashlib.sha256(api_key) as weak hashing of sensitive data (it treats the API key as a password requiring a slow KDF). The hash here is only an in-process rate-limit bucket id, not password storage. Use the builtin non-cryptographic hash(), which keeps per-key buckets and still avoids holding the raw secret as a map key, without tripping the crypto sink. --------- Co-authored-by: Claude <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings May 16, 2026 04:45

Copilot started reviewing on behalf of williaby May 16, 2026 04:45 View session

Copilot AI reviewed May 16, 2026

View reviewed changes

Comment thread src/audio_processor/preprocessing/quality.py Outdated

Comment thread src/audio_processor/preprocessing/ffmpeg.py Outdated

Comment thread src/audio_processor/preprocessing/ffmpeg.py Outdated

williaby force-pushed the claude/audio-preprocessing-phase-1-Pc9qx branch from d23e6f1 to dcfc527 Compare May 17, 2026 18:45

claude and others added 3 commits May 18, 2026 21:45

williaby force-pushed the claude/audio-preprocessing-phase-1-Pc9qx branch from fdcc7be to 13d7e47 Compare May 19, 2026 04:48

williaby changed the title ~~feat(preprocessing): Phase 1 Sprints 1-4 — audio preprocessing pipeline~~ feat(preprocessing): phase 1 sprints 1-4, audio preprocessing pipeline May 19, 2026

coderabbitai Bot added python dependencies tests security labels May 19, 2026

coderabbitai Bot reviewed May 19, 2026

View reviewed changes

williaby and others added 9 commits May 19, 2026 06:39

fix(deps): bump lxml to clear GHSA-vfmq-68hx-4jfw (XXE via iterparse …

989c95b

…default config) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix(deps): bump mistune to clear GHSA-8mp2-v27r-99xp (ReDoS in LINK_T…

0e274be

…ITLE_RE) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

coderabbitai Bot removed the python label May 19, 2026

coderabbitai Bot added ci and removed dependencies tests security labels May 19, 2026

coderabbitai Bot reviewed May 19, 2026

View reviewed changes

williaby and others added 2 commits May 19, 2026 08:42

williaby merged commit 1b1a036 into main May 19, 2026
60 checks passed

williaby deleted the claude/audio-preprocessing-phase-1-Pc9qx branch May 19, 2026 16:38

williaby mentioned this pull request May 29, 2026

refactor: architecture review remediation (job lifecycle, dead code, API hardening, decomposition) #53

Merged

Uh oh!

Conversation

williaby commented May 16, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Modules Created

Tests (15 unit tests, all mocked)

Coverage

Dependencies

Verification

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagrams

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

Uh oh!

github-actions Bot commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ FIPS Compatibility Check

Uh oh!

github-actions Bot commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dependency Review

License Issues

uv.lock

OpenSSF Scorecard

Scanned Files

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

socket-security Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Mutation Testing Results

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

williaby commented May 16, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 16, 2026 •

edited

Loading

github-actions Bot commented May 16, 2026 •

edited

Loading

github-actions Bot commented May 16, 2026 •

edited

Loading

socket-security Bot commented May 19, 2026 •

edited

Loading

github-actions Bot commented May 19, 2026 •

edited

Loading