diff --git a/docs_v2/08_Features/08_04_ChangeRequests/001/003_sidecar-not-found-fast-path.md b/docs_v2/08_Features/08_04_ChangeRequests/001/003_sidecar-not-found-fast-path.md new file mode 100644 index 0000000..09d8642 --- /dev/null +++ b/docs_v2/08_Features/08_04_ChangeRequests/001/003_sidecar-not-found-fast-path.md @@ -0,0 +1,89 @@ +# Sidecar Not-Found Fast Path + +**Feature ID:** 001 +**Change ID:** 001-003 +**Status:** Shipped +**Date Completed:** 2025-11-20 +**Code Branch / PR:** TODO + +## Summary +This change introduces a dedicated, sidecar-aware fast path for reading `.txt` SD-caption sidecar files so that missing sidecars no longer trigger multi-second Dropbox retry sequences. Instead of using the generic `download_image` API for sidecars, the web layer now calls a new helper that treats "not found" as an expected cache miss, significantly improving the responsiveness of `/api/images/random` and `/api/images/{filename}/analyze` when many images do not yet have sidecars. + +## Goals +- Make checking for the presence of a `.txt` sidecar file fast and non-disruptive, especially when the file does not exist. +- Preserve robust retry behavior for genuine transient Dropbox errors while treating “not found” as an expected, non-retriable outcome. +- Keep sidecar semantics, formats, and archive behavior from Feature 001 unchanged, while improving end-to-end latency for web and future callers. + +## Non-Goals +- Changing the on-disk format, contents, or naming of sidecar files defined by Feature 001. +- Modifying SD caption generation prompts, models, or sidecar-writing behavior. +- Introducing new storage systems or altering archival semantics for images or sidecars. +- Redesigning the existing sidecar-as-cache behavior (CR 001-001) beyond how sidecars are fetched. + +## User Value +For operators using the web interface, images without sidecars now load much faster because the system no longer spends several seconds retrying a download that will never succeed. Maintainers benefit from clearer separation between primary image downloads and optional sidecar reads, reducing Dropbox load and improving latency while keeping Feature 001’s sidecar semantics fully intact. This better aligns real-world performance with the sidecar-as-cache intent, particularly in workflows that browse or analyze large sets of images where many are newly added. + +## Technical Overview +- **Scope of the change:** + - Storage layer: `DropboxStorage` gains a sidecar-specific read helper. + - Web layer: `WebImageService` switches to the sidecar helper in random-image and analyze flows. + - Tests: Dropbox sidecar tests and web service tests extended to cover the new behavior. +- **Core flow delta:** + - Before: Sidecar reads used `download_image` with tenacity retries, so missing sidecars incurred multiple attempts with exponential backoff. + - After: Sidecar reads use `download_sidecar_if_exists`, which quickly returns `None` for “not found” conditions without retrying, while still retrying genuine transient errors as appropriate. +- **Key components touched:** + - `publisher_v2.services.storage.DropboxStorage`: new `_is_sidecar_not_found_error` and `download_sidecar_if_exists`. + - `publisher_v2.web.service.WebImageService`: `get_random_image` and `analyze_and_caption` updated to call the new helper. + - Tests: `publisher_v2/tests/test_dropbox_sidecar.py`, `publisher_v2/tests/web/test_web_service.py`. +- **Flags / config:** + - No new config flags were introduced; behavior is always-on and backward-compatible. +- **Data/state/sidecar updates:** + - No changes to sidecar format, naming (`image.jpg` → `image.txt`), or archival behavior; only read-path performance and error handling were refined. + +## Implementation Details +- **Key functions/classes added or modified:** + - `DropboxStorage._is_sidecar_not_found_error(exc: ApiError) -> bool` detects Dropbox “file not found” path errors for sidecars. + - `async DropboxStorage.download_sidecar_if_exists(folder: str, filename: str) -> bytes | None` computes the `.txt` name, downloads the sidecar, and returns `None` on not-found without treating it as an error. + - `WebImageService.get_random_image` now uses `download_sidecar_if_exists` in its `asyncio.gather` call instead of `download_image` for `image.txt`. + - `WebImageService.analyze_and_caption` now calls `download_sidecar_if_exists` in its sidecar-first cache path instead of manually constructing the `.txt` filename and calling `download_image` with a broad `try/except`. +- **Error handling behavior:** + - When Dropbox reports a path-based “not found” error for a sidecar, `_is_sidecar_not_found_error` returns `True` and `download_sidecar_if_exists` returns `None`, enabling callers to treat the case as an ordinary cache miss. + - Non-not-found `ApiError` instances remain subject to tenacity retries in the helper and ultimately surface as `StorageError` if they persist; in the web flows, these errors are either ignored for sidecar parsing (in `get_random_image`) or treated as “no sidecar” for UX parity with previous behavior. +- **Performance / reliability considerations:** + - Missing sidecars no longer trigger multi-attempt retry backoff, cutting several seconds from common web flows that touch unprocessed images. + - Transient Dropbox issues continue to be retried and surfaced, preserving robustness for primary image and sidecar reads when they should succeed. +- **Security / privacy notes:** + - No new secrets or access scopes were introduced; all Dropbox interactions continue to use the existing `DropboxConfig`. + - Logging of sidecar behavior remains high-level; no raw sidecar contents are logged. + +## Testing +- **Unit tests:** + - `publisher_v2/tests/test_dropbox_sidecar.py` now verifies: + - `write_sidecar_text` still writes `image.txt` correctly. + - `download_sidecar_if_exists` returns bytes when a sidecar is present. + - `download_sidecar_if_exists` returns `None` when a simulated Dropbox “not found” error occurs. +- **Integration / web service tests:** + - `publisher_v2/tests/web/test_web_service.py` was updated so the `_DummyStorage` exposes `download_sidecar_if_exists`, and existing tests validate that: + - `get_random_image` continues to populate sidecar-derived fields when a sidecar is present. + - `get_random_image` behaves correctly when sidecars are missing (no errors, no regressions in response shape). + - `analyze_and_caption` still writes sidecars and returns expected captions/SD captions. +- **E2E / manual checks:** + - Existing web endpoint tests (`web_integration` suite) continue to pass unchanged, confirming that HTTP contracts and behavior are preserved while latency improves in missing-sidecar scenarios. + +## Rollout Notes +- **Feature/change flags:** + - No new flags; change is always enabled and safe as a performance improvement. +- **Monitoring / logs:** + - Existing web telemetry (`web_random_image_ms`, `web_analyze_ms`) can be used to observe latency improvements; no new log events were added specifically for this change. +- **Backout strategy:** + - If needed, revert the helper and web-service call sites to the prior `download_image` behavior via a code rollback; no migrations or config changes are involved. + +## Artifacts +- Change Request: docs_v2/08_Features/08_04_ChangeRequests/001/003_sidecar-not-found-fast-path.md +- Change Design: docs_v2/08_Features/08_04_ChangeRequests/001/003_sidecar-not-found-fast-path_design.md +- Change Plan: docs_v2/08_Features/08_04_ChangeRequests/001/003_sidecar-not-found-fast-path_plan.yaml +- Parent Feature Design: docs_v2/08_Features/08_02_Feature_Design/001_captionfile_design.md +- PR: TODO + +## Final Notes +This change keeps the Stable Diffusion caption file feature’s sidecar semantics fully intact while aligning real-world performance with the sidecar-as-cache intent in the web interface. Future follow-ups could add lightweight metrics or log events specific to sidecar read outcomes, or introduce a metadata-only existence check if additional latency reductions are needed for very large folders. The new `download_sidecar_if_exists` helper should be the preferred pattern for sidecar reads in any future CLI preview or batch features that consume sidecars. \ No newline at end of file diff --git a/docs_v2/08_Features/08_04_ChangeRequests/001/003_sidecar-not-found-fast-path_design.md b/docs_v2/08_Features/08_04_ChangeRequests/001/003_sidecar-not-found-fast-path_design.md new file mode 100644 index 0000000..77fdfd3 --- /dev/null +++ b/docs_v2/08_Features/08_04_ChangeRequests/001/003_sidecar-not-found-fast-path_design.md @@ -0,0 +1,190 @@ + + +# Sidecar Not-Found Fast Path — Change Design + +**Feature ID:** 001 +**Change ID:** 001-003 +**Parent Feature:** Stable Diffusion Caption File +**Design Version:** 1.0 +**Date:** 2025-11-20 +**Status:** Design Review +**Author:** Architecture Team +**Linked Change Request:** docs_v2/08_Features/08_04_ChangeRequests/001/003_sidecar-not-found-fast-path.md +**Parent Feature Design:** docs_v2/08_Features/08_02_Feature_Design/001_captionfile_design.md + +## 1. Summary + +- **Problem & context:** The Stable Diffusion caption file feature (001) and its sidecar-as-cache changes (001-001, 001-002) assume sidecars are cheap to read, but the current implementation uses the generic `DropboxStorage.download_image` path with tenacity retries. When a sidecar `.txt` is missing, Dropbox returns a "not found" error that is retried multiple times, adding seconds to key web endpoints (`/api/images/random`, `/api/images/{filename}/analyze`) even though the "no sidecar yet" state is normal. +- **Goal of this change:** Introduce a dedicated, sidecar-aware read helper that treats "not found" as an expected, fast-path outcome while preserving robust behavior for genuine transient errors, and wire the web service to use it. +- **Non-goals:** Do not alter sidecar creation, format, or archival semantics defined by Feature 001; do not change AI behavior, prompts, or SD caption generation logic. + +## 2. Context & Assumptions + +- **Current behavior (affected parts):** + - Feature 001 design defines `DropboxStorage.write_sidecar_text` and describes moving `.txt` sidecars alongside images during archive; reading sidecars is handled by callers (e.g., `WebImageService`) using the generic `download_image` API. + - `WebImageService.get_random_image` currently uses `asyncio.gather` to call `DropboxStorage.get_temporary_link`, `DropboxStorage.download_image(folder, sidecar_name)`, and `DropboxStorage.download_image(folder, selected)` in parallel, with `return_exceptions=True`. Sidecar parsing is conditional on `sidecar_result` not being an exception. + - `WebImageService.analyze_and_caption` ensures the image exists via `get_temporary_link`, then, when not forcing refresh, attempts to read an existing sidecar by calling `download_image` for `image.txt` wrapped in a broad `try/except Exception`, treating any exception as "no sidecar". + - `DropboxStorage.download_image` is decorated with tenacity `@retry(stop_after_attempt(3), wait=wait_exponential(min=1,max=8))`, designed for transient network/API issues when downloading primary image bytes. +- **Constraints from parent feature & sidecar-cache CRs:** + - Sidecars remain optional add-ons; absence of a `.txt` file must never be treated as an error condition for callers. + - Sidecar contents and movement semantics (write beside the image, move with archive) must not change. + - No new persistent stores; Dropbox remains the source of truth. + - Web flows must remain backward compatible in terms of HTTP contracts and overall behavior, only improving latency. +- **Dependencies:** + - `publisher_v2.services.storage.DropboxStorage` (Dropbox SDK, tenacity). + - `publisher_v2.web.service.WebImageService` (web flows that read sidecars). + - `publisher_v2.web.sidecar_parser` (unchanged; continues to parse sidecar text). + +## 3. Requirements + +### 3.1 Functional Requirements + +- **CR1:** Provide a dedicated, sidecar-aware read helper in `DropboxStorage` (e.g., `async def download_sidecar_if_exists(folder: str, filename: str) -> bytes | None`) that: + - Derives the `.txt` sidecar path from the given image filename. + - Returns the raw bytes of the sidecar when present. + - Returns `None` when Dropbox reports "file not found" for the sidecar, without multiple retry attempts. +- **CR2:** Ensure that transient Dropbox errors (network, throttling, etc.) for sidecar reads continue to be retried according to a tenacity policy, and that persistent non-404 failures surface as exceptions to callers (which may choose to treat them as "no sidecar" for UX). +- **CR3:** Update `WebImageService.get_random_image` and `WebImageService.analyze_and_caption` to use the new sidecar helper for `.txt` reads, preserving all existing caching, parsing, and logging semantics while removing multi-second delays when sidecars are absent. +- **CR4:** Maintain current behavior for image downloads (`download_image`) and sidecar writing/archival (`write_sidecar_text`, `archive_image`); only sidecar-reading behavior is adjusted. + +### 3.2 Non-Functional Requirements + +- **Performance:** For images without sidecars, sidecar-check latency must be reduced from multi-second retry sequences to a single Dropbox round-trip, making `/api/images/random` and `/api/images/{filename}/analyze` noticeably faster when sidecars are missing. +- **Reliability:** Robust error handling for genuine Dropbox issues must be preserved; the new helper must not mask systemic failures as cache misses without emitting logs where appropriate. +- **Security & Privacy:** No new secrets, scopes, or sensitive data flows; sidecar contents and logging behavior remain governed by Feature 001 and its cache CRs. +- **Observability:** Existing logs (e.g., `web_analyze_sidecar_cache_hit`) remain valid; additional logging for sidecar read failures, if added, must use structured `log_json` and avoid leaking sensitive information. + +## 4. Architecture & Design (Delta) + +### 4.1 Current vs. Proposed + +- **Current:** + - Sidecar reads in web flows use `DropboxStorage.download_image` directly for `image.txt`. This path is optimized for primary image bytes and includes tenacity retries for any `ApiError`, including 404-style "not found", causing multiple attempts and backoff delays when sidecars do not exist. + - Web code compensates by catching broad exceptions and treating them as "no sidecar", but cannot avoid the time spent in retries. +- **Proposed:** + - Introduce `DropboxStorage.download_sidecar_if_exists` as a sidecar-specific helper that: + - Still uses the Dropbox SDK under the hood. + - Recognizes Dropbox "not found" errors and treats them as an expected, non-retriable outcome, returning `None` quickly. + - Optionally leverages tenacity `retry` with a custom predicate so that only retryable errors are re-attempted. + - Refactor `WebImageService` to call `download_sidecar_if_exists` from both `get_random_image` and `analyze_and_caption`, while leaving other behavior (e.g., `asyncio.gather`, `return_exceptions=True`, sidecar parsing, and cache-hit logging) intact. + +### 4.2 Components & Responsibilities + +- **`publisher_v2.services.storage.DropboxStorage`** + - **New helper:** `async def download_sidecar_if_exists(self, folder: str, filename: str) -> bytes | None` + - Responsibility: Fast, sidecar-aware read that normalizes "not found" to `None` and allows callers to distinguish between "no sidecar" and other failures. + - **New internal helper:** `_is_sidecar_not_found_error(exc: ApiError) -> bool` + - Responsibility: Inspect Dropbox `ApiError` to identify "file not found" conditions for sidecar paths. +- **`publisher_v2.web.service.WebImageService`** + - **`get_random_image`** + - Responsibility change: Use `download_sidecar_if_exists` instead of `download_image` for `.txt` reads in the `asyncio.gather` call; treat `None` as "no sidecar" and continue to prefer sidecar-derived caption/metadata when present. + - **`analyze_and_caption`** + - Responsibility change: Replace the `try/except` + `download_image` pattern with a single call to `download_sidecar_if_exists`, falling back to AI analysis when the helper returns `None`. +- **Tests** + - **`publisher_v2/tests/test_dropbox_sidecar.py`** + - Extended to cover `download_sidecar_if_exists` behavior for existing vs. missing sidecars. + - **`publisher_v2/tests/web/test_web_service.py`** + - Extended to assert that missing sidecars result in no exceptions and that sidecar-backed paths still work as before. + +### 4.3 Data & Contracts + +- **Data:** + - No changes to sidecar file naming (`image.jpg` → `image.txt`), contents, or archive behavior. + - No new fields added to sidecar metadata or in-memory models. +- **Contracts:** + - `DropboxStorage` gains a new, internal-facing API (`download_sidecar_if_exists`) used by the web service; this is additive and does not alter existing public contracts. + - Web API contracts (`/api/images/random`, `/api/images/{filename}/analyze`) remain unchanged: response shapes, status codes, and semantics are preserved. + +### 4.4 Error Handling & Edge Cases + +- **Dropbox "file not found" for sidecar:** + - Detected by `_is_sidecar_not_found_error` in `DropboxStorage`. + - `download_sidecar_if_exists` returns `None` without raising, allowing callers to treat this as a normal cache miss. +- **Other Dropbox errors (auth, throttling, network):** + - For the sidecar helper, these are still subject to tenacity retry (if configured) and ultimately surface as exceptions. + - In `get_random_image`, such exceptions will appear in the `asyncio.gather` result and be ignored for sidecar parsing, maintaining current "best effort" behavior. + - In `analyze_and_caption`, they may either be caught and treated as "no sidecar" (maintaining today’s semantics) or allowed to bubble as 5xx; we prefer to preserve current practice of not failing the endpoint due solely to sidecar-read issues. +- **Invalid sidecar contents:** + - Parsing remains handled by `rehydrate_sidecar_view`; invalid or partial sidecars fall back to AI analysis exactly as defined in CR 001-001. + +### 4.5 Security, Privacy, Compliance + +- No changes to auth, access rules, or credentials. +- Sidecar contents remain governed by Feature 001 (PG-13 SD captions, no PII, no new sensitive fields). +- Logging must not include raw sidecar contents; any new logs should only report high-level outcomes (e.g., "sidecar_not_found", "sidecar_read_error") with sanitized error messages. + +## 5. Detailed Flow + +### 5.1 `download_sidecar_if_exists` Flow + +1. Caller passes `folder` and image `filename` (e.g., `"image.jpg"`). +2. Helper computes `sidecar_name = f"{stem}.txt"` using `os.path.splitext`. +3. It attempts a Dropbox `files_download` for the sidecar path in a thread via `asyncio.to_thread`. +4. If the call succeeds, the sidecar bytes are returned. +5. If Dropbox raises an `ApiError`: + - If `_is_sidecar_not_found_error(exc)` is `True`, the helper returns `None` immediately. + - Otherwise, the error is either retried (tenacity) or ultimately raised as a `StorageError`, depending on the configured retry policy. + +### 5.2 `get_random_image` Flow (Delta) + +1. `_get_cached_images` returns the cached or freshly listed candidate filenames. +2. A random image `selected` is chosen. +3. `asyncio.gather` is invoked with: + - `get_temporary_link(folder, selected)` + - `download_sidecar_if_exists(folder, selected)` + - `download_image(folder, selected)` for the image bytes +4. After gather: + - If the temp link call failed, an exception is raised as before. + - If `sidecar_result` is a non-exception and not `None`, it is decoded and parsed via `rehydrate_sidecar_view` to populate `sd_caption`, `caption`, `metadata`, and `has_sidecar`. + - If `sidecar_result` is `None` or an exception, the image is treated as having no sidecar. + - SHA256 is optionally computed from `image_result` as today. + +### 5.3 `analyze_and_caption` Flow (Delta) + +1. `get_temporary_link` validates that the image exists. +2. If `force_refresh` is `False`: + - `download_sidecar_if_exists(folder, filename)` is called. + - If bytes are returned and sidecar parsing yields a usable caption and/or `sd_caption`, the method returns an `AnalysisResponse` constructed from the sidecar view and logs a `web_analyze_sidecar_cache_hit`. + - If the helper returns `None` or any error is treated as "no sidecar", the flow falls through to AI analysis. +3. If `force_refresh` is `True` or no usable sidecar is found: + - AI analysis + caption/SD-caption generation proceeds as defined in CR 001-001 and 001-002. + - On success, a sidecar is (re)written via `write_sidecar_text`, unchanged. + +## 6. Testing Strategy (for this Change) + +- **Unit tests:** + - Extend `test_dropbox_sidecar.py` to cover: + - Successful sidecar download via `download_sidecar_if_exists`. + - "File not found" behavior resulting in `None` and no exception. + - Optional: behavior when an `ApiError` that is not "not found" is raised (e.g., ensure it bubbles or is wrapped appropriately). +- **Web service tests:** + - Extend `tests/web/test_web_service.py` to ensure: + - `get_random_image` continues to populate sidecar-derived fields correctly when a sidecar exists, using the new helper. + - `get_random_image` behaves correctly when sidecars are missing, with no unexpected exceptions. + - `analyze_and_caption` still prefers sidecar cache when available, and falls back to AI when sidecars are missing. +- **Integration/E2E (optional for this change):** + - Leverage existing web endpoint tests to validate that behavior and responses remain unchanged; focus on ensuring no new errors occur when sidecars are absent. + +## 7. Risks & Alternatives + +- **Risks:** + - **Incorrect "not found" detection:** If `_is_sidecar_not_found_error` misclassifies errors, the helper might treat real issues as cache misses. Mitigation: Implement targeted tests with simulated Dropbox error objects and keep classification logic narrow. + - **Divergent semantics between sidecar and image downloads:** Introducing a specialized helper could confuse future contributors. Mitigation: Clearly document the helper’s purpose and ensure all sidecar reads converge on it. +- **Alternatives Considered:** + - **Adjusting tenacity on `download_image` globally:** Rejected; would affect all image downloads, not just sidecars, and risks under-retrying genuine image download failures. + - **Using only metadata (`files_get_metadata`) to check for sidecar existence:** Considered as a future optimization; for this change, a single, non-retriable download attempt for missing sidecars is sufficient and simpler to reason about. + +## 8. Work Plan (Scoped) + +- **Task 1:** Implement `DropboxStorage.download_sidecar_if_exists` and `_is_sidecar_not_found_error`, including any tenacity retry predicate if used. +- **Task 2:** Refactor `WebImageService.get_random_image` to use the new helper in its `asyncio.gather` call for sidecar reads. +- **Task 3:** Refactor `WebImageService.analyze_and_caption` to use the new helper instead of the generic `download_image` + `try/except` pattern for cache-first logic. +- **Task 4:** Extend `test_dropbox_sidecar.py` with unit tests for the new helper and its error classification behavior. +- **Task 5:** Extend `tests/web/test_web_service.py` to validate sidecar-present and sidecar-absent behavior under the new helper, ensuring no regressions in existing web flows. + +## 9. Open Questions + +- Should `download_sidecar_if_exists` be the only supported API for sidecar reads going forward (e.g., should other components be refactored to use it), or is it acceptable to scope its usage to the web layer for now? — Proposed answer: Scope initially to web flows, then refactor other sidecar consumers to use it in follow-up work if needed. +- Do we need additional structured logging around sidecar read failures (e.g., `sidecar_not_found`, `sidecar_read_error`) for observability, or is the existing web telemetry sufficient? — Proposed answer: Start without new log events; add them only if operational debugging shows a need. + + diff --git a/docs_v2/08_Features/08_04_ChangeRequests/001/003_sidecar-not-found-fast-path_plan.yaml b/docs_v2/08_Features/08_04_ChangeRequests/001/003_sidecar-not-found-fast-path_plan.yaml new file mode 100644 index 0000000..38b0055 --- /dev/null +++ b/docs_v2/08_Features/08_04_ChangeRequests/001/003_sidecar-not-found-fast-path_plan.yaml @@ -0,0 +1,97 @@ + +version: 1 +feature_id: stable-diffusion-caption-file-change-001-003-sidecar-not-found-fast-path +summary: "Introduce a sidecar-aware fast path for missing .txt files to avoid multi-second Dropbox retries in web flows." +design_refs: + - docs_v2/08_Features/08_04_ChangeRequests/001/003_sidecar-not-found-fast-path_design.md + - docs_v2/08_Features/08_02_Feature_Design/001_captionfile_design.md +repo_constraints: + allowed_paths: + - "publisher_v2/src/publisher_v2/**" + - "publisher_v2/tests/**" + - "docs_v2/**" + excluded_paths: + - "code_v1/**" + - "docs_v1/**" + - "htmlcov/**" + - "coverage.xml" + - "coverage-current.xml" +acceptance_criteria: + - "Given an image without a .txt sidecar in Dropbox, when /api/images/random or /api/images/{filename}/analyze is called, the system determines that the sidecar is absent in a single Dropbox round-trip without multi-second retry delays." + - "Given an image with a valid .txt sidecar, when the new sidecar-specific storage method is used, the sidecar contents are returned as before and callers behave identically to the existing cache semantics." + - "Given a Dropbox 'file not found' error for a sidecar read, when the new helper is invoked, it surfaces a fast, non-exceptional 'no sidecar' outcome that callers treat as a normal cache miss." +non_goals: + - "Changing sidecar file format, naming, or archival behavior defined by Feature 001." + - "Altering AI prompts, SD caption generation, or sidecar-writing logic." + - "Introducing new persistent stores or modifying CLI workflows." +tasks: + - id: storage_story001_003.add_sidecar_fast_path_helper + type: code + touches: + - "publisher_v2/src/publisher_v2/services/storage.py" + files_out: [] + contracts: + request: {} + response: {} + signatures: + - "class DropboxStorage" + - "async def download_sidecar_if_exists(self, folder: str, filename: str) -> bytes | None" + depends_on: [] + - id: web_story001_003.use_sidecar_helper_in_service + type: code + touches: + - "publisher_v2/src/publisher_v2/web/service.py" + files_out: [] + contracts: + request: {} + response: {} + signatures: + - "class WebImageService" + - "async def get_random_image(self) -> ImageResponse" + - "async def analyze_and_caption(self, filename: str, correlation_id: Optional[str] = None, force_refresh: bool = False) -> AnalysisResponse" + depends_on: + - "storage_story001_003.add_sidecar_fast_path_helper" + - id: test_story001_003.extend_dropbox_sidecar_tests + type: test + touches: + - "publisher_v2/tests/test_dropbox_sidecar.py" + files_out: [] + contracts: + request: {} + response: {} + signatures: [] + depends_on: + - "storage_story001_003.add_sidecar_fast_path_helper" + - id: test_story001_003.extend_web_service_sidecar_tests + type: test + touches: + - "publisher_v2/tests/web/test_web_service.py" + files_out: [] + contracts: + request: {} + response: {} + signatures: [] + depends_on: + - "web_story001_003.use_sidecar_helper_in_service" +quality_gates: + tests_required: + - "test_story001_003.extend_dropbox_sidecar_tests" + - "test_story001_003.extend_web_service_sidecar_tests" + coverage_delta_min: +1 + perf_budget: + web_random_image_p95_ms: 5000 + web_analyze_p95_ms: 20000 + security_checks: + - "dep-audit" + - "no-secrets" + - "lint" +release: + strategy: "gradual" + flags: + - "features.captionfile_sidecar_not_found_fast_path" + metrics: + - "web_random_image_ms" + - "web_analyze_ms" + - "dropbox_sidecar_read_errors" + + diff --git a/publisher_v2/src/publisher_v2/services/storage.py b/publisher_v2/src/publisher_v2/services/storage.py index 58cb54e..e63b957 100644 --- a/publisher_v2/src/publisher_v2/services/storage.py +++ b/publisher_v2/src/publisher_v2/services/storage.py @@ -3,11 +3,11 @@ import asyncio import os from pathlib import Path -from typing import List, Tuple +from typing import List, Tuple, Optional import dropbox from dropbox.exceptions import ApiError -from tenacity import retry, stop_after_attempt, wait_exponential +from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception from publisher_v2.config.schema import DropboxConfig from publisher_v2.core.exceptions import StorageError @@ -48,6 +48,54 @@ def _upload() -> None: except ApiError as exc: raise StorageError(f"Failed to upload sidecar for {filename}: {exc}") from exc + @staticmethod + def _is_sidecar_not_found_error(exc: ApiError) -> bool: + """ + Return True when the given ApiError represents a "file not found" condition + for a path-based operation (e.g., sidecar download). + """ + + error = getattr(exc, "error", None) + if error is None: + return False + # Dropbox SDK models path errors with is_path()/get_path(), and the + # nested object exposes is_not_found() when the file is missing. + if hasattr(error, "is_path") and error.is_path(): + path_error = error.get_path() + if hasattr(path_error, "is_not_found") and path_error.is_not_found(): + return True + return False + + @retry( + reraise=True, + stop=stop_after_attempt(3), + wait=wait_exponential(multiplier=1, min=1, max=8), + retry=retry_if_exception(lambda exc: isinstance(exc, ApiError) and not DropboxStorage._is_sidecar_not_found_error(exc)), # type: ignore[arg-type] + ) + async def download_sidecar_if_exists(self, folder: str, filename: str) -> Optional[bytes]: + """ + Download the .txt sidecar for the given image if it exists. + + Returns the sidecar bytes on success, or None when Dropbox reports that + the sidecar file does not exist. Transient errors remain subject to + tenacity retries and ultimately surface as ApiError/StorageError. + """ + + try: + def _download() -> bytes: + stem = os.path.splitext(filename)[0] + sidecar_name = f"{stem}.txt" + path = os.path.join(folder, sidecar_name) + _, response = self.client.files_download(path) + return response.content + + return await asyncio.to_thread(_download) + except ApiError as exc: + if self._is_sidecar_not_found_error(exc): + # Fast-path for "not found" – treat as normal cache miss instead of error. + return None + raise StorageError(f"Failed to download sidecar for {filename}: {exc}") from exc + @retry( reraise=True, stop=stop_after_attempt(3), diff --git a/publisher_v2/src/publisher_v2/web/service.py b/publisher_v2/src/publisher_v2/web/service.py index c3e00db..5958fac 100644 --- a/publisher_v2/src/publisher_v2/web/service.py +++ b/publisher_v2/src/publisher_v2/web/service.py @@ -92,12 +92,13 @@ async def get_random_image(self) -> ImageResponse: selected = images[0] folder = self.config.dropbox.image_folder - sidecar_name = os.path.splitext(selected)[0] + ".txt" # Fetch temporary link, sidecar (if any), and image bytes for hash in parallel. + # Sidecar reads use a dedicated helper that treats "not found" as a fast, + # non-error outcome to avoid multi-second retries for missing sidecars. temp_link_result, sidecar_result, image_result = await asyncio.gather( self.storage.get_temporary_link(folder, selected), - self.storage.download_image(folder, sidecar_name), + self.storage.download_sidecar_if_exists(folder, selected), self.storage.download_image(folder, selected), return_exceptions=True, ) @@ -146,11 +147,9 @@ async def analyze_and_caption( # Sidecar-first cache path when not forcing refresh. if not force_refresh: - sidecar_name = os.path.splitext(filename)[0] + ".txt" - try: - blob = await self.storage.download_image(self.config.dropbox.image_folder, sidecar_name) - except Exception: - blob = b"" + blob = await self.storage.download_sidecar_if_exists( + self.config.dropbox.image_folder, filename + ) if blob: text = blob.decode("utf-8", errors="ignore") view = rehydrate_sidecar_view(text) diff --git a/publisher_v2/tests/test_dropbox_sidecar.py b/publisher_v2/tests/test_dropbox_sidecar.py index e7793ac..d449262 100644 --- a/publisher_v2/tests/test_dropbox_sidecar.py +++ b/publisher_v2/tests/test_dropbox_sidecar.py @@ -1,9 +1,10 @@ from __future__ import annotations from types import SimpleNamespace -from typing import Any, List +from typing import Any, List, Optional import pytest +from dropbox.exceptions import ApiError from publisher_v2.config.schema import DropboxConfig from publisher_v2.services.storage import DropboxStorage @@ -12,10 +13,31 @@ class DummyClient: def __init__(self) -> None: self.uploads: List[tuple[str, bytes, Any]] = [] + self.sidecar_bytes: Optional[bytes] = None + self.sidecar_exists: bool = True def files_upload(self, data: bytes, path: str, mode: Any, mute: bool = False, strict_conflict: bool = False) -> None: self.uploads.append((path, data, mode)) + def files_download(self, path: str) -> tuple[None, SimpleNamespace]: + # Simple emulation: only support sidecar path, and respect sidecar_exists flag. + if not self.sidecar_exists: + # Build a minimal ApiError with a .error exposing is_path()/get_path().is_not_found() + class _PathError: + def is_not_found(self) -> bool: + return True + + class _Error: + def is_path(self) -> bool: + return True + + def get_path(self) -> _PathError: + return _PathError() + + raise ApiError("request-id", _Error(), "not_found", "en-US") + content = self.sidecar_bytes or b"sidecar-bytes" + return None, SimpleNamespace(content=content) + @pytest.mark.asyncio async def test_sidecar_upload_overwrite(monkeypatch: pytest.MonkeyPatch) -> None: @@ -37,3 +59,40 @@ async def test_sidecar_upload_overwrite(monkeypatch: pytest.MonkeyPatch) -> None assert path == "/ImagesToday/image.txt" assert data.decode("utf-8") == "sd caption" + +@pytest.mark.asyncio +async def test_download_sidecar_if_exists_returns_bytes(monkeypatch: pytest.MonkeyPatch) -> None: + cfg = DropboxConfig( + app_key="key", + app_secret="secret", + refresh_token="refresh", + image_folder="/ImagesToday", + archive_folder="archive", + ) + storage = DropboxStorage(cfg) + dummy = DummyClient() + dummy.sidecar_bytes = b"hello-sidecar" + dummy.sidecar_exists = True + monkeypatch.setattr(storage, "client", dummy) + + blob = await storage.download_sidecar_if_exists("/ImagesToday", "image.jpg") + assert blob == b"hello-sidecar" + + +@pytest.mark.asyncio +async def test_download_sidecar_if_exists_returns_none_on_not_found(monkeypatch: pytest.MonkeyPatch) -> None: + cfg = DropboxConfig( + app_key="key", + app_secret="secret", + refresh_token="refresh", + image_folder="/ImagesToday", + archive_folder="archive", + ) + storage = DropboxStorage(cfg) + dummy = DummyClient() + dummy.sidecar_exists = False + monkeypatch.setattr(storage, "client", dummy) + + blob = await storage.download_sidecar_if_exists("/ImagesToday", "image.jpg") + assert blob is None + diff --git a/publisher_v2/tests/web/test_web_service.py b/publisher_v2/tests/web/test_web_service.py index 8f75e00..cf9ae5e 100644 --- a/publisher_v2/tests/web/test_web_service.py +++ b/publisher_v2/tests/web/test_web_service.py @@ -29,14 +29,16 @@ async def get_temporary_link(self, folder: str, filename: str) -> str: return f"https://example.com/{filename}" async def download_image(self, folder: str, filename: str) -> bytes: - # crude branching: return sidecar for .txt when present; image bytes otherwise - if filename.endswith(".txt"): - if self.sidecar_content is not None: - return self.sidecar_content - # Simulate missing sidecar - raise FileNotFoundError(filename) + # crude branching: return image bytes only; sidecars are handled via + # download_sidecar_if_exists for this change. return b"image-bytes" + async def download_sidecar_if_exists(self, folder: str, filename: str) -> bytes | None: + # Derive sidecar from image name and return content when present. + if self.sidecar_content is not None: + return self.sidecar_content + return None + async def get_file_metadata(self, folder: str, filename: str) -> Dict[str, str]: return {"id": "file-id", "rev": "file-rev"}