Skip to content

feat: add StorageAdapter for Phase 4 Dual-Write#5

Merged
nitaibezerra merged 1 commit intomainfrom
feat/storage-adapter
Dec 25, 2025
Merged

feat: add StorageAdapter for Phase 4 Dual-Write#5
nitaibezerra merged 1 commit intomainfrom
feat/storage-adapter

Conversation

@nitaibezerra
Copy link
Contributor

Summary

Implements StorageAdapter as part of Phase 4 (Dual-Write) of the HF → PostgreSQL migration.

Features

  • Unified interface supporting multiple storage backends:
    • HUGGINGFACE: Write to HuggingFace only (legacy mode)
    • POSTGRES: Write to PostgreSQL only (target mode)
    • DUAL_WRITE: Write to both backends (migration phase)
  • Lazy loading of backend managers
  • Automatic agency/theme resolution using PostgresManager cache
  • Environment variable configuration (STORAGE_BACKEND, STORAGE_READ_FROM)
  • Partial failure handling in dual-write mode
  • Compatible with DatasetManager interface for seamless scraper integration

Files

  • src/data_platform/managers/storage_adapter.py - Main adapter implementation
  • src/data_platform/managers/__init__.py - Updated exports
  • tests/unit/test_storage_adapter.py - 15 unit tests

Environment Variables

Variable Description Values
STORAGE_BACKEND Write destination huggingface, postgres, dual_write
STORAGE_READ_FROM Read source huggingface, postgres

Test plan

  • All 15 unit tests pass
  • Integration test with scraper workflow (next step)

Related

Part of Phase 4 of the migration plan. Next steps:

  • Update scraper workflow to use StorageAdapter
  • Run dual-write mode for 5+ days
  • Validate consistency between backends

Implements StorageAdapter supporting multiple storage backends:
- HUGGINGFACE: Write to HuggingFace only (legacy)
- POSTGRES: Write to PostgreSQL only (target)
- DUAL_WRITE: Write to both backends (migration phase)

Features:
- Unified interface compatible with DatasetManager
- Lazy loading of backend managers
- Automatic agency/theme resolution using PostgresManager cache
- Environment variable configuration (STORAGE_BACKEND, STORAGE_READ_FROM)
- Partial failure handling in dual-write mode

Also includes comprehensive unit tests with 15 test cases.
@nitaibezerra nitaibezerra merged commit 2de92d0 into main Dec 25, 2025
@nitaibezerra nitaibezerra deleted the feat/storage-adapter branch December 25, 2025 14:36
mauriciomendonca pushed a commit that referenced this pull request Mar 20, 2026
Correcoes baseadas no code review:

- #4 (ALTO): Remover BEGIN/COMMIT explicito do rollback SQL 005 que
  quebrava o commit atomico do runner (COMMIT prematuro antes do
  registro em migration_history)

- #1 (MEDIO): Remover server-side cursor desnecessario em
  006_migrate_unique_ids.py — fetchall() carrega tudo em memoria
  de qualquer forma, o name= so adiciona confusao semantica

- #9 (MEDIO): Sanitizar target_version com regex ^[0-9]{3}$ no
  workflow e substituir eval $CMD por execucao direta com bash
  array para prevenir command injection

- #5 (BAIXO): Extrair _execute_with_history() em migrate.py para
  eliminar ~60% de duplicacao entre execute_migration() e
  execute_rollback(), reduzindo risco de divergencia futura

Todos os 44 testes continuam passando.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant