Skip to content

cleanup: remove Cogfy enrichment from pipeline#92

Merged
nitaibezerra merged 2 commits intomainfrom
cleanup/remove-cogfy-enrichment
Feb 27, 2026
Merged

cleanup: remove Cogfy enrichment from pipeline#92
nitaibezerra merged 2 commits intomainfrom
cleanup/remove-cogfy-enrichment

Conversation

@nitaibezerra
Copy link
Contributor

Summary

Remove a integração com Cogfy para enriquecimento de notícias. O enriquecimento via LLM agora é feito pela DAG enrich_news_llm no repo data-science, usando AWS Bedrock (Claude 3 Haiku) via Cloud Composer.

Removido:

  • Jobs upload-to-cogfy e enrich-themes do main workflow (incluindo 20min de wait)
  • Módulo src/data_platform/cogfy/ (~2.100 linhas): cogfy_manager, upload_manager, enrichment_manager
  • CLI commands upload-cogfy e enrich
  • Dependência algoliasearch
  • Referência a COGFY_API_KEY no workflow de integration tests

Pipeline simplificado:

setup-dates → typesense-sync → pipeline-summary

Novo fluxo de enrichment (Airflow):

enrich_news_llm DAG (cada 10 min)
  → Lê notícias sem tema do PostgreSQL
  → Classifica via Claude 3 Haiku (Bedrock)
  → Grava temas + summary no PostgreSQL

Contexto

  • Issue: data-platform#56
  • Infra PR: destaquesgovbr/infra#89 (merged)
  • DAG deploy: data-science#15 (merged)

Test plan

  • Main workflow roda sem erros (apenas typesense-sync)
  • Nenhuma referência a cogfy restante no código fonte
  • poetry install sem algoliasearch

LLM enrichment is now handled by the enrich_news_llm Airflow DAG
in the data-science repo (AWS Bedrock, Claude 3 Haiku).

- Remove upload-to-cogfy and enrich-themes jobs from main workflow
- Remove cogfy/ module (cogfy_manager, upload_manager, enrichment_manager)
- Remove upload-cogfy and enrich CLI commands
- Remove algoliasearch dependency
- Remove COGFY_API_KEY from integration tests workflow
- Simplify pipeline: setup-dates → typesense-sync → summary
- Delete obsolete test_full_pipeline.py (was already skip'd)
- Update CLAUDE.md: remove cogfy/ from tree, remove COGFY env vars
- Update docs: Cogfy → AWS Bedrock/Airflow in architecture + README
- Update SQL schema comments: Cogfy → LLM enrichment
@nitaibezerra nitaibezerra merged commit 64ff113 into main Feb 27, 2026
1 check failed
@nitaibezerra nitaibezerra deleted the cleanup/remove-cogfy-enrichment branch February 27, 2026 23:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants