Skip to content

Conversation

@mart-r
Copy link
Collaborator

@mart-r mart-r commented Jun 25, 2025

First did some prerequisite work.

This involves loading the config off disk when appropriate.
It also involves making an assumption regarding the type of base tokenizer (spacy) with a massive warning.

Also added a few small tests.

@mart-r mart-r merged commit 01b9c85 into main Jun 26, 2025
10 checks passed
@mart-r mart-r deleted the CU-8699hhrcy-add-meta-cat-load-method-to-cat branch June 26, 2025 08:46
parsa-hemmati added a commit to parsa-hemmati/cogstack-nlp that referenced this pull request Nov 20, 2025
[Agent-generated code - Debugging session]

Changes:
- Added psycopg2-binary==2.9.10 to requirements.txt (alembic needs sync driver)
- Modified alembic/env.py to convert asyncpg URLs to psycopg2
- Created app/db/base_class.py - settings-free Base class for migrations
- Modified app/db/base.py to lazy-load settings (avoid import during migrations)
- Fixed Base imports in 5 model files (user, audit_log, document, extracted_entity, patient)

Rationale:
- Root Cause CogStack#1: Alembic requires psycopg2 (sync driver), but FastAPI uses asyncpg (async)
- Root Cause CogStack#2: env.py was using asyncpg URL directly without conversion
- Root Cause CogStack#3: Settings imported at module level caused CORS_ORIGINS parsing error during migrations
- Root Cause CogStack#4: Models importing from app.db.base triggered settings initialization
- Root Cause CogStack#5: Some models imported from non-existent app.core.database module

Tests:
- Alembic can now load env.py without errors
- psycopg2 can connect to database successfully
- Models can import Base without triggering settings parsing
- Migrations still not applying (requires further investigation)

CONTEXT.md Updates:
- Updated "Alembic Debugging" section with 5 root causes and fixes
- Documented files modified for alembic compatibility
- Noted remaining issue: migrations not executing despite fixes
- Technical debt: may need manual schema initialization or alternative migration strategy

Technical Debt:
- Migrations silently not executing (needs transaction handling investigation)
- May need to verify alembic context configuration
- Consider alternative: manual schema creation script if alembic remains problematic

AI Context:
- Extensive debugging session (2+ hours) identifying 5 distinct alembic issues
- Fixed import chain: env.py → base_class.py → models (no settings dependency)
- Verified psycopg2 connectivity works, URL conversion works, imports work
- Session: 2025-11-18
parsa-hemmati added a commit to parsa-hemmati/cogstack-nlp that referenced this pull request Nov 20, 2025
Changes:
- NHS number masking: Normalize to digits-only before masking (handles spaces, dashes, any format)
- Pydantic validation: Added enums for filters (NegationFilter, TemporalityFilter, ExperiencerFilter, CertaintyFilter) and sort_by (SortByOption)
- Audit logging: Wrapped in try/except to prevent search failures when audit logging fails
- Performance: Added Certainty to composite index (migration 007) for 4-field filtering

Bugs Fixed:
1. NHS masking failed on "123 456 7890" format (UK standard with spaces)
2. No validation on filter values - any string accepted
3. No validation on sort_by - unknown values silently ignored
4. Audit logging failure aborted search requests (500 error)
5. Certainty filtering not covered by composite index (performance degradation)

Rationale:
- Bug CogStack#1 (NHS masking): Privacy risk - malformed inputs could leak more digits
- Bug CogStack#2/CogStack#3 (validation): Security - unvalidated input, though safe from SQL injection
- Bug CogStack#4 (audit logging): Reliability - HIPAA logging must not break core functionality
- Bug CogStack#5 (index): Performance - Certainty filtering would trigger full table scan

Tests:
- Backend health: ✅ PASSING
- Migration 007: ✅ APPLIED (alembic version 007)
- Index verified: ✅ ix_extracted_entities_cui_meta_anns_with_certainty exists
- Enum validation: ✅ Pydantic will reject invalid values

CONTEXT.md Updates:
- Added "Bug Fixes: Patient Search Security & Performance" entry in Recent Changes
- Documented all 5 bugs fixed with before/after examples
- Included impact assessment (security, validation, reliability, performance)
- Noted bugs were user-reported from security review
- Status: All bugs fixed with no technical debt

Security Impact:
- NHS masking now secure against all input formats
- Filter/sort values validated at schema level
- Audit logging failures logged but don't disrupt service

Performance Impact:
- Certainty filtering now indexed (expected <50ms queries)
- All 4 meta-annotation filters now covered by composite index

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants