Skip to content

Preserve extra icd_df columns in similarity output#8

Merged
03bennej merged 1 commit intomainfrom
claude/preserve-icd-columns-dFk0a
Feb 13, 2026
Merged

Preserve extra icd_df columns in similarity output#8
03bennej merged 1 commit intomainfrom
claude/preserve-icd-columns-dFk0a

Conversation

@03bennej
Copy link
Copy Markdown
Collaborator

Previously, Phecoder.init dropped all columns from icd_df except icd_code and icd_string. Any additional columns the user provided (e.g. frequency, category, source) were lost. Now the full DataFrame is retained and extra columns are merged into the similarity.parquet output for both per-model runs and ensemble runs.

The ICD fingerprint (used for cache/skip logic) is still computed on only the two essential columns so that adding metadata columns does not invalidate embedding caches.

https://claude.ai/code/session_01L3hQVCXxdrWBSHoFLzmLyy

Previously, Phecoder.__init__ dropped all columns from icd_df except
icd_code and icd_string. Any additional columns the user provided
(e.g. frequency, category, source) were lost. Now the full DataFrame
is retained and extra columns are merged into the similarity.parquet
output for both per-model runs and ensemble runs.

The ICD fingerprint (used for cache/skip logic) is still computed on
only the two essential columns so that adding metadata columns does not
invalidate embedding caches.

https://claude.ai/code/session_01L3hQVCXxdrWBSHoFLzmLyy
@03bennej 03bennej merged commit a07519c into main Feb 13, 2026
@03bennej 03bennej deleted the claude/preserve-icd-columns-dFk0a branch February 13, 2026 19:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants