Preserve extra icd_df columns in similarity output by 03bennej · Pull Request #8 · DiseaseNeuroGenomics/phecoder

03bennej · 2026-02-13T19:46:53Z

Previously, Phecoder.init dropped all columns from icd_df except icd_code and icd_string. Any additional columns the user provided (e.g. frequency, category, source) were lost. Now the full DataFrame is retained and extra columns are merged into the similarity.parquet output for both per-model runs and ensemble runs.

The ICD fingerprint (used for cache/skip logic) is still computed on only the two essential columns so that adding metadata columns does not invalidate embedding caches.

https://claude.ai/code/session_01L3hQVCXxdrWBSHoFLzmLyy

Previously, Phecoder.__init__ dropped all columns from icd_df except icd_code and icd_string. Any additional columns the user provided (e.g. frequency, category, source) were lost. Now the full DataFrame is retained and extra columns are merged into the similarity.parquet output for both per-model runs and ensemble runs. The ICD fingerprint (used for cache/skip logic) is still computed on only the two essential columns so that adding metadata columns does not invalidate embedding caches. https://claude.ai/code/session_01L3hQVCXxdrWBSHoFLzmLyy

03bennej merged commit a07519c into main Feb 13, 2026

03bennej deleted the claude/preserve-icd-columns-dFk0a branch February 13, 2026 19:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preserve extra icd_df columns in similarity output#8

Preserve extra icd_df columns in similarity output#8
03bennej merged 1 commit intomainfrom
claude/preserve-icd-columns-dFk0a

03bennej commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

03bennej commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants