-
Notifications
You must be signed in to change notification settings - Fork 31
Open
Labels
bugSomething isn't workingSomething isn't workinggood first issueGood for newcomersGood for newcomers
Description
Hi team,
Thanks for the great work on biolearn. The modular model interface and GeoData
integration make it really convenient to work with methylation-based predictors.
While using multiple models in a batch prediction workflow, I noticed that the model.predict()
output format varies depending on the model:
- Most models return a DataFrame where:
- Sample IDs are stored as the row index (pandas .index)
- The predicted value is stored in a single "Predicted" column
-
Some models (e.g. Zhang_10) return sample IDs as a regular "id" or "SampleID" column, rather than as the DataFrame index
-
Models like
TwelveCellDeconvoluteBloodEPIC
return transposed matrices:
- Rows = cell types
- Columns = sample IDs (requiring a
.T
to match other models)
To ensure downstream code (e.g. batch pipelines, pd.concat, or tidy reshaping) works reliably, would it be possible to standardize the .predict()
output for all models?
Really appreciate all the effort you've put into this framework!
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workinggood first issueGood for newcomersGood for newcomers