Skip to content

Suggestion: Standardize model.predict() output format across all models #161

@tnyamaguchi

Description

@tnyamaguchi

Hi team,

Thanks for the great work on biolearn. The modular model interface and GeoData integration make it really convenient to work with methylation-based predictors.

While using multiple models in a batch prediction workflow, I noticed that the model.predict() output format varies depending on the model:

  1. Most models return a DataFrame where:
  • Sample IDs are stored as the row index (pandas .index)
  • The predicted value is stored in a single "Predicted" column
  1. Some models (e.g. Zhang_10) return sample IDs as a regular "id" or "SampleID" column, rather than as the DataFrame index

  2. Models like TwelveCellDeconvoluteBloodEPIC return transposed matrices:

  • Rows = cell types
  • Columns = sample IDs (requiring a .T to match other models)

To ensure downstream code (e.g. batch pipelines, pd.concat, or tidy reshaping) works reliably, would it be possible to standardize the .predict() output for all models?

Really appreciate all the effort you've put into this framework!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinggood first issueGood for newcomers

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions