tidyCovariates is extremely slow and resource intensive for large data when using Andromeda >= 1.0.0

The new Andromeda makes most operations including `tidyCovariates()` much faster, but not when the covariate data is very large (e.g. in my case 2 million subject with > 160k covariates). This is caused by inefficiencies in how DuckDB handles the combination of filtering by covariate ID and normalization.

I have created a fix that reduced the processing time for my data from > 3 hours (at 3 hours it was at 3% and I stopped it) to 3 minutes.

I post a PR.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

tidyCovariates is extremely slow and resource intensive for large data when using Andromeda >= 1.0.0 #308

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

tidyCovariates is extremely slow and resource intensive for large data when using Andromeda >= 1.0.0 #308

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions