Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up COCOA with matrix operations #31

Open
j-lawson opened this issue Jan 5, 2021 · 0 comments
Open

Speed up COCOA with matrix operations #31

j-lawson opened this issue Jan 5, 2021 · 0 comments

Comments

@j-lawson
Copy link
Collaborator

j-lawson commented Jan 5, 2021

Use matrix operations to calculate COCOA scores for data with discrete regions. For ATAC-seq data, the peak regions can be used. For DNA methylation data, it is probably best to use a tiled genome or create another segmentation, rather than having one row per CpG in the data matrix.

[x] Decide whether to group matrix chunks by region (e.g. chr1 for all samples) or by samples (e.g. all genome for subgroup of samples). Calculating COCOA in chunks might make it possible to not use tiling regions for DNA methylation but instead just keep each CpG as a row in the region set overlap matrix without having a matrix of unfeasible size.

[x] Make region set overlap that can be used for more complex calculations like scoring by percent overlap for multibase data or for single base data, averaging CpGs within each region before averaging all regions.

[x] Input checks for the correct format for functions involved in matrix scoring.

[x] Unit tests for creation of region set overlap matrix and scoring.

[] Update documentation.

[x] Update vignettes.

[x] Decide what output of "runCOCOA" will be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant