Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CpG informative sites as index in PAMESdata #7

Closed
GMFranceschini opened this issue Jun 10, 2019 · 6 comments
Closed

CpG informative sites as index in PAMESdata #7

GMFranceschini opened this issue Jun 10, 2019 · 6 comments

Comments

@GMFranceschini
Copy link

I noticed that PAMES sites in PAMESdata collection (450k sites) are indicated as indexes rather than probe names. That might cause troubles with compute_purity() if those sites are used with a Beta table that is not exactly as expected by the function (indexes matching the proper probe).

Ex. if a Beta table is smaller or larger, indexes will use the wrong sites. I hope this makes sense to you, what scared me the most is that when this problem happens PAMES returns no error at all, but of course, the purity estimation at that point is wrong (using wrong sites).

Please let me know if I can help by any means, this is not urgent but I think you might want to address that in the future.

@romagnolid
Copy link
Collaborator

romagnolid commented Jun 12, 2019

You are right, maybe it's better to include both indexes and probe names or probe names only. The latter case presents a problem if a beta table has no probe names associated but it would avoid a selection of wrong sites

@GMFranceschini
Copy link
Author

Great! A momentary solution could be to check for the expected dimension of the input beta matrix. This would require minimal effort and return an error if the output doesn't match the expected probe set dimension, probably avoiding the situation in most of the cases

romagnolid added a commit that referenced this issue Aug 26, 2019
@romagnolid
Copy link
Collaborator

I won't close the issue for now, let's see how the temporary solution plays out.

@jtlow
Copy link

jtlow commented Jun 28, 2021

Hi, I'm trying out PAMES for the first time to calculate tumor purity for some 450k array data. It seems like the tumor_table needs to be exactly the same number of rows as the ref_table, is that correct? Are there any workarounds for cases where tumor_table may be missing data?

@romagnolid
Copy link
Collaborator

Hi @jtlow, thanks for using our package!
That is correct, it is a way to ensure that the indexes of the CpG sites have the same correspondence between the tumor table and the reference (I might change that in a future release, maybe using CpG probe names instead of indexes).

First of all, a simple workaround is to pass the same object (a matrix of beta-values coverted to percentage) to both the tumor_table and ref_table args in the function compute_purity but you must be absolutely certain that the CpG sites you are passing have the correct correspondence.

For sparce missing data or even entire rows/CpG sites missing, you don't need to worry, the purity will be computed anyway as long as your table has the classic Illumina 450k format.
Otherwise, if your table has a different format you can provide me other details and I can help you find a solution.

@romagnolid
Copy link
Collaborator

New version uses probe names instead of indexes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants