Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keeping low number of duplicates #76

Open
phoebe460 opened this issue Apr 11, 2024 · 3 comments
Open

Keeping low number of duplicates #76

phoebe460 opened this issue Apr 11, 2024 · 3 comments

Comments

@phoebe460
Copy link

Hi EPIC2 Developers,

First off, thank you for creating a great peak calling tool. I am planning to use this for my own analysis for ChIP-seq data actually. In that case, I am wondering if there is a way to keep duplicates using --keep-duplicates but setting it to just a low number for instance like 1 instead of just to True, which would remove the majority of PCR duplicates found but still keep a low number of duplicates?

A similar thing can be done using their keep duplicates flag in MACS3 as follows:

--keep-dup

It controls the MACS3 behavior towards duplicate tags at the exact same location – the same coordination and the same strand. You can set this as auto, all, or an integer value. The auto option makes MACS3 calculate the maximum tags at the exact same location based on binomial distribution using 1e-5 as p-value cutoff; and the all option keeps every tag. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location. Default: 1

If you can clarify this for me before I start using your program, then that would be greatly appreciated.

Thank you,
Phoebe

@endrebak
Copy link
Member

This is something I could consider. I think it makes sense. It should not be hard to allow keeping some duplicates, even though it will make the runtime a bit longer.

@phoebe460
Copy link
Author

phoebe460 commented Apr 12, 2024

Hi @endrebak,

Thank you for your reply back. Sure, if there is anyway you could support this kind of implementation into your tool, then that would be awesome. It will definitely help me with my own analysis for the ChIP-seq dataset I am currently working with needs to have some but not all duplicates retained.

Keep me posted,
Phoebe

@endrebak
Copy link
Member

I will not have the time to do this anytime soon. You can preprocess the data yourself and use --keep-duplicates

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants