-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Two questions about the pairs format #56
Comments
Hi Heng Li,
1. That's an interesting point, in terms of optimizing space. I think it
depends on the downstream process. For example, if you want to map
restriction enzyme sites to each of the two mates, the absolute strand of
each mate would would make a difference, because one would expect the
relevant restriction site would be on the 3' side of the read.
2. We didn't specify the first column as a key, so it's legitimate to have
the same read id multiple times. Currently we don't have a formal
recommendation about how to encode triplets in a pairs file. Do you have a
case where you'd like to add triplets?
Best,
Soo
…On Mon, Apr 9, 2018, 12:26 PM Heng Li ***@***.***> wrote:
1.
The example
<https://github.com/4dn-dcic/pairix/blob/master/pairs_format_specification.md#example-pairs-file>
in the spec gives two strands, one of each pos1 and pos2. However, I
speculate that only one relative strand (=strand1*strand2) is needed. For
example, do these two lines make difference in downstream processing?
EAS139:136:FC706VJ:2:1286:25:275154 chr1 30000 chr3 40000 + -
EAS139:136:FC706VJ:2:1286:25:275154 chr1 30000 chr3 40000 - +
2.
Another example in the spec shows that only one of the following two
lines should be retained:
EAS139:136:FC706VJ:2:1286:25:275154 chr1 10000 chr2 2000 + +
EAS139:136:FC706VJ:2:1286:25:275154 chr2 2000 chr1 10000 + +
which makes sense. However, is it legitimate to encode a triplet with
identical first column like
EAS139:136:FC706VJ:2:1286:25:275154 chr1 10000 chr2 2000 + +
EAS139:136:FC706VJ:2:1286:25:275154 chr2 2000 chr3 10000 + +
Thanks!
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#56>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA63bG-ulojdJpNzHTsY8qg36xFl2pFVks5tm4u9gaJpZM4TM1iz>
.
|
Thanks, @SooLee! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The example in the spec gives two strands, one of each pos1 and pos2. However, I speculate that only one relative strand (=strand1*strand2) is needed. For example, do these two lines make difference in downstream processing?
Another example in the spec shows that only one of the following two lines should be retained:
which makes sense. However, is it legitimate to encode a triplet with identical first column like
Thanks!
The text was updated successfully, but these errors were encountered: