Count how many chromosome in the range of the kmer #1249

Closed
Fei-Guang opened this Issue Nov 8, 2016 · 1 comment

Comments

Projects
None yet
1 participant
@Fei-Guang

Fei-Guang commented Nov 8, 2016

Hello, Team

i need the help on count the how many chromosome in the range of the kmer

    
val reads = sc.loadAlignments("/data/sample.rmdup.bam")

val rdd1 = reads.rdd.flatMap(read => {
  // check whether the read is mapped, lest we get a null pointer exception
  if (read.getReadMapped) {
    Some((read.getContigName, read.getStart))
  } else {
    None
  }
})
val rdd2= sc.textFile("/data/win_100k.use_50mer")
  .map(line => {
    // get the range from the rdd2.kmer file
    val columns = line.split("\\s+") // i assume this is tab delimited?
    val contig = columns(0)
    val start = columns(4).toLong
    val end = columns.last.toLong
    (contig, (start, end))
  })
scala>rdd2.take(10)

(chr1,(10001,20000))
(chr1,(30001,40000))
(chr2,(110001,260000))
(chr2,(160001,360000))
(chr3,(260001,410000))
(chr3,(360001,460000))
(chr3,(410001,560000))
(chr4,(460001,610000))
(chr4,(560001,660000))
(chr4,(610001,710000))
scala>rdd1.take(10)

(chr1,10001)
(chr1,10015)
(chr1,10026)
(chr1,10030)
(chr1,30038)
(chr2,110101)
(chr2,160001)
(chr3,360101)
(chr3,410101)
(chr4,610100)
(chr4,610001)

how to get the following count :

if rdd1[.(1)] == rdd2[.(1)] && rdd1[.(2)] in range of [rdd2[.(2)],lines[_.(3)] then 
count[rdd2(chr1,(10001,20000))] plus 1

the example result:

chr1,(10001,20000), 4
chr1,(30001,40000), 1
chr2,(110001,260000), 2

@Fei-Guang Fei-Guang closed this Nov 8, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment