Skip to content

How to write the function called findoverlaps with minoverlap>=1? #48

@zhangchunyong999

Description

@zhangchunyong999

Hello,bioinformaticians.
I want to write a function called findovelaps like bioconductor R. I tried like the following.

using GenomicFeatures
using DataFrames
function number_interval(tp::Tuple)
	# Unpack Tuple.
	(i, interval) = tp

	# Setup numbered metadata.
	new_metadata = (
		i = i,
		original = GenomicFeatures.metadata(interval)
	)

	# Create new interval with numbered metadata.
	return Interval(
		seqname(interval),
		leftposition(interval),
		rightposition(interval),
		strand(interval),
		new_metadata
	)
end
function findoverlaps(query,subject)
    query_numbered= query|> enumerate .|> number_interval  
    subject_numbered=subject |> enumerate .|> number_interval 
    df = Vector{Tuple{Int64, Int64}}()
    for (q,r) in eachoverlap(query_numbered,subject_numbered)
        result=(
             GenomicFeatures.metadata(q).i, 
             GenomicFeatures.metadata(r).i
        )
        push!(df,result)
        
    end
    rename!(DataFrame(df),[:queryHits,:subjectHits])
end

col = [
	Interval("chr1", 10628, 10683, '?', "abc")
	Interval("chr1", 10643, 10779, '?', "abc")
	Interval("chr1", 10645, 10748, '?', "abc")
	Interval("chr1", 10648, 10786, '?', "abc")
] |> IntervalCollection

hhh = [
	Interval("chr1", 10631, 10638)
	Interval("chr1", 10633, 10635)
	Interval("chr1", 10636, 10650)
	Interval("chr1", 10638, 10649)
	Interval("chr1", 10641, 10651)
] |> IntervalCollection

I ran the function findoverlaps,it returned following.

julia> overlap=findoverlaps(col,hhh)
14×2 DataFrame
 Row │ queryHits  subjectHits
     │ Int64      Int64
─────┼────────────────────────
   1 │         1            1
   2 │         1            2
   3 │         1            3
   4 │         1            4
   5 │         1            5
   6 │         2            3
   7 │         2            4
   8 │         2            5
   9 │         3            3
  10 │         3            4
  11 │         3            5
  12 │         4            3
  13 │         4            4
  14 │         4            5

But I can not solve the problem with minoverlap=5.For example,hhh’s first line has more than 5 overlaps with col’s first ,I will output the index of col’s index and hhh’s index.hhh’s second line does not have 5 overlaps,it will not occur in the final dataframe.The function above seems to solve minoverlap=1.What should I do to solve this problem? Thank all guys for helping me!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions