In [2]:
library(GenomicRanges)

Loading required package: stats4

Loading required package: BiocGenerics


Attaching package: ‘BiocGenerics’


The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs


The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, basename, cbind, colnames,
    dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which.max, which.min


Loading required package: S4Vectors


Attaching package: ‘S4Vectors’


The following objects are masked from ‘package:base’:

    expand.grid, I, unname


Loading required package: IRanges

Loading required package: GenomeInfoDb



## Rle

[GenomicRanges - Rle](https://kasperdanielhansen.github.io/genbioconductor/html/GenomicRanges_Rle.html)
```
An Rle (run-length-encoded) vector is a specific representation of a vector. The IRanges package implements support for this class. Watch out: there is also a base R class called rle which has much less functionality.

The run-length-encoded representation of a vector, represents the vector as a set of distinct runs with their own value. Let us take an example
```

In [6]:
rl = Rle(c(1,1,1,1,2,2,3,3,2,2))
rl

numeric-Rle of length 10 with 4 runs
  Lengths: 4 2 2 2
  Values : 1 2 3 2

```
This is a very efficient representation if

- the vector is very long
- there are a lot of consecutive elements with the same value
```

In [9]:
print(runLength(rl))
print(runValue(rl))
print(as.numeric(rl))

[1] 4 2 2 2
[1] 1 2 3 2
 [1] 1 1 1 1 2 2 3 3 2 2


## IRanges

In [5]:
IRanges(11:13,51:53)

IRanges object with 3 ranges and 0 metadata columns:
          start       end     width
      <integer> <integer> <integer>
  [1]        11        51        41
  [2]        12        52        41
  [3]        13        53        41

## GRanges

In [4]:
gr = GRanges("chrZ",IRanges(11:13,51:53))
gr

GRanges object with 3 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chrZ     11-51      *
  [2]     chrZ     12-52      *
  [3]     chrZ     13-53      *
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

In [33]:
gr <- GRanges(
    seqnames="chrZ",                            # sequence names (here chromosome Z)   
    ranges=IRanges(start=c(5,10),end=c(35,45)), # iranges
    strand="+",                                 # strand information
    seqlengths=c(chrZ=100L))                    # sequence lengths (here specified that chromosome z is 100 base pairs long)
print(gr)

GRanges object with 2 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chrZ      5-35      +
  [2]     chrZ     10-45      +
  -------
  seqinfo: 1 sequence from an unspecified genome


### genome

In [34]:
seqinfo(gr)

Seqinfo object with 1 sequence from an unspecified genome:
  seqnames seqlengths isCircular genome
  chrZ            100         NA   <NA>

In [35]:
genome(gr)

In [36]:
genome(gr) <- "hg19"
genome(gr)

In [37]:
seqinfo(gr)

Seqinfo object with 1 sequence from hg19 genome:
  seqnames seqlengths isCircular genome
  chrZ            100         NA   hg19

In [38]:
gr

GRanges object with 2 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chrZ      5-35      +
  [2]     chrZ     10-45      +
  -------
  seqinfo: 1 sequence from hg19 genome

### metadata columns

In [39]:
mcols(gr)

DataFrame with 2 rows and 0 columns

assign values

In [40]:
mcols(gr)$value <- c(-1,4)
mcols(gr)

DataFrame with 2 rows and 1 column
      value
  <numeric>
1        -1
2         4

after assigning

In [41]:
gr

GRanges object with 2 ranges and 1 metadata column:
      seqnames    ranges strand |     value
         <Rle> <IRanges>  <Rle> | <numeric>
  [1]     chrZ      5-35      + |        -1
  [2]     chrZ     10-45      + |         4
  -------
  seqinfo: 1 sequence from hg19 genome

### question: do I need to specfify sequence length?

In [16]:
gr <- GRanges(
    seqnames="chrZ",                            # sequence names (here chromosome Z)   
    ranges=IRanges(start=c(5,10),end=c(35,45)), # iranges
    strand="+")                                 # strand information
    #seqlengths=c(chrZ=100L))                   # sequence lengths (here specified that chromosome z is 100 base pairs long)
print(gr)

GRanges object with 2 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chrZ      5-35      +
  [2]     chrZ     10-45      +
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths


In [19]:
gr <- GRanges(seqnames = "chr1", strand = c("+", "-", "+"),
              ranges = IRanges(start = c(1,3,5), width = 3))
gr

GRanges object with 3 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chr1       1-3      +
  [2]     chr1       3-5      -
  [3]     chr1       5-7      +
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths