In [16]:
suppressMessages(suppressWarnings(library("tidyverse")))
suppressMessages(suppressWarnings(library("GenomicRanges")))
suppressMessages(suppressWarnings(library("rtracklayer")))
suppressMessages(suppressWarnings(library("BSgenome.Hsapiens.UCSC.hg38")))

Introduction to GRanges for genomics analysis  
https://research.stowers.org/cws/CompGenomics/Tutorial/GRanges/guide.html

Using information from BSgenome packages
https://kasperdanielhansen.github.io/genbioconductor/html/GenomicRanges_seqinfo.html

In [2]:
genome_hg38 = BSgenome.Hsapiens.UCSC.hg38
genome_hg38

Human genome:
# organism: Homo sapiens (Human)
# genome: hg38
# provider: UCSC
# release date: Feb 2019
# 640 sequences:
#   chr1                    chr2                    chr3                   
#   chr4                    chr5                    chr6                   
#   chr7                    chr8                    chr9                   
#   chr10                   chr11                   chr12                  
#   chr13                   chr14                   chr15                  
#   ...                     ...                     ...                    
#   chr19_KV575254v1_alt    chr19_KV575255v1_alt    chr19_KV575256v1_alt   
#   chr19_KV575257v1_alt    chr19_KV575258v1_alt    chr19_KV575259v1_alt   
#   chr19_KV575260v1_alt    chr22_KN196485v1_alt    chr22_KN196486v1_alt   
#   chr22_KQ458387v1_alt    chr22_KQ458388v1_alt    chr22_KQ759761v1_alt   
#   chrX_KV766199v1_alt                                                    
# (use 'seqnames()' to see all the sequence

## Create a GRange Object

[A quick introduction to GRanges and GRangesList objects](https://bioconductor.org/packages/devel/bioc/vignettes/GenomicRanges/inst/doc/GRanges_and_GRangesList_slides.pdf)
```
- Each genomic range is described by a chromosome name, a start, an end, and a
strand.
- start and end are both 1-based positions relative to the 5' end of the plus strand
of the chromosome, even when the range is on the minus strand.
- start and end are both considered to be included in the interval (except when the
range is empty).
- The width of the range is the number of genomic positions included in it. So
width = end - start + 1.
- end is always >= start, except for empty ranges (a.k.a. zero-width ranges) where
end = start - 1.
Note that the start is always the leftmost position and the end the rightmost, even
when the range is on the minus strand.
Gotcha: A TSS is at the end of the range associated with a transcript located on the
minus strand.
```

### Note that IRanges/GRanges are 1-based

In [26]:
ir = IRanges(
        start=c( 5,15, 46),
        end  =c(10,45, 46))
ir

IRanges object with 3 ranges and 0 metadata columns:
          start       end     width
      <integer> <integer> <integer>
  [1]         5        10         6
  [2]        15        45        31
  [3]        46        46         1

In [27]:
gr = GRanges(
    seqnames="chr1",                            
    ranges=IRanges(
        start=c( 5,15, 46),
        end  =c(10,45, 46)), 
    strand="+",
    seqlengths = seqlengths(genome_hg38))

mcols(gr)$score = c(10, 20, 30)
genome(gr) = "hg38"

print(gr)

GRanges object with 3 ranges and 1 metadata column:
      seqnames    ranges strand |     score
         <Rle> <IRanges>  <Rle> | <numeric>
  [1]     chr1      5-10      + |        10
  [2]     chr1     15-45      + |        20
  [3]     chr1        46      + |        30
  -------
  seqinfo: 640 sequences from hg38 genome


In [28]:
seqinfo(gr)

Seqinfo object with 640 sequences from hg38 genome:
  seqnames             seqlengths isCircular genome
  chr1                  248956422       <NA>   hg38
  chr2                  242193529       <NA>   hg38
  chr3                  198295559       <NA>   hg38
  chr4                  190214555       <NA>   hg38
  chr5                  181538259       <NA>   hg38
  ...                         ...        ...    ...
  chr22_KN196486v1_alt     153027       <NA>   hg38
  chr22_KQ458387v1_alt     155930       <NA>   hg38
  chr22_KQ458388v1_alt     174749       <NA>   hg38
  chr22_KQ759761v1_alt     145162       <NA>   hg38
  chrX_KV766199v1_alt      188004       <NA>   hg38

## Export the object with different file format using rtracklayer

In [29]:
export(gr, "test.bed", format = "Bed")

In [30]:
dat = read_tsv(
    "test.bed", 
    col_names=c("Chrom", "Start", "End", "Name", "Score", "Strand"), 
    show_col_types=FALSE)
dat

Chrom,Start,End,Name,Score,Strand
<chr>,<dbl>,<dbl>,<chr>,<dbl>,<chr>
chr1,4,10,.,10,+
chr1,14,45,.,20,+
chr1,45,46,.,30,+


if the seqInfo/seqlenths is not provided, the below error appear:
```
Error in .local(object, con, format, ...): Unable to determine seqlengths; either specify 'seqlengths' or specify a genome on 'object' that is known to BSgenome or UCSC
Traceback:

1. export(gr, "test.bw", format = "BigWig")
2. export(gr, "test.bw", format = "BigWig")
3. export(object, FileForFormat(con, format), ...)
4. export(object, FileForFormat(con, format), ...)
5. .local(object, con, format, ...)
6. stop("Unable to determine seqlengths; either specify ", "'seqlengths' or specify a genome on 'object' that ", 
 .     "is known to BSgenome or UCSC")
```

In [31]:
export(gr, "test.bw", format = "BigWig")