### The general outline of this document is correct, but specific variable names, etc., may be incorrect. Please refer to the main [Usage Instructions](usage-instructions.ipynb) for up-to-date information.

## Creating the reference genome index files

This document describes the steps necessary to prepare a reference genome and matching annotation for use in the Rp-Bp and Rp-chi pipelines. The process must only be run once for each reference genome and set of annotations.

It shows some sample calls. For all programs, the `--help` option can be given to see the complete list of parameters.

**Input**
* Reference genome sequence
* GFF3/GTF annotations matching the reference sequence
* Ribosomal sequence

Please see the [usage instructions](usage-instructions.ipynb#creating-reference-genome-indices) for a high-level overview of this process.

### Building the ribosomal sequence index

The ribosomal sequence Bowtie2 index must be created with `bowtie2-build-s`.

In [None]:
bowtie2-build-s /path/to/my/input/ribosomal-fasta.fa /path/to/my/output/ribosomal-index

### Creating the STAR index

The `STAR` genome index is created for mapping reads to the genome. The `create-star-reference` script creates the reference. It is a light wrapper around `STAR -runMode genomeGenerate`.

In [None]:
create-star-reference /path/to/my/input-annotations.gtf /path/to/my/input/reference-sequence.fa /path/to/my/output/star-index --num-procs p --mem m

### Extracting spliced transcript sequences

First, spliced transcript sequences are extracted using `extract-transcript-fasta`. This script is a light wrapper around `gffread` from the Cufflinks package.

In [None]:
extract-transcript-fasta /path/to/my/input-annotations.gtf /path/to/my/input/reference-sequence.fa /path/to/my/output/transcript-sequences.fa

### Extracting ORFs from transcripts

The open reading frames within the spliced transcript sequences are identified using `extract-orfs`. This script uses pybedtools.

In [None]:
extract-orfs /path/to/my/input/transcript-sequences.fa /path/to/my/output/orfs.bed.gz --num-procs p