Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 49 additions & 0 deletions docusaurus/docs/Guides/color-guide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
---
id: color-guide
title: Color Selection Guide
sidebar_label: Color Selection Guide
---

Some tools allow you to customize colors used in the output, specifically among the `figure-generation` tools. The following guide introduces the user to color customization in ScriptManager.

Color customization options available in the [Heatmap Labeler][heatmap-labeler], [Four Color Sequence Plot][four-color], [Two Color Heatmap][heatmap], [Three Color Heatmap][three-color-heatmap], and [Composite Plot][composite] tools.


## Color Selector Window (GUI)

When the user opens up the color selector window, they will see several tabs, each visualizing a different method for selecting a custom color.

The default/first tab ("Swatch") shows a bunch of color swatches for a fixed collection of colors to choose from.
![swatch-guide](/../static/md-img/swatch-guide.png)

The "HSV" tab allows the user to select a color based on the Hue, Saturation, and Value color system...
![hsv-guide](/../static/md-img/hsv-guide.png)

...while the "HSL" tab allows the user to select a color based on the Hue, Saturation, and Lightness color system...
![hsl-guide](/../static/md-img/hsl-guide.png)

...and the "RGB" tab allows the user to select a color based on the Red, Green, and Blue color system...
![rgb-guide](/../static/md-img/rgb-guide.png)

...and the "CMYK" tab allows the user to select a color based on the Cyan, Magenta, Yellow, and Black color system...
![cmyk-guide](/../static/md-img/cmyk-guide.png)


## Choosing colors from the command line (CLI)

ScriptManager's command-line tools typically indicate color using the `-c` flag followed one or more hexidecimal color strings shexstrings). The hexstrings are composed of a sequence of 6 characters (0-9 or A-F), where each pair of characters represent an Red, Green, and Blue value, and each pair encoding any value from 0-255. The help documentation points the user to [this url][color-hex-url] for users to browse colors and get the corresponding hexstring.

:::caution

User should not use the pound symbol `#` in front of the hexidecimal because it renders the token invisible to bash and thus, ScriptManager.

:::


[color-hex-url]:http://www.javascripter.net/faq/rgbtohex.htm

[four-color]:/docs/figure-generation/heatmap
[heatmap]:/docs/figure-generation/Four-color
[three-color-heatmap]:/docs/figure-generation/three-color-heatmap
[heatmap-labeler]:/docs/figure-generation/heatmap-labeler
[composite]:/docs/figure-generation/composite-plot
21 changes: 0 additions & 21 deletions docusaurus/docs/Guides/command-line.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,27 +93,6 @@ refer the user to the appropriate tool.
At any point in building a command, if you get stuck or are unsure of your options, use the `-h` flag to show options. This can list the available subcommands or parameter and argument options.


## Color Customization

Some tools allow you to customize colors used in the output, specifically among the `figure-generation` tools.

E.g. `composite`, `heatmap`, and `four-color`

Default colors are set for these tools so that no color needs to be specified for the program to execute. The following is an example of heatmap's default execution.

`java -jar ScriptManager.jar figure-generation heatmap nucleosomes.cdt `

In the case of `heatmap`, there are also preset color flags are available for the user to choose from.

`java -jar ScriptManager.jar figure-generation heatmap nucleosomes.cdt --blue`

However if you want to use a color outside the preset values, you can indicate RGB colors using _hexstrings_. These are a sequence of 6 characters, where each pair of characters represent an Red, Green, and Blue value, 0-255 each). The help documentation points the user to [this url](http://www.javascripter.net/faq/rgbtohex.htm) for users to browse colors and get the appropriate hexstring.

_Note user should not use the pound symbol `#` in front of the hexidecimal because it renders the token invisible to bash and thus, ScriptManager too_

`java -jar ScriptManager.jar figure-generation heatmap nucleosomes.cdt -c 9400D3`


## Output Options

### Default filename
Expand Down
32 changes: 19 additions & 13 deletions docusaurus/docs/bam-manipulation/bam-indexer.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,31 +9,37 @@ For most tools using BAM inputs (both within and without ScriptManager), a BAM i

ScriptManager's [TagPileup][tag-pileup], [Merge BAM replicates][merge-bam], [BAM Correlation][bam-correlation], and BAM Format Converter tools ([bam-to-bed][bam-to-bed], [bam-to-gff][bam-to-gff], [bam-to-bedgraph][bam-to-bedgraph], and [bam-to-scidx][bam-to-scidx]) are some example tools that require a `.bai` file.

![BAIIndexerWindow](/../static/md-img/BAMManipulation/BAIIndexerWindow.png)
<img src={require('/../static/md-img/BAMManipulation/BAIIndexerWindow.png').default} style={{width:70+'%'}}/>

After clicking "Index", ScriptManager will index all of the loaded index files and save them to the "Output Directory" location with the `.bai` extension. Output files follow convention in naming the `.bai` file. If you are indexing the file `sample123.bam`, then the index file will be called `sample123.bam.bai`.
After clicking "Index", ScriptManager will index all of the loaded index files and save them to the "Output Directory" location with the `.bai` extension. Output files follow convention in naming the `.bai` file. If you are indexing the file `sample123.bam`, then the index file will be called `sample123.bam.bai` and located within the same directory.

:::tip

It is standard practice to generate and save the index file in the same place as the `.bam` file it is indexing so that your bioinformatics tools can find it. Make sure your "Output Directory" is the same location as the input BAM files. This tool will not load files without the proper `.bam` extension.
It is standard practice to generate and save the index file in the same place with the same name as the `.bam` file it is indexing so that your bioinformatics tools can find it.

:::

### Command Line Interface
:::caution

BAM file **MUST** be [sorted][sort-bam] to successfully index.

:::

## Command Line Interface (Picard and Samtools)
_CommandLine tools already exist for this function. This tool only exists as a GUI wrapper in ScriptManager._

Please see the [Samtools index tool][samtools-index] or the [Picard BuildBamIndex tool][picard-index].



[samtools-index]:http://www.htslib.org/doc/samtools-index.html
[picard-index]:https://broadinstitute.github.io/picard/command-line-overview.html#BuildBamIndex

[bam-correlation]:bam-statistics/bam-correlation.md
[bam-to-bedgraph]:bam-format-converter/bam-to-bedgraph.md
[bam-to-bed]:bam-format-converter/bam-to-bed.md
[bam-to-gff]:bam-format-converter/bam-to-gff.md
[bam-to-scidx]:bam-format-converter/bam-to-scidx.md
[bed-to-gff]:coordinate-manipulation/bed-to-gff.md
[merge-bam]:bam-manipulation/merge-bam.md
[tag-pileup]:read-analysis/tag-pileup.md
[bam-correlation]:/docs/bam-statistics/bam-correlation.md
[bam-to-bedgraph]:/docs/bam-format-converter/bam-to-bedgraph.md
[bam-to-bed]:/docs/bam-format-converter/bam-to-bed.md
[bam-to-gff]:/docs/bam-format-converter/bam-to-gff.md
[bam-to-scidx]:/docs/bam-format-converter/bam-to-scidx.md
[bed-to-gff]:/docs/coordinate-manipulation/bed-to-gff.md
[merge-bam]:/docs/bam-manipulation/merge-bam.md
[sort-bam]:/docs/bam-manipulation/sort-bam
[tag-pileup]:/docs/read-analysis/tag-pileup.md
56 changes: 48 additions & 8 deletions docusaurus/docs/bam-manipulation/filter-pip-seq.md
Original file line number Diff line number Diff line change
@@ -1,27 +1,67 @@
---
id: filter-pip-seq
title: Filter PIPseq
sidebar_label: filter-pip-seq
title: Filter PIP-seq
sidebar_label: Filter PIP-seq
---

![filter-pip-seq](/../static/icons/bam-manipulation/FilterPIP-seq_square.svg)

Filter BAM file by -1 nucleotide. Requires genome FASTA file.

![PIP-seq figure 1a from Lai 2017 ](/../static/md-img/Lai_2017_PIPseq_F1a.jpeg)

Permanganate (KMnO4) and piperidine treatment of DNA fragments preferentially oxidizes and cleaves off the T (thymine) at the 5' end of single stranded DNA fragment. When analyzing data from sequencing assays like PIP-seq that use this treatment ([Lai et al, 2017][pip-seq-paper]), a filter step using this tool for reads that align to positions with a 'T' at the -1 position of the 5' end of read 1 can reduce the amount of noise (i.e. DNA fragments not cleaved by piperidine). This tool can potentially clarify signal in the downstream steps of your analysis.

<img src={require('/../static/md-img/BAMManipulation/FilterPIPseq.png').default} style={{width:70+'%'}}/>

### Genome input (FASTA)

Since the alignment files only capture the reference genome sequence at genomic positions covered by a read, the sequence upstream of the 5' end of Read 1 is not necessarily captured by the BAM file format. Thus the reference genome is required to determine the sequence upstream of the 5' end of read 1 (the basis for this filtering script).

:::note
Make sure that the genome build used for the genome input matches the genome aligned to for the BAM formatted files. If you aren't sure, compare the chromosoome lengths in the genomic FASTA index file (FAI) against each BAM file header (`samtools view -H myfile.bam`).
:::

### File inputs (BAM)

This script filters BAM-type files so make sure your input is properly formatted and uses the appropriate `.bam` extension. The script also supports bulk selection and processing of files.

### Output
The output file for this script is a filtered set of alignments in BAM format for each input BAM file. The `_PSfilter.bam` suffix is used for each output. For example, for a given `XXX.bam` input file, `XXX_PSfilter.bam` will be output the the user-selected output directory.

### Upstream sequence
The sequence upstream of the 5' end of read 1 to check for and filter by. If the sequence in the reference genome upstream of the 5'end of read 1 matches this sequence, the read-pair information is written to the output BAM file.

:::caution

For classic PIP-seq datasets the default "T" sequence should be used.

:::

This tool supports different sequences in the event an as of yet unknown future biochemical assay or analysis requires this filtering based on a different sequence. For example, a user investigating and comparing the rates of permanganate oxidation and piperdine cleavage at other nucelotides might compare BAM files filtered by other upstream sequences such as "C" which is known to be cleaved under such treatment (just not as frequently as at "T").

### Generate BAI file (GUI only)
By checking this box, the script will automatically generate a BAI index file for each new filtered BAM file.

:::note
The CLI cannot index the resulting BAM file. The user must use appropriate [samtools][samtools-index]/[Picard][picard-index] command to generate BAI.
:::

## Command Line Interface
Usage:
```bash
java -jar ScriptManager.jar bam-manipulation filter-pip-seq [-hV] [-f=<filterString>]
[-o=<output>] <bamFile> <genomeFASTA>
```

Description:

Filter BAM file by -1 nucleotide. Requires genome FASTA file. Note this program does not index the resulting BAM file and user must use appropriate samtools command to generate BAI.

<img src={require('/../static/md-img/BAMManipulation/FilterPIPseq.png').default} style={{width:70+'%'}}/>

### Output Options

| Option | Description |
| ------ | ----------- |
| `-o, --output=<output>` | specify output file (default=`<bamFileNoExt>_PSfilter.bam`) |
| `-f, --filter=<filterString>` | filter by upstream sequence, works only for single-nucleotide A,T,C, or G. (default seq='T')|

[samtools-index]:http://www.htslib.org/doc/samtools-index.html
[picard-index]:https://broadinstitute.github.io/picard/command-line-overview.html#BuildBamIndex

[pip-seq-paper]:https://pubmed.ncbi.nlm.nih.gov/27927716/
30 changes: 26 additions & 4 deletions docusaurus/docs/bam-manipulation/merge-bam.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,38 @@
---
id: merge-bam
title: Merge BAM
sidebar_label: merge-bam
sidebar_label: Merge BAM
---

![merge-bam](/../static/icons/bam-manipulation/BAMReplicateMerge_square.svg)

CommandLine tools already exist for this function. This tool only exists as a GUI wrapper in ScriptManager.
Merges Multiple BAM files into single BAM file. Sorting is performed automatically. RAM intensive process. If program freezes, increase JAVA heap size.

Please see the [Samtools merge tool][samtools-merge] or the [Picard MergeBamAlignment tool][picard-merge].
<img src={require('/../static/md-img/BAMManipulation/MergeBAM.png').default} style={{width:70+'%'}}/>

This is frequently used for replicate merging. All input files loaded will be saved to a merged BAM file with the default `merged_BAM.bam` but user-customizable filename in the "Output Directory".

:::tip

Make sure to keep the `.bam` file extension to follow bioinformatics best practices.

:::

### Use multiple CPUs
User may speed up the merging by checking this box to allow threading for parallelization of the merge and sort algorithms.

### Generate BAI file
By checking this box, the script will automatically generate a BAI index file for each new filtered BAM file.

:::note
The CLI cannot index the resulting BAM file. The user must use appropriate [samtools][samtools-index]/[Picard][picard-index] command to generate BAI.
:::

## Command Line Interface (Picard and Samtools)
_CommandLine tools already exist for this function. This tool only exists as a GUI wrapper in ScriptManager._

Please see the [Samtools merge tool][samtools-merge] or the [Picard MergeBamAlignment tool][picard-merge] for a command line tool that performs this function.

<img src={require('/../static/md-img/BAMManipulation/MergeBAM.png').default} style={{width:70+'%'}}/>

[samtools-merge]:http://www.htslib.org/doc/samtools-merge.html
[picard-merge]:https://broadinstitute.github.io/picard/command-line-overview.html#MergeBamAlignment
43 changes: 38 additions & 5 deletions docusaurus/docs/bam-manipulation/remove-duplicates.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,49 @@
---
id: remove-duplicates
title: Remove Duplicates
sidebar_label: remove-duplicates
title: Mark Duplicates (Picard)
sidebar_label: Mark Duplicates
---

![remove-duplicates](/../static/icons/bam-manipulation/MarkDuplicates_square.svg)

CommandLine tools already exist for this function. This tool only exists as a GUI wrapper in ScriptManager.
Removes or marks duplicate reads in paired-end sequencing given identical 5' read positions. _Read more in the [Picard documentation][picard-markdup]_.

Please see the [Samtools markdup tool][samtools-markdup] or the [Picard MarkDuplicates tool][picard-markdup].
<img src={require('/../static/md-img/BAMManipulation/MarkDuplicates.png').default} style={{width:70+'%'}}/>

<img src={require('/../static/md-img/BAMManipulation/BamManipulation:remove-duplicaites.png').default} style={{width:70+'%'}}/>
### File inputs (BAM list)

This script filters BAM-type files so make sure your input is properly formatted and uses the appropriate `.bam` extension. The script also supports bulk selection and processing of files.

### Output file (BAM)
The output filename for this script is based on a user-customizable text field that defaults to `merged_BAM.bam`.

:::tip
Make sure if you change the filename that you keep the `.bam` file extension.
:::

### Upstream sequence
The sequence upstream of the 5' end of read 1 to check for and filter by. If the sequence in the reference genome upstream of the 5'end of read 1 matches this sequence, the read-pair information is written to the output BAM file.

:::caution

For classic PIP-seq datasets the default "T" sequence should be used.

:::

This tool supports different sequences in the event an as of yet unknown future biochemical assay or analysis requires this filtering based on a different sequence. For example, a user investigating and comparing the rates of permanganate oxidation and piperdine cleavage at other nucelotides might compare BAM files filtered by other upstream sequences such as "C" which is known to be cleaved under such treatment (just not as frequently as at "T").

### Generate BAI file (GUI only)
By checking this box, the script will automatically generate a BAI index file for each new filtered BAM file.

:::note
The CLI cannot index the resulting BAM file. The user must use appropriate [samtools][samtools-index]/[Picard][picard-index] command to generate BAI.
:::


## Command Line Interface (Picard and Samtools)
_CommandLine tools already exist for this function. This tool only exists as a GUI wrapper in ScriptManager._

Please see the [Samtools markdup tool][samtools-markdup] or the [Picard MarkDuplicates tool][picard-markdup] for a command line tool that performs this function.

[samtools-markdup]:http://www.htslib.org/doc/samtools-markdup.html
[picard-markdup]:https://broadinstitute.github.io/picard/command-line-overview.html#MarkDuplicates
11 changes: 8 additions & 3 deletions docusaurus/docs/bam-manipulation/sort-bam.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,19 @@
---
id: sort-bam
title: Sort BAM
sidebar_label: sort-bam
sidebar_label: Sort BAM
---
![sort-bam](/../static/icons/bam-manipulation/BamManipulation:SortBAM.svg)

Sort BAM files in order to efficiently extract and manipulate. RAM intensive process. If program freezes, increase JAVA heap size.

![sort-bam](/../static/md-img/BAMManipulation/BamManipulation_SortBAM.png)
<img src={require('/../static/md-img/BAMManipulation/SortBAM.png').default} style={{width:70+'%'}}/>

CommandLine tools already exist for this function. This tool only exists as a GUI wrapper in ScriptManager.
Many bioinformatic files require sorting BAM files so that they can be efficiently parsed. It is good practice to keep your BAM files sorted.


## Command Line Interface (Picard and Samtools)
_CommandLine tools already exist for this function. This tool only exists as a GUI wrapper in ScriptManager._

Please see the [Samtools sort tool][samtools-sort] or the [Picard SortSam tool][picard-sort].

Expand Down
Loading