Skip to content

Commit

Permalink
Merge b37ff56 into abd06ac
Browse files Browse the repository at this point in the history
  • Loading branch information
alperuzun committed Aug 1, 2019
2 parents abd06ac + b37ff56 commit 4f70e45
Showing 1 changed file with 16 additions and 16 deletions.
32 changes: 16 additions & 16 deletions docs/src/filtering_vcf.md
@@ -1,14 +1,14 @@
#Variant Record and Sample Selection
# Filtering your VCF file: Variant Record and Sample Selection

##General Note:s: Extracting and Reshaping VCF Data
## General Notes: Extracting and Reshaping VCF Data

VIVA supports flexible filters for selecting variant records for visualization.

Additionally, the tool supports selecting and grouping samples by common traits for visualization.

Grouping samples is particularly useful for exploring phenotypic and genotypic associations, displaying differential distribution of variants between groups of samples, and identifying batch effect on coverage between groups of samples in variant analysis experiments.

##Choose a VCF file to Visualize *REQUIRED*
## Choose a VCF file to Visualize *REQUIRED*

Specify filename of VCF file.

Expand All @@ -19,16 +19,16 @@ Specify filename of VCF file.
*Note*: This is the *only required argument* for VIVA. If you run with none of the other options, default options will be used. These default options are described in detail below.

```
julia viva -f example.vcf [OPTIONS]
viva -f example.vcf [OPTIONS]
```

##Selecting Variant Records
## Selecting Variant Records

VIVA offers three filters for selecting variant records to visualize from VCF files.

It is recommended to use one or a combination of these filters to reduce the number of variant records extracted from the VCF for plotting. This is recommended for reasons related to technical limitations and practical visual interpretation. The number of variant records able to be plotted is limited by both the user's available computing resources as well as the number of pixels in their display for displaying data points. While it is possible to visualize many thousands of variant records at one time with VIVA, **we recommend visualizing fewer than 2000 variants** so that all data points can be displayed that your computing resources are not overburdened. However, VIVA is capable of extracting and plotting hundreds of thousands of data points from VCF files.

###Genomic range
### Genomic range

Select rows within a given genomic range.

Expand All @@ -39,10 +39,10 @@ Select rows within a given genomic range.
*Note*: To visualize genomic ranges within multiple chromosomes, you may create a batch script to run VIVA multiple times using different genomic ranges.

```
julia viva -f example.vcf -r chr1:20000-30000000
viva -f example.vcf -r chr1:20000-30000000
```

###Variant list
### Variant list

Select variants matching list of chromosomal positions.

Expand All @@ -51,10 +51,10 @@ Select variants matching list of chromosomal positions.
*arguments*: Provide filename of text file formatted with two columns in .csv format as an argument. There should be a header row with "chr" and "start" in row 1 of column 1 and 2 respectively. Column 1 should contain chromosome number in the format "chr1" or "1" and should match the syntax of the VCF file (that is, if the VCF file lists chromosome numbers in the form "chrX", use "chrX" in your positions list, not "X") You can find an example of this file [here]("[here]("https://github.com/compbiocore/VariantVisualization.jl/tree/master/test/test_files/positions_list.csv")")

```
julia viva -f example.vcf -l "example_positions_list.txt"
viva -f example.vcf -l "example_positions_list.txt"
```

###Pass filter
### Pass filter

Select rows that passed filters originally set during variant calling and VCF file generation. Selects records with "PASS" in the FILTER column of the VCF file. This filter alone is often not stringent enough to reduce the number of variants for plotting and visual interpretation. For analyzing large VCF files with many "passed" filter records, use genomic range,

Expand All @@ -63,12 +63,12 @@ Select rows that passed filters originally set during variant calling and VCF fi
*arguments*: This flag is a positional argument and does not take options.

```
julia viva -f example.vcf -p
viva -f example.vcf -p
```

##Selecting and Grouping Samples
## Selecting and Grouping Samples

###Group samples by sample metadata traits
### Group samples by sample metadata traits

Group sample columns using your sample metadata and visualize metadata attributes in a colorbar above heatmap visualizations.

Expand All @@ -94,10 +94,10 @@ Metadata traits are stored as rownames in the first column of the table and shou
This matrix should be saved as a comma delimited .csv file. Microsoft Excel is commonly used for this purpose, but sometimes creates extra delimiter characters in the output file that produce an error in VIVA. You can check to make sure the .csv file was saved properly by opening the file with a text editor such as BBEdit to inspect for and delete empty values or extra delimiter characters at the end of each row.

```
julia viva -f example.vcf -g sample_metadata_matrix.csv case,control
viva -f example.vcf -g sample_metadata_matrix.csv case,control
```

###Select samples to include in visualization
### Select samples to include in visualization

Select specific samples to be extracted from the VCF for visualization.

Expand All @@ -108,5 +108,5 @@ Select specific samples to be extracted from the VCF for visualization.
*Note*: To use the sample selection feature in combination with the sample grouping feature, the sample metadata matrix must only contain the sample ids to be selected.

```
julia viva -f example.vcf --select_samples select_samples_list.txt
viva -f example.vcf --select_samples select_samples_list.txt
```

0 comments on commit 4f70e45

Please sign in to comment.