# ClineHelpR

Plot BGC and INTROGRESS genomic cline results and correlate INTROGRESS clines with environmental variables.

ClineHelpR allows you to plot BGC (Bayesian Genomic Cline) output. After we ran BGC, we realized it wasn't easy to plot the BGC results, so we put together this package in the process of figuring it out.

Our package allows you to make several plots.

The BGC and INTROGRESS software packages are described elsewhere (Gompert and Buerkle, 2010, 2011, 2012; Gompert et al., 2012a, 2012b).

## Software Flow Diagram

<img src="img/flowchart_ClinePlotR.png" width=100% height=100% />

## Example Dataset

All example data are available from a Dryad Digital Repository (https://doi.org/10.5061/dryad.b2rbnzsc8), as the files are too large for GitHub. To run the example data, download the exampleData directory from DRYAD, then run the R scripts in the ClineHelpR/scripts directory.

## Installation  

Below are some instructions for how to install ClineHelpR and its dependencies.

### Dependencies

ClineHelpR has several dependencies, most of which can be installed using Anaconda3.

The bgcPlotter functions require:

+ data.table
+ dplyr
+ bayestestR
+ scales
+ reshape2
+ ggplot2
+ forcats
+ gtools
+ RIdeogram
+ gdata
+ adegenet

The environmental functions require:

+ ENMeval
+ rJava
+ raster
+ sp
+ dismo

The INTROGRESS functions require:

+ introgress (not available from conda)
+ ggplot2
+ dplyr
+ scales

The vcf2bgc.py script requires:
+ Python >= 3.4 and Python <= 3.6
+ pyVCF


### Installing the Dependencies  

Most of the dependencies can be installed with Anaconda3. The only one that cannot be installed via conda is the 
Introgress R package. Below is a conda command that can be used to install all the other dependencies:

```
conda create -n clinehelpr python=3.6
conda activate clinehelpr
conda install -c conda-forge r-base r-dplyr r-bayestestr r-scales r-reshape2 r-ggplot2 r-forcats r-gtools r-rideogram r-gdata r-adegenet r-enmeval r-rjava r-raster r-sp r-dismo r-devtools
```

To install the additional pyVCF dependency for vcf2bgc.py:

```conda install -c bioconda pyvcf```

In our experience, installing conda packages from conda-forge and bioconda works better with R packages than the default anaconda channel.  


### Installing ClineHelpR

To install ClineHelpR, you can do the following:

```
# If you don't already have devtools installed, uncomment the next line
# install.packages("devtools")

# Install ClineHelpR
devtools::install_github("btmartin721/ClineHelpR")
``` 

Now load the library.  
```
library("ClineHelpR")
```

## Step 1: Data Filtering (optional)  

Data filtering is important for reducing noise and preventing uninformative sites from swamping out real signals in the data. Accordingly, we recommend applying missing data and minor allele frequency filters. 

Additionally, non-biallelic sites should be removed because BGC assumes that all sites are bi-allelic.  

While filtering the data is outside the scope of ClineHelpR, below is a link to a GitHub Repository containing two scripts, *nremover.pl* and *phylipFilterPops.pl*, that can be used for appropriate data filtering:

https://github.com/tkchafin/scripts

+ *nremover.pl* applies per-individual and per-site missing data filters, bi-allelic, and minor allele frequeny filters.

+ *phylipFilterPops.pl* can apply missing data filtering per-population. 

There are also numerous useful file conversion scripts in this repository that you might be interested in using.

## Step 2: File Conversion

The input data must be converted to the custom BGC file format. ClineHelpR includes a Python script to convert a VCF (variant call format) file to BGC format. 

+ ClineHelpR/scripts/*vcf2bgc.py* Convert VCF file to BGC format

Alternatively, two scripts included in the https://github.com/tkchafin/scripts GitHub repository can handle other file formats.  

+ *phylip2bgc.pl* Convert a PHYLIP file to BGC format

+ *phylip2introgress.pl* Convert a PHYLIP file to INTROGRESS format

Here we demonstrate the *vcf2bgc.py* script on the example data.

First, let's bring up the help menu to see the options:

In [1]:
%run ../scripts/vcf2bgc.py -h

usage: vcf2bgc.py -v VCF -m POPMAP --p1 P1 --p2 P2 --admixed ADMIXED
                  [-o OUTPREFIX] [-l] [-h]

Convert VCF file to BGC format (with genotype uncertainties). Currently only
handles three populations maximum (P1, P2, and Admixed).

Required Arguments:
  -v VCF, --vcf VCF     Input VCF file
  -m POPMAP, --popmap POPMAP
                        Two-column tab-separated population map file: inds
                        pops. No header line.
  --p1 P1               Parental population 1
  --p2 P2               Parental population 2
  --admixed ADMIXED     Admixed population (limit=1 population)

Optional Arguments:
  -o OUTPREFIX, --outprefix OUTPREFIX
                        Specify output prefix for BGC files.
  -l, --linkage         Toggle to create a linkage map file for BGC's linkage
                        model. Only use if you have a reference-mapped VCF
                        file; default = off.
  -h, --help            Displays this help menu


The popmap (population map) file contains two columns separated by a tab. The first column should contain sampleIDs for each individual in the dataset. The second column should contain the population IDs for each sample.  There should not be a header line. 

For example: 

```
Ind1    parent1
Ind2    parent1
Ind3    parent1
Ind4    parent2
Ind5    parent2
Ind6    parent2
Ind7    admix
Ind8    admix
Ind9    admix
```

The parental populations, p1 and p2, should be identified with the ```--p1``` and ```--p2``` arguments, and the admixed population should be identified with the ```--admixed``` arguments. 

E.g., 
```../scripts/vcf2bgc.py -v INPUT_VCF_FILE -m POPMAP_FILE --p1 --p1 parent1 --p2 parent2 --admixed admix```

Below we run vcf2bgc.py on the example dataset.