### Directions for inputting a VCF (Variant Call Format) SNP file into Haploview

#### by [Margaret Antonio](github.com/mmlantonio/) 16.08.24 

#### Overview

Haploview has useful visualization and analysis tools for haplotype and LD. We can convert a VCF file into PLINK format (.ped and .map files) for input as LINKAGE FORMAT in Haploview. 


#### TOOLS Used (see linked manuals for more information)

* [VCF TOOLS](http://vcftools.sourceforge.net/man_latest.html)
* [PLINK](http://pngu.mgh.harvard.edu/~purcell/plink/)
* [HAPLOVIEW](https://www.broadinstitute.org/scientific-community/science/programs/medical-and-population-genetics/haploview/haploview)


#### Command line

1. Remove indels from VCF file (Haploview LINKAGE FORMAT does not accept INDELS)

    ```bash
    vcftools --vcf ../inputs/snpEff/snpEff_GATK_filtered.vcf --remove-indels --recode --recode-INFO-all --out gatk_filt_SNPonly 
    ```

2. Convert VCF to PLINK format 

    ```bash
    vcftools --vcf gatk_filt_SNPonly.recode.vcf --plink --out gatk_filt_SNPonly
    ```

3. OPTIONAL. In plink, update family and individual IDs. VCF files only have individual IDs, so this is the time to specify the family IDs. If no Family ID, the just use the Individual ID.)

    ```bash
   plink --file gatk_filt_SNPonly --update-ids update/newIDs.txt --recode --out edit5_gatk_filt_SNPonly 
    
    #ID file has 4 columns: Old-FamilyID Old-IndivID New-FamilyID New-IndivID
    #Example=> SM:DH10 SM:DH10 DH10 DH10
    #          SM:DH12 SM:DH12 DH12 DH12
    ```

4. OPTIONAL. In Plink, update phenotypes where 1=unaffected and 2=affected. 

    ```bash
   plink --file edit5_gatk_filt_SNPonly --pheno update/newPheno.txt --recode --out edit6_gatk_filt_SNPonly 
    
     #Phenotype file has 3 columns: FamilyID IndivID Phenotype
     #Example=> DH8     DH8	    2
     #          Ki2007	Ki2007	1
    ```

5. OPTIONAL. In Plink, update sex.

    ```bash
   plink --file edit6_gatk_filt_SNPonly --update-sex update/newSEX.txt --recode --out edit7_gatk_filt_SNPonly 
    
    #File in --update-sex contains 3 columns: FamilyID IndivID Sex. Use 0 for unknown, 1 for female, and 2 for male
    ```
6. Open .map file as txt file in Excel and remove all other columns, so only 2 columns remain: SNP-marker (chr:pos) and POS(bp).

7. Launch Haploview and upload .ped and .map files as DATA FILE and LOCUS INFORMATION, respectively, under the LINKAGE FORMAT data type tab

    ```bash
    java -jar path/to/Haploview.jar
    ```
