# Introduction to Plink


### PLINK


There are a number of tools that can be used to perform a GWAS. PLINK is perhaps the most commonly used tool for performing a GWAS and is the tool that we are going to be using for this tutorial. For this tutorial, you will not need to download PLINK, however, if you want to do GWAS in the future, you will have to download PLINK to your computer. To download, you can follow the link [here](http://pngu.mgh.harvard.edu/~purcell/plink/download.shtml) for instructions.

When performing a GWAS with PLINK, the files containing the data must be in PED and MAP formats. 


#### PED Format


The fields in a PED file are:

| Column #  | Description |
| :---------: | :------------: |
|   1         | Family ID number (Omitted in our tutorial example) |
|   2         | Sample ID number |
|   3         | Paternal ID number |
|   4         | Maternal ID number |
|   5         | Sex (1 = Male, 2 = Female, any other number = unknown) |
|   6         | Phenotype (1 = Unaffected with the disease, 2 = Affected with the disease, 0 = Disease status unknown, Any other number = Values for quantitative trait analysis) |
|   7         | The remaining columns list the participant’s genotype information. A 0 indicates that nucleotide at the particular base pair is unknown |

The following file is the hapmap1.ped file from the data that we are using for this tutorial. Notice that the ‘Family ID’ column was not used for this dataset. If you open the hapmap1.ped file in Vim, the file will likely look slightly different unless you have turned off the automatic line-wrapping setting that Vim has, however, the information in the file is the same. 


To view the data directly, create a new cell (like you did on the [first page](Welcome.ipynb)) and run the code below. The head command below with n = 1 with display the first line of the hapmap1.ped file

```
%%bash
head -n 1 plink_data/hapmap1.ped
```

In [6]:
%%bash
head -n 1 plink_data/hapmap1.ped

HCB181 1 0 0 1 1 2 2 2 2 2 2 1 2 2 2 2 2 2 2 0 0 2 2 2 2 1 1 2 2 2 2 2 2 0 0 0 0 2 2 1 1 1 1 2 2 2 2 2 1 2 2 2 2 2 2 1 1 2 2 2 2 2 2 2 2 2 2 1 2 2 2 1 2 1 2 1 2 0 0 1 2 2 2 2 2 0 0 2 2 1 1 1 2 1 1 2 1 1 2 2 2 0 0 2 2 2 2 2 2 2 2 2 2 2 2 1 2 0 0 2 2 2 2 1 2 2 2 1 2 2 2 1 1 2 2 2 2 0 0 2 2 1 2 1 1 0 0 1 2 2 2 2 2 2 2 1 2 1 2 2 2 2 1 2 2 2 2 1 1 2 2 2 2 2 2 2 2 2 2 2 1 1 2 2 2 2 2 2 2 1 2 1 2 1 1 1 2 1 2 1 2 0 0 2 2 2 2 2 2 2 2 2 2 2 2 1 1 2 1 1 2 2 1 2 1 0 0 1 2 2 2 1 2 0 0 2 1 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 1 2 1 1 1 2 2 2 1 2 2 2 2 2 1 2 2 2 1 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 0 0 0 0 1 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 0 0 1 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 1 2 1 2 2 2 0 0 2 2 2 2 2 1 2 2 1 1 2 2 1 2 2 2 2 2 2 2 2 2 1 1 1 2 1 1 2 2 1 2 1 1 2 2 0 0 2 1 1 2 1 2 2 2 2 2 1 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 2 2 2 2 2 2 2 2 2 2 2 2 0 0 1 1 2 2 1 1 1 2 2 2 2 2 0 0 2 2 1 2 2 1 1 2 2 2 0 0 1 1 2 2 0 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 1 2 1 2 1 2 1 2 2 2

#### MAP Format


The fields in a MAP file are:

| Column #  | Description |
| :---------: | :------------: |
|   1         | Chromosome number |
|   2         | Marker ID (reference SNP cluster ID, used to access the SNP in dbSNP) |
|   3         | Genetic Distance (This is not used for most analyses, so it frequently defaults to 0 or is excluded) |
|   4         | Physical position in base pairs |

The following file is the hapmap1.map file from the data that we are using for this tutorial. Notice that the ‘Genetic Distance’ column was defaulted to 0 for this dataset. If you open the hapmap1.ped file in Vim, the file will likely look slightly different unless you have turned off the automatic line-wrapping setting that Vim has, however, the information in the file is the same. 


Again, to view the data directly, create a new cell and paste in the same code, changing the file name "hapmap1.ped" from the earlier example to "hapmap1.map". You may want to increase the number of lines shown, as well, since the lines in the map file are shorter and not so overwhelming.

In [8]:
%%bash
head -n 10 plink_data/hapmap1.map

1 rs6681049 0 1
1 rs4074137 0 2
1 rs7540009 0 3
1 rs1891905 0 4
1 rs9729550 0 5
1 rs3813196 0 6
1 rs6704013 0 7
1 rs307347 0 8
1 rs9439440 0 9
1 rs3128342 0 10


[Next Page](FileFormatting.ipynb)

[Previous Page](DataAcquisition.ipynb)

[Werlcome Page](Welcome.ipynb)