# File Formatting

Getting data into PED and MAP formats can be somewhat tricky since the data found in databases such as dbGaP and CGHub are not in any universally standardized format. When using data from these databases, you may come across any number of different file formats, including the following: 
    1. VCF format
    2. Microarray data
    3. A format made up by the researcher

The files that we are going to be using for this tutorial are already in PED and MAP formats, so you will not need to worry about converting the tutorial files. However, in future studies, you may need to use some of the techniques described below. 

### VCF Format


Variant Call Format is used to store gene sequence variation, frequently from next-generation or Sanger sequencing. These files can be easily modified to files in PED and MAP formats using the vcftools package (available at https://github.com/vcftools/vcftools) with the following command: 
```
vcftools --vcf <font color='blue'>input_file.vcf</font> --plink --out output_file
```
To use this command, you would simply need to change 'input_file.vcf' to your chosen input file name and cnahge 'output_file' to your chosen output file name. The resulting output files will have the name specified with the --out flag and the file extensions .ped and .map, ready for input to PLINK. 

### Microarray Format


Like other sequencing methods, DNA microarrays are used to genotype DNA and give their results as data files. There are many different types of microarrays, resulting in many different formatting conventions. These can be converted for use in PLINK, but as there are so many different varieties, doing so is beyond the scope of this tutorial.


### Custom Format


Many researchers choose to store their data in a format of their own choice. This can make it difficult to understand what the rows and columns within the data represent. It can also make it difficult to convert to a PED or MAP format. This is just one reason that it is so important to store data in a format that is easily recognizable and understandable for the rest of the scientific community. To convert this customized data into a PED or MAP format, it is often necessary to write a custom script. Unfortunately, writing such a script is beyond the scope of this tutorial. 


[Next Page](CompressFiles.ipynb)

[Previous Page](PLINKIntro.ipynb)

[Welcome Page](Welcome.ipynb)