Skip to content

preprocessing

Pas-Kapli edited this page Sep 13, 2016 · 2 revisions

1.1. Infer alignment:

$ mafft Anolis.fas > Anolis_mafft.fas

Need to be much more careful with the alignment when using datasets with many indels (e.g. 16S rRNA)

1.2. Remove identical sequences:

$ raxmlHPC-PTHREADS-SSE3 -f c -s Anolis_mafft.fas -m GTRGAMMA -n read_alignment

On the screen and the RAxML_info.read_alignment file there is the following information

Found 13 sequences that are exactly identical to other sequences in the alignment.
Normally they should be excluded from the analysis.```	

A reduced file is automatically created: Anolis_mafft.fas.reduced in phylip format 

Check the difference: 

```$ head -n 1 Anolis_mafft.fas.reduced```

```$ grep ">"```

```grep ">" Anolis_mafft.fas | wc -l ```

_**Q: How many sequences were in the fasta file and how many are there in the reduced phylip file?**_

#### Convert phylip to fasta (we will need it for mptp):

``` $ phylip_to_fasta.py Anolis_mafft.fas.reduced ```

The output file will be named Anolis_mafft.fas.reduced.fasta

### 1.3. Infer phylogenetic tree with RAxML:

``` $ raxmlHPC-PTHREADS-SSE3 -s Anolis_mafft.fas.reduced.fasta -m GTRGAMMA -n Anolis -p $RANDOM -T 2 -o GBGC12094-13_Polychrus ```

_Using the -o argument we retrieve a rooted phylogeny, if not then we can root it with mptp later_

Clone this wiki locally