-
Notifications
You must be signed in to change notification settings - Fork 2
preprocessing
Pas-Kapli edited this page Sep 13, 2016
·
2 revisions
$ mafft Anolis.fas > Anolis_mafft.fas
Need to be much more careful with the alignment when using datasets with many indels (e.g. 16S rRNA)
$ raxmlHPC-PTHREADS-SSE3 -f c -s Anolis_mafft.fas -m GTRGAMMA -n read_alignment
On the screen and the RAxML_info.read_alignment file there is the following information
Found 13 sequences that are exactly identical to other sequences in the alignment.
Normally they should be excluded from the analysis.```
A reduced file is automatically created: Anolis_mafft.fas.reduced in phylip format
Check the difference:
```$ head -n 1 Anolis_mafft.fas.reduced```
```$ grep ">"```
```grep ">" Anolis_mafft.fas | wc -l ```
_**Q: How many sequences were in the fasta file and how many are there in the reduced phylip file?**_
#### Convert phylip to fasta (we will need it for mptp):
``` $ phylip_to_fasta.py Anolis_mafft.fas.reduced ```
The output file will be named Anolis_mafft.fas.reduced.fasta
### 1.3. Infer phylogenetic tree with RAxML:
``` $ raxmlHPC-PTHREADS-SSE3 -s Anolis_mafft.fas.reduced.fasta -m GTRGAMMA -n Anolis -p $RANDOM -T 2 -o GBGC12094-13_Polychrus ```
_Using the -o argument we retrieve a rooted phylogeny, if not then we can root it with mptp later_