Skip to content

ContentMine/phylotree

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

phylotree

ami-phylo analyses images and diagrams to extract phylogenetic trees. This is a complete repository of the analysis of ca 4300 figure image files from the IJSEM journal, carried out as Open Notebook Science. The intention is that everything in the analysis is either accessible here or should be Open and linked from here.

Main headings are:

description of the workflow

  • Scrape figure image content from IJSEM journal website (note: was originally performed on older Highwire platform, not new Ingenta platform)
  • Manually filter out non-phylogeny containing figures using Shotwell.
  • Pass each of these figures to our software for analysis with this bash loop:
#!/bin/bash
while read i ; 
      do timeout 60s mvn exec:java  -Dexec.mainClass='org.xmlcml.ami2.plugins.phylotree.RunPhylo' \
      -Dexec.args=''"$i"' ./all-output/'"$i"'' -e -X | tee $i.log ; 
done <list-of-input-images.txt
  • check results for OCR errors and Newick structure errors
  • Standardise taxa across different studies
  • Feed cleaned Newick data to mrpmatrix to create a supertree matrix
  • Analyse supertree matrix with TNT

specification of files, errors, protocols

Figure images were obtained from IJSEM articles from 2003 to 2014 (inclusive). This includes 4705 articles. 4341 figures containing a dendrogram were extracted from this set of articles.

input and output files (large)

errors

About

A repository for ami-phylotree development

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages