Skip to content
This repository


Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Scientific Name Parser

Fetching latest commit…


Cannot retrieve the latest commit at this time

Octocat-spinner-32 bin
Octocat-spinner-32 conf
Octocat-spinner-32 examples
Octocat-spinner-32 lib
Octocat-spinner-32 pkg
Octocat-spinner-32 spec
Octocat-spinner-32 .document
Octocat-spinner-32 .gitignore
Octocat-spinner-32 .rvmrc
Octocat-spinner-32 Gemfile
Octocat-spinner-32 Gemfile.lock
Octocat-spinner-32 LICENSE
Octocat-spinner-32 README.rdoc
Octocat-spinner-32 Rakefile
Octocat-spinner-32 VERSION


Parses taxonomic scientific name and breaks it into semantic elements.


To install gem you need RubyGems >= 1.3.6

$ sudo gem install biodiversity #for ruby 1.8.x
$ sudo gem install biodiversity19 #for ruby 1.9.x

Example usage

As a command line script

You can parse file with taxonomic names from command line. File should contain one scientific name per line

nnparser file_with_names

As a socket server

If you do not use ruby and need a fast access to the parser functionality you can use a socket server



parserver --output=canonical

to return a canonical form of the name string

parserver --output=canonical_with_rank

the same as above, but infraspecies' rank is shown if available

parserver --port 5555

run socket server on a different port

Then you can access it via 4334 port using a socket client library of your programming language. You can find socket client script example in the examples directory of the gem.

If you want to check if socket server works for you:

#run server in one terminal

#in another terminal window type
telnet localhost 4334

If you enter a line with a scientific name server will send you back parsed information in json format.

To stop telnet client type any of 'end','exit','q', '.' (without quotes) instead of scientific name

$ telnet localhost 4334
Trying ::1...
Connected to localhost.
Escape character is '^]'.
Acacia abyssinica Hochst. ex Benth. ssp. calophylla Brenan
{"scientificName":{"canonical":"Acacia abyssinica calophylla","parsed":true,"parser_run":1,"verbatim":"Acacia abyssinica Hochst. ex Benth. ssp. calophylla Brenan\r\n","positions":{"0":["genus",6],"18":["author_word",25],"29":["author_word",35],"7":["species",17],"41":["infraspecies",51],"52":["author_word",58]},"hybrid":false,"normalized":"Acacia abyssinica Hochst. ex Benth. ssp. calophylla Brenan","details":[{"species":{"basionymAuthorTeam":{"exAuthorTeam":{"author":["Benth."],"authorTeam":"Benth."},"author":["Hochst."],"authorTeam":"Hochst."},"string":"abyssinica","authorship":"Hochst. ex Benth."},"infraspecies":[{"basionymAuthorTeam":{"author":["Brenan"],"authorTeam":"Brenan"},"string":"calophylla","rank":"ssp.","authorship":"Brenan"}],"genus":{"string":"Acacia"}}]}}

As a library

You can use it as a library

require 'biodiversity'

parser =

# to parse a scientific name into a ruby hash
parser.parse("Plantago major")

#to get json representation

# to clean name up
parser.parse("             Plantago       major    ")[:scientificName][:normalized]

# to get only cleaned up latin part of the name
parser.parse("Pseudocercospora dendrobii (H.C. Burnett) U. Braun & Crous 2003")[:scientificName][:canonical]

# to get detailed information about elements of the name
parser.parse("Pseudocercospora dendrobii (H.C. Burnett 1883) U. Braun & Crous 2003")[:scientificName][:details]

# to parse using several CPUs (4 seem to be optimal)
parser = # will try to run 4 processes if hardware allows
array_of_names = ["Betula alba", "Homo sapiens"....]
parser.parse(array_of_names) # -> {"Betula alba" => {:scientificName...}, "Homo sapiens" => {:scientificName...}, ...}

parallel parser takes list of names and returns back a hash with names as keys and parsed data as values

# to resolve lsid and get back RDF file

Copyright © 2009-2011 Marine Biological Laboratory. See LICENSE.txt for further details.

Something went wrong with that request. Please try again.