Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time


DOI Gem Version Continuous Integration Status

Parses taxonomic scientific name and breaks it into semantic elements.

Important: Biodiversity parser >= 4.0.0 uses binding to and is not backward compatible with older versions. However it is much much faster and better than previous versions.

This gem does not have a remote server or a command line executable anymore. For such features use


sudo gem install biodiversity

The gem should work on Linux, Mac and Windows (64bit) machines


The fastest way to go through a massive amount of names is to use Biodiversity::Parser.parse_ary([big array], simple: true) function.

For example parsing a large file with one name per line:

#!/usr/bin/env ruby

require 'biodiversity'

P = Biodiversity::Parser
count = 0'all_names.txt').each_slice(50_000) do |sl|
  count += 1
  res = P.parse_ary(sl, true)
  puts count * 50_000
  puts res[0]

Here are comparative results of running parsers against a file with 24 million names on a 4CPU hyperthreaded laptop:

Program Version Full/Simple Names/min
gnparser 0.12.0 Simple 3,000,000
biodiversity 4.0.1 Simple 2,000,000
biodiversity 4.0.1 Full JSON 800,000
biodiversity 3.5.1 n/a 40,000

Example usage

You can use it as a library in Ruby:

require 'biodiversity'

#to find the gem version number

# Note that the version in parsed output will correspond to the version of
# gnparser.

# to parse a scientific name into a simple Ruby hash
Biodiversity::Parser.parse("Plantago major", simple: true)

# to parse many scientific names using all computer CPUs
Biodiversity::Parser.parse_ary(["Plantago major", ... ], simple: true)

# to parse a scientific name into a very detailed Ruby hash
Biodiversity::Parser.parse("Plantago major")

# to parse many scientific names with all details using all computer CPUs
Biodiversity::Parser.parse_ary(["Plantago major", ... ])

#to get json representation

# to clean name up
Biodiversity::Parser.parse("      Plantago       major    ")[:normalized]

# to get canonical form with or without infraspecies ranks, as well as
# stemmed version.
parsed = Biodiversity::Parser.parse("Seddera latifolia H. & S. var. latifolia")

# to get detailed information about elements of the name
Biodiversity::Parser.parse("Pseudocercospora dendrobii (H.C. Burnett 1883) U. \
Braun & Crous 2003")[:details]

# to parse a botanical cultivar
Biodiversity::Parser.parse("Sarracenia flava 'Maxima'", with_cultivars: true)

'Surrogate' is a broad group which includes 'Barcode of Life' names, and various undetermined names with cf. sp. spp. nr. in them:

parser.parse("Coleoptera BOLD:1234567")[:surrogate]

What is "nameStringID" in the parsed results?

ID field contains UUID v5 hexadecimal string. ID is generated out of bytes from the name string itself, and identical id can be generated using any popular programming language. You can read more about UUID version 5 in a blog post

For example "Homo sapiens" should generate "16f235a0-e4a3-529c-9b83-bd15fe722110" UUID


Authors: Dmitry Mozzherin

Contributors: Patrick Leary, Hernán Lucas Pereira

Copyright (c) 2008-2021 Dmitry Mozzherin. See LICENSE for further details.