[View in Colaboratory](https://colab.research.google.com/github/RyanJayGray/Display_images/blob/master/Radiolarian_classifier.ipynb)

![Effective neural net classification of radiolaria from a sparse dataset](http://paleobots.com/classifier/images/paperimages/smallradiotitle.png)
## Abstract
A classifier was created using artificial neural nets for the purpose of identifying Radiolaria tests from digitized microscope slide images. The deep neural networks were trained by using images categorized by taxa. Several architectures were evaluated. 1987 total images of sixteen radiolarian species were used to build sparse datasets. Species were specifically chosen to test the ability to automatically identify closely related forms within two genera (6 species within Antarctissa and 7 within Cycladophora), as well as larger differences between these, and two other genera (Helotholus and Lithomelissa). Transmitted light microscope images at different focal planes, without removal of local background (e.g. other microfossils) were used. Through optimization of top-n error rates, a method of dataset preparation and an associated Radiolarian classifier were created and refined. When exposed to new images distinct from those in its training or validation sets, the classifier achieved an average 70% percent top-1 accuracy for the 16 taxa, including discrimination between con-generic forms. A visualization in the form of an interactive dendrogram, using top-5 inference certainties as a measure of morphological distance in a force-directed graph. An application program for mobile devices was created to enable use of the classifier in conjunction with any microscope equipped with a display monitor.


## Introduction
***Please read and execute this section before continuing***. (This will instantiate the notebook, making all of the code executable, including the live Radiolarian classifier.)

The development of the radiolarian classifier proceeded in three phases: curation, training, and classification. Curation involved preparing the image files by normalizing them, organizing them by taxa, and naming them with a rigorous naming scheme. During the training stage hyperparameters were selected, datasets were refined, and a neural net architecture was selected using top-n classification rate as a guiding metric. During the classification stage the neural net was run on a new set of images. The resulting inferences were displayed in order of certainty. Focal plane images were processed as sets. The classifier used these associated images to achieve a higher recognition rate. Resulting inference data was used to render an interactive dendrogram for exploring the relationships between the specimens. The classifier system was also embedded within an app for use on mobile devices.

This notebook includes a radiolarian classifier and associated documentation. It classifies radiolarian datasets containing the 16 species for which it has been trained. A method and means of curation and training are also provided to expand upon the classifier's existing capability. Most of the code and tools used in this project are included in this notebook. The primary intent of this notebook is to allow for complete reproducibility of the associated paper's results. This metholodology can also be applied to the creation of other image classifiers from sparse datasets. You are encouraged to make derivative works. If you make use of the code in this notebook, please credit the authors. This notebook was written by Ryan Jay Gray ([Paleobots.com](http://paleobots.com)) with generous support, Radiolarian images, and advice by Drs. David Lazarus and Johan Renaudie of [Museum für Naturkunde, Berlin](https://https://www.museumfuernaturkunde.berlin/).

Place your cursor between the brackets below and press the triangular 'Run cell' icon to load the environment. This will allow execution of the code necessary to curate a dataset, train a neural net, run the classifier and visualize the resulting data with an interactive dendrogram. This setup step may take a minute or more. Once execution of this cell is complete, the Google Colab virtual environment will contain all of the datasets, code, labels and models necessary to reproduce the results in *Effective neural net classification of radiolaria from a sparse dataset*. Keep in mind that the Colaboratory environment is temporary. If you need to keep the results of your classification, save them to your local computer or copy this notebook to your own development environment. If you wish to begin classification of radiolarians independently of this notebook, you may install the [app](#scrollTo=jpORvZtyNOiU).



In [0]:
# <-- Click inside these brackets to 
# instantiate this notebook.
#
# Environment setup code for running this notebook
# in Google Colaboratory's virtual environment
!mkdir classifier
!wget http://paleobots.com/classifier/environment.tar.gz
!tar --strip-components 1 -xzf environment.tar.gz
!rm environment.tar.gz
!apt-get install -y axel imagemagick > /dev/null
!echo Environment is instantiated

Now the notebook is live. You can read this document serially, click on the Table of contents to the left, or use these quick links to major topics:
<br><br>
>[<img src="http://paleobots.com/classifier/images/paperimages/bclassifier.png">](#scrollTo=8n2fc9qBvXSA)[<img src="http://paleobots.com/classifier/images/paperimages/bcuration.png">](#scrollTo=Vyz709YhgKat)[<img src="http://paleobots.com/classifier/images/paperimages/btraining.png">](#scrollTo=yPjA28eAiDZC)[<img src="http://paleobots.com/classifier/images/paperimages/bclassification.png">](#scrollTo=pA87_oTojoqR)[<img src="http://paleobots.com/classifier/images/paperimages/bvisualization.png">](#scrollTo=FXc_VvDjjrml)[<img src="http://paleobots.com/classifier/images/paperimages/bapplication.png">](#scrollTo=jpORvZtyNOiU)
<br><br><br><br><br><br><br>

<br><br>

---
<br>

<p align="center">
![alt text](http://paleobots.com/classifier/images/paperimages/bcuration.png)
</p>
## Curation
The original set of images was in multiple file formats, heavily annotated by filename. Care was taken to preserve all of the original information embedded in the file names while accommodating the syntactic requirements of the training and classification code. During training, the neural net acquires class names for taxa from directory names. In order to train the classifier the image files were rigorously sorted into different directories by taxa. Directory names were assigned for accurate classification. Files were uniformly converted to JPEG format. The classifier was written to acquire focal planes from letter designations before the files' extensions (ie: *a.jpg, *b.jpg, etc.). The classifier determines top-n accuracy rates by comparing file names with classes, so abbreviations were expanded and file names were vetted to ensure compliance with the naming system.


### Nomenclature

**Initial Radiolaria microphotograph collection summary**

This table includes all of the initial species and the core sample they came from. They were initially organized in directories named for samples.



<table cellpadding="2" cellspacing="0">
	<col width="152">
	<col width="89">
	<col width="66">
	<col width="81">
	<col width="74">
	<col width="82">
	<col width="68">
	<col width="47">
	<col width="62">
	<col width="84">
	<col width="58">
	<col width="82">
	<col width="85">
	<col width="71">
	<col width="94">
	<col width="65">
	<col width="105">
	<col width="49">

<tr>
		<td height="17" style="border: none; padding: 0in">
			<p align="left"><font size="1">Sample</font></p>
		</td>
		<td style="border: none; padding: 0in">
			<p align="left"><font size="1">A denticulata</font></p>
		</td>
		<td style="border: none; padding: 0in">
			<p align="left"><font size="1">A ballista</font></p>
		</td>
		<td style="border: none; padding: 0in">
			<p align="left"><font size="1">A cylindrica</font></p>
		</td>
		<td style="border: none; padding: 0in">
			<p align="left"><font size="1">A strelkovi</font></p>
		</td>
		<td style="border: none; padding: 0in">
			<p align="left"><font size="1">A deflandrei</font></p>
		</td>
		<td style="border: none; padding: 0in">
			<p align="left"><font size="1">A robusta</font></p>
		</td>
		<td style="border: none; padding: 0in">
			<p align="left"><font size="1">L stigi</font></p>
		</td>
		<td style="border: none; padding: 0in">
			<p align="left"><font size="1">L setosa</font></p>
		</td>
		<td style="border: none; padding: 0in">
			<p align="left"><font size="1">H praevema</font></p>
		</td>
		<td style="border: none; padding: 0in">
			<p align="left"><font size="1">H vema</font></p>
		</td>
		<td style="border: none; padding: 0in">
			<p align="left"><font size="1">C davisiana</font></p>
		</td>
		<td style="border: none; padding: 0in">
			<p align="left"><font size="1">C pliocenica</font></p>
		</td>
		<td style="border: none; padding: 0in">
			<p align="left"><font size="1">C bicornis</font></p>
		</td>
		<td style="border: none; padding: 0in">
			<p align="left"><font size="1">C cornutoides</font></p>
		</td>
		<td style="border: none; padding: 0in">
			<p align="left"><font size="1">C cosma</font></p>
		</td>
		<td style="border: none; padding: 0in">
			<p align="left"><font size="1">C spongothorax</font></p>
		</td>
		<td style="border: none; padding: 0in">
			<p align="left"><font size="1">C golli</font></p>
		</td>
	</tr>
	<tr>
		<td height="17" style="border: none; padding: 0in">
			<p align="left">1138A-2R-4 27/31cm</p>
		</td>
		<td style="border: none; padding: 0in" sdval="16" sdnum="1033;">
			<p align="right">16</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="19" sdnum="1033;">
			<p align="right">19</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="7" sdnum="1033;">
			<p align="right">7</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="13" sdnum="1033;">
			<p align="right">13</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="7" sdnum="1033;">
			<p align="right">7</p>
		</td>
		<td style="border: none; padding: 0in" sdval="2" sdnum="1033;">
			<p align="right">2</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
	</tr>
	<tr>
		<td height="17" style="border: none; padding: 0in">
			<p align="left">751A-1H-2 7/7cm</p>
		</td>
		<td style="border: none; padding: 0in" sdval="5" sdnum="1033;">
			<p align="right">5</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="5" sdnum="1033;">
			<p align="right">5</p>
		</td>
		<td style="border: none; padding: 0in" sdval="1" sdnum="1033;">
			<p align="right">1</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="14" sdnum="1033;">
			<p align="right">14</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="52" sdnum="1033;">
			<p align="right">52</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="5" sdnum="1033;">
			<p align="right">5</p>
		</td>
		<td style="border: none; padding: 0in" sdval="13" sdnum="1033;">
			<p align="right">13</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
	</tr>
	<tr>
		<td height="17" style="border: none; padding: 0in">
			<p align="left">693A-6R-5 48/55cm</p>
		</td>
		<td style="border: none; padding: 0in" sdval="6" sdnum="1033;">
			<p align="right">6</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="15" sdnum="1033;">
			<p align="right">15</p>
		</td>
		<td style="border: none; padding: 0in" sdval="4" sdnum="1033;">
			<p align="right">4</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
	</tr>
	<tr>
		<td height="17" style="border: none; padding: 0in">
			<p align="left">747A-3H-4 16/24cm</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
	</tr>
	<tr>
		<td height="17" style="border: none; padding: 0in">
			<p align="left">751A-3-4 85/87cm</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="1" sdnum="1033;">
			<p align="right">1</p>
		</td>
		<td style="border: none; padding: 0in" sdval="26" sdnum="1033;">
			<p align="right">26</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="6" sdnum="1033;">
			<p align="right">6</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="30" sdnum="1033;">
			<p align="right">30</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="3" sdnum="1033;">
			<p align="right">3</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
	</tr>
	<tr>
		<td height="17" style="border: none; padding: 0in">
			<p align="left">693A-18R-4 101/107cm</p>
		</td>
		<td style="border: none; padding: 0in" sdval="47" sdnum="1033;">
			<p align="right">47</p>
		</td>
		<td style="border: none; padding: 0in" sdval="19" sdnum="1033;">
			<p align="right">19</p>
		</td>
		<td style="border: none; padding: 0in" sdval="29" sdnum="1033;">
			<p align="right">29</p>
		</td>
		<td style="border: none; padding: 0in" sdval="31" sdnum="1033;">
			<p align="right">31</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="6" sdnum="1033;">
			<p align="right">6</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="7" sdnum="1033;">
			<p align="right">7</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
	</tr>
	<tr>
		<td height="17" style="border: none; padding: 0in">
			<p align="left">689B-3H-3 116/118cm</p>
		</td>
		<td style="border: none; padding: 0in" sdval="4" sdnum="1033;">
			<p align="right">4</p>
		</td>
		<td style="border: none; padding: 0in" sdval="25" sdnum="1033;">
			<p align="right">25</p>
		</td>
		<td style="border: none; padding: 0in" sdval="7" sdnum="1033;">
			<p align="right">7</p>
		</td>
		<td style="border: none; padding: 0in" sdval="10" sdnum="1033;">
			<p align="right">10</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="25" sdnum="1033;">
			<p align="right">25</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="3" sdnum="1033;">
			<p align="right">3</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
	</tr>
	<tr>
		<td height="17" style="border: none; padding: 0in">
			<p align="left">1138A-14R-2 50/58cm</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
	</tr>
	<tr>
		<td height="17" style="border: none; padding: 0in">
			<p align="left">689B-4H-4 116/118cm</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
	</tr>
	<tr>
		<td height="17" style="border: none; padding: 0in">
			<p align="left">1138A-17-2 105/107cm</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="87" sdnum="1033;">
			<p align="right">87</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="3" sdnum="1033;">
			<p align="right">3</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="6" sdnum="1033;">
			<p align="right">6</p>
		</td>
		<td style="border: none; padding: 0in" sdval="12" sdnum="1033;">
			<p align="right">12</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
	</tr>
	<tr>
		<td height="17" style="border: none; padding: 0in">
			<p align="left">278-20-1 77/78cm</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="1" sdnum="1033;">
			<p align="right">1</p>
		</td>
		<td style="border: none; padding: 0in" sdval="36" sdnum="1033;">
			<p align="right">36</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="8" sdnum="1033;">
			<p align="right">8</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="24" sdnum="1033;">
			<p align="right">24</p>
		</td>
	</tr>
	<tr>
		<td height="17" style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
	</tr>
	<tr>
		<td height="17" style="border: none; padding: 0in">
			<p align="left">Total specimens</p>
		</td>
		<td style="border: none; padding: 0in" sdval="78" sdnum="1033;">
			<p align="right">78</p>
		</td>
		<td style="border: none; padding: 0in" sdval="45" sdnum="1033;">
			<p align="right">45</p>
		</td>
		<td style="border: none; padding: 0in" sdval="82" sdnum="1033;">
			<p align="right">82</p>
		</td>
		<td style="border: none; padding: 0in" sdval="65" sdnum="1033;">
			<p align="right">65</p>
		</td>
		<td style="border: none; padding: 0in" sdval="88" sdnum="1033;">
			<p align="right">88</p>
		</td>
		<td style="border: none; padding: 0in" sdval="36" sdnum="1033;">
			<p align="right">36</p>
		</td>
		<td style="border: none; padding: 0in" sdval="0" sdnum="1033;">
			<p align="right">0</p>
		</td>
		<td style="border: none; padding: 0in" sdval="27" sdnum="1033;">
			<p align="right">27</p>
		</td>
		<td style="border: none; padding: 0in" sdval="31" sdnum="1033;">
			<p align="right">31</p>
		</td>
		<td style="border: none; padding: 0in" sdval="30" sdnum="1033;">
			<p align="right">30</p>
		</td>
		<td style="border: none; padding: 0in" sdval="65" sdnum="1033;">
			<p align="right">65</p>
		</td>
		<td style="border: none; padding: 0in" sdval="13" sdnum="1033;">
			<p align="right">13</p>
		</td>
		<td style="border: none; padding: 0in" sdval="15" sdnum="1033;">
			<p align="right">15</p>
		</td>
		<td style="border: none; padding: 0in" sdval="15" sdnum="1033;">
			<p align="right">15</p>
		</td>
		<td style="border: none; padding: 0in" sdval="14" sdnum="1033;">
			<p align="right">14</p>
		</td>
		<td style="border: none; padding: 0in" sdval="12" sdnum="1033;">
			<p align="right">12</p>
		</td>
		<td style="border: none; padding: 0in" sdval="24" sdnum="1033;">
			<p align="right">24</p>
		</td>
	</tr>
</table>

For each taxa of Radiolaria, a directory was named using proper International Commission on Zoological Nomenclature.

Examples: 
Antarctissa ballista         
* Antarctissa cylindrica   
* Cycladophora spongothorax 
* Cycladophora bicornis    
* Helotholus praevema

This diagram shows the basic structure of a dataset. The root is a folder that names the entire set of images. Names such as dataset022 were used to mark the version of the dataset it contains. The directories within were named after the species they contain. An example is "Cycladophora bicornis". This directory holds all of the Cycladophora bicornis image files during training. The directory name defines the class. During classification, this same string is applied to label inferences for all of the image files in the test set. This label is also used to determine the accuracy of results for datasets which conform to the same naming system. e.g., If the inference "Cycladophora bicornis" appears in the top-n inferences for the file "Cycladophora bicornis Axioskop 40X jr-0079 693A-18-4,101.jpg", the tally of correct identifications is incremented.

![Image not loading](http://paleobots.com/classifier/images/paperimages/directory.png "Directory Structure for datasets")



The following convention was used to name all of the image files in the datasets. The semantic delimiters are the space and period characters. The values in the first two fields must be spelled correctly because the classifier uses them to provide an accuracy rate for a labeled dataset. The last period in the file name string signifies the last field which is used to process only the image files. The letter character that precedes the period signifies the focal plane. If this character is numeric, then the image is the only focal plane of the specimen.


Naming Scheme:

<table cellpadding="2" cellspacing="0">
	<col width="85">
	<col width="85">
	<col width="85">
	<col width="85">
	<col width="85">
	<col width="85">
	<col width="85">
	<col width="85">
	<col width="85">
	<col width="85">
	<tr>
		<td height="17" style="border: none; padding: 0in">
			<p align="left">Genus</p>
		</td>
		<td style="border: none; padding: 0in">
			<p align="left">Species</p>
		</td>
		<td style="border: none; padding: 0in">
			<p align="left">Sample</p>
		</td>
		<td style="border: none; padding: 0in">
			<p><br/>

			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p align="left">Microscope</p>
		</td>
		<td style="border: none; padding: 0in">
			<p align="left">Magnification</p>
		</td>
		<td style="border: none; padding: 0in">
			<p align="left">Microscopist</p>
		</td>
		<td style="border: none; padding: 0in">
			<p align="left">Specimen # 
			</p>
		</td>
		<td style="border: none; padding: 0in">
			<p align="left">Focal Plane</p>
		</td>
		<td style="border: none; padding: 0in">
			<p align="left">File Format</p>
		</td>
	</tr>
	<tr>
		<td height="17" style="border: none; padding: 0in">
			<p align="left">Antarctissa</p>
		</td>
		<td style="border: none; padding: 0in">
			<p align="left">strelkovi</p>
		</td>
		<td style="border: none; padding: 0in">
			<p align="left">1138A-2-4</p>
		</td>
		<td style="border: none; padding: 0in" sdval="27" sdnum="1033;">
			<p align="left">27</p>
		</td>
		<td style="border: none; padding: 0in">
			<p align="left">Axioskop</p>
		</td>
		<td style="border: none; padding: 0in">
			<p align="left">40X</p>
		</td>
		<td style="border: none; padding: 0in">
			<p align="left">jr</p>
		</td>
		<td style="border: none; padding: 0in" sdval="-10" sdnum="1033;">
			<p align="left">-10</p>
		</td>
		<td style="border: none; padding: 0in">
			<p align="left">a</p>
		</td>
		<td style="border: none; padding: 0in">
			<p align="left">.jpg</p>
		</td>
	</tr>
</table>

### Image file pre-processing

Image files need to be processed to eliminate issues such as file corruption, mismatched codecs, improperly labeled images, and incorrect color spaces. Files were visually examined for corruption, color spaces and mislabeling. Corrupted files were reacquired from original sources. Bad Peggy was used to detect issues with JPEG encoding. The ImageMagick mogrify command was used to convert files to a single colorspace (in the case of these datasets, grayscale encoded into RGB). A codec mismatch was found between some source BMP-format files and the Python image decoder. Mogrify reformatting was used to convert BMP-format files to JPEGs to overcome this and also to significantly reduce the size of the datasets:

In [0]:
!cp "images/paperimages/A. deflandrei 28 1138A-17-2,105 Olympus BH-2 30X dbl.bmp" test.bmp
!mogrify -format jpg test.bmp

### Normalization of source images and programmatic display of images with units and labels



There were several objective lens-digitizer-microscope combinations used to gather the images. An Axioskop with a 40X objective and AmScope MU300 digitizer and an Olympus BH-2 with 30X objective and AmScope MU300 digitizer. The digitizers recorded images at two resolutions, 1024x768 and 2048x1536. Images were captured in a variety of lighting conditions.

To normalize the brightness and contrast of the images

To normalize the scale of the images, a calibration slide was photographed. The resulting images were put into the GNU Image Manipulation Program, which was used to find the  resolution of the stage and verify the uniformity of the digitizing . The images were then superimposed to make sure they were about the right size range. The pixels per micrometer were calculated by dividing the amount of pixels measured by the correlating measurement of micrometers. The image directory for the correlating objective were then opened in the terminal and modified with the /mogrify command. This gave the neural net a sense of size for each of the images.

The source images were taken on different configurations of microscope/objective lens/digitizer. This causes a problem. The neural net could misclassify an image due to the misrepresented size. In order to correct this, the images have to be normalized. This allowed the radiolaria specimens to be shown with proper units, regardless the size of the source file. This is also important because the images will be cropped to train the neural net only on the specimen images, not the specimen and background debris. To do this, the normalization coefficient of the Axioskop with 40X objective and AmScope MU300 digitizer were found. In order to find this, a calibration slide photographed by the same instrument was compared to an image of a specimen from the instrument. First the calibration slide was superimposed on the to see if the size range was roughly the same. Second, the calibration slide is rotated to fit a horizontal line, and is measured in pixels. The number of pixels for each micrometer is found. 

There were two types of image files -- those created by the Axioskop 40X objective and AmScope MU300 digitizer, and those created by the Olympus BH-2 with a 30X objective lens digitizer. The images with the 30X objective have image dimensions of 1024x768 or 2048x1536, while the Axioskop optical stack produced images of resolution 2048x1536. A normalization workflow was developed to achieve the same pixels per micrometer throughout the resulting datasets. To preserve information the decision was made to enlarge the smaller files by the appropriate factor. The Olympus files are smaller, so they were processed to match the ppm of the Axioskop images. By dividing the Axioskop normalization coefficient, by the Olympus coefficient, the percentage for resizing was found. This number was then used to resize the Olympus files.

To resize images an ImageMagick mogrify command was used. The folders with only Olympus files in them were opened in a terminal. The mogrify command was applied, and the images were normalized.

Preforming these calculations also allowed for the Radiolaria specimens to be displayed with proper units, regardless of the size of the source files. This is important because the images were cropped in one family of datasets to train the neural net only on the specimen images, not the specimen and background debris.

GNU Image Manipulation Program (GIMP) was used to take initial measurements and then find proper coefficients through successive approximation.

All code to perform calculations and programmatic manipulation of images is contained in the Python code sections.


Procedure:

Normalize sizes of images: 

  1. Calculate normalization coefficient for Axioskop with 40X objective and AmScope MU300 digitizer
   1. select an image from the first set created with an Axioskop with 40X objective and AmScope MU300 digitizer
   2. Find dimensions of image in pixels
   3. Load corresponding calibration slide(image of calibration slide taken on same 40X objective)
   4. Superimpose the two images using GIMP to see if the image is roughly in the size range for Radiolaria 
   5. Take initial measure of distance and rotation
   
   ![reticle_grid_Axioskop_40X_rjg.png not found](http://paleobots.com/classifier/images/paperimages/example_astrelkovi_w_reticle_grid_small.jpg "Axioskop_40X")
   6. Verify calibration image has no skew by reversing rotation and placing horizontal and vertical guides (make sure calibration slide is perpendicular to x-axis)
   7. Calculate pixels per micrometer by dividing the amount of pixels measured by the correlating measurement of micrometers
   8. Resize the images by opening their file in a terminal and use the /mogrify command, scale the images using the coefficient in step vii
  2. Calculate normalization coefficient for Olympus BH-2 with a 30X objective lens and (Need specifications here) digitizer
   1. select an image from the first set created with an Axioskop with 30X objective and AmScope MU300 digitizer
   2. Find dimensions of image in pixels
   3. Find correlating calibration slide(image of calibration slide taken on same 30X objective)
   4. Superimpose the two images using GIMP to see if the image is roughly in the size range for Radiolaria 
   5. Take initial measure of distance and rotation
   6. Verify calibration image has no skew by reversing rotation and placing horizontal and vertical guides (make sure calibration slide is perpendicular to x-axis)
   7. Calculate pixels per micrometer by dividing the amount of pixels measured by the correlating measurement of micrometers
   8. Resize the images by opening their file in a terminal and use the /mogrify command, scale the images using the coefficient in step vii 
  3. Programmatically load and display images with proper units, grids, titles, and labels

**Calibrate image units**

* Select an image from the first set created with an Axioskop with 40X objective and AmScope MU300 digitizer:

![astrelkovi.jpg](http://paleobots.com/classifier/images/paperimages/astrelkovi.jpg "A. Strelkovi")

* The dimensions of this image are 2048x1536 pixels

* Now, look at the calibration slide for this microscope configuration. (midpoints of minor bars are 10µm apart):

![scalex40.jpg not loading](http://paleobots.com/classifier/images/paperimages/scalex40.jpg "Calibration bars")

**Superimpose the two images to see if they are roughly in the size range for Radiolaria**

![Image not loading](http://paleobots.com/classifier/images/paperimages/sanity1.jpg "Simple overlay of calibration bars and specimen photograph")

Check! Individual is ~ 100µm

* Take initial measure of distance and rotation:

![Image not loading](http://paleobots.com/classifier/images/paperimages/anglecorrection.png "Initial measure")

**Verify calibration image has no skew by reversing rotation and placing horizontal and vertical guides**

![Image not loading](http://paleobots.com/classifier/images/paperimages/perpendicularitytest.png "Corrected for rotation")

Check! Image is orthogonal.


**Calculate pixels per micrometer for this configuration of microscope, objective lens, and digitizer**

In [0]:
print("1781.5/200 = "+str(1781.5/200))

Now come up with a coeficient to match 10 pixels per micrometer. This allows for the application of a useful simulated reticle grid to images.

In [0]:
print(str(10/8.9075))

Testing this value by scaling the calibration slide image shows that the single measurement has introduced an error of about one percent. Successive approximation produces a more precise value of 1.118164063. In the process a reticle grid was created:

![reticle_grid_Axioskop_40X_rjg.png not found](http://paleobots.com/classifier/images/paperimages/example_astrelkovi_w_reticle_grid_small.jpg "Axioskop_40X")

Images can be displayed, using this coefficient, with proper units for this configuration of microscope, objective lens, and digitizer.

**For the second microscope, an Olympus BH-2 with a 30X objective lens and Amscope MU300 digitizer at 1024x768 resolution:**

**Calibrate our image units**

* Select an image from the second set:

![Image not loading](http://paleobots.com/classifier/images/paperimages/A.deflandrei.jpg "A. deflandrei")

* The dimensions of this image are 1024x768 pixels

* Look at the calibration slide for this microscope configuration. (midpoints of minor bars are 10µm apart):

![Image not loading](http://paleobots.com/classifier/images/paperimages/calibrateLazarus.jpg "Calibration bars")

**Do a rough comparison**

* Superimpose the two images to see if they are roughly in the size range for Radiolaria:

![Image not loading](http://paleobots.com/classifier/images/paperimages/sanity2.jpg "Simple overlay of calibration bars and specimen photograph")

Check! Individual is ~ 85µm. This test was conducted in GNU Image Manipulation Program by reducing the alpha of the closer layer.

* Take initial measure of distance and rotation:

![Image not loading](http://paleobots.com/classifier/images/paperimages/measureminiruler.jpg "Initial measure")

* Verify calibration image has no skew by reversing rotation and placing horizontal and vertical guides:

![Image not loading](http://paleobots.com/classifier/images/paperimages/correctedimage2.png "Corrected for rotation")

Check! Image is orthogonal.



Calculate pixels per micrometer:

In [0]:
print("736/250 = "+str(736/250))

Now come up with a coeficient to match 10 pixels per micrometer. This allows for the application of a useful simulated reticle grid to images.

In [0]:
print(""+str(10/2.944))

And verify results. Check!

![example_adeflandrei_w_reticle_grid_small.jpg not found](http://paleobots.com/classifier/images/paperimages/example_adeflandrei_w_reticle_grid_small.jpg "Closeup")

Display images with proper units for this configuration of microscope, objective lens, and digitizer.


* First load and plot an image:

In [0]:
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

image = mpimg.imread('images/paperimages/A. deflandrei 28 1138A-17-2,105 Olympus BH-2 30X dbl.bmp')
print(image.shape)

In [0]:
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

image = mpimg.imread('images/paperimages/adeflandrei_cropped.jpg')
plt.title("A. Deflandrei\nDimensions:" + str(image.shape))
plt.xlabel('pixels')
plt.ylabel('pixels')
plt.imshow(image)
plt.show()

Notice that pixels are the units. The approximated ratios can now be applied. 

Steps:
1. Parse original filename for microscope-objective-digitizer configuration.
2. Verify resolution match.
3. Plot image with proper units.
4. Plot image of abritrary size with proper units, title, axis labels, etc.


1.&nbsp;Original filename ("A. deflandrei 28 1138A-17-2,105 Olympus BH-2 30X dbl.bmp") indicates  Olympus BH-2 with a 30X objective lens. 

2.&nbsp;Load the image and check resolution:

In [0]:
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

image = mpimg.imread('images/paperimages/A. deflandrei 28 1138A-17-2,105 Olympus BH-2 30X dbl.bmp')
print(image.shape)

3.&nbsp;Resolution is the expected 1024x768 pixels

4.&nbsp;Plot image programmatically with proper units

First, plot the larger image with proper units. Also add major and minor gridlines to simulate a reticle.

In [0]:
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

import sys  

image = mpimg.imread('images/paperimages/A. deflandrei 28 1138A-17-2,105 Olympus BH-2 30X dbl.bmp')
olyPPM = 2.944 #Olympus BH-2 30X pixels per micrometer

fig = plt.gcf()
fig_size = plt.rcParams["figure.figsize"]
fig.set_size_inches(fig_size[0]*2, fig_size[1]*2)

plt.xlabel('micrometers')
plt.ylabel('micrometers')
xDim = image.shape[1]/olyPPM
yDim = image.shape[0]/olyPPM
plt.imshow(image, extent=[0,xDim,0,yDim],alpha=0.7)
plt.axis([0,xDim,0,yDim])
plt.grid(color='#00FF00',alpha=.2)
plt.minorticks_on()
plt.grid(b=True, which='minor', color='g', linestyle=':', alpha=0.15)
plt.show()

4. Plot image of abritrary size with proper units, title, axis labels, etc.

Load an image cropped to display primarily the specimen of interest. This is an image of arbitrary dimensions.

In [0]:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import matplotlib.ticker as ticker

image = mpimg.imread('images/paperimages/adeflandrei_cropped.jpg')

plt.xlabel('micrometers')
plt.ylabel('micrometers')

olyPPM = 2.944 #Olympus BH-2 30X pixels per micrometer
xDim = image.shape[1]/olyPPM
yDim = image.shape[0]/olyPPM
plt.imshow(image, extent=[0,xDim,0,yDim], alpha=1)

#make this small image easier to see
fig = plt.gcf()
fig_size = plt.rcParams["figure.figsize"]
fig.set_size_inches(fig_size[0]*2, fig_size[1]*2)

#make sure ticks display with proper interval
plt.xticks(np.arange(0, xDim, 10))
plt.grid(color='#00C000',alpha=.2, linestyle='-', linewidth=1)

#plt.grid(color='g', linestyle=':', linewidth=.3)
plt.minorticks_on()
plt.grid(b=True, which='minor', color='g', linewidth=.3, linestyle=':', alpha=0.2)
plt.title("A. Deflandrei")
plt.show()



Dimensions in micrometers are properly calculated and displayed, regardless of cropping.

**Normalize source image files**

The images needed to be resized to the appropriate pixels per micrometer. Images were sorted into two directories: those created by the Axioskop 40X objective and AmScope MU300 digitizer, and those created by the Olympus BH-2 with a 30X objective lens and XxY digitizer. The images with the 30X objective are in a resolution of 1024x768 pixels, while the 40X objective have larger dimensions of 2048x1536. To achieve the same pixels per micometer a number of things were done. To preserve information, the files were not made smaller, rather they were be scaled larger. The Olympus files are smaller, so they can be expanded to match the Axioskop files.   

Axioskop with 40X objective and AmScope MU300 digitizer = 8.9075 ppm 

Olympus BH-2 with a 30X objective lens digitizer = 2.944 ppm 



In [0]:
print("Ratio of Axioskop/Olympus PPM:"+str(8.9075/2.944))

For images taken on the Olympus BH-2 with 30X objective at the higher resolution of 2048x1536, this coefficient is halved. This also applies to the Olympus BH-2 with the 1024x768 resolution capture combined with a 60X objective:

In [0]:
print("Ratio of Axioskop/Olympus at 2048x1536 PPM:"+str(8.9075/2.944/2))

Now that ratio for resizing has been found,the mogrify command can be used on the Olympus images. They have to be resized by percentage, so make sure the resize ratio has been converted into percentage, round to the nearest hundredth. For example:

In [0]:
import matplotlib.image as mpimg

!cp "images/paperimages/A. deflandrei 28 1138A-17-2,105 Olympus BH-2 30X dbl.bmp" test.bmp
image = mpimg.imread('test.bmp')
print(image.shape)
!mogrify -resize 303% test.bmp
image = mpimg.imread('test.bmp')
print(image.shape)
!mogrify -format jpg test.bmp

### Image processing

Two families of datasets were created: one in which images were used in pre-processed and normalized format, the other in which additional image processing was also performed. The more processed family of datasets were cropped closely about the specimen of interest. The brightness and contrast of the images was randomized about a mean by +- 20%. The final version of each of these datasets are in dataset012 and dataset024. Dataset024 includes the preprocessed and normalized format and dataset012 includes the further processed images.  

Both families of datasets were used for training and classification with and without addtional image processing including random flip, rotation, cropping and brigtness manipulations. The additional image processing was not found to significantly affect the recognition rate.

Augmentation was used to support the determination to include multiple focal plane images. Particularly sparse image classes were subject to augmentation via image processing. Operations included random crop, flip, rotation, skew and distortion. For this particular set of tests, augmentation produced useful results. More information is provided under the classification section of this notebook. Augmentation was not used in the creation of training and test data for the classifier itself.


<br><br>

---
<br><p align="center">
![alt text](http://paleobots.com/classifier/images/paperimages/btraining.png)
</p>
## Training
After curation, the datasets created were used to train the neural net. During this stage the neural net was optimized. Since there was a sparse dataset, transfer learning was employed. Transfer learning is a strategy to increase the recognition rate for small datasets -- neural nets already trained on the Imagenet dataset were used to select features for the layers trained on the Radiolarian images. The neural net architectures that were evaluated were the 16 versions of Mobilenet V1 and Inception V3. The earlier layers of this library of pre-trained neural nets contain features useful for classification of the wide variety of image classes in the larger dataset. Once transfer learning had been implemented, the neural net was optimized. To optimize the neural net, many training tests were conducted, in which datasets were refined and hyperparameters were changed. These hyperparameters included learning rate, number of steps, batch size, and transient distortions that did not affect test sets. The Mobilenet neural net architecture was chosen during this phase on the basis of dedicated tests. The Mobilenet_1.0_224 was eventually selected and is employed in the code in this notebook.

#### Procedure for Tests
Pre-classified images from the curation stage were sorted into datasets appropriate to each series of test.

1. Train the neural net
    4. Select a subset of the larger dataset for a test.
    6. Select architecture and hyperparameters for neural net, or a range of these in a Jupyter notebook. Once hyperparameters are selected, they will become constant for the remainder of the experiment.
    7. Train the neural net (this employs the hardware and software materials listed in the appendices) 
    8. Use the neural net within the classifier against an unused test set or sets
    9. Record the data, and archive the dataset and results on dropbox
    10. Iterate from step 1 until sufficient data is acquired to establish constant hyperparameters and a neural net architecture
    11. Constants to establish:
        13. The selection of taxa
        14. Hyperparameters 
            15. Learning rate
            17. Number of steps (of training)
            18. Batch size
            19. Number of epochs
            18. Distortions
        19. Neural net architecture
2. Classify images


#### Evaluation of architectures

Different architectures of neural nets suitable for transfer learning were evaluated. The architectures that were used are different variations on Mobilenet, and one version of Inception. Mobilenet is a convolutional neural network. It is a light-weight net that has 30 layers, each layer getting more specific. It operates on two hyperparameters, adjustable for specific datasets: the image resolution, and the width multiplier. The possible multiplier values are 1.0, 0.75, 0.50, and 0.25. The possible image resolutions are 224, 192, 160, and 128. These parameters were varied to produce sixteen potential architectures. Each of these variations was tested, and evaluated by top-1 classification rate. Inception had been demonstrated to achieve a high recognition rate in the ImageNet Large Scale Visual Recognition Challenge in 2015. Inception was evaluated, but with all other variables held constant, Mobilenet delivered a higher recognition rate. Its faster execution allowed for the quick training to find a useful set of hyperparameters. 

Architectures tested:
* Mobilenet_1.0_224
* MobileNet_v1_0.50_160
* MobileNet_v1_0.25_128
* Inception v3

#### Procedure
1. Select constants from prior tests
        1. A set of images for training and testing
        2. The selection of taxa
        3. Hyperparameters 
            1. Learning rate
            2. Activation function
            3. Number of steps(to training)
        4. Neural net architecture
1. Select architecture
2. Train the neural net (this employs the hardware and software materials listed) 
3. Test the neural net against an unused test set
4. Record the results
5. Iterate from step two until sufficient data is acquired to select an architecture


#### Dataset 
Description:
These images are normalized.

The taxa in this dataset are 
1. Antarctissa Ballista 43 images
2. Antarctissa Cylindrica 135 images
3. Antarctissa Denticulata 124 images
4. Antarctissa Strelkovi 130 images
5. Antarctiss Arobusta 35 images
6. Cycladophora Golli 45 images 
7. Helotholus Praevema 50 images
8. Antarctissa Deflandrei 91 images
9. Cycladophora Bicornis 40 images
10. Cycladophora Davisiana 65 images
11. Helotholus Vema 58 images
12. Lithomelissa Setosa 44 images


#### Parameters

Architectures: Inception v3 and 16 variations on Mobilenet
    * convolution dimension: 3x3
    * Activation function: rectified linear unit
    * depth: 28 to 30 
Preliminary dataset: Imagenet 
Final layer dataset: archevaldataset.tar.gz
Learning rate: default=0.01
Number of steps: 400
Training percentage: 80%
Validation percentage: 10%
Testing percentage: 10%

python -m scripts.retrain   --bottleneck_dir=tf_files/bottlenecks   --model_dir=tf_files/models/   --summaries_dir=tf_files/training_summaries/""   --output_graph=tf_files/retrained_graph.pb   --output_labels=tf_files/retrained_labels.txt   --architecture="mobilenet_1.0_160" --image_dir=/home/paleo/1radiolaria/archevaldataset --print_misclassified_test_images --how_many_training_steps=400

width multiplier: 1.0 0.75 0.50 0.25
image resolution: 224, 192, 160, 128


#### Results
Mobilenet 1.0:
* mobilenet_v1_1.0_224: Final test accuracy = 70.9% (N=86)
* mobilenet_v1_1.0_192: Final test accuracy = 80.2% (N=86) 
* MobileNet_v1_1.0_160: Final test accuracy = 75.6% (N=86)
* MobileNet_v1_1.0_128: Final test accuracy = 74.4% (N=86)
Mobilenet 0.75:
* mobilenet_v1_0.75_224: Final test accuracy = 75.6% (N=86)
* mobilenet_v1_0.75_192: Final test accuracy = 73.3% (N=86)
* mobilenet_v1_0.75_160: Final test accuracy = 75.6% (N=86)
* mobilenet_v1_0.75_128: Final test accuracy = 76.7% (N=86)
Mobilenet 0.50:
* mobilenet_v1_0.50_224: Final test accuracy = 73.3% (N=86)
* mobilenet_v1_0.50_192: Final test accuracy = 69.8% (N=86)
* mobilenet_v1_0.50_160: Final test accuracy = 74.4% (N=86)
* mobilenet_v1_0.50_128: Final test accuracy = 72.1% (N=86) 
Mobilenet 0.25:
* mobilenet_v1_0.25_224: Final test accuracy = 68.6% (N=86) 
* mobilenet_v1_0.25_192: Final test accuracy = 68.6% (N=86)
* mobilenet_v1_0.25_160: Final test accuracy = 61.6% (N=86)
* mobilenet_v1_0.25_128: Final test accuracy = 72.1% (N=86)
Others:

* Inception V3: Final test accuracy = 68.6% (N=86)

#### Observations
Mobilenet versions provide better results for the datasets than Inception v3 for the chosen parameters. The architecture that provides the best result for this series of tests is mobilenet_v1_1.0_192. Since this architecture provided the best results for a dataset and hyperparameters that had been demonstrated to be most useful in previous tests, it was used for the tests afterward. Subsequent tests and refinements to the workflow led to the selection of the 224x224 image resolution.





#### Optimization metric: top-n accuracy

The objective was to create a Radiolarian classifier that delivers human level accuracy. The literature indicates that human level accuracy in the domain of image recognition, is imperfect. Russakovsky, Karpathy et al  observe humans make mistakes and do not always agree on the identification of certain images. Karpathy notes that some images are not of the best quality. They record human level accuracy in a range from 93.6%-97% on images taken from the Imagenet dataset depending on image size. The Imagenet Large Scale Visual Recognition Challenge uses the top-n classification rates. Top-n classification rates are defined as:

<img src="http://paleobots.com/classifier/images/paperimages/topn.png" width="241" height="152" hspace="200"/>

“In this task, given an image an algorithm will produce 5 class labels ci,i=1,…n in decreasing order of confidence“ (6)

The classification rate in this project was optimized using top-1 classification rate. Top-N was used as an indicator of the impact in changes made in the input data, hyperparameters, and variations on architecture. 


### Example training run

A neural net was trained with a dataset of 16 taxa of Radiolaria.

In [0]:
!python \
-m code.train --validation_percentage=12 \
--test_seed=1 \
--bottleneck_dir=tmp/bottlenecks \
--test_set=test_sets/example/ \
--model_dir=tmp/models/ \
--summaries_dir=tmp/training_summaries/ \
--output_graph=models/example.pb   \
--output_labels=labels/example.txt   \
--architecture="mobilenet_1.0_224" \
--image_dir=datasets/dataset202s \
'#--print_misclassified_test_images' \
--how_many_training_steps=500 \
--learning_rate=0.01

<br><br>

---
<br>
<p align="center">
![alt text](http://paleobots.com/classifier/images/paperimages/bclassification.png)
</p>
## Classification

After the neural net architecture and hyperparameters were selected, the classifier code was written in Python to perform classification on datasets labeled according to the naming system established in the curation stage. A Monte Carlo cross validation strategy was implemented. An algorithm was selected and implemented for considering focal planes as groups during set selection and classification. Code was written for calculating per-image and per-specimen tallies and accuracies. The classifier read images from a specified dataset and fed them to the specified trained neural net and associated set of class labels. The classifier processed input focal places for each specimen as an associated group. The classifer produced five inferences in order from greatest to least certainty for each image and for each specimen. In determining final inferences and accuracy rates, only the focal plane of greatest certainty for each specimen was considered. The classifier took a top_n argument from 1 to 5 and produced an assessment of its accuracy calculated with this metric based on image labels provided by domain experts.




### Example classification run

A classification was performed of the radiolarian images that were excluded from the earlier training run. Each filename was displayed by the classifier, followed by the image itself with units, then the top five classifier inferences and their associated certainty. (Only the first ten images are displayed here to conform to Colaboratory’s output constraint.) Multiple focal planes were included for some specimens. A letter before the ‘.jpg’ extension indicated the focal plane, while the number before the letter indicated the specimen number. At the end of the first run of images, the total count and percentage correctly identified were displayed. In the subsequent accounting, the classifier retained only the focal plane for each image with the inference of greatest certainty. Individual specimens were presented with top five inferences. Finally a total count of specimens and the percentage of them correctly identified by the classifier were displayed.

In [0]:
import sys
sys.path.insert(0, 'code')

import classify

model_file = "models/example.pb"
dataset = "test_sets/example"
label_file = "labels/example.txt"

classify.evaluateDirectory(model_file,dataset,label_file,1)

### Significance of focal planes

The classifier performed uniformly better when given input datasets with multiple focal planes per speciment. Because some sets contained only single plane images per speciment, additional tests were conducted to attempt to quantify the effect of number of focal planes on neural net classification accuracy of fossil Radiolaria, as determined by percentage correctly identified 

**Independent Variables:**

The independent variable for this series of tests was the number of focal planes for a single specimen/individual included in the training set.   
Control: Human expert recognition
First level: a single plane
Second level: 2 focal planes
Third Level: 3 focal planes
Fourth Level: 4 focal planes

**Dependent Variable:**

The dependent variable for this series of tests was the recognition accuracy as a percentage of the rate of a human expert.(Which is assumed to be 100% for this experiment.) During testing the neural net was exposed to approximately 114 samples. 
Dependent Variable Display:
The dependent variable was displayed in a scatterplot.

**Hypothesis:**

If a neural net is trained with the greatest number of planes available for a single specimen for the classification of radiolaria, then it will achieve better results because it can acquire more features associated with a single specimen. The micropaleontologists who captured the images indicated that multiple planes were required for effective recognition by humans. For the application of this experiment, the greatest number of focal planes if four. 

![Image not loading](http://paleobots.com/classifier/images/paperimages/4focal.png)
> *Image series: Four focal planes of Antarctissa deflandrei*

![Image not loading](http://paleobots.com/classifier/images/paperimages/focal.jpg)
> *Contribution of focal planes to whole image perception*



**Procedure**

1. Train the neural net
    1. Organize the images into directories by taxa
    2. Name each directory after the taxon it will be containing
    3. Populate each directory with the species
        * mogrify  -colorspace Gray "directory used"
    4. Select architecture and hyperparameters for neural net
        * mobilenet_1.0_192 
    5. Train the neural net (this employs the hardware and software materials listed) 
    6. Test the neural net against an unused test set
    7. Record the data, and archive the dataset and results
    8. Iterate until sufficient data is acquired to establish constants
    9. Constants to establish
        1. Learning rate
        2. Activation function
        3. Number of steps(to training)
        4. Distortions
        5. Neural net architecture
        
**Dataset** 
The taxa in this dataset were 
1. Antarctissa Cylindrica 84 images
2. Antarctissa Deflandrei 86 images
3. Cycladophora Pliocenica 37 images
4. Cycladophora Spongothorax 17 images 

       
**Parameters**       
       
Architecture: mobilenet_1
    * convolution dimension: 3x3
    * Activation function: rectified linear unit
    * depth: 30 
Preliminary dataset: Imagenet 
Final layer dataset: mobilenet_1.0_192
Learning rate: default=0.01
Number of steps: 400
Training percentage: 80%
Validation percentage: 10%
Testing percentage: 10%

Command line:
rm -rf tf_files/bottlenecks
python -m scripts.retrain   --bottleneck_dir=tf_files/bottlenecks   --model_dir=tf_files/models/   --summaries_dir=tf_files/training_summaries/""   --output_graph=/home/paleo/Desktop/Science_fair_Radiolaria/Datasets/experiment1_0.pb   --output_labels=tf_files/retrained_labels.txt   --architecture="mobilenet_1.0_192" --image_dir=/home/paleo/Desktop/Science_fair_Radiolaria/Datasets/1FocalPlane --print_misclassified_test_images --how_many_training_steps=400 seed=1,2,3,4,5,6,7,8,9,10,11,12,13,14

**Single focal plane**

Training and testing was performed with a neural net on a dataset of images with a single focal plane per specimen. 

**Results:**
* no_seed=          Final test accuracy = 87.5%  
* testing_seed=1    Final test accuracy = 86.7%
* testing_seed=2    Final test accuracy = 95.2%          
* testing_seed=3    Final test accuracy = 96.0% 
* testing_seed=4    Final test accuracy = 88.5% 
* testing_seed=5    Final test accuracy = 100.0% 
* testing_seed=6    Final test accuracy = 87.5% 
* testing_seed=7    Final test accuracy = 95.2%        
* testing_seed=8    Final test accuracy = 90.9% 
* testing_seed=9    Final test accuracy = 90.5%         
* testing_seed=10   Final test accuracy = 87.5%           
* testing_seed=11   Final test accuracy = 76.5% 
* testing_seed=12   Final test accuracy = 88.2%           
* testing_seed=13   Final test accuracy = 86.4%          
* testing_seed=14   Final test accuracy = 91.7%          
            
* Mean accuracy = 89.88%


**Two focal planes**

Training and testing was performed with a neural net on a dataset of images with two different focal planes per specimen. 

**Results:** 
* no_seed=          Final test accuracy = 94.7%
* testing_seed=1    Final test accuracy = 96.2%
* testing_seed=2    Final test accuracy = 100.0%
* testing_seed=3    Final test accuracy = 96.8% 
* testing_seed=4    Final test accuracy = 94.1% 
* testing_seed=5    Final test accuracy = 96.0% 
* testing_seed=6    Final test accuracy = 83.3%
* testing_seed=7    Final test accuracy = 97.1%    
* testing_seed=8    Final test accuracy = 96.8%
* testing_seed=9    Final test accuracy = 95.0%        
* testing_seed=10   Final test accuracy = 92.1%          
* testing_seed=11   Final test accuracy = 96.8%
* testing_seed=12   Final test accuracy = 96.2% 
* testing_seed=13   Final test accuracy = 91.7%          
* testing_seed=14   Final test accuracy = 90.7% 

* Mean accuracy = 94.5%


**Three focal planes**

Training and testing was performed with a neural net on a dataset of images with three different focal planes per specimen. 

**Results:**
* no_seed=           Final test accuracy = 100.0%             
* testing_seed=1     Final test accuracy = 100.0% 
* testing_seed=2     Final test accuracy = 100.0%
* testing_seed=3     Final test accuracy = 100.0%
* testing_seed=4     Final test accuracy = 100.0% 
* testing_seed=5     Final test accuracy = 100.0%
* testing_seed=6     Final test accuracy = 100.0%
* testing_seed=7     Final test accuracy = 100.0%
* testing_seed=8     Final test accuracy = 93.8%
* testing_seed=9     Final test accuracy = 100.0%   
* testing_seed=10    Final test accuracy = 100.0%       
* testing_seed=11    Final test accuracy = 100.0%
* testing_seed=12    Final test accuracy = 92.3%        
* testing_seed=13    Final test accuracy = 100.0%        
* testing_seed=14    Final test accuracy = 100.0%      
            
* Mean accuracy = 99.07%


**Four focal planes**

Training and testing was performed with a neural net on a dataset of images with three different focal planes per specimen. 

**Results:**
* no_seed=            Final test accuracy = 100.0%
* testing_seed=1      Final test accuracy = 100.0%
* testing_seed=2      Final test accuracy = 100.0%
* testing_seed=3      Final test accuracy = 100.0% 
* testing_seed=4      Final test accuracy = 100.0%
* testing_seed=5      Final test accuracy = 100.0%
* testing_seed=6      Final test accuracy = 100.0%
* testing_seed=7      Final test accuracy = 100.0%
* testing_seed=8      Final test accuracy = 100.0%
* testing_seed=9      Final test accuracy = 100.0%      
* testing_seed=10     Final test accuracy = 100.0%      
* testing_seed=11     Final test accuracy = 100.0%
* testing_seed=12     Final test accuracy = 100.0%      
* testing_seed=13     Final test accuracy = 100.0%        
* testing_seed=14     Final test accuracy = 100.0%    
            
* Mean accuracy = 100%

**Results and Summary**

This experiment was to test the effects of adding additional focal planes of microphotographs of Radiolaria on the recognition rate of a neural net. The purpose of this experiment was to find if more focal planes had an effect on neural net recognition. The initial hypothesis was: If a neural net is trained with the greatest number of planes available for each specimen for the classification of radiolaria, then it will achieve better results because it can acquire more features. The hypothesis is supported by the data collected.
The independent variable was the number of focal planes; ranging from one focal plane to four. The dependent variable was the recognition rate as measured in percentage correctly classified. The control group is human recognition, which was 100% for this experiment. 
2 tables and one graph were created. The first table displays the 15 trials conducted for each level of the IV. The averages calculated were 89.89% for one focal plane, 94.5% for two focal planes, 99.07% for three focal planes, and 100% for four focal planes, which supported the hypothesis. For each level of the IV, the mean recognition rate increased, as seen in table two. Table two displays the average recognition rate and error bars for each level of the IV. Another trend was the standard deviation decreased for each focal plane added. A 6% standard deviation for one plane, 4% for two focal planes, 3% for 3 focal planes, and 0% standard deviation for four focal planes were found.
 Table three displays the statistics results. A chi-squared test was used to analyze the results. The original expected results were 50% recognition rate for one focal plane, 70% for two focal planes, 80% for 3 focal planes, and 90% for four focal planes as shown in table 3. Unlike a normal test, avaerage recognition rates had to be used due to the number of trials. The final result was a sum of approximately 30.144 chi-squared(x^2). Since there have four IV levels, there have 3 degrees of freedom. So  the third level on the “Level of Probability table” was checked. According to this table the results had a less than 0.1% chance of being random. This results in the null hypothesis, all of the recognition rates are the same, being rejected. Therefore the initial hypothesis was supported. 
![Image not loading](http://paleobots.com/classifier/images/paperimages/scatterplot.png "Mean Accuracy")

**Image augmentation**
The number of specimens with multiple focal planes was fewer than the number with single focal planes. If the number of images is too small the neural net will not run. Therefore the images required augmentation. Marcus D. Bloice's Augmentor was selected for its simple API and its independence from other graphics libraries. The operations it performed were rotations, reflections, and random small distortions on a training set kept distinct from the final test set.

Definition of image augmentation:
+ Deep networks need large amount of training data to achieve good performance. To build a powerful image classifier using very little training data, image augmentation is usually required to boost the performance of deep networks. Image augmentationartificially creates training images through different ways of processing or combination of multiple processing, such as random rotation, shifts, shear and flips, etc.(10)

**Example augmentation:**

In [0]:
!pip install Augmentor
!wget http://paleobots.com/classifier/datasets_extended.tar.gz
!tar -xzf datasets_extended.tar.gz


In [0]:
import Augmentor
p = Augmentor.Pipeline("datasets_extended/4FocalPlanes/Cycladophora Spongothorax")

p.rotate90(probability=0.5)
p.rotate270(probability=0.5)
p.flip_left_right(probability=0.8)
p.flip_top_bottom(probability=0.3)
p.random_distortion(probability=1, grid_width=4, grid_height=4, magnitude=8)

p.sample(40)

**Procedure for tests of classifier performance on a variety of datasets:**

1. Train the neural net
    1. Organize the images into directories by taxa
    2. Name each directory after the taxon it will be containing
    3. Populate each directory with the species
    4. Select a subset of the taxa for an experiment
    5. Scale: Images were scaled to have a uniform pixels per micrometer. The mogrify command was used on the Olympus images with the appropriate coefficient calculated in Curation. 
    6. Select architecture and hyperparameters for neural net
    7. Train the neural net (this employs the hardware and software materials listed) 
    8. Test the neural net against an unused test set
    9. Record the data, and archive the dataset and results
    10. Repeat from 7

Datasets
Training, validation and test
The taxa in this dataset are 
1. Antarctissa Ballista 43 images
2. Antarctissa Cylindrica 135 images
3. Antarctissa Denticulata 124 images
4. Antarctissa Strelkovi 130 images
5. Antarctiss Arobusta 35 images
6. Cycladophora Golli 45 images 
7. Helotholus Praevema 50 images
8. Antarctissa Deflandrei 91 images
9. Cycladophora Bicornis 40 images
10. Cycladophora Davisiana 65 images
11. Helotholus Vema 58 images
12. Lithomelissa Setosa 44 images

Extended test set
The taxa in this dataset are 
1. Antarctissa Ballista
2. Antarctissa Cylindrica
3. Antarctissa Denticulata
4. Antarctissa Strelkovi
5. Antarctiss Arobusta
6. Cycladophora Golli
7. Helotholus Praevema
8. Antarctissa Deflandrei
9. Cycladophora Bicornis
10. Cycladophora Davisiana
11. Helotholus Vema
12. Lithomelissa Setosa



Parameters

Architecture: mobilenet_1.0_224
    * convolution dimension: 3x3
    * Activation function: rectified linear unit
    * depth: 30 
Preliminary dataset: Imagenet 
Final layer dataset: dataset_jpgs2size
Learning rate: default=0.01
Number of steps: 400
Training percentage: 80%
Validation percentage: 10%
Testing percentage: 10%
width multiplier: 1.0
image resolution: 192

python -m scripts.retrain   --bottleneck_dir=tf_files/bottlenecks   --model_dir=tf_files/models/   --summaries_dir=tf_files/training_summaries/""   --output_graph=tf_files/retrained_graph.pb   --output_labels=tf_files/retrained_labels.txt   --architecture="mobilenet_1.0_192" --image_dir=/home/paleo/1radiolaria/dataset_jpgs2size --print_misclassified_test_images --how_many_training_steps=400
        
        


### Cross Validation

Monte Carlo cross validation, also known as repeated random sub-sampling was employed to assess predictive accuracy and significance of focal planes. An incremented seed was fed to a pseudo-random number generator to select different training, validation and test sets many times. In this example of ten sample classifications, a mean accuracy of specimen classification was found to be 72.63% with .02 standard deviation. A mean accuracy of image classification was found to be 68.88%. In every classification run considering multiple focal planes per specimen, a higher recognition rate was achieved with a mean improvement of 3.75%. In hundreds of informal observations, consideration of multiple focal planes improved recognition rate in all but two instances.  

<table cellspacing="0" border="0">
	<colgroup width="85"></colgroup>
	<colgroup width="101"></colgroup>
	<colgroup width="95"></colgroup>
	<colgroup width="88"></colgroup>
	<colgroup span="2" width="85"></colgroup>
	<tr>
		<td height="17" align="left"><font face="DejaVu Sans"><br></font></td>
		<td align="left"><font face="DejaVu Sans">Accuracy,</font></td>
		<td align="left"><font face="DejaVu Sans">Accuracy,</font></td>
		<td align="left"><font face="DejaVu Sans"><br></font></td>
		<td align="left"><br></td>
		<td align="left"><br></td>
	</tr>
	<tr>
		<td height="17" align="left"><font face="DejaVu Sans">Seed</font></td>
		<td align="left"><font face="DejaVu Sans">per image</font></td>
		<td align="left"><font face="DejaVu Sans">per specimen</font></td>
		<td align="left"><font face="DejaVu Sans">Improvement</font></td>
		<td align="right"><font face="DejaVu Sans">Train time</font></td>
		<td align="right"><font face="DejaVu Sans">Classify time</font></td>
	</tr>
	<tr>
		<td height="17" align="left" sdval="1" sdnum="1033;"><font face="DejaVu Sans">1</font></td>
		<td align="left" sdval="0.7191" sdnum="1033;0;0.00%"><font face="DejaVu Sans">71.91%</font></td>
		<td align="right" sdval="0.7368" sdnum="1033;0;0.00%"><font face="DejaVu Sans">73.68%</font></td>
		<td align="right" sdval="0.0177000000000002" sdnum="1033;0;0.00%"><font face="DejaVu Sans">1.77%</font></td>
		<td align="right" sdnum="1033;0;@"><font face="DejaVu Sans">0:00:34</font></td>
		<td align="right" sdnum="1033;0;@"><font face="DejaVu Sans">0:00:53.49</font></td>
	</tr>
	<tr>
		<td height="17" align="left" sdval="2" sdnum="1033;"><font face="DejaVu Sans">2</font></td>
		<td align="left" sdval="0.6742" sdnum="1033;0;0.00%"><font face="DejaVu Sans">67.42%</font></td>
		<td align="right" sdval="0.7263" sdnum="1033;0;0.00%"><font face="DejaVu Sans">72.63%</font></td>
		<td align="right" sdval="0.0520999999999999" sdnum="1033;0;0.00%"><font face="DejaVu Sans">5.21%</font></td>
		<td align="right" sdnum="1033;0;@"><font face="DejaVu Sans">0:00:34</font></td>
		<td align="right" sdnum="1033;0;@"><font face="DejaVu Sans">0:01:07.82</font></td>
	</tr>
	<tr>
		<td height="17" align="left" sdval="3" sdnum="1033;"><font face="DejaVu Sans">3</font></td>
		<td align="left" sdval="0.6685" sdnum="1033;0;0.00%"><font face="DejaVu Sans">66.85%</font></td>
		<td align="right" sdval="0.7053" sdnum="1033;0;0.00%"><font face="DejaVu Sans">70.53%</font></td>
		<td align="right" sdval="0.0368000000000001" sdnum="1033;0;0.00%"><font face="DejaVu Sans">3.68%</font></td>
		<td align="right" sdnum="1033;0;@"><font face="DejaVu Sans">0:00:34</font></td>
		<td align="right" sdnum="1033;0;@"><font face="DejaVu Sans">0:00:55.43</font></td>
	</tr>
	<tr>
		<td height="17" align="left" sdval="4" sdnum="1033;"><font face="DejaVu Sans">4</font></td>
		<td align="left" sdval="0.7022" sdnum="1033;0;0.00%"><font face="DejaVu Sans">70.22%</font></td>
		<td align="right" sdval="0.7368" sdnum="1033;0;0.00%"><font face="DejaVu Sans">73.68%</font></td>
		<td align="right" sdval="0.0346000000000002" sdnum="1033;0;0.00%"><font face="DejaVu Sans">3.46%</font></td>
		<td align="right" sdnum="1033;0;@"><font face="DejaVu Sans">0:00:34</font></td>
		<td align="right" sdnum="1033;0;@"><font face="DejaVu Sans">0:01:24.49</font></td>
	</tr>
	<tr>
		<td height="17" align="left" sdval="5" sdnum="1033;"><font face="DejaVu Sans">5</font></td>
		<td align="left" sdval="0.6742" sdnum="1033;0;0.00%"><font face="DejaVu Sans">67.42%</font></td>
		<td align="right" sdval="0.7474" sdnum="1033;0;0.00%"><font face="DejaVu Sans">74.74%</font></td>
		<td align="right" sdval="0.0731999999999999" sdnum="1033;0;0.00%"><font face="DejaVu Sans">7.32%</font></td>
		<td align="right" sdnum="1033;0;@"><font face="DejaVu Sans">0:00:34</font></td>
		<td align="right" sdnum="1033;0;@"><font face="DejaVu Sans">0:01:39.75</font></td>
	</tr>
	<tr>
		<td height="17" align="left" sdval="6" sdnum="1033;"><font face="DejaVu Sans">6</font></td>
		<td align="left" sdval="0.6854" sdnum="1033;0;0.00%"><font face="DejaVu Sans">68.54%</font></td>
		<td align="right" sdval="0.7158" sdnum="1033;0;0.00%"><font face="DejaVu Sans">71.58%</font></td>
		<td align="right" sdval="0.0304" sdnum="1033;0;0.00%"><font face="DejaVu Sans">3.04%</font></td>
		<td align="right" sdnum="1033;0;@"><font face="DejaVu Sans">0:00:34</font></td>
		<td align="right" sdnum="1033;0;@"><font face="DejaVu Sans">0:01:57.45</font></td>
	</tr>
	<tr>
		<td height="17" align="left" sdval="7" sdnum="1033;"><font face="DejaVu Sans">7</font></td>
		<td align="left" sdval="0.7079" sdnum="1033;0;0.00%"><font face="DejaVu Sans">70.79%</font></td>
		<td align="right" sdval="0.7579" sdnum="1033;0;0.00%"><font face="DejaVu Sans">75.79%</font></td>
		<td align="right" sdval="0.0499999999999999" sdnum="1033;0;0.00%"><font face="DejaVu Sans">5.00%</font></td>
		<td align="right" sdnum="1033;0;@"><font face="DejaVu Sans">0:00:34</font></td>
		<td align="right" sdnum="1033;0;@"><font face="DejaVu Sans">0:02:14.70</font></td>
	</tr>
	<tr>
		<td height="17" align="left" sdval="8" sdnum="1033;"><font face="DejaVu Sans">8</font></td>
		<td align="left" sdval="0.6629" sdnum="1033;0;0.00%"><font face="DejaVu Sans">66.29%</font></td>
		<td align="right" sdval="0.7053" sdnum="1033;0;0.00%"><font face="DejaVu Sans">70.53%</font></td>
		<td align="right" sdval="0.0424" sdnum="1033;0;0.00%"><font face="DejaVu Sans">4.24%</font></td>
		<td align="right" sdnum="1033;0;@"><font face="DejaVu Sans">0:00:34</font></td>
		<td align="right" sdnum="1033;0;@"><font face="DejaVu Sans">0:02:32.39</font></td>
	</tr>
	<tr>
		<td height="17" align="left" sdval="9" sdnum="1033;"><font face="DejaVu Sans">9</font></td>
		<td align="left" sdval="0.7079" sdnum="1033;0;0.00%"><font face="DejaVu Sans">70.79%</font></td>
		<td align="right" sdval="0.7368" sdnum="1033;0;0.00%"><font face="DejaVu Sans">73.68%</font></td>
		<td align="right" sdval="0.0289" sdnum="1033;0;0.00%"><font face="DejaVu Sans">2.89%</font></td>
		<td align="right" sdnum="1033;0;@"><font face="DejaVu Sans">0:00:34</font></td>
		<td align="right" sdnum="1033;0;@"><font face="DejaVu Sans">0:02:52.77</font></td>
	</tr>
	<tr>
		<td height="17" align="left" sdval="10" sdnum="1033;"><font face="DejaVu Sans">10</font></td>
		<td align="left" sdval="0.6854" sdnum="1033;0;0.00%"><font face="DejaVu Sans">68.54%</font></td>
		<td align="right" sdval="0.6947" sdnum="1033;0;0.00%"><font face="DejaVu Sans">69.47%</font></td>
		<td align="right" sdval="0.00929999999999997" sdnum="1033;0;0.00%"><font face="DejaVu Sans">0.93%</font></td>
		<td align="right" sdnum="1033;0;@"><font face="DejaVu Sans">0:00:34</font></td>
		<td align="right" sdnum="1033;0;@"><font face="DejaVu Sans">0:03:12.49</font></td>
	</tr>
	<tr>
		<td height="17" align="left"><font face="DejaVu Sans">Mean</font></td>
		<td align="left" sdval="0.68877" sdnum="1033;0;0.00%"><font face="DejaVu Sans">68.88%</font></td>
		<td align="right" sdval="0.72631" sdnum="1033;0;0.00%"><font face="DejaVu Sans">72.63%</font></td>
		<td align="right" sdval="0.03754" sdnum="1033;0;0.00%"><font face="DejaVu Sans">3.75%</font></td>
		<td align="left"><br></td>
		<td align="left"><br></td>
	</tr>
	<tr>
		<td height="17" align="left"><font face="DejaVu Sans">Std Dev.</font></td>
		<td align="left" sdval="0.0183292143857831" sdnum="1033;0;#,##0.00"><font face="DejaVu Sans">0.02</font></td>
		<td align="right" sdval="0.019404455673891" sdnum="1033;0;#,##0.00"><font face="DejaVu Sans">0.02</font></td>
		<td align="right" sdval="0.0172871165901083" sdnum="1033;0;#,##0.00"><font face="DejaVu Sans">0.02</font></td>
		<td align="left"><br></td>
		<td align="left"><br></td>
	</tr>
</table>

<br>
### Example cross validation run

The results in the table above were generated by the example training and classification code below. To execute all of the 10 separate cells sequentially, issue the "Run all" or "Run after" commands.


In [0]:
import sys
sys.path.insert(0, 'code')
import classify

model_file = "models/example.pb"
dataset = "test_sets/example"
label_file = "labels/example.txt"

!python \
-m code.train --validation_percentage=12 \
--test_seed=1 \
--bottleneck_dir=tmp/bottlenecks \
--test_set=test_sets/example/ \
--model_dir=tmp/models/ \
--summaries_dir=tmp/training_summaries/ \
--output_graph=models/example.pb   \
--output_labels=labels/example.txt   \
--architecture="mobilenet_1.0_224" \
--image_dir=datasets/dataset202s \
--how_many_training_steps=500 \
--learning_rate=0.01

classify.evaluateDirectory(model_file,dataset,label_file,1)

In [0]:
!python \
-m code.train --validation_percentage=12 \
--test_seed=2 \
--bottleneck_dir=tmp/bottlenecks \
--test_set=test_sets/example/ \
--model_dir=tmp/models/ \
--summaries_dir=tmp/training_summaries/ \
--output_graph=models/example.pb   \
--output_labels=labels/example.txt   \
--architecture="mobilenet_1.0_224" \
--image_dir=datasets/dataset202s \
--how_many_training_steps=500 \
--learning_rate=0.01

classify.evaluateDirectory(model_file,dataset,label_file,1)

In [0]:
!python \
-m code.train --validation_percentage=12 \
--test_seed=3 \
--bottleneck_dir=tmp/bottlenecks \
--test_set=test_sets/example/ \
--model_dir=tmp/models/ \
--summaries_dir=tmp/training_summaries/ \
--output_graph=models/example.pb   \
--output_labels=labels/example.txt   \
--architecture="mobilenet_1.0_224" \
--image_dir=datasets/dataset202s \
--how_many_training_steps=500 \
--learning_rate=0.01

classify.evaluateDirectory(model_file,dataset,label_file,1)

In [0]:
!python \
-m code.train --validation_percentage=12 \
--test_seed=4 \
--bottleneck_dir=tmp/bottlenecks \
--test_set=test_sets/example/ \
--model_dir=tmp/models/ \
--summaries_dir=tmp/training_summaries/ \
--output_graph=models/example.pb   \
--output_labels=labels/example.txt   \
--architecture="mobilenet_1.0_224" \
--image_dir=datasets/dataset202s \
--how_many_training_steps=500 \
--learning_rate=0.01

classify.evaluateDirectory(model_file,dataset,label_file,1)

In [0]:
!python \
-m code.train --validation_percentage=12 \
--test_seed=5 \
--bottleneck_dir=tmp/bottlenecks \
--test_set=test_sets/example/ \
--model_dir=tmp/models/ \
--summaries_dir=tmp/training_summaries/ \
--output_graph=models/example.pb   \
--output_labels=labels/example.txt   \
--architecture="mobilenet_1.0_224" \
--image_dir=datasets/dataset202s \
--how_many_training_steps=500 \
--learning_rate=0.01

classify.evaluateDirectory(model_file,dataset,label_file,1)

In [0]:
!python \
-m code.train --validation_percentage=12 \
--test_seed=6 \
--bottleneck_dir=tmp/bottlenecks \
--test_set=test_sets/example/ \
--model_dir=tmp/models/ \
--summaries_dir=tmp/training_summaries/ \
--output_graph=models/example.pb   \
--output_labels=labels/example.txt   \
--architecture="mobilenet_1.0_224" \
--image_dir=datasets/dataset202s \
--how_many_training_steps=500 \
--learning_rate=0.01

classify.evaluateDirectory(model_file,dataset,label_file,1)

In [0]:
!python \
-m code.train --validation_percentage=12 \
--test_seed=7 \
--bottleneck_dir=tmp/bottlenecks \
--test_set=test_sets/example/ \
--model_dir=tmp/models/ \
--summaries_dir=tmp/training_summaries/ \
--output_graph=models/example.pb   \
--output_labels=labels/example.txt   \
--architecture="mobilenet_1.0_224" \
--image_dir=datasets/dataset202s \
--how_many_training_steps=500 \
--learning_rate=0.01

classify.evaluateDirectory(model_file,dataset,label_file,1)

In [0]:
!python \
-m code.train --validation_percentage=12 \
--test_seed=8 \
--bottleneck_dir=tmp/bottlenecks \
--test_set=test_sets/example/ \
--model_dir=tmp/models/ \
--summaries_dir=tmp/training_summaries/ \
--output_graph=models/example.pb   \
--output_labels=labels/example.txt   \
--architecture="mobilenet_1.0_224" \
--image_dir=datasets/dataset202s \
--how_many_training_steps=500 \
--learning_rate=0.01

classify.evaluateDirectory(model_file,dataset,label_file,1)

In [0]:
!python \
-m code.train --validation_percentage=12 \
--test_seed=9 \
--bottleneck_dir=tmp/bottlenecks \
--test_set=test_sets/example/ \
--model_dir=tmp/models/ \
--summaries_dir=tmp/training_summaries/ \
--output_graph=models/example.pb   \
--output_labels=labels/example.txt   \
--architecture="mobilenet_1.0_224" \
--image_dir=datasets/dataset202s \
--how_many_training_steps=500 \
--learning_rate=0.01

classify.evaluateDirectory(model_file,dataset,label_file,1)

In [0]:
!python \
-m code.train --validation_percentage=12 \
--test_seed=10 \
--bottleneck_dir=tmp/bottlenecks \
--test_set=test_sets/example/ \
--model_dir=tmp/models/ \
--summaries_dir=tmp/training_summaries/ \
--output_graph=models/example.pb   \
--output_labels=labels/example.txt   \
--architecture="mobilenet_1.0_224" \
--image_dir=datasets/dataset202s \
--how_many_training_steps=500 \
--learning_rate=0.01

classify.evaluateDirectory(model_file,dataset,label_file,1)

### Example large dataset classification run

A classification was performed of the whole dataset of radiolarian images containing 1,987 image files from 16 different taxa. Each filename was displayed by the classifier, followed by the image itself with units, then the top five classifier inferences and their associated certainty. (Only the first ten images are displayed here to conform to Colaboratory’s output constraint.) Multiple focal planes were included for some specimens. A letter before the ‘.jpg’ extension indicated the focal plane, while the number before the letter indicated the specimen number. At the end of the first run of images, the total count and percentage correctly identified were displayed. In the subsequent accounting, the classifier retained only the focal plane for each image with the inference of greatest certainty. Individual specimens were presented with top five inferences. Finally a total count of specimens and the percentage of them correctly identified by the classifier were displayed.

In [0]:
import sys
sys.path.insert(0, 'code')

import classify

model_file = "models/example.pb"
dataset = "datasets/dataset100s"
label_file = "labels/example.txt"

classify.evaluateDirectory(model_file,dataset,label_file,1)

### Analysis/Display of results

Results were generally displayed in tabular form. Raw classification results were presented as generated. The file name, labeled by a domain expert was presented at the beginning of each output, followed by the classifier's top five classification inferences and their certainty.The top five inferences were sorted by certainty. At the end of the raw results, a tally of images and top-1 accuracy assessment was shown.

Once raw results were presented, the classifier processed results by specimen, considering only the focal plane inference of greatest certainty. A tally of specimens and the respective top-1 accuracy was shown.

A force directed graph was also rendered. This graph included representations of organization. The root of the graph was radiolaria, branching out to the genus, and then the separate taxa, and then individual specimens. The forces that directed the nodes were the certainty of inference for each specimen. Individuals were drawn closer to the nodes representing inferences of greater certainty. It is possible this tool could be used for showing morphological distance. If a specimen of a certain species seems closer to another, it could be more closely related to that species. Since fossil radiolaria do not contain DNA, this tool could be useful, because it uses the images of tests of fossil radiolaria to compute morphological distance. 

<br><br>

---
<br>

<p align="center">
![Live classifier](http://paleobots.com/classifier/images/paperimages/bclassifier.png)
</p>
## Live classifier
This Radiolarian classifier will identify any Radiolarian images in a dataset that is uploaded in a gzipped file. The classifier's neural net has been trained on a set of images from these taxa:

> Antarctissa cylindrica, Cycladophora cosma, Cycladophora spongothorax, Antarctissa deflandrei, Cycladophora davisiana, Helotholus praevema, Antarctissa denticulata, Cycladophora golli, Helotholus vema
Antarctissa robusta, Cycladophora humerus, Lithomelissa setosa, Antarctissa strelkovi, Cycladophora pliocenica

If the uploaded dataset is labeled according to the system described in the Curation section of this notebook, the classifier will also make a determination of its classification accuracy.

In [0]:
# <-- Click inside these brackets to
# run the retrained classifier
# It will prompt you to upload your gzipped dataset

top_n = 5
from google.colab import files
import tarfile
import os
import sys

sys.path.insert(0, 'code')
import classify

uploaded = files.upload()
fname = list(uploaded.keys())[0]
if (fname.endswith("tar.gz")):
    tar = tarfile.open(fname, "r:gz")
    tar.extractall()
    tar.close()
elif (fname.endswith("tar")):
    tar = tarfile.open(fname, "r:")
    tar.extractall()
    tar.close()
os.remove(fname)
model_file = "tmp/graph024.pb"
dataset = os.path.splitext(os.path.splitext(fname)[0])[0]
label_file = "tmp/classifier_labels.txt"

classify.evaluateDirectory(model_file,dataset,label_file,top_n)

## Visualization

Classification results were fed to a force-directed graph using the D3JS library.(7) This interactive graph shows an attractive force between nodes that is proportional to the neural net's certainty in their classification. A repulsive charge distributes the nodes to allow for visualization of relationships. Users of the graph can hover for individual identification and drag nodes to obtain new views.

In [0]:
# <-- Click inside these brackets to create the HTML code
# Then click the "Draw dendrogram" button to create the live graph
import sys
sys.path.insert(0, 'code')
import dendrogram

dendrogram.draw()

## Application

An android application was written to allow for real-time classification of radiolarians. [The app is available here](https://play.google.com/store/apps/details?id=com.paleobots.radioclassifier).

![Image not loading](http://paleobots.com/classifier/images/paperimages/appfar.png)
> *The application performing real-time classification of radiolarians from a computer display*
<br>
<br>
<br>

![Image not loading](http://paleobots.com/classifier/images/paperimages/appclose.png)
> *Close-up of the application at work*

## Appendices

### Bibliography

"A Primer on Python for Life Science Researchers - PLOS." 30 Nov. 2007, http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.0030199.

"Life History and Ecology of the Radiolaria." http://www.ucmp.berkeley.edu/protista/radiolaria/radlh.html.

"Late Pleistocene-Holocene radiolarian ... - ePIC - AWI." http://epic.awi.de/10506/.

"RadSS: A radiolarian classifier using support vector ... - IEEE Xplore." 19 Dec. 2016, http://ieeexplore.ieee.org/document/7785347/.

Keçeli, Ali Seydi & Kaya, Aydin & Uzunçimen Keçeli, Seda. (2017). Classification of radiolarian images with hand-crafted and deep features. Computers & Geosciences. 109. . 10.1016/j.cageo.2017.08.011. https://www.researchgate.net/publication/319158131_Classification_of_radiolarian_images_with_hand-crafted_and_deep_features

"[1409.0575] ImageNet Large Scale Visual Recognition Challenge - arXiv." 1 Sep. 2014, https://arxiv.org/abs/1409.0575.

"A Hybrid Space-Filling and Force-Directed Layout Method for ...." http://vis.cs.ucdavis.edu/papers/pacificvis09_Itoh.pdf. Accessed 2 Jan. 2018.

"MobileNets: Efficient Convolutional Neural Networks for Mobile Vision ...." 17 Apr. 2017, https://arxiv.org/abs/1704.04861. Accessed 8 Apr. 2018."

(9) "Image Augmentation for Deep Learning – Towards Data Science." 10 Jul. 2017, https://towardsdatascience.com/image-augmentation-for-deep-learning-histogram-equalization-a71387f609b2. Accessed 8 Jan. 2018.

### Training code

In [0]:
!cat code/train.py

### Classification code

In [0]:
!cat code/classify.py

### Visualization code

In [0]:
!cat code/dendrogram.py

### Selection of tools based on objectives

The objective of the research was to create a radiolarian classifier that reaches a reproducible human-level recognition rate. This guided the selection of tools. The decision was made to use open-source materials, and to perform the task with tools that use inexpensive, commonly-available CPUs, and GPUs. For documentation, Jupyter notebooks is an open-source interactive environment that can execute code, and has the abilities of HTML, supporting easy access and reproducibility for other researchers. Python is a widely used programming language for analytical data sciences. It is open source. It supports a wide variety of tools and is used as the foundation for programs such as Jupyter notebooks. Machine learning environments using Python on the open source GNU/Linux OS include Caffe, Theano and TensorFlow. For transfer learning the ImageNet database provides a large image dataset with labels. Google provides open-sourced pretrained models based on this database which already contained image recognition features.

### Equipment and Materials:
Computer, Model: Linux Os AMD FX-6100 cpu NVidia 1060 GT GPU
1150 Radiolarian images of 17 taxa in BMP and JPEG format, Resolutions, 1024x768 and 2048x1536, 24 bit RGB: Species: A denticulata, A ballista, A cylindrica, A strelkovi, A deflandrei, A robusta, L setosa, H praevema, H vema, C davisiana, C pliocenica, C bicornis, C cornutoides, C cosma, C spongothorax, C golli
Free and open-source software development environment: Linux, Python 3.6, Anaconda, Spyder, Jupyter Notebooks, Numpy, Mobilenet, MatplotLib, GNU Image Manipulation Program, Libre Office
Proprietary software: NVidia CUDA, cuDNN, Dropbox