Skip to content

Commit

Permalink
Minor additions and bug fixes (#13)
Browse files Browse the repository at this point in the history
* Water raman scans processing and viz

* Debugging the S3 demo data download

* attempting to migrate from circleCI to github actions

* attempting to migrate from circleCI to github actions

* attempting to migrate from circleCI to github actions

* attempting to migrate from circleCI to github actions

* attempting to migrate from circleCI to github actions

* attempting to migrate from circleCI to github actions

* attempting to migrate from circleCI to github actions

* Playing with github actions. Publish to pypi on release.

* integrating pre-commit and black

* getting the GH action linter working

* GH action for docs

* GH action for docs

* Debugging GH action for docs

* Debugging GH action for docs

* Debugging GH action for docs

* increment minor version for new release

* added some tests for new plotting functions.

* debugging codecov GH action.

* debugging codecov GH action.

* debugging codecov GH action.

* debugging codecov GH action.

* Update README

* JOSS paper prep.

* Added the MIT REMORA instrument and fixed minor bugs.
  • Loading branch information
drewmee committed Jun 11, 2021
1 parent 0f41b69 commit d9407b5
Show file tree
Hide file tree
Showing 16 changed files with 254 additions and 53 deletions.
31 changes: 31 additions & 0 deletions .github/workflows/codecov.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
name: Codecov
on: [push]
jobs:
run:
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest]
env:
OS: ${{ matrix.os }}
PYTHON: '3.7'
steps:
- uses: actions/checkout@master
- name: Setup Python
uses: actions/setup-python@master
with:
python-version: 3.7
- name: Generate coverage report
run: |
python -m pip install --upgrade pip
pip install -e .[tests]
pip install pytest-cov
pytest --cov=./ --cov-report=xml
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v1.0.5
with:
token: ${{ secrets.CODECOV_TOKEN }}
file: ./coverage.xml
flags: unittests
name: codecov-umbrella
fail_ci_if_error: true
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
# PyEEM

![Test](https://github.com/drewmee/PyEEM/workflows/Test/badge.svg)
[![Read the Docs](https://readthedocs.org/projects/pyeem/badge/?version=latest)](https://pyeem.readthedocs.io/)
[![PyPi version](https://img.shields.io/pypi/v/pyeem.svg 'pypi version')](https://pypi.org/project/pyeem/)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/pyeem.svg)](https://pypi.org/project/pyeem/)
[![Test](https://github.com/drewmee/PyEEM/workflows/Test/badge.svg)](https://github.com/drewmee/PyEEM/actions?query=workflow%3ATest)
[![Read the Docs](https://readthedocs.org/projects/pyeem/badge/?version=latest)](https://pyeem.readthedocs.io/)
[![codecov](https://codecov.io/gh/drewmee/PyEEM/branch/master/graph/badge.svg?token=RAPG3XDZ6H)](https://codecov.io/gh/drewmee/PyEEM)
[![Code style](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![Binder](https://mybinder.org/badge.svg)](https://mybinder.org/v2/gh/drewmee/PyEEM/master?filepath=docs%2Fsource%2Ftutorials%2Fnotebooks)
[![License](https://img.shields.io/github/license/mashape/apistatus.svg)](https://github.com/drewmee/PyEEM/blob/master/LICENSE)
[![Code style](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
<!--- Badge for codecov -->

Python library for the preprocessing, analysis, and visualization of Excitation Emission Matrices (EEMs).

Expand Down
21 changes: 21 additions & 0 deletions docs/source/LICENSE_opcsim
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2016-2020 David H Hagan and Jesse H Kroll

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
File renamed without changes.
42 changes: 34 additions & 8 deletions docs/source/tutorials/notebooks/tutorial_2.ipynb

Large diffs are not rendered by default.

40 changes: 12 additions & 28 deletions paper/paper.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: 'PyEEM: A Python library for the preprocessing, correction, deconvolution and analysis of Excitation Emission Matrices (EEMs).'
title: 'PyEEM: A Python library for the preprocessing, correction, and analysis of Excitation Emission Matrices (EEMs).'
tags:
- python
- fluorescence
Expand All @@ -9,10 +9,12 @@ tags:
authors:
- name: Drew Meyers
affiliation: "1, 2"
- name: Jay W Rutherford
affiliation: 3
- name: Qinmin Zheng
affiliation: 2
- name: Fabio Duarte
affiliation: "2, 3"
affiliation: "2, 4"
- name: Carlo Ratti
affiliation: 2
- name: Harold H Hemond
Expand All @@ -24,42 +26,24 @@ affiliations:
index: 1
- name: Senseable City Lab, Massachusetts Institute of Technology
index: 2
- name: Pontifícia Universidade Católica do Paraná, Brazil
- name: Department of Chemical Engineering, University of Washington
index: 3
- name: Pontifícia Universidade Católica do Paraná, Brazil
index: 4
date: 2020-07-08
bibliography: paper.bib
---

# Statement of Need

Fluorescence Excitation and Emission Matrix Spectroscopy (EEMs) is a popular analytical technique in environmental monitoring. In particular, it has been applied extensively to investigate the composition and concentration of dissolved organic matter (DOM) in aquatic systems [@Coble1990;@McKnight2001;@Fellman2010]. Historically, EEMs have been combined with multi-way techniques such as PCA, ICA, and PARAFAC in order to decompose chemical mixtures [@Bro1997;@Stedmon2008;@Murphy2013;@CostaPereira2018]. More recently, machine learning approaches such as convolutional neural networks (CNNs) and autoencoders have been applied to EEMs for source sepearation of chemical mixtures [@Cuss2016;@Peleato2018;@Ju2019;@Rutherford2020]. However, before these source separation techniques can be performed, several preprocessing and correction steps must be applied to the raw EEMs. In order to achieve comparability between studies, standard methods to apply these corrections have been developed [@Ohno2002;@Bahram2006;@Lawaetz2009;@R.Murphy2010;@Murphy2011;@Kothawala2013]. These standard methods have been implemented in Matlab and R packages [@Murphy2013;@Massicotte;Pucher2019]. However until PyEEM, no Python package existed which implemented these standard correction steps. Furthermore, the Matlab and R implementations impose metadata schemas on users which limit their ability to track several important metrics corresponding with each measurement set. By providing a Python implementation, researchers will now be able to more effectively leverage Python's large scienfitic computing ecosystem when working with EEMs.

In addition to the implementation of the preprocessing and correction steps, PyEEM also provides researchers with the ability to create augmented mixture and single source training data from a small set of calibration EEM measurements. The augmentation technique relies on the fact that fluorescnce spectra are linearly additive in mixtures, according to Beer's law [source]. This augmentation technique was first described in Rutherford et al., in which it was used to train a CNN to predict the concentration of single sources of pollutants in spectral mixtures [@Rutherford2020]. Additionally, augmented and synthetic data has shown promise in improving the performace of deep learning models in several fields [@Nikolenko2019].

PyEEM provides the first open source implementation of such an augmentation technique for EEMs. PyEEM also provides plots toolbox useful in the interpretation of EEMs... [@Hansen2018]

# Summary

- A summary describing the high-level functionality and purpose of the software for a diverse, non-specialist audience...
- Description of how the software enables some new research challenges to be addressed or makes addressing research challenges significantly better (e.g., faster, easier, simpler)...
- Description of how the software is feature-complete (i.e. no half-baked solutions) and designed for maintainable extension (not one-off modifications of existing tools)...
Fluorescence Excitation and Emission Matrix Spectroscopy (EEMs) is a popular analytical technique in environmental monitoring. In particular, it has been applied extensively to investigate the composition and concentration of dissolved organic matter (DOM) in aquatic systems [@Coble1990;@McKnight2001;@Fellman2010]. Historically, EEMs have been combined with multi-way techniques such as PCA, ICA, and PARAFAC in order to decompose chemical mixtures [@Bro1997;@Stedmon2008;@Murphy2013;@CostaPereira2018]. More recently, deep learning approaches such as convolutional neural networks (CNNs) and autoencoders have been applied to EEMs for source separation of chemical mixtures [@Cuss2016;@Peleato2018;@Ju2019;@Rutherford2020]. However, before these source separation techniques can be performed, several preprocessing and correction steps must be applied to the raw EEMs. In order to achieve comparability between studies, standard methods to apply these corrections have been developed [@Ohno2002;@Bahram2006;@Lawaetz2009;@R.Murphy2010;@Murphy2011;@Kothawala2013]. PyEEM provides a Python implementation for these standard preprocessing and correction steps for EEM measurements produced by several common spectrofluorometers.

PyEEM is a python library for the preprocessing, correction, deconvolution and analysis of Excitation Emission Matrices (EEMs)...
In addition to the implementation of the standard preprocessing and correction steps, PyEEM also provides researchers with the ability to create augmented single source and mixture training data from a small set of calibration EEM measurements. The augmentation technique relies on the fact that fluorescence spectra are linearly additive in mixtures, according to Beer's law. This augmentation technique was first described in Rutherford et al., in which it was used to train a CNN to predict the concentration of single sources of pollutants in spectral mixtures [@Rutherford2020]. Additionally, augmented and synthetic data has shown promise in improving the performance of deep learning models in several fields [@Nikolenko2019].

- Supported instruments, example datasets
- Metadata schema [@Hansen2018]
- Preprocessing, corrections, and filtering:
- Cropping and wavelength filtering [SOURCE]
- Blank subtraction [SOURCE]
- Scattering removal [@Bahram2006]
- Include Zepp 2004.
- Inner-filter effect correction [@Ohno2002;@Kothawala2013]
- Raman normalization [@Lawaetz2009;@Murphy2011]
- Augmentation [@Rutherford2020]
- plots [@Hansen2018]
Finally, PyEEM provides an extensive visualization toolbox, based on Matplotlib, which is useful in the interpretation of EEM datasets. This visualization toolbox includes various ways of plotting EEMs, the visualization of the Raman scatter peak area over time, and more.

# Acknowledgements
# Statement of Need

We acknowledge contributions from...
Prior to PyEEM, no open source Python package existed to work with EEMs. However, such libraries have existed for MATLAB and R for some time [@Murphy2013;@Massicotte;Pucher2019]. By providing a Python implementation, researchers will now be able to more effectively leverage Python's large scientific computing ecosystem when working with EEMs. Furthermore, the existing libraries in MATLAB and R do not provide deep learning techniques for decomposing chemical mixtures from EEMs. These libraries provide PARAFAC methods for performing such a task. However, although this technique has been widely used for some time, it has its limitations and recent work has shown promise in using deep learning approaches. For this reason, PyEEM provides a toolbox for generating augmented training data as well as an implementation of the CNN architecture reported in Rutherford et al., which has shown to be able to successfully decompose spectral mixtures [@Rutherford2020].

# References
12 changes: 11 additions & 1 deletion pyeem/analysis/models/rutherfordnet.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@
)
from tensorflow.keras.models import Sequential

# from tensorflow.keras.optimizers import Adam


class RutherfordNet:
"""The convolutional neural network (CNN) described in Rutherford et al. 2020."""
Expand Down Expand Up @@ -86,6 +88,12 @@ def create_model(
default_compile_kws = dict(
loss="mean_squared_error", optimizer="adam", metrics=["accuracy"]
)
"""
opt = Adam(learning_rate=0.0001)
default_compile_kws = dict(
loss="mean_squared_error", optimizer=opt, metrics=["accuracy"]
)
"""
compile_kws = dict(default_compile_kws, **compile_kws)
model.compile(**compile_kws)
return model
Expand Down Expand Up @@ -229,7 +237,9 @@ def get_test_data(self, dataset, routine_results_df):
"""
test_samples_df = self._isolate_test_samples(dataset, routine_results_df)

sources = test_samples_df.index.get_level_values("source").unique().values
sources = (
test_samples_df.index.get_level_values("source").unique().dropna().values
)
sources = np.delete(sources, np.where(sources == "mixture"))

X = []
Expand Down
6 changes: 4 additions & 2 deletions pyeem/augmentation/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ def prototypical_spectrum(dataset, source_df):
)

proto_eems = []
for index, row in source_df.iterrows():
for index, row in source_df[source_df["prototypical_sample"]].iterrows():
eem_path = row["hdf_path"]
eem = pd.read_hdf(dataset.hdf, key=eem_path)
proto_eems.append(eem)
Expand All @@ -51,11 +51,13 @@ def prototypical_spectrum(dataset, source_df):
"concentration"
].mean()

"""
weights = []
for i in range(len(proto_eems)):
weights.append(random.uniform(0, 1))

proto_eem = np.average([eem.values for eem in proto_eems], axis=0, weights=weights)
"""
proto_eem = np.average([eem.values for eem in proto_eems], axis=0)

proto_eem = pd.DataFrame(
data=proto_eem, index=proto_eems[0].index, columns=proto_eems[0].columns
Expand Down
4 changes: 4 additions & 0 deletions pyeem/instruments/MIT/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
from .remora import Remora

name = "MIT"
instruments = [Remora]
83 changes: 83 additions & 0 deletions pyeem/instruments/MIT/remora.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
import pandas as pd


class Remora:
"""The MIT REMORA, a field compact deployable spectrofluorometer."""

manufacturer = "MIT"
"""Name of Manufacturer."""

name = "REMORA"
"""Name of Instrument."""

supported_models = ["REMORA-V1"]
"""List of supported models."""

def __init__(self, model, sn=None):
"""
Args:
model (str): The model name of the instrument.
sn (str or int, optional): The serial number of the instrument.
Defaults to None.
"""
self.model = model
self.sn = sn

@staticmethod
def load_eem(filepath):
"""Loads an Excitation Emission Matrix which is generated by the instrument.
Args:
filepath (str): The filepath of the data file.
Returns:
pandas.DataFrame: An Excitation Emission Matrix.
"""
eem_df = pd.read_csv(filepath, index_col=0)
eem_df.columns = eem_df.columns.astype(float)
eem_df = eem_df.sort_index(axis=0)
eem_df = eem_df.sort_index(axis=1)
eem_df.index.name = "emission_wavelength"
return eem_df

def load_absorbance(filepath):
"""Loads an absorbance spectrum which is generated by the instrument.
Args:
filepath (str): The filepath of the data file.
Returns:
pandas.DataFrame: An absorbance spectrum.
"""
absorb_df = pd.read_csv(filepath, index_col=0)
absorb_df.index.name = "excitation_wavelength"
absorb_df.sort_index(axis=0)
absorb_df.index = absorb_df.index.astype("float64")
return absorb_df

def load_water_raman(filepath):
"""Loads a water Raman spectrum which is generated by the instrument.
Args:
filepath (str): The filepath of the data file.
Returns:
pandas.DataFrame: An absorbance spectrum.
"""
raman_df = pd.read_csv(filepath, index_col=0)
raman_df.columns = raman_df.columns.astype(float)
raman_df = raman_df.sort_index(axis=0)

raman_df = raman_df.rename(columns={raman_df.columns[0]: "intensity"})
raman_df.index.name = "emission_wavelength"
return raman_df

@staticmethod
def load_spectral_corrections():
"""TODO - Should load instrument specific spectral corrections which will
be used in data preprocessing.
Raises:
NotImplementedError: On the TODO list...
"""
raise NotImplementedError()
11 changes: 9 additions & 2 deletions pyeem/instruments/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,13 @@
from . import agilent, horiba, tecan
from . import MIT, agilent, horiba, tecan
from .base import _get_dataset_instruments_df, get_supported_instruments

supported, _supported = get_supported_instruments()

__all__ = ["agilent", "horiba", "tecan", "get_supported_instruments", "supported"]
__all__ = [
"agilent",
"horiba",
"tecan",
"MIT",
"get_supported_instruments",
"supported",
]
3 changes: 2 additions & 1 deletion pyeem/instruments/base.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import pandas as pd

from . import agilent, horiba, tecan
from . import MIT, agilent, horiba, tecan


def get_supported_instruments():
Expand All @@ -17,6 +17,7 @@ def get_supported_instruments():
agilent.name: agilent.instruments,
horiba.name: horiba.instruments,
tecan.name: tecan.instruments,
MIT.name: MIT.instruments,
}
# instruments = [Aqualog, Fluorolog, Cary]
df = pd.DataFrame()
Expand Down
2 changes: 1 addition & 1 deletion pyeem/plots/augmentations.py
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,7 @@ def single_source_animation(
max_val = ss_np.max()

default_plot_kws = dict(vmin=min_val, vmax=max_val)
plot_kws = dict(default_fig_kws, **plot_kws)
plot_kws = dict(default_plot_kws, **plot_kws)

default_kwargs = dict(zlim_min=min_val, zlim_max=max_val, title=None)
kwargs = dict(default_kwargs, **kwargs)
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
"numpy<1.19.0,>=1.18.5",
"pandas>=1.0.5",
"xlrd >= 1.0.0",
"h5py>=2.10.0",
"h5py<2.11.0,>=2.10.0",
"tables>=3.6.1",
"matplotlib>=3.3.0",
"celluloid>=0.2.0",
Expand Down
2 changes: 1 addition & 1 deletion tests/test_instruments.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@


class TestInstruments:
manufacturers = ["Agilent", "Horiba", "Tecan"]
manufacturers = ["Agilent", "Horiba", "Tecan", "MIT"]
"""
manuf_instruments = {
pyeem.instruments.agilent.name: pyeem.instruments.agilent.instruments,
Expand Down
Loading

0 comments on commit d9407b5

Please sign in to comment.