ml-weight-fraction

This project uses machine learning (Python programming language) to approximate weight fractions of emerging chemicals in consumer products.

Launching the project

Before cloning this repository, try launching it through your browser:

Through the Binder interface, users may access all of our code in interactive iPython notebooks self-contained in a virtual, executable environment. I.e., the code for this project is entirely reproducible simply by clicking the button above.

Suggested Software

If you prefer to clone our Git repository, we suggest these software versions:

Python 3.7.3
Poetry 1.0 or higher

After cloning, launch the project by running the following in your command line interface:

cd ml-weight-fraction
poetry install
poetry run jupyter lab

Core Components

The suggested order to read/execute iPython notebooks:

functions.ipynb: Contains functions used across multiple notebooks (optional with Binder).
modelpipeline.ipynb: Framework for data-poor scenarios that tests and optimizes multiple machine learning prediction algorithms; includes (optional) augmentation of nanomaterials product data with organics product data.
organicspipeline.ipynb: Framework for data-rich scenarios that tests and optimizes multiple machine learning prediction algorithms.

Post-processing data files are provided. If you wish to go through the pre-processing steps and generate data summaries, run these notebooks:

preprocessENM.ipynb: Preprocessing steps for the nanomaterials product dataset (data-poor).
preprocessorganics.ipynb: Preprocessing steps for the bulk-scale organic chemical product dataset (data-rich).

Acronyms and Abbreviations

arr = array
bp = boiling point
cv = cross validation
df = data frame
enm = engineered nanomaterials
matrix_F = matrix of the product is a formulation (i.e., not a solid)
ml = machine learning
mp = melting point
mw = molecular weight
oecd = Organisation for Economic Co-operation and Development
prop = property
puc = product use category
rbf = radial basis function (a non-linear implementation of SVM)
rfc = random forest classifier
svc = support vector classifier (for categorical data)
svm = support vector machine
wf = weight fraction

Status

This project has been submitted for publication. Data sources are provided in the article.

Contact

Luka L. Thornton (thorn.luka@gmail.com)

Name		Name	Last commit message	Last commit date
Latest commit History 397 Commits
.devcontainer		.devcontainer
data		data
results		results
.gitignore		.gitignore
.python-version		.python-version
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
functions.ipynb		functions.ipynb
modelpipeline.ipynb		modelpipeline.ipynb
organicspipeline.ipynb		organicspipeline.ipynb
poetry.lock		poetry.lock
postBuild		postBuild
preprocessENM.ipynb		preprocessENM.ipynb
preprocessorganics.ipynb		preprocessorganics.ipynb
pyproject.toml		pyproject.toml
setup.py		setup.py

License

LukaThorn/ml-weight-fraction

Folders and files

Latest commit

History

Repository files navigation

ml-weight-fraction

Table of Contents

Launching the project

Suggested Software

Core Components

Acronyms and Abbreviations

Status

Contact

About

Resources

License

Stars

Watchers

Forks

Languages