Skip to content
/ VAE Public

Developing functional data analysis tools for analyzing high-throughput experimental spectra

License

Notifications You must be signed in to change notification settings

Bella-cell/VAE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Models for Infinity Dimensions (MID)

The MID project aims to produce a machine learning toolkit that can return the physical meaning behind the complex dataset. Considering the continuously growing research interest in using high throughput experiments in characterizations of newly synthesized samples/materials, tons of spectra data are being conducted, and thus an efficient toolkit for analyzing data is in great needed. In this project, the toolkit is built up with several unsupervised analytical methods, including normal PCA, tangent PCA and VAE. These methods will lower the dimension of a given dataset and produce plots of new low-dimensional spectrum. Therefore, it will make it easier for users to understand the physical meaning of the originally complex data. Moreover, by adding and subtracting one standard deviation of each newly generated spectrum, the user will be able to observe more information of the data and even make prediction for their future experiments.


Software Dependencies

For those who would like to run the jupyter and python files, please ensure you have the following:

  • Python 3.7
  • Python packages listed in environment.ymland Installation section.

Installation

Install and activate the 'MID' environment in your desired directory with the following commands:

git clone ......

cd MID

conda env create -f environment.yml

conda activate MID

This enviroment contains the following packages:

  • jupyter
  • pandas
  • numpy
  • scikit-learn
  • pip
  • pip:
    • plotly==5.6.0

Organization

  • UW Chemical Engineering
  • Clean Energy Institute
  • UW Direct

spectrum Data

The dataset is obtained from UV-vis spectroscopy of a kind of material in prof. Lilo Pozzo’s laboratory. Since the dataset is composed of 448 samples with 101 features, the original dimension of the dataset in 101, which is too complex to analyze and need further dimension reduction. Moreover, the data are functional data, which means they couldn’t be simply express by vectors and need more advanced analytical methods such as tangent PCA and VAE to process them. To see how spectrum data was obtained, please see data directory.

Citation for accompanying publication: Link


Authors

Bella Wu, Material Science and Engineering
Kim, Material Science and Engineering
Lilo Yeh, Material Science and Engineering
Nick Leu, Material Science and Engineering
William Lin, Material Science and Engineering

About

Developing functional data analysis tools for analyzing high-throughput experimental spectra

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published