pdt-regressor

A collection of ensemble models that predict interfacial tension (beta) from the edge profile of a pendant drop.

Artificial Model

pdt-regressor can currently predict Beta values given a drop profile feature set with an RMSE of .004. feature datasets are currently generated by solving ODE to solve for radii of the droplet across a range of Beta and Smax values.

Image Model

This model will also be trained and tested on a real droplet image profiles where the feature data sets are extracted from the output of the pdt-canny-edge-detector.
profile data will be stored in the /data folder in .csv format.

Requirements

The finalized models, feature extraction, and data preparation will be placed in a single pdt_regressor.py application that can be ran as a complete system. However, for development and understanding, it is often useful to use JupyterNotebooks to visualize our data and step through parameter tuning. For that reason, it is recommended to use a JupyterNotebook enabled IDE such as Pycharm. Students and researchers are able to access the professional version for free.

Setup

To use this project, and develop on it, either download the .zip file from the repository or

git clone https://github.com/DmitriLyalikov/pdt_regressor.git

Open the project in your IDE and run

pip install . e

in the PyCharm IDE terminal. This should install all the library dependencies for the project like scikit-learn, xgboost, and pandas.

Usage

Trained models are stored in the /models folder as .pkl files. They can be stored and loaded as an XGB Regressor object using the pickle python package.

To load and use a model:

# Load the model from models folder
with open("../models/pdt-regression-model.pkl", 'rb') as f:
    model = pickle.load(f)

To predict and get Root-Mean-Squared-Error of the prediction:

y_pred = model.predict(X_test)

reg_mse = mean_squared_error(y_test, y_pred)
reg_rmse = np.sqrt(reg_mse)

Hyperparameter Tuning

XGBoost Hyperparameters are used to improve the performance of the model, reduce variance, and minimize overfitting. Some important HP are learning_rate (eta), max_depth, no_of_iterations, and subsamples. Complete list of XGBoost Hyperparameters can be found here

HP depend on the model, data, and methods of regression, and generally are found empirically. Included in XGBoost.ipynb is grid_search function which will automate the tuning process by finding the best parameter provided in params

grid_search(params={'max_depth': [1, 2, 3, 4, 5, 6]})

This will yield the output:

Best params: {'max_depth': 6}
Training score: 951.398

For full examples of usage consult XGBoost.ipynb provided. Generally Hyperparameters should be tested together, as one may or may not have an effect on another's score

Results

Train/Test with pdt-dataset (2500 entries, Beta [0.4,0.8], Smax=True)

n-estimators: 800
learning_rate=.1
max_depth = 5

Accuracy score on test data (.999), RMSE: (0.0034324513493428823)

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.idea		.idea
data		data
feature_gen		feature_gen
models		models
notebooks		notebooks
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pdt-regressor

Artificial Model

Image Model

Table of Contents

Requirements

Setup

Usage

Hyperparameter Tuning

Results

Appendix

About

Releases

Packages

Languages

DmitriLyalikov/pdt_regressor

Folders and files

Latest commit

History

Repository files navigation

pdt-regressor

Artificial Model

Image Model

Table of Contents

Requirements

Setup

Usage

Hyperparameter Tuning

Results

Appendix

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages