A collection of ensemble models that predict interfacial tension (beta) from the edge profile of a pendant drop.
pdt-regressor can currently predict Beta values given a drop profile feature set with an RMSE of .004. feature datasets are currently generated by solving ODE to solve for radii of the droplet across a range of Beta and Smax values.
This model will also be trained and tested on a real droplet image profiles where the feature data sets are extracted from the output of the pdt-canny-edge-detector
.
profile data will be stored in the /data folder in .csv format.
The finalized models, feature extraction, and data preparation will be placed in a single pdt_regressor.py application that can be ran as a complete system.
However, for development and understanding, it is often useful to use JupyterNotebooks to visualize our data and step through parameter tuning. For that reason,
it is recommended to use a JupyterNotebook enabled IDE such as Pycharm
. Students and researchers are able to access the professional version for free.
To use this project, and develop on it, either download the .zip file from the repository or
git clone https://github.com/DmitriLyalikov/pdt_regressor.git
Open the project in your IDE and run
pip install . e
in the PyCharm IDE terminal. This should install all the library dependencies for the project like scikit-learn, xgboost, and pandas.
Trained models are stored in the /models folder as .pkl files. They can be stored and loaded as an XGB Regressor object using the pickle python package.
To load and use a model:
# Load the model from models folder
with open("../models/pdt-regression-model.pkl", 'rb') as f:
model = pickle.load(f)
To predict and get Root-Mean-Squared-Error of the prediction:
y_pred = model.predict(X_test)
reg_mse = mean_squared_error(y_test, y_pred)
reg_rmse = np.sqrt(reg_mse)
XGBoost Hyperparameters are used to improve the performance of the model, reduce variance, and minimize overfitting.
Some important HP are learning_rate (eta), max_depth, no_of_iterations, and subsamples. Complete list of XGBoost Hyperparameters can be found here
HP depend on the model, data, and methods of regression, and generally are found empirically. Included in XGBoost.ipynb is grid_search function which will automate the tuning process by finding the best parameter provided in params
grid_search(params={'max_depth': [1, 2, 3, 4, 5, 6]})
This will yield the output:
Best params: {'max_depth': 6}
Training score: 951.398
For full examples of usage consult XGBoost.ipynb provided. Generally Hyperparameters should be tested together, as one may or may not have an effect on another's score
Train/Test with pdt-dataset (2500 entries, Beta [0.4,0.8], Smax=True)
- n-estimators: 800
- learning_rate=.1
- max_depth = 5
Accuracy score on test data (.999), RMSE: (0.0034324513493428823)