# Practicum II Part II

## Study on Errors of Pressure and Viscosity
 Special Thanks to Dr. Charles Thangaraj for helping me set up the base used for this project back in Summer of 2023.
## Objective
* This research project aims to study errors (calculation deviations) associated with the 1D Poiseuille model found in "Code Development for Flow Simulation" by Christopher Lama and Dr. Arati Nanda Pati of the University of St Thomas-Houston.
* In 2023, a traditional simulation approach was taken, where the 1D Poiseuille model was solved repeatedly with varying parameters of pressure, viscosity, length and radius of arteries.
    * It proved extremely computationally expensive, often clogged by bottlenecks significantly delaying acquisition of results.
    * Multithreading was implemented with 10 threads.
    * The traditional simulation approach was executed with a matrix size of 1000 through a total of 10 combinations of parameters, around 100,000 iterations totaling with 1,000,000 rows of generated data. The resulting data included both relative and absolute errors, as well as viscosity (Nu) and pressure (P) of blood, length and radius of artery. 
    * The data was written to a file.
    * This approach depends on matrix operations as the backbone of all computations. 
    * As the accuracy required by the situation increases, matrices undergo exponential computational complexity and costs associated with the increased need for accuracy. 
* Rather than an iterative mathematical procedure delivering results, a Machine Learning approximation strategy will be implemented in this practicum project.
    * The data file was parsed, and then loaded to a pandas dataframe for model training.
    * The machine learning model will be trained on viscosity (Nu), pressure (P), length and radius of artery so as to predict the absolute error. 
    * Through implementing a machine learning estimation of the simulation of errors, a significant amount of computational complexity is lifted. 

## Import Libraries

In [2]:
from sklearn.ensemble import StackingRegressor,GradientBoostingRegressor,AdaBoostRegressor,HistGradientBoostingRegressor
from sklearn.ensemble import StackingRegressor, GradientBoostingRegressor, AdaBoostRegressor, HistGradientBoostingRegressor
from sklearn.metrics import mean_squared_error, mean_absolute_error, mean_absolute_percentage_error, r2_score
from sklearn.linear_model import RidgeCV
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np


## Data Processing

In [3]:
df = pd.read_csv("simulation_dataset.csv")

X = df[["Length", "Radius", "Nu", "P"]]
y = df["AvgAbsError"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=62)

## Model Setup
* Stacking strategy: Models chosen through best applicability and performance to purely numerical data.
    * Base Models: 
        * Gradient Boosting Regressor
        * Ada Boosting Regressor
        * Histogram Gradient Boosting Regressor
    * Meta Model:
        * Ridge Regression


In [4]:

baseModels = [
    ("gbr", GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, max_depth=3)),
    ("ada", AdaBoostRegressor(n_estimators=100, learning_rate=0.1)),
    ("hgb", HistGradientBoostingRegressor(max_iter=100, learning_rate=0.1))]

metaModel = RidgeCV()

stacked = StackingRegressor(estimators=baseModels,final_estimator=metaModel,passthrough=False,n_jobs=-1)

stacked.fit(X_train, y_train)


## Generate Predictions

In [5]:
y_pred = stacked.predict(X_test)

mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
meanAbsError = mean_absolute_percentage_error(y_test, y_pred)


## Results

In [6]:
print(f"Mean Squared Error MSE:          {mse:.6e}")
print(f"Root Mean Squared Error RMSE:    {rmse:.6e}")
print(f"Mean Absolute Error MAE:         {mae:.6e}")
print(f"Mean Absolute Percentage Error:  {meanAbsError:.6f}")
print(f"R-Squared Score:                 {r2:.6f}")

Mean Squared Error MSE:          6.692354e-04
Root Mean Squared Error RMSE:    2.586959e-02
Mean Absolute Error MAE:         8.020406e-04
Mean Absolute Percentage Error:  42.146284
R-Squared Score:                 0.312574


* As the Mean Squared Error is very small, the error are very small, hence the predictions are both consistent and close the mathematical estimation.
* As the Mean Absolute Error is very small, less than 0.001 and the Root Mean Squared Error is 0.026, the model strategy can be very much estimate errors and their magnitudes.
* However, the Mean Absolute Percentage Error is 42.15% which contradicts most of the statistics presented here. Most likely due to the mean absolute percentage error calculation procedure: it divides the absolute error by the estimate error and it generates high percentages when the values are very small. Source: https://valuechainplanning.com/blog-details/70

* The R-squared score is 0.31 which suggests improvements to variance explanation on behalf of the model, as there is high numerical variations throughout the target variable.


## Conclusion

### Objective
* The objective was to develop an estimation-based alternative to the traditional error study procedures. Stacked machine learning models were trained on previously calculated error estimates from "Code Development for Flow Simulation" by Christopher Lama and Dr. Arati Pati. 
### Dataset
* The data was acquired from a previous iteration of the project where relative and absolute errors, as well as viscosity (Nu) and pressure (P) of blood, length and radius of artery were written to a text file, which was then parsed to a csv file. It contains 1,000,000 rows of data. 
### Results
* Results indicate the model is well equipped to be implemented in computational error estimations so as to save computational costs.
    * Mean Squared Error MSE:          6.692354e-04
    * Root Mean Squared Error RMSE:    2.586959e-02
    * Mean Absolute Error MAE:         8.020406e-04

    * Due to how the Mean Absolute Percentage Error is calculated, it is especially inaccurate when it comes to very small error metrics and it scored a 42%.

    * The R-Squared Score encourages an improvement to model accuracy through its evaluation of 31%. 
    
### References
* Lama, Christopher. Pati, Arati. "Code Development for Flow Simulation". https://github.com/ChristopherL891123/Research--Code-Development-for-Flow-Simulation
* Lama, Christopher. Thangaraj, Charles. "Study on Errors of Pressure and Viscosity".https://github.com/ChristopherL891123/Research--Study-On-Errors-of-Pressure-and-Viscosity 
* https://scikit-learn.org/stable/modules/generated/sklearn.
ensemble.GradientBoostingClassifier.html
* https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.AdaBoostClassifier.html
* https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.HistGradientBoostingClassifier.html
* https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_absolute_percentage_error.html

