# Individual Project
Michael Labarge

---

# Repository Info

Title: SPLINTER (Spline Interpolation)

Authors: Bjarne Grimstad and others

Link: https://github.com/bgrimstad/splinter

# Package Goals

This is a general purpose package which solves multivariate function approximation with splines. Approximations are represented via the tensor product B-spline, consisting of a piecewise polynomial basis.

Applications of this package include approximation, regression, data smoothing, reduction, and many more.

The overall intended goal of this package is to achieve fast SPLINTER in nonlinear optimization, for use in petroleum production optimization.

# Stakeholder Information

Developed by: Solution Seeker (CTO - Bjarne Grimstad)

The software has been made available for public use, there is no charge for use in academic research or publications. They only ask that you cite them in your work.

They're looking to create a fast nonlinear optimizer using SPLINTER, and for their purposes, this package will eventually be used to optimize petroleum production. This could impact the oil industry very positively, as more efficient methods can be developed for producing specific, high-purity chain-length products and less of the byproducts typically associated with petroleum production.

Communication within this project takes place in the issues tab in GitHub. Over 100 issues are present, relating to new ideas and potential bugs in the program. 

# Metrics and Features

Accuracy appears as a choice in splines. We can choose linear, quadratic, cubic or higher order B-splines, depending on what we think our data is doing. 

B-splines are stable and well conditioned for up to approximately 20 variables. This package limits the sample size when using B-splines to about 100,000, which effectively limits the number of variables to about 5-6. Therefore, this package uses a well-conditioned, stable algorithm in nearly every case. 

Cost appears in the choice of B vs. P-splines. P-splines are much more demanding to solve, and effectively limits the sample size to about 10,000. The P-splines are used to smooth the data, which causes this decrease in scope.

For non-linear modeling, some choices are made by the program, which determine important features of a give dataset. This can allow the package to solve a system of more than 6 variables, by determining important features and the variables that retain those features when others are dropped. This is a choice deliberately made to keep the algorithm well conditioned and stable, even when the problem at hand may not be. 

Imagine that we want to separate three varying length hydrocarbons in a distillation column. The concentration of each species in gaseous form and liquid form is dependent on pressure and temperature. Therefore, we have 5 variables. A surface exists, relating T, P, and species concentration in the liquid phase. All possible combinations of gas/liquid concentrations can be described in this manner. By using this package to accurately determine the shape of this surface, it is possible to map a shortest path between feed and product. This knowledge is of the utmost importance for designing the systems to separate these components efficiently. For stakeholders, we might present a well defined ternary interaction plot, and then show that the package comes within a certain tolerance of this "surface" whilst only needing approximately 500 experimentally determined points.  

# Example in 3D

First, we will look at a comparison between accuracy and cost for the unchanged example, where cost is the basis used to solve the problem.



# Original Surface Generated by Rosenbrock Function

![N5-Original.png](attachment:5982f4aa-a2f9-47d6-b21f-c08cd0f6a047.png)

# Generated Surfaces (Left) and Associated Error (Right) - Linear, Quadratic, Cubic, Quartic

![N5-Generated.png](attachment:3b0ccd46-b4ab-4f16-9303-e511cc7f30a1.png)
![N5-Error.png](attachment:25ea9791-ce20-4d5c-887a-db0fbe1281f9.png)

Note that quartic error is very small (E-12) with only 25 samples!

Now we will take a look at what happens when we quadruple the sample size - note that the runtime is nearly 4x as well.

# Original Surface Generated by Rosenbrock Function

![N10-Original.png](attachment:e1c98acf-1b97-4654-9429-ccd02d5b0400.png)

# Generated Surfaces (Left) and Associated Error (Right) - Linear, Quadratic, Cubic, Quartic

![N10-Generated.png](attachment:5196df57-7a31-4ce5-b7b3-9e22f4cdd1b9.png)
![N10-Error.png](attachment:5877c4ed-d0ae-4146-a6c8-61808623c3b0.png)

Now we can see that linear error has dropped by about 5 fold, Quadratic by nearly 10 fold, and Cubic by 25 fold! Quartic error is nearly eliminated except for areas with a large gradient.

# Questions About the Software

In non-linear systems, how does the program decide what features of a given surface are important? 

What tolerances are used when dropping variables out to determine if the surface is still representative?

# Experiment Proposal

It would be very interesting to apply this software to a chemical reactor system, where kinetic equations are well defined and exist in the polynomial basis. Concentration profiles are available for a wide variety of systems, so there would be a lot of data to choose from. 

I would like to test the applicability of this software to various systems, such as zero to second order chemical reactions, and possibly systems with multiple reactions. Specifically, for reversible reactions, this type of surface visualization is imperative to designing efficient reactor conditions. Therefore, using existing chemical data, I would like to set a baseline accuracy and determine the number of samples required to represent these systems. 