Greenbox: a module for sensitivity analysis and model emulation

Greenbox is a Python module for Monte Carlo three-point-estimate sensitivity analysis and model emulation in Excel. It relies on xlwings to operate Excel from within Python. To install, clone this repository (or just download greenbox.py and requirements.txt and do

pip install -r requirements.txt

Sensitivity analysis

As a demonstration, we'll study the "range of expression" of a simple neural network:

Here, _x1 and _x2 are fixed inputs; _a1,_a2 are weights leading into the intermediate unit _a, _b1,_b2 are weights leading into the intermediate unit _b and finally _y is linked to _a and _b by the weights _wa, _wb. A more realistic use case, of course, are those financial spreadsheets that grow by accretion of consensus and that can't really be developed by alternate methodologies (such as Jupyter notebooks). Our goal is to specify probability distributions for the inputs such as

and arrive at a result such as this:

as well as a spreadsheet containing histogram data for plotting directly in Excel, the full set of sample inputs and resulting outputs and some summary statistics.

To use Greenbox, first give names to the cells you want to change (inputs) or watch (outputs)

Then define two named ranges named greenbox and bluebox. The first will have five columns as follows:

The first column is the name of the cell that will be stressed; the next three columns are minimum, central and maximum estimates and the last is a concentration parameter. The larger this parameter, the more the distribution is concentrated around its central value. $\kappa = 2$ gives the uniform distribution; $\kappa=1$ gives the Jeffrey's distribution where the extremes are more likely than the center.

The bluebox is simply a list of outputs.

Python will actually only read the first column; the second says =INDIRECT() to the cell name so we can watch the numbers shuffle in one place, but that's optional.

To run simulations, we import the Greenbox and Bluebox classes from the greenbox module and use them in script mode as follows:

from greenbox import *
import xlwings as xw

n_samples = 500
print('{} samples'.format(n_samples))

# we pass an object that supports the .Range() method. The simples is `xw` itself,
# but this requires the target spreadsheet to remain in Excel focus all the time
greenbox = Greenbox(xw)

# notionally a greenbox could correspond to a number of blueboxes
# we can even pass a separate .Range-supporting object using the  `excel_ref` parameter at init.
bluebox = Bluebox(greenbox)

# this does the heavy lifting.
bluebox.sample(n_samples)

# this saves histogram plots in png format for the greenbox (input) and bluebox (output) variables.
bluebox.plot()
# on an environment such as Jupyter you can pass save=False.

# this makes an excel spreadsheet
bluebox.to_excel(filename = 'sensitivity.xlsx)

(Note that realistically you should be sampling ~1K for this problem size and ~10K for any kind of complex model with >5 random inputs).

from greenbox import *
import xlwings as xw

n_samples = 500

greenbox = Greenbox(xw)
bluebox = Bluebox(greenbox)

bluebox.sample(n_samples)

bluebox.plot(inputs = False, outputs = True, save = False)

Model emulation

The idea of model emulation is to approximately replicate a spreadsheet that's too complex to understand with a statistical or machine learning model that's too opaque to understand. (Sad trombone) Here we will approximate a spreadsheet by a decision tree that is sort of visually inspectable.

from greenbox import *
import xlwings as xw
n_samples = 10000
greenbox = Greenbox(xw)
bluebox = Bluebox(greenbox)
bluebox.sample(n_samples)

from sklearn.tree import DecisionTreeRegressor
tree = DecisionTreeRegressor()
tree.fit(X=bluebox.input_samples, y = bluebox.outcomes['_y'])
export_graphviz(decision_tree = tree, out_file="test.dot", feature_names = bluebox.input_names,
            label = None, impurity = False, node_ids = False, rounded = True, filled = True)

The main idea in this code snippet is that input samples and outcomes are saved in the bluebox object as the input_samples and outcomes attributes, each of which is a pandas DataFrame. Here are the results.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
LICENSE		LICENSE
README.md		README.md
bluebox.png		bluebox.png
cell_names.png		cell_names.png
greenbox.png		greenbox.png
greenbox.py		greenbox.py
input _a1.png		input _a1.png
input _a2.png		input _a2.png
input _b1.png		input _b1.png
input _b2.png		input _b2.png
input _wa.png		input _wa.png
input _wb.png		input _wb.png
net1.png		net1.png
output _a.png		output _a.png
output _b.png		output _b.png
output _y.png		output _y.png
requirements.txt		requirements.txt
test.py		test.py
tree.png		tree.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Greenbox: a module for sensitivity analysis and model emulation

Sensitivity analysis

Model emulation

About

Releases

Packages

Languages

License

asemic-horizon/stanton

Folders and files

Latest commit

History

Repository files navigation

Greenbox: a module for sensitivity analysis and model emulation

Sensitivity analysis

Model emulation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages