ForestfortheTrees

This library generates visual explanations of Gradient Boosting models. I recommend you jump in through the Interactive Notebook on Binder. This interactive Jupyter notebook is an Explainable that showcases the value of the library and provides sample code.

You can see more of my work on my website, or check out the presentation I gave at VISxAI 2019. My presentation even got mentioned in Uncharted's VIS 2019 Highlights!

Installation

Alternatively, you can run notebook.ipynb locally by cloning the repository and then performing the following:

Navigate into the package directory. cd ForestForTheTrees
Install the conda environment. conda env create binder/environment.yml
Activate the conda environment. conda activate ForestForTheTrees
Run the postBuild script (this installs the appropriate jupyterlab extension required to display interactive widgets) bash binder/postBuild or just run this command directly jupyter labextension install @jupyter-widgets/jupyterlab-manager.
Fire up Jupyter Lab, run all cells and begin interacting with the notebook. jupyter lab notebook.ipynb

Note that a recent version of Jupyter Lab (included in the environment) is required to run this notebook - Jupyter notebooks will not work (at least out of the box). This is due to some peculiarities in the interaction of Altair, ipywidgets, and Jupyter.

I recommend running all cells as soon as the notebook is opened. Due to the nature of the interactive widgets, it is not possible to save the state, so the notebook is saved without output. If you are perusing the full document, each cell will have run by the time you get to it. This applies whether viewing locally or via Binder.

Usage

As mentioned above, the best way to get a sense of how Gradient Boosting models can be explained with ForestForTheTrees is to run the Binder link above. To get started quickly, adapt the minimal example below:

#load dataset
dataset_df = pd.read_csv("Some_file.csv")
target_column = "Target"  #the value to predict

#build model
model = GradientBoostingRegressor(
    n_estimators = 100
)

#fit model
model.fit(
    dataset_df.drop(target_column, axis = 1),
    dataset_df.loc[:,target_column]
) #you should build a good model here using train/test split

#initialize ForestForTheTrees with dataset, model, and target
f2t = ft.ForestForTheTrees(
    dataset = dataset_df, #pass bike instead to use the sample dataset
    model = model,
    target_col = "Ridership"
)

#extract the underlying structure of the model
#this must be called before displaying the visual explanation
f2t.extract_components()

#output the visual explanation at the selected fidelity
f2t.explain(
    fidelity_threshold = .95
)

Development

This library is under active development - please review the Issues tab for current priorities. Feature requests and bug reports are welcomed! If you find this library useful, please feel free to message me and let me know how it went.

Developed using Python and the Python data science stack, particularly numpy, pandas, and scikit-learn. Altair was used for data visualization.

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
VISxAI presentation		VISxAI presentation
binder		binder
data		data
notebook_resources		notebook_resources
readme_resources		readme_resources
.gitignore		.gitignore
ForestForTheTrees.py		ForestForTheTrees.py
LICENSE		LICENSE
README.md		README.md
notebook.ipynb		notebook.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ForestfortheTrees

Installation

Usage

Development

About

Releases

Packages

Languages

License

MattJBritton/ForestfortheTrees

Folders and files

Latest commit

History

Repository files navigation

ForestfortheTrees

Installation

Usage

Development

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages