Skip to content

Commit

Permalink
including decision boundary function
Browse files Browse the repository at this point in the history
  • Loading branch information
dvgodoy committed Apr 30, 2018
1 parent 528877a commit 7c5c8e3
Show file tree
Hide file tree
Showing 8 changed files with 7,361 additions and 16,652 deletions.
44 changes: 37 additions & 7 deletions README.md
Expand Up @@ -17,13 +17,14 @@ It contains:

The available visualizations are:
- ***Feature Space***: plot of a 2-D grid representing the twisted and turned feature space, corresponding to the output of a hidden layer (only 2-unit hidden layers supported for now);
- ***Decision Boundary***: plot of a 2-D grid representing the original feature space, together with the decision boundary (only 2-dimensional inputs supported for now);
- ***Probabilities***: histograms of the resulting class probabilities for the inputs, corresponding to the output of the final layer (only binary classification supported for now);
- ***Loss and Metric***: line plot for the loss and a chosen metric, computed over all the inputs;
- ***Losses***: histogram of the losses computed over all the inputs (only binary cross-entropy loss suported for now).

Feature Space | Class Probability | Loss/Metric | Losses
:-:|:-:|:-:|:-:
![Feature Space](/images/feature_space.png) | ![Probability Histogram](/images/prob_histogram.png) | ![Loss and Metric](/images/loss_and_metric.png) | ![Loss Histogram](/images/loss_histogram.png)
Feature Space | Decision Boundary | Class Probability | Loss/Metric | Losses
:-:|:-:|:-:|:-:|:-:
![Feature Space](/images/feature_space.png) | ![Decision Boundary](/images/decision_boundary.png) | ![Probability Histogram](/images/prob_histogram.png) | ![Loss and Metric](/images/loss_and_metric.png) | ![Loss Histogram](/images/loss_histogram.png)

### Google Colab

Expand Down Expand Up @@ -129,17 +130,43 @@ fs.animate().save('feature_space_animation.mp4', dpi=120, writer=writer)

## FAQ

### Grid lines are missing!
### 1. Grid lines are missing!

Does your input have more than 2 dimensions? If so, this is expected, as grid lines are only plot for 2-dimensional inputs.

If your input is 2-dimensional and grid lines are missing nonetheless, please open an [issue](https://github.com/dvgodoy/deepreplay/issues).

### My hidden layer has more than 2 units! How can I plot it anyway?
### 2. My hidden layer has more than 2 units! How can I plot it anyway?

Apart from toy datasets, it is likely the (last) hidden layer has more than 2 units. But ***DeepReplay*** only supports ***FeatureSpace*** plots based on 2-unit hidden layers. So, what can you do?

Well, you can add an extra hidden layer with ***2 units*** and a ***LINEAR*** activation function and tell ****DeepReplay*** to use this layer for plotting the ***FeatureSpace***!
There are two different ways of handling this: if your inputs are 2-dimensional, you can plot them directly, together with the decision boundary. Otherwise, you can (train and) plot 2-dimensional embeddings.

#### 2.1 Using Raw Inputs

Instead of using ***FeatureSpace***, you can use ***DecisionBoundary*** and plot the inputs in their original feature space, with the decision boundary as of any given epoch.

In this case, there is no need to specify any layer, as it will use the raw inputs.

```python
## Input layer has 2 units
## Hidden layer has 10 units
model = Sequential()
model.add(Dense(input_dim=2, units=10, kernel_initializer='he', activation='tanh'))

## Typical output layer for binary classification
model.add(Dense(units=1, kernel_initializer='normal', activation='sigmoid', name='output'))

...

fs = replay.build_decision_boundary(ax_fs)
```

For an example, check the [Circles Dataset](https://github.com/dvgodoy/deepreplay/blob/master/notebooks/circles_dataset.ipynb).

#### 2.2 Using Embeddings

You can add an extra hidden layer with ***2 units*** and a ***LINEAR*** activation function and tell ***DeepReplay*** to use this layer for plotting the ***FeatureSpace***!

```python
## Input layer has 57 units
Expand All @@ -160,7 +187,10 @@ fs = replay.build_feature_space(ax_fs, layer_name='hidden')

By doing so, you will be including a transformation from a highly dimensional space to a 2-dimensional space, which is also going to be learned by the network.

For examples, check either the [Circles Dataset](https://github.com/dvgodoy/deepreplay/blob/master/notebooks/circles_dataset.ipynb) or [UCI Spambase Dataset](https://github.com/dvgodoy/deepreplay/blob/master/notebooks/UCI_spambase_dataset.ipynb) notebooks.
In fact, you will be also training 2-dimensional embeddings, which will then feed the last layer. You can think of this as a logistic regression with 2 inputs, in this case, the embeddings.

For examples, check either the [Moons Dataset](https://github.com/dvgodoy/deepreplay/blob/master/notebooks/moons_dataset.ipynb) or [UCI Spambase Dataset](https://github.com/dvgodoy/deepreplay/blob/master/notebooks/UCI_spambase_dataset.ipynb) notebooks.


## Comments, questions, suggestions, bugs

Expand Down
2 changes: 1 addition & 1 deletion deepreplay/plot.py
Expand Up @@ -6,7 +6,7 @@
import seaborn as sns
from collections import namedtuple
from matplotlib import animation
matplotlib.rcParams['animation.writer'] = 'ffmpeg'
matplotlib.rcParams['animation.writer'] = 'avconv'
sns.set_style('white')

FeatureSpaceData = namedtuple('FeatureSpaceData', ['line', 'bent_line', 'prediction', 'target'])
Expand Down
98 changes: 98 additions & 0 deletions deepreplay/replay.py
Expand Up @@ -97,11 +97,13 @@ def __init__(self, replay_filename, group_name, model_filename=''):
self._loss_hist_data = None
self._loss_and_metric_data = None
self._prob_hist_data = None
self._decision_boundary_data = None
# Attributes for the visualizations - Plot objects
self._feature_space_plot = None
self._loss_hist_plot = None
self._loss_and_metric_plot = None
self._prob_hist_plot = None
self._decision_boundary_plot = None

def _retrieve_weights(self):
# Generates ranges for the number of different weight arrays in each layer
Expand All @@ -124,6 +126,10 @@ def _make_function(self, inputs, layer):
def _predict_proba(self, inputs, weights):
return self._get_output([self.learning_phase, inputs] + weights)

@property
def decision_boundary(self):
return self._decision_boundary_plot, self._decision_boundary_data

@property
def feature_space(self):
return self._feature_space_plot, self._feature_space_data
Expand Down Expand Up @@ -328,6 +334,98 @@ def build_probability_histogram(self, ax_negative, ax_positive, epoch_start=0, e
self._prob_hist_plot = ProbabilityHistogram(ax_negative, ax_positive).load_data(self._prob_hist_data)
return self._prob_hist_plot

def build_decision_boundary(self, ax, contour_points=1000, xlim=(-1, 1), ylim=(-1, 1), display_grid=True,
epoch_start=0, epoch_end=-1):
"""Builds a FeatureSpace object to be used for plotting and
animating the raw inputs and the decision boundary.
The underlying data, that is, grid lines, inputs and contour
lines, as well as the corresponding predictions for the
contour lines, can be later accessed as the second element of
the `decision_boundary` property.
Only inputs with 2 dimensions are supported!
Parameters
----------
ax: AxesSubplot
Subplot of a Matplotlib figure.
contour_points: int, optional
Number of points in each axis of the contour.
Default is 1,000.
xlim: tuple of ints, optional
Boundaries for the X axis of the grid.
ylim: tuple of ints, optional
Boundaries for the Y axis of the grid.
display_grid: boolean, optional
If True, display grid lines (for 2-dimensional inputs).
Default is True.
epoch_start: int, optional
First epoch to consider.
epoch_end: int, optional
Last epoch to consider.
Returns
-------
decision_boundary_plot: FeatureSpace
An instance of a FeatureSpace object to make plots and
animations.
"""
input_dims = self.model.input_shape[-1]
assert input_dims == 2, 'Only layers with 2-dimensional inputs are supported!'

if epoch_end == -1:
epoch_end = self.n_epochs
epoch_end = min(epoch_end, self.n_epochs)

X = self.inputs
y = self.targets

y_ind = y.squeeze().argsort()
X = X.squeeze()[y_ind].reshape(X.shape)
y = y.squeeze()[y_ind]

n_classes = len(np.unique(y))

# Builds a 2D grid and the corresponding contour coordinates
grid_lines = np.array([])
if display_grid:
grid_lines = build_2d_grid(xlim, ylim)

contour_lines = build_2d_grid(xlim, ylim, contour_points, contour_points)
get_predictions = self._make_function(self.model.inputs, self.model.layers[-1])

bent_lines = []
bent_inputs = []
bent_contour_lines = []
bent_preds = []
# For each epoch, uses the corresponding weights
for epoch in range(epoch_start, epoch_end + 1):
weights = self.weights[epoch]

bent_lines.append(grid_lines)
bent_inputs.append(X)
bent_contour_lines.append(contour_lines)

inputs = [TEST_MODE, contour_lines.reshape(-1, 2)] + weights
output_shape = (contour_lines.shape[:2]) + (-1,)
# Makes predictions for each point in the contour surface
bent_preds.append((get_predictions(inputs=inputs)[0].reshape(output_shape) > .5).astype(np.int))

# Makes lists into ndarrays and wrap them as namedtuples
bent_inputs = np.array(bent_inputs)
bent_lines = np.array(bent_lines)
bent_contour_lines = np.array(bent_contour_lines)
bent_preds = np.array(bent_preds)

line_data = FeatureSpaceLines(grid=grid_lines, input=X, contour=contour_lines)
bent_line_data = FeatureSpaceLines(grid=bent_lines, input=bent_inputs, contour=bent_contour_lines)
self._decision_boundary_data = FeatureSpaceData(line=line_data, bent_line=bent_line_data,
prediction=bent_preds, target=y)

# Creates a FeatureSpace plot object and load data into it
self._decision_boundary_plot = FeatureSpace(ax, True).load_data(self._decision_boundary_data)
return self._decision_boundary_plot

def build_feature_space(self, ax, layer_name, contour_points=1000, xlim=(-1, 1), ylim=(-1, 1), scale_fixed=True,
display_grid=True, epoch_start=0, epoch_end=-1):
"""Builds a FeatureSpace object to be used for plotting and
Expand Down
17 changes: 6 additions & 11 deletions examples/circles_dataset.py
Expand Up @@ -15,7 +15,7 @@

X, y = make_circles(n_samples=2000, random_state=27, noise=0.03)

sgd = SGD(lr=0.01)
sgd = SGD(lr=0.02)

he_initializer = he_normal(seed=42)
normal_initializer = normal(seed=42)
Expand All @@ -30,10 +30,6 @@
model.add(Dense(units=3,
kernel_initializer=he_initializer))
model.add(Activation('relu'))
model.add(Dense(units=2,
kernel_initializer=normal_initializer,
activation='linear',
name='hidden'))
model.add(Dense(units=1,
kernel_initializer=normal_initializer,
activation='sigmoid',
Expand All @@ -43,7 +39,7 @@
optimizer=sgd,
metrics=['acc'])

model.fit(X, y, epochs=300, batch_size=16, callbacks=[replaydata])
model.fit(X, y, epochs=200, batch_size=16, callbacks=[replaydata])

replay = Replay(replay_filename='circles_dataset.h5', group_name=group_name)

Expand All @@ -54,14 +50,13 @@
ax_lm = plt.subplot2grid((2, 4), (0, 3))
ax_lh = plt.subplot2grid((2, 4), (1, 3))

fs = replay.build_feature_space(ax_fs, layer_name='hidden',
display_grid=False, scale_fixed=False)
fs = replay.build_decision_boundary(ax_fs, xlim=(-1.5, 1.5), ylim=(-1.5, 1.5))
ph = replay.build_probability_histogram(ax_ph_neg, ax_ph_pos)
lh = replay.build_loss_histogram(ax_lh)
lm = replay.build_loss_and_metric(ax_lm, 'acc')

sample_figure = compose_plots([fs, ph, lm, lh], 280)
sample_figure.savefig('circles.png', dpi=120, format='png')
sample_figure = compose_plots([fs, ph, lm, lh], 150)
sample_figure.savefig('circles_db.png', dpi=120, format='png')

sample_anim = compose_animations([fs, ph, lm, lh])
sample_anim.save(filename='circles.mp4', dpi=120, fps=5)
sample_anim.save(filename='circles_db.mp4', dpi=120, fps=5)
56 changes: 56 additions & 0 deletions examples/comparison_activation_functions.py
@@ -0,0 +1,56 @@
from keras.layers import Dense
from keras.models import Sequential
from keras.optimizers import SGD
from keras.initializers import glorot_normal, normal

from deepreplay.datasets.parabola import load_data
from deepreplay.callbacks import ReplayData
from deepreplay.replay import Replay
from deepreplay.plot import compose_animations, compose_plots

import matplotlib.pyplot as plt

X, y = load_data()

sgd = SGD(lr=0.05)

for activation in ['sigmoid', 'tanh', 'relu']:
glorot_initializer = glorot_normal(seed=42)
normal_initializer = normal(seed=42)

replaydata = ReplayData(X, y, filename='comparison_activation_functions.h5', group_name=activation)

model = Sequential()
model.add(Dense(input_dim=2,
units=2,
kernel_initializer=glorot_initializer,
activation=activation,
name='hidden'))

model.add(Dense(units=1,
kernel_initializer=normal_initializer,
activation='sigmoid',
name='output'))

model.compile(loss='binary_crossentropy',
optimizer=sgd,
metrics=['acc'])

model.fit(X, y, epochs=150, batch_size=16, callbacks=[replaydata])

fig, axs = plt.subplots(1, 3, figsize=(12, 4))

replays = []
for activation in ['sigmoid', 'tanh', 'relu']:
replays.append(Replay(replay_filename='comparison_activation_functions.h5', group_name=activation))

spaces = []
for ax, replay in zip(axs, replays):
spaces.append(replay.build_feature_space(ax, layer_name='hidden'))

sample_figure = compose_plots(spaces, 80)
sample_figure.savefig('comparison.png', dpi=120, format='png')

sample_anim = compose_animations(spaces)
sample_anim.save(filename='comparison.mp4', dpi=120, fps=5)

Binary file added images/decision_boundary.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
23,794 changes: 7,162 additions & 16,632 deletions notebooks/circles_dataset.ipynb

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion setup.py
Expand Up @@ -5,7 +5,7 @@ def readme():
return f.read()

setup(name='deepreplay',
version='0.1.0a5',
version='0.1.0a6',
install_requires=['matplotlib', 'numpy', 'h5py', 'seaborn', 'keras', 'sklearn'],
description='"Hyper-parameters in Action!" visualizing tool for Keras models.',
long_description=readme(),
Expand Down

0 comments on commit 7c5c8e3

Please sign in to comment.