Skip to content

Commit

Permalink
Merge pull request #7 from dvgodoy/boundaries
Browse files Browse the repository at this point in the history
Boundaries
  • Loading branch information
dvgodoy committed May 2, 2018
2 parents 528877a + 1f9f937 commit afdc435
Show file tree
Hide file tree
Showing 8 changed files with 7,385 additions and 16,654 deletions.
48 changes: 39 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,14 +16,15 @@ It contains:
- a class ***Replay***, which leverages the collected data to build several kinds of visualizations.

The available visualizations are:
- ***Feature Space***: plot of a 2-D grid representing the twisted and turned feature space, corresponding to the output of a hidden layer (only 2-unit hidden layers supported for now);
- ***Probabilities***: histograms of the resulting class probabilities for the inputs, corresponding to the output of the final layer (only binary classification supported for now);
- ***Feature Space***: plot representing the twisted and turned feature space, corresponding to the output of a hidden layer (only 2-unit hidden layers supported for now), including grid lines if the input is 2-dimensional;
- ***Decision Boundary***: plot of a 2-D grid representing the original feature space, together with the decision boundary (only 2-dimensional inputs supported for now);
- ***Probabilities***: two histograms of the resulting classification probabilities for the inputs, corresponding to the output of the final layer (only binary classification supported for now);
- ***Loss and Metric***: line plot for the loss and a chosen metric, computed over all the inputs;
- ***Losses***: histogram of the losses computed over all the inputs (only binary cross-entropy loss suported for now).

Feature Space | Class Probability | Loss/Metric | Losses
:-:|:-:|:-:|:-:
![Feature Space](/images/feature_space.png) | ![Probability Histogram](/images/prob_histogram.png) | ![Loss and Metric](/images/loss_and_metric.png) | ![Loss Histogram](/images/loss_histogram.png)
Feature Space | Decision Boundary | Class Probability | Loss/Metric | Losses
:-:|:-:|:-:|:-:|:-:
![Feature Space](/images/feature_space.png) | ![Decision Boundary](/images/decision_boundary.png) | ![Probability Histogram](/images/prob_histogram.png) | ![Loss and Metric](/images/loss_and_metric.png) | ![Loss Histogram](/images/loss_histogram.png)

### Google Colab

Expand Down Expand Up @@ -129,17 +130,43 @@ fs.animate().save('feature_space_animation.mp4', dpi=120, writer=writer)

## FAQ

### Grid lines are missing!
### 1. Grid lines are missing!

Does your input have more than 2 dimensions? If so, this is expected, as grid lines are only plot for 2-dimensional inputs.

If your input is 2-dimensional and grid lines are missing nonetheless, please open an [issue](https://github.com/dvgodoy/deepreplay/issues).

### My hidden layer has more than 2 units! How can I plot it anyway?
### 2. My hidden layer has more than 2 units! How can I plot it anyway?

Apart from toy datasets, it is likely the (last) hidden layer has more than 2 units. But ***DeepReplay*** only supports ***FeatureSpace*** plots based on 2-unit hidden layers. So, what can you do?

Well, you can add an extra hidden layer with ***2 units*** and a ***LINEAR*** activation function and tell ****DeepReplay*** to use this layer for plotting the ***FeatureSpace***!
There are two different ways of handling this: if your inputs are 2-dimensional, you can plot them directly, together with the decision boundary. Otherwise, you can (train and) plot 2-dimensional latent space.

#### 2.1 Using Raw Inputs

Instead of using ***FeatureSpace***, you can use ***DecisionBoundary*** and plot the inputs in their original feature space, with the decision boundary as of any given epoch.

In this case, there is no need to specify any layer, as it will use the raw inputs.

```python
## Input layer has 2 units
## Hidden layer has 10 units
model = Sequential()
model.add(Dense(input_dim=2, units=10, kernel_initializer='he', activation='tanh'))

## Typical output layer for binary classification
model.add(Dense(units=1, kernel_initializer='normal', activation='sigmoid', name='output'))

...

fs = replay.build_decision_boundary(ax_fs)
```

For an example, check the [Circles Dataset](https://github.com/dvgodoy/deepreplay/blob/master/notebooks/circles_dataset.ipynb).

#### 2.2 Using a Latent Space

You can add an extra hidden layer with ***2 units*** and a ***LINEAR*** activation function and tell ***DeepReplay*** to use this layer for plotting the ***FeatureSpace***!

```python
## Input layer has 57 units
Expand All @@ -160,7 +187,10 @@ fs = replay.build_feature_space(ax_fs, layer_name='hidden')

By doing so, you will be including a transformation from a highly dimensional space to a 2-dimensional space, which is also going to be learned by the network.

For examples, check either the [Circles Dataset](https://github.com/dvgodoy/deepreplay/blob/master/notebooks/circles_dataset.ipynb) or [UCI Spambase Dataset](https://github.com/dvgodoy/deepreplay/blob/master/notebooks/UCI_spambase_dataset.ipynb) notebooks.
In fact, the model will be learning a 2-dimensional latent space, which will then feed the last layer. You can think of this as a logistic regression with 2 inputs, in this case, the latent factors.

For examples, check either the [Moons Dataset](https://github.com/dvgodoy/deepreplay/blob/master/notebooks/moons_dataset.ipynb) or [UCI Spambase Dataset](https://github.com/dvgodoy/deepreplay/blob/master/notebooks/UCI_spambase_dataset.ipynb) notebooks.


## Comments, questions, suggestions, bugs

Expand Down
24 changes: 22 additions & 2 deletions deepreplay/plot.py
Original file line number Diff line number Diff line change
Expand Up @@ -150,14 +150,15 @@ def compose_plots(objects, epoch, title=''):
title += ' - '
fig.suptitle('{}Epoch {}'.format(title, epoch), fontsize=14)
fig.tight_layout()
fig.subplots_adjust(top=0.9)
fig.subplots_adjust(top=0.85)
return fig

class Basic(object):
"""Basic plot class, NOT to be instantiated directly.
"""
def __init__(self, ax):
self._title = ''
self._custom_title = ''
self.n_epochs = 0

self.ax = ax
Expand All @@ -166,7 +167,11 @@ def __init__(self, ax):

@property
def title(self):
return self._title if isinstance(self._title, tuple) else (self._title,)
title = self._title
if not isinstance(title, tuple):
title = (self._title,)
title = tuple([' '.join([self._custom_title, t]) for t in title])
return title

@property
def axes(self):
Expand All @@ -183,6 +188,20 @@ def _prepare_plot(self):
def _update(i, object, epoch_start=0):
pass

def set_title(self, title):
"""Prepends a custom title to the plot.
Parameters
----------
title: String
Custom title to prepend.
Returns
-------
None
"""
self._custom_title = title

def plot(self, epoch):
"""Plots data at a given epoch.
Expand Down Expand Up @@ -353,6 +372,7 @@ class ProbabilityHistogram(Basic):
"""
def __init__(self, ax1, ax2):
self._title = ('Negative Cases', 'Positive Cases')
self._custom_title = ''
self.ax1 = ax1
self.ax2 = ax2
self.ax1.clear()
Expand Down
98 changes: 98 additions & 0 deletions deepreplay/replay.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,11 +97,13 @@ def __init__(self, replay_filename, group_name, model_filename=''):
self._loss_hist_data = None
self._loss_and_metric_data = None
self._prob_hist_data = None
self._decision_boundary_data = None
# Attributes for the visualizations - Plot objects
self._feature_space_plot = None
self._loss_hist_plot = None
self._loss_and_metric_plot = None
self._prob_hist_plot = None
self._decision_boundary_plot = None

def _retrieve_weights(self):
# Generates ranges for the number of different weight arrays in each layer
Expand All @@ -124,6 +126,10 @@ def _make_function(self, inputs, layer):
def _predict_proba(self, inputs, weights):
return self._get_output([self.learning_phase, inputs] + weights)

@property
def decision_boundary(self):
return self._decision_boundary_plot, self._decision_boundary_data

@property
def feature_space(self):
return self._feature_space_plot, self._feature_space_data
Expand Down Expand Up @@ -328,6 +334,98 @@ def build_probability_histogram(self, ax_negative, ax_positive, epoch_start=0, e
self._prob_hist_plot = ProbabilityHistogram(ax_negative, ax_positive).load_data(self._prob_hist_data)
return self._prob_hist_plot

def build_decision_boundary(self, ax, contour_points=1000, xlim=(-1, 1), ylim=(-1, 1), display_grid=True,
epoch_start=0, epoch_end=-1):
"""Builds a FeatureSpace object to be used for plotting and
animating the raw inputs and the decision boundary.
The underlying data, that is, grid lines, inputs and contour
lines, as well as the corresponding predictions for the
contour lines, can be later accessed as the second element of
the `decision_boundary` property.
Only inputs with 2 dimensions are supported!
Parameters
----------
ax: AxesSubplot
Subplot of a Matplotlib figure.
contour_points: int, optional
Number of points in each axis of the contour.
Default is 1,000.
xlim: tuple of ints, optional
Boundaries for the X axis of the grid.
ylim: tuple of ints, optional
Boundaries for the Y axis of the grid.
display_grid: boolean, optional
If True, display grid lines (for 2-dimensional inputs).
Default is True.
epoch_start: int, optional
First epoch to consider.
epoch_end: int, optional
Last epoch to consider.
Returns
-------
decision_boundary_plot: FeatureSpace
An instance of a FeatureSpace object to make plots and
animations.
"""
input_dims = self.model.input_shape[-1]
assert input_dims == 2, 'Only layers with 2-dimensional inputs are supported!'

if epoch_end == -1:
epoch_end = self.n_epochs
epoch_end = min(epoch_end, self.n_epochs)

X = self.inputs
y = self.targets

y_ind = y.squeeze().argsort()
X = X.squeeze()[y_ind].reshape(X.shape)
y = y.squeeze()[y_ind]

n_classes = len(np.unique(y))

# Builds a 2D grid and the corresponding contour coordinates
grid_lines = np.array([])
if display_grid:
grid_lines = build_2d_grid(xlim, ylim)

contour_lines = build_2d_grid(xlim, ylim, contour_points, contour_points)
get_predictions = self._make_function(self.model.inputs, self.model.layers[-1])

bent_lines = []
bent_inputs = []
bent_contour_lines = []
bent_preds = []
# For each epoch, uses the corresponding weights
for epoch in range(epoch_start, epoch_end + 1):
weights = self.weights[epoch]

bent_lines.append(grid_lines)
bent_inputs.append(X)
bent_contour_lines.append(contour_lines)

inputs = [TEST_MODE, contour_lines.reshape(-1, 2)] + weights
output_shape = (contour_lines.shape[:2]) + (-1,)
# Makes predictions for each point in the contour surface
bent_preds.append((get_predictions(inputs=inputs)[0].reshape(output_shape) > .5).astype(np.int))

# Makes lists into ndarrays and wrap them as namedtuples
bent_inputs = np.array(bent_inputs)
bent_lines = np.array(bent_lines)
bent_contour_lines = np.array(bent_contour_lines)
bent_preds = np.array(bent_preds)

line_data = FeatureSpaceLines(grid=grid_lines, input=X, contour=contour_lines)
bent_line_data = FeatureSpaceLines(grid=bent_lines, input=bent_inputs, contour=bent_contour_lines)
self._decision_boundary_data = FeatureSpaceData(line=line_data, bent_line=bent_line_data,
prediction=bent_preds, target=y)

# Creates a FeatureSpace plot object and load data into it
self._decision_boundary_plot = FeatureSpace(ax, True).load_data(self._decision_boundary_data)
return self._decision_boundary_plot

def build_feature_space(self, ax, layer_name, contour_points=1000, xlim=(-1, 1), ylim=(-1, 1), scale_fixed=True,
display_grid=True, epoch_start=0, epoch_end=-1):
"""Builds a FeatureSpace object to be used for plotting and
Expand Down
13 changes: 4 additions & 9 deletions examples/circles_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@

X, y = make_circles(n_samples=2000, random_state=27, noise=0.03)

sgd = SGD(lr=0.01)
sgd = SGD(lr=0.02)

he_initializer = he_normal(seed=42)
normal_initializer = normal(seed=42)
Expand All @@ -30,10 +30,6 @@
model.add(Dense(units=3,
kernel_initializer=he_initializer))
model.add(Activation('relu'))
model.add(Dense(units=2,
kernel_initializer=normal_initializer,
activation='linear',
name='hidden'))
model.add(Dense(units=1,
kernel_initializer=normal_initializer,
activation='sigmoid',
Expand All @@ -43,7 +39,7 @@
optimizer=sgd,
metrics=['acc'])

model.fit(X, y, epochs=300, batch_size=16, callbacks=[replaydata])
model.fit(X, y, epochs=200, batch_size=16, callbacks=[replaydata])

replay = Replay(replay_filename='circles_dataset.h5', group_name=group_name)

Expand All @@ -54,13 +50,12 @@
ax_lm = plt.subplot2grid((2, 4), (0, 3))
ax_lh = plt.subplot2grid((2, 4), (1, 3))

fs = replay.build_feature_space(ax_fs, layer_name='hidden',
display_grid=False, scale_fixed=False)
fs = replay.build_decision_boundary(ax_fs, xlim=(-1.5, 1.5), ylim=(-1.5, 1.5))
ph = replay.build_probability_histogram(ax_ph_neg, ax_ph_pos)
lh = replay.build_loss_histogram(ax_lh)
lm = replay.build_loss_and_metric(ax_lm, 'acc')

sample_figure = compose_plots([fs, ph, lm, lh], 280)
sample_figure = compose_plots([fs, ph, lm, lh], 150)
sample_figure.savefig('circles.png', dpi=120, format='png')

sample_anim = compose_animations([fs, ph, lm, lh])
Expand Down
58 changes: 58 additions & 0 deletions examples/comparison_activation_functions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
from keras.layers import Dense
from keras.models import Sequential
from keras.optimizers import SGD
from keras.initializers import glorot_normal, normal

from deepreplay.datasets.parabola import load_data
from deepreplay.callbacks import ReplayData
from deepreplay.replay import Replay
from deepreplay.plot import compose_animations, compose_plots

import matplotlib.pyplot as plt

X, y = load_data()

sgd = SGD(lr=0.05)

for activation in ['sigmoid', 'tanh', 'relu']:
glorot_initializer = glorot_normal(seed=42)
normal_initializer = normal(seed=42)

replaydata = ReplayData(X, y, filename='comparison_activation_functions.h5', group_name=activation)

model = Sequential()
model.add(Dense(input_dim=2,
units=2,
kernel_initializer=glorot_initializer,
activation=activation,
name='hidden'))

model.add(Dense(units=1,
kernel_initializer=normal_initializer,
activation='sigmoid',
name='output'))

model.compile(loss='binary_crossentropy',
optimizer=sgd,
metrics=['acc'])

model.fit(X, y, epochs=150, batch_size=16, callbacks=[replaydata])

fig, axs = plt.subplots(1, 3, figsize=(12, 4))

replays = []
for activation in ['sigmoid', 'tanh', 'relu']:
replays.append(Replay(replay_filename='comparison_activation_functions.h5', group_name=activation))

spaces = []
for ax, replay, activation in zip(axs, replays, ['sigmoid', 'tanh', 'relu']):
space = replay.build_feature_space(ax, layer_name='hidden')
space.set_title(activation)
spaces.append(space)

sample_figure = compose_plots(spaces, 80)
sample_figure.savefig('comparison.png', dpi=120, format='png')

#sample_anim = compose_animations(spaces)
#sample_anim.save(filename='comparison.mp4', dpi=120, fps=5)

Binary file added images/decision_boundary.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit afdc435

Please sign in to comment.