# [DeepSphere]: a spherical convolutional neural network
[DeepSphere]: https://github.com/SwissDataScienceCenter/DeepSphere

[Nathanaël Perraudin](https://perraudin.info), [Michaël Defferrard](http://deff.ch), Tomasz Kacprzak, Raphael Sgier

# Demo: whole sphere classification

In [None]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

In [None]:
import os
import shutil

# Run on CPU.
os.environ["CUDA_VISIBLE_DEVICES"] = ""

import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
import healpy as hp
import tensorflow as tf

from deepsphere import models, experiment_helper, plot
from deepsphere.data import LabeledDataset

In [None]:
plt.rcParams['figure.figsize'] = (17, 5)

In [None]:
EXP_NAME = 'whole_sphere'

## 1 Data loading

The data consists of a toy dataset that is sufficiently small to have fun with. It is made of 200 maps of size `NSIDE=64` splitted into 2 classes. 

The maps contain a Gaussian random field realisations produced with Synfast function from Healpy package.
The input power spectra were taken from LambdaCDM model with two sets of parameters.
These maps are not realistic cosmological mass maps, just a toy dataset.
We downsampled them to `Nside=64` in order to make the processing faster.

In [None]:
data = np.load('data/maps_downsampled_64.npz')
assert(len(data['class1']) == len(data['class2']))
nclass = len(data['class1'])

Let us plot a map of each class. It is not simple to visually catch the differences.

In [None]:
cmin = min(np.min(data['class1']), np.min(data['class2']))
cmax = max(np.max(data['class1']), np.max(data['class2']))
cm = plt.cm.RdBu_r
cm.set_under('w')
hp.mollview(data['class1'][0], title='class 1', nest=True, cmap=cm, min=cmin, max=cmax)
hp.mollview(data['class2'][0], title='class 2', nest=True, cmap=cm, min=cmin, max=cmax)

However, those maps have different Power Spectral Densities PSD.

In [None]:
sample_psd_class1 = np.empty((nclass, 192))
sample_psd_class2 = np.empty((nclass, 192))

for i in range(nclass):
    sample_psd_class1[i] = experiment_helper.psd(data['class1'][i])
    sample_psd_class2[i] = experiment_helper.psd(data['class2'][i])

In [None]:
ell = np.arange(sample_psd_class1.shape[1])
plot.plot_with_std(ell, sample_psd_class1*ell*(ell+1), label='class 1, Omega_matter=0.3, mean', color='b')
plot.plot_with_std(ell,sample_psd_class2*ell*(ell+1), label='class 2, Omega_matter=0.5, mean', color='r')
plt.legend(fontsize=16);
plt.xlim([10, np.max(ell)])
plt.ylim([1e-6, 1e-3])
# plt.yscale('log')
plt.xscale('log')
plt.xlabel('$\ell$: spherical harmonic index', fontsize=18)
plt.ylabel('$C_\ell \cdot \ell \cdot (\ell+1)$', fontsize=18)
plt.title('Power Spectrum Density, 3-arcmin smoothing, noiseless, Nside=1024', fontsize=18);

## 2 Data preparation

Let us split the data into training and testing sets. The raw data is stored into `x_raw` and the power spectrum densities into `x_psd`.

In [None]:
# Normalize and transform the data, i.e. extract features.
x_raw = np.vstack((data['class1'], data['class2']))
x_raw = x_raw / np.mean(x_raw**2) # Apply some normalization (We do not want to affect the mean)
x_psd = preprocessing.scale(np.vstack((sample_psd_class1, sample_psd_class2)))

# Create the label vector
labels = np.zeros([x_raw.shape[0]], dtype=int)
labels[nclass:] = 1

# Random train / test split
ntrain = 150
ret = train_test_split(x_raw, x_psd, labels, test_size=2*nclass-ntrain, shuffle=True)
x_raw_train, x_raw_test, x_psd_train, x_psd_test, labels_train, labels_test = ret

print('Class 1 VS class 2')
print('  Training set: {} / {}'.format(np.sum(labels_train==0), np.sum(labels_train==1)))
print('  Test set: {} / {}'.format(np.sum(labels_test==0), np.sum(labels_test==1)))

## 3 Classification using SVM

As a baseline, let us classify our data using an SVM classifier.

* An SVM based on the raw feature cannot discriminate the data because the dimensionality of the data is too large.
* We however observe that the PSD features are linearly separable.

In [None]:
clf = SVC(kernel='rbf')
clf.fit(x_raw_train, labels_train) 

e_train = experiment_helper.model_error(clf, x_raw_train, labels_train)
e_test = experiment_helper.model_error(clf, x_raw_test, labels_test)
print('The training error is: {}%'.format(e_train*100))
print('The testing error is: {}%'.format(e_test*100))

In [None]:
clf = SVC(kernel='linear')
clf.fit(x_psd_train, labels_train) 

e_train = experiment_helper.model_error(clf, x_psd_train, labels_train)
e_test = experiment_helper.model_error(clf, x_psd_test, labels_test)
print('The training error is: {}%'.format(e_train*100))
print('The testing error is: {}%'.format(e_test*100))

## 4 Classification using DeepSphere

Let us now classify our data using a spherical convolutional neural network.

Three types of architectures are suitable for this task:
1. Classic CNN: the classic ConvNet composed of some convolutional layers followed by some fully connected layers.
2. Stat layer: a statistical layer, which computes some statistics over the pixels, is inserted between the convolutional and fully connected layers. The role of this added layer is make the prediction invariant to the position of the pixels on the sphere.
3. Fully convolutional: the fully connected layers are removed and the network outputs many predictions at various spatial locations that are then averaged.

On this simple task, all architectures can reach 100% test accuracy. Nevertheless, the number of parameters to learn decreases and training converges faster. A fully convolutional network is much faster and efficient in terms of parameters. It does however assume that all pixels have the same importance and that their location does not matter. While that is true for cosmological applications, it may not for others.

In [None]:
params = dict()
params['dir_name'] = EXP_NAME

# Types of layers.
params['conv'] = 'chebyshev5'  # Graph convolution: chebyshev5 or monomials.
params['pool'] = 'max'  # Pooling: max or average.
params['activation'] = 'relu'  # Non-linearity: relu, elu, leaky_relu, softmax, tanh, etc.
params['statistics'] = None  # Statistics (for invariance): None, mean, var, meanvar, hist.

# Architecture.
architecture = 'fully_convolutional'

if architecture == 'classic_cnn':
    params['statistics'] = None
    params['nsides'] = [64, 32, 16, 16]  # Pooling: number of pixels per layer.
    params['F'] = [5, 5, 5]  # Graph convolutional layers: number of feature maps.
    params['M'] = [50, 2]  # Fully connected layers: output dimensionalities.

elif architecture == 'stat_layer':
    params['statistics'] = 'meanvar'
    params['nsides'] = [64, 32, 16, 16]  # Pooling: number of pixels per layer.
    params['F'] = [5, 5, 5]  # Graph convolutional layers: number of feature maps.
    params['M'] = [50, 2]  # Fully connected layers: output dimensionalities.

elif architecture == 'fully_convolutional':
    params['statistics'] = 'mean'
    params['nsides'] = [64, 32, 16, 8, 8]
    params['F'] = [5, 5, 5, 2]
    params['M'] = []

params['K'] = [10] * len(params['F'])  # Polynomial orders.
params['batch_norm'] = [True] * len(params['F'])  # Batch normalization.

# Regularization.
params['regularization'] = 0  # Amount of L2 regularization over the weights (will be divided by the number of weights).
params['dropout'] = 0.5  # Percentage of neurons to keep.

# Training.
params['num_epochs'] = 12  # Number of passes through the training data.
params['batch_size'] = 16  # Number of samples per training batch. Should be a power of 2 for greater speed.
params['eval_frequency'] = 15  # Frequency of model evaluations during training (influence training time).
params['scheduler'] = lambda step: 1e-1  # Constant learning rate.
params['optimizer'] = lambda lr: tf.train.GradientDescentOptimizer(lr)
#params['optimizer'] = lambda lr: tf.train.MomentumOptimizer(lr, momentum=0.5)
#params['optimizer'] = lambda lr: tf.train.AdamOptimizer(lr, beta1=0.9, beta2=0.999, epsilon=1e-8)

In [None]:
model = models.deepsphere(**params)

In [None]:
# Cleanup before running again.
shutil.rmtree('summaries/{}/'.format(EXP_NAME), ignore_errors=True)
shutil.rmtree('checkpoints/{}/'.format(EXP_NAME), ignore_errors=True)

In [None]:
training = LabeledDataset(x_raw_train, labels_train)
testing = LabeledDataset(x_raw_test, labels_test)

In [None]:
accuracy_validation, loss_validation, loss_training, t_step = model.fit(training, testing)

In [None]:
plot.plot_loss(loss_training, loss_validation, t_step, params['eval_frequency'])

In [None]:
error_train = experiment_helper.model_error(model, x_raw_train, labels_train)
error_test = experiment_helper.model_error(model, x_raw_test, labels_test)
print('The training error is: {:.2%}'.format(error_train))
print('The testing error is: {:.2%}'.format(error_test))

## 5 Filters visualization

The package offers a few different visualizations for the learned filters. First we can simply look at the Chebyshef coefficients. This visualization is not very interpretable for human, but can help for debugging problems related to optimization.

In [None]:
layer=2
model.plot_chebyshev_coeffs(layer)

We observe the Chebyshef polynomial, i.e the filters in the graph spectral domain. This visuallization can help to understand wich graph frequencies are picked by the filtering operation. It mostly interpretable by the people for the graph signal processing community.

In [None]:
model.plot_filters_spectral(layer);

Here comes one of the most human friendly representation of the filters. It consists the section of the filters "projected" on the sphere. Because of the irregularity of the healpix sampling, this representation of the filters may not look very smooth.

In [None]:
mpl.rcParams.update({'font.size': 16})
model.plot_filters_section(layer, title='');

Eventually, we can simply look at the filters on sphere. This representation clearly displays the sampling artifacts.

In [None]:
plt.rcParams['figure.figsize'] = (10, 10)
model.plot_filters_gnomonic(layer, title='')