# [DeepSphere]: a spherical convolutional neural network
[DeepSphere]: https://github.com/SwissDataScienceCenter/DeepSphere

[Nathanaël Perraudin](https://perraudin.info), [Michaël Defferrard](http://deff.ch), Tomasz Kacprzak, Raphael Sgier

# Demo: part of sphere 2D ConvNet

This demo uses the whole datataset, smoothing, and the addition of noise.

**You need a private dataset to execute this notebook.**
See the [README](https://github.com/SwissDataScienceCenter/DeepSphere/tree/master#reproducing-the-results-of-the-paper).
But you can use it with your own data.

### 0.1 Load packages

In [None]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

In [None]:
import os
import shutil

# Run on first GPU.
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

# To get the CUDA profiler (do it on the CLI before starting jupyter):
# export LD_LIBRARY_PATH=/usr/local/cuda-9.0/extras/CUPTI/lib64

import numpy as np
import matplotlib.pyplot as plt

from deepsphere import models, experiment_helper, plot, utils
from deepsphere.data import LabeledDatasetWithNoise, LabeledDataset
import hyperparameters

In [None]:
plt.rcParams['figure.figsize'] = (17, 5)

### 0.2 Definition of the parameters

#### A) Non tunable parameters
These parameters are fixed or the preprocessing script has to be modified.

In [None]:
Nside = 1024
sigma = 3
data_path = 'data/same_psd/'

#### B) Tunable parameters
These parameters can be changed.

We choose to work in the noiseless setting by setting `sigma_noise = 0`. This allows this notebook to run an acceptable time. In the noisy case, the training of the network needs considerably more iterations.

In [None]:
order = 2  # 1,2,4,8 correspond to 12,48,192,768 parts of the sphere.
sigma_noise = 2  # Amount of noise for the experiment

# 1 Data preparation

### 1.1 Data download
Set `download` to `True` to download the dataset from zenodo

In [None]:
download = False
if download:
    %run -i 'download.py'

### 1.2 Data preprocessing
Apply the preprocessing steps.
1. Remove the mean of the maps
2. Smooth with a radius of 3 arcmin. (`sigma` parameter)

Set `preprocess` to `True` to execute the preprocessing script.

In [None]:
preprocess = False
if preprocess:
    %run -i 'data_preprocess.py'

Let us display the resulting PSDs of the preprocessed data. We pre-computed the PSDs for faster execution.

In [None]:
compute = False
if compute:
    psd = experiment_helper.psd
    data_path = 'data/same_psd/'
    ds1 = np.load(data_path+'smoothed_class1_sigma{}.npz'.format(sigma))['arr_0']
    ds2 = np.load(data_path+'smoothed_class2_sigma{}.npz'.format(sigma))['arr_0']
    psds_img1 = [psd(img) for img in ds1]
    psds_img2 = [psd(img) for img in ds2]
    np.savez('results/psd_data_sigma{}'.format(sigma), psd_class1=psds_img1, psd_class2=psds_img2)
else:
    psds_img1 = np.load('results/psd_data_sigma{}.npz'.format(sigma))['psd_class1']
    psds_img2 = np.load('results/psd_data_sigma{}.npz'.format(sigma))['psd_class2']

The PSD of the two classes is almost indistinguishable. 

Spoiler Alert! This is the reason why PSD features are not good enough to classify the data.

In [None]:
ell = np.arange(psds_img1.shape[1])

plot.plot_with_std(ell,np.stack(psds_img1)*ell*(ell+1), label='class 1, $\Omega_m=0.31$, $\sigma_8=0.82$, $h=0.7$', color='r')
plot.plot_with_std(ell,np.stack(psds_img2)*ell*(ell+1), label='class 2, $\Omega_m=0.26$, $\sigma_8=0.91$, $h=0.7$', color='b')
plt.legend(fontsize=16);
plt.xlim([11, np.max(ell)])
plt.ylim([1e-6, 5e-4])
plt.yscale('log')
plt.xscale('log')
plt.xlabel('$\ell$: spherical harmonic index', fontsize=18)
plt.ylabel('$C_\ell \cdot \ell \cdot (\ell+1)$', fontsize=18)
plt.title('Power Spectrum Density, 3-arcmin smoothing, noiseless, Nside=1024', fontsize=18);


### 1.2 Data loading
The following functions will
1. Load the preprocessed data
2. Create samples by dividing the complete spheres in patches (based on healpix sampling). See the function `hp_split` of `experiment_helper.py` for more specific informations.

The function that load the testing data will additionally add the noise to the sample.

In [None]:
x_raw_train, labels_raw_train, x_raw_std = experiment_helper.get_training_data(sigma, order)

In [None]:
x_raw_test, labels_test, _ = experiment_helper.get_testing_data(sigma, order, sigma_noise, x_raw_std)

# 4 Classification using Deep Sphere

Let us now classify our data using a spherical convolutional neural network.

### 4.1 Preparation of the dataset
Let us create the datafor the spherical neural network. It is simply the raw data.

In [None]:
ret = experiment_helper.data_preprossing(x_raw_train, labels_raw_train, x_raw_test, sigma_noise, feature_type=None, train_size=0.8)
features_train, labels_train, features_validation, labels_validation, features_test = ret

The spherical neural network will uses a Dataset object that need to be initialized. The object `LabeledDatasetWithNoise` will add noise to the raw data at the time of training. It will slowly increase the amount of noise during `nit` iteration.

In [None]:
from deepsphere.cnn import build_index
nx = Nside//order
nlevels = np.round(np.log2(nx)).astype(np.int)
index = build_index(nlevels).astype(np.int)

features_train = features_train[:, index]
features_validation = features_validation[:, index]
shuffle = np.random.permutation(len(features_test))
features_test = features_test[:, index]
features_test = features_test[shuffle]
labels_test = labels_test[shuffle]

In [None]:
training = LabeledDatasetWithNoise(features_train, labels_train, end_level=sigma_noise)
validation = LabeledDataset(features_validation, labels_validation)

### 4.2 Building the Network

We now create our spherical neural network. We use one architecture, a fully convolutional architecture (see the exact parameters in `hyperparameters.py`), for all the problems (that is for all configurations of `order` and `sigma_noise`. A smaller `order` means more pixels per sample, that is more data for a prediction. It translates to higher accuracy as the network is more confident about its prediction (as they are averaged across spatial locations).

For the paper, we selected a conservative set of parameters that were providing good results across the board. To train faster, diminish `num_epochs`, or interrupt training whenever you get bored. To reproduce all the results from the paper, the easiest is to run the `experiments_deepsphere.py` script.

In [None]:
ntype = 'CNN-2d'
EXP_NAME = '40sim_{}sides_{:0.1f}noise_{}order_{}sigma_{}'.format(Nside, sigma_noise, order, sigma, ntype)

In [None]:
params = hyperparameters.get_params(training.N, EXP_NAME, order, Nside, ntype)
# params['profile'] = True  # See computation time and memory usage in Tensorboard.
# params['debug'] = True  # Debug the model in Tensorboard.
model = models.cnn2d(**params)

In [None]:
# Cleanup before running again.
shutil.rmtree('summaries/{}/'.format(EXP_NAME), ignore_errors=True)
shutil.rmtree('checkpoints/{}/'.format(EXP_NAME), ignore_errors=True)

### 4.3 Find an optimal learning rate (optional)

The learning rate is the most important hyper-parameter. A technique to find an optimal value is to visualize the validation loss while increasing the learning rate. One way to define the optimal learning rate is to search for the largest value looking for which the validation loss still decreases.

In [None]:
# backup = params.copy()
# 
# params, learning_rate = utils.test_learning_rates(params, training.N, 1e-6, 1e-1, num_epochs=20)
# 
# shutil.rmtree('summaries/{}/'.format(params['dir_name']), ignore_errors=True)
# shutil.rmtree('checkpoints/{}/'.format(params['dir_name']), ignore_errors=True)
# 
# model = models.deepsphere(**params)
# _, loss_validation, _, _ = model.fit(training, validation)
# 
# params.update(backup)
#
# plt.semilogx(learning_rate, loss_validation, '.-')

### 4.4 Training the network

Here are a few remarks.
* The model will create tensorboard summaries in the `summaries` folder. Start tensorboard with `cd summaries` then `tensorboard --logdir .`, and open <http://localhost:6006> in a browser tab to visualize training progress and statistics about the learned parameters. You can debug the model by setting `params['debug'] = True` and launching tensorboard with `tensorboard --logdir . --debugger_port 6064`.
* You probably need a GPU to train the model in an acceptable amount of time.
* You will get slightly different results every time the network is trained.

In [None]:
accuracy_validation, loss_validation, loss_training, t_step = model.fit(training, validation)

We can see below that the classifier does not overfit the training data.

In [None]:
# plot.plot_loss(loss_training, loss_validation, t_step, params['eval_frequency'])

In [None]:
error_validation = experiment_helper.model_error(model, features_validation, labels_validation)
print('The validation error is {:.2%}'.format(error_validation), flush=True)

In [None]:
error_test = experiment_helper.model_error(model, features_test, labels_test)
print('The testing error is {:.2%}'.format(error_test), flush=True)

In [None]:
import tensorflow as tf

In [None]:
def plotconv(conv, ax=None):
    sx,sy,nx,ny = conv.shape
    mat = np.zeros((sx*nx+nx-1, sy*ny+ny-1))
    for i in range(nx):
        for j in range(ny):
            mat[i+i*sx:i+(i+1)*sx,j+j*sy:j+(j+1)*sy] = conv[:,:,i,j]
    if ax is None:
        ax = plt.gca()
    v = np.max(abs(mat))
    ax.imshow(mat, cmap=plt.cm.RdBu, vmin=-v, vmax=v)
    
    ticks = np.arange(nx)*(sx+1)+(sx+1)/2-1
    lx = ['In {}'.format(i+1) for i in range(nx)]
    plt.yticks(ticks, lx)
    
    ticks = np.arange(ny)*(sy+1)+(sy+1)/2-1
    ly = ['Out {}'.format(i+1) for i in range(ny)]
    plt.xticks(ticks, ly)
    
    return ax

In [None]:
# Get all variable of the graph
# [n.name for n in model.graph.as_graph_def().node]
# Get all trainable variable
# with model.graph.as_default():
#     print(tf.trainable_variables())
for i in range(5):
    layer = i+1
    plt.figure(figsize=(20,20))
    conv = model.get_var('conv{}/conv2d/w'.format(layer))
    plotconv(conv)
    plt.title('Convolution layer {}'.format(i))
    plt.savefig("figures/conv_kernel_layer{}.pdf".format(i), bbox_inches='tight')