# CurvLearn Tutorial
In this  tutorial, you will learn how to build a non-Euclidean binary classification model, including
- define manifold and riemannian tensors.
- build non-Euclidean models from manifold operations.
- define loss function and apply riemannian optimization.

Let's start!

In [1]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import os,sys
sys.path.append(os.path.dirname(os.path.dirname(sys.path[0])))

import numpy as np
import tensorflow as tf

Define hyperparameters.

In [2]:
epochs = 500
batch_size = 1024
log_steps = 100
learning_rate = 1e-3

CurvLearn now supports the following manifolds
- Constant curvature manifolds
    - ```curvlearn.manifolds.Euclidean``` - Euclidean space with zero curvature.
    - ```curvlearn.manifolds.Stereographic``` - Constant curvature stereographic projection model. The curvature can be positive, negative or zero.
    - ```curvlearn.manifolds.PoincareBall``` - The stereographic projection of the Lorentz model with negative curvature.
    - ```curvlearn.manifolds.ProjectedSphere``` - The stereographic projection of the sphere model with positive curvature.
- Mixed curvature manifolds
    - ```curvlearn.manifolds.Product``` - Mixed-curvature space consists of multiple manifolds with different curvatures.

In this tutorial, we use the stereographic model with trainable curvature. 

In [3]:
from curvlearn.manifolds import Stereographic

manifold = Stereographic()
curvature = tf.get_variable(name="curvature", initializer=tf.constant(0.0, dtype=manifold.dtype), trainable=True)

print(manifold.name)


Stereographic


Generate random binary classification dataset.
1 sprase feature and 8 dense features are used to predict the 0/1 label.

In [4]:
global_step = tf.get_variable(name='global_step',initializer=tf.constant(0), trainable=False)

dense = np.random.rand(10000, 8)
sparse = np.random.randint(0, 1000, [10000, 1])
labels = np.random.choice([0, 1], size=10000, replace=True)

dataset = tf.data.Dataset.from_tensor_slices(
    {
        'dense': tf.cast(dense, tf.float32),
        'sparse': tf.cast(sparse, tf.int32),
        'labels': tf.cast(labels, tf.float32)
    }
)
dataset = dataset.shuffle(batch_size * 10).batch(batch_size, drop_remainder=False).repeat(epochs)

iterator = tf.data.make_one_shot_iterator(dataset)
batch = iterator.get_next()
dense, sparse, labels = batch['dense'], batch['sparse'], batch['labels']




Define tensors in the specific manifold can be simply realized through the wrapper function `manifold.variable`.
According to the variable name, tensors are optimized in different ways.
- "*RiemannianParameter*" is contained in the variable name: the variable is a riemannian tensor, and should be optimized by riemannian optimizers.
- Otherwise: the variable is an euclidean(tangent) tensor and is projected into the manifold. In this case, riemannian optimizers behave equivalently to vanilla euclidean optimizers.

Here we optimize dense embedding in euclidean space and sparse embedding in curved space.

In [5]:
embedding_table = tf.get_variable(
    name='RiemannianParameter/embedding',
    shape=(1000, 8),
    dtype=manifold.dtype,
    initializer=tf.truncated_normal_initializer(0.001)
)
embedding_table = manifold.variable(embedding_table, c=curvature)
sparse_embedding = tf.squeeze(tf.nn.embedding_lookup(embedding_table, sparse), axis=1)
dense_embedding = manifold.variable(dense, c=curvature)

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


Building riemannian neural networks requires replacing euclidean tensor operations with manifold operations.

CurvLearn now supports the following basic operations.
- ```variable(t, c)``` - Defines a riemannian variable from manifold or tangent space at origin according to its name.
- ```to_manifold(t, c, base)``` - Converts a tensor ```t``` in the tangent space of ```base``` point to the manifold.
- ```to_tangent(t, c, base)``` - Converts a tensor ```t``` in the manifold to the tangent space of ```base``` point.
- ```weight_sum(tensor_list, a, c)``` - Computes the sum of tensor list ```tensor_list``` with weight list ```a```.
- ```mean(t, c, axis)``` - Computes the average of elements along ```axis``` dimension of a tensor ```t```.
- ```sum(t, c, axis)``` - Computes the sum of elements along ```axis``` dimension of a tensor ```t```.
- ```concat(tensor_list, c, axis)``` - Concatenates tensor list ```tensor_list``` along ```axis``` dimension.
- ```matmul(t, m, c)``` - Multiplies tensor ```t``` by euclidean matrix ```m```.
- ```add(x, y, c)``` - Adds tensor ```x``` and tensor ```y```.
- ```add_bias(t, b, c)``` - Adds a euclidean bias vector ```b``` to tensor ```t```.
- ```activation(t, c_in, c_out, act)``` - Computes the value of  activation function ```act``` for the input tensor ```t```.
- ```linear(t, in_dim, out_dim, c_in, c_out, act, scope)``` - Computes the linear transformation for the input tensor ```t```.
- ```distance(src, tar, c)``` - Computes the squared geodesic/distance between ```src``` and ```tar```.

Complex operations can be decomposed into basic operations explicitly or realized in tangent space implicitly.

Here we use two fully-connected layers as our model backbone.

In [6]:
x = manifold.concat([sparse_embedding, dense_embedding], axis=1, c=curvature)
x = manifold.linear(x, 16, 256, curvature, curvature, tf.nn.elu, 'hidden_layer_1')
x = manifold.linear(x, 256, 32, curvature, curvature, tf.nn.elu, 'hidden_layer_2')



Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor


Notice non-euclidean geometry can only be expressed by geodesics, we use the fermi-dirac decoder to decode the distance and generate the probabilities. Cross entropy is used as the loss function.

In [7]:
origin = manifold.proj(tf.zeros([32], dtype=manifold.dtype), c=curvature)
distance = tf.squeeze(manifold.distance(x, origin, c=curvature))
loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=labels, logits=1.0 - 1.0*distance))

CurvLearn now supports the following optimizers.
- ```curvlearn.optimizers.rsgd``` - Riemannian stochastic gradient optimizer.
- ```curvlearn.optimizers.radagrad``` - Riemannian Adagrad optimizer.
- ```curvlearn.optimizers.radam``` - Riemannian Adam optimizer.

Here we apply riemannian adam optimizer to minimize the loss.

In [8]:
from curvlearn.optimizers import RAdam
optimizer = RAdam(learning_rate=learning_rate, manifold=manifold, c=curvature)
train_op = optimizer.minimize(loss)






  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


Now a non-Euclidean binary classification model is built successfully.

Let's check the performance!

In [9]:
ops = [train_op, curvature, loss] + tf.get_collection(tf.GraphKeys.UPDATE_OPS)

batch_idx = 0
global_init = tf.global_variables_initializer()
local_init = tf.local_variables_initializer()
cp = tf.ConfigProto()
cp.gpu_options.allow_growth = True

with tf.Session(config=cp) as sess:
    sess.run([global_init, local_init])
    while True:
        try:
            batch_idx += 1
            _, c, loss = sess.run(ops)
            if batch_idx % log_steps == 1:
                print('No.{} batches, curvature {}, loss {}'.format(batch_idx, c, loss))

        except tf.errors.OutOfRangeError:
            print('Finish train')
            break









2021-09-13 20:45:04.958438: I tensorflow/core/platform/cpu_feature_guard.cc:145] This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU instructions in performance critical operations:  SSE4.1 SSE4.2
To enable them in non-MKL-DNN operations, rebuild TensorFlow with the appropriate compiler flags.
2021-09-13 20:45:04.958975: I tensorflow/core/common_runtime/process_util.cc:115] Creating new thread pool with default inter op setting: 8. Tune using inter_op_parallelism_threads for best performance.


No.1 batches, curvature 0.0009999870089814067, loss 4.15216064453125
No.101 batches, curvature 0.015087426640093327, loss 0.685605525970459
No.201 batches, curvature 0.01416726689785719, loss 0.6863309144973755
No.301 batches, curvature 0.012791029177606106, loss 0.687471866607666
No.401 batches, curvature 0.011037757620215416, loss 0.683305561542511
No.501 batches, curvature 0.008938672952353954, loss 0.6784000396728516
No.601 batches, curvature 0.006404845044016838, loss 0.6809292435646057
No.701 batches, curvature 0.003494514850899577, loss 0.6842485666275024
No.801 batches, curvature -0.00011886192805832252, loss 0.6720203161239624
No.901 batches, curvature -0.00437968410551548, loss 0.6828656196594238
No.1001 batches, curvature -0.009167192503809929, loss 0.6769402027130127
No.1101 batches, curvature -0.014364367350935936, loss 0.6710186004638672
No.1201 batches, curvature -0.01929965801537037, loss 0.6705893278121948
No.1301 batches, curvature -0.021828362718224525, loss 0.660495

Since our dataset is generated without any geometry prior, the curvature is trained to be near zero and the space is almost euclidean. 

Check performance on real dataset([recommendation](hyperml/README.md), [link prediction](hgcn/README.md), [tree pretrain](tree_pretrain/README.md)) and see the advantages of non-euclidean geometry.