# Visit loss and Normalization
Visit loss implicitely normalizes along classes, i.e. it assumes that the unsupervised samples have the same class distribution as the supervised samples. 

This can be a problem in the following cases:
- Settings with few supervised samples, where we don't want to equalize count by sampling when creating batches, as in active learning. 
- Settings with many classes, where a batch cannot cover all samples.

This problem is worsened by the fact that the labels of unsupervised samples are unknown, so one cannot sample equally distributed samples (in every batch) from the unsupervised training data.

Examples for this normalization:

#### unbalanced supervised samples (active learning)
1 sup sample of class A, 2 sup samples of class B. 2 unsupervised samples.

\begin{equation*}
P_{ab} =  \begin{bmatrix}
1 & 0 \\
0 & 1 \\
0 & 1
\end{bmatrix}
\end{equation*}

-> visit probability is (0.33, 0.66), p_target is (0.5, 0.5) -> strange things might happen.


#### unbalanced unsupervised samples
1 sup sample of class A, 1 sup sample of class B. 4 unsupervised samples. By random (bad) luck, we got 1 unsupervised sample of A and 3 of B, so, if the model would be good we get the following P_ab:

\begin{equation*}
P_{ab} =  \begin{bmatrix}
1 & 0 & 0 & 0 \\
0 & .33 & .33 & .33 \\
\end{bmatrix}
\end{equation*}

-> visit probability is (0.5,  0.165,  0.165,  0.165), p_target is (0.25, 0.25, 0.25, 0.25) -> strange things might happen.


In [2]:
import numpy as np
import tensorflow as tf

In [3]:
#simple case
P_ab = np.asarray([[1,0], [0,1],[0,1]])
P_ba = np.asarray([[1,0,0],[0,1,0]])
labels_raw = np.asarray([0,1,1], np.int)
num_classes = 2

#simple case with unbalanced unsup samples
P_ab = np.asarray([[1,0,0,0], [0,0.33,0.33,0.33]])
P_ba = np.asarray([[1,0],[0,1],[0,1],[0,1]])
labels_raw = np.asarray([0,1], np.int)
num_classes = 2

# longer case
#P_ab = np.asarray([[0.4,0.6,0,0], [0,0,0,1], [0.6,0.4,0,0],[0,0,1,0]])
#labels_raw = np.asarray([0,1,0,1], np.int)
#num_classes = 2

# case where a class has no samples
#P_ab = np.asarray([[0.4,0.6,0,0], [0,0,0,1], [0.6,0.4,0,0],[0,0,1,0]])
#labels_raw = np.asarray([0,1,0,1], np.int)
#num_classes = 3

In [4]:
# current implementation

p = tf.placeholder(shape=[None, None], dtype=tf.float32)

visit_probability = tf.reduce_mean(
        p, [0], keep_dims=True, name='visit_prob') 

sess = tf.InteractiveSession()
print(sess.run([visit_probability], {p: P_ab}))

[array([[ 0.5       ,  0.16500001,  0.16500001,  0.16500001]], dtype=float32)]


### Class-normalized visit loss

Visit loss assumes an equal class distribution among supervised samples. If that is not true, this assumption can be removed by scaling P_ab with the class counts, and then using sum instead of mean to calculate visit probability:

In [5]:
# class normalized

# from semisup loss
labels = tf.placeholder(shape=[None,], dtype=tf.float32)
equality_matrix = tf.equal(tf.reshape(labels, [-1, 1]), labels)
equality_matrix = tf.cast(equality_matrix, tf.float32)
p_target = (equality_matrix / tf.reduce_sum(
    equality_matrix, [1], keep_dims=True)) 
scale_f = tf.diag_part(p_target)

p = tf.placeholder(shape=[None, None], dtype=tf.float32)
p_norm = tf.transpose(tf.multiply(tf.transpose(p), scale_f))

visit_probability = tf.reduce_sum(
        p_norm, [0], keep_dims=True, name='visit_prob') 

# normalization
visit_probability = visit_probability * (1 / tf.reduce_sum(visit_probability))

sess = tf.InteractiveSession()
print(sess.run([p_norm, visit_probability], {labels: labels_raw, p: P_ab}))

[array([[ 1.        ,  0.        ,  0.        ,  0.        ],
       [ 0.        ,  0.33000001,  0.33000001,  0.33000001]], dtype=float32), array([[ 0.50251257,  0.16582915,  0.16582915,  0.16582915]], dtype=float32)]


# Proximity loss
Visit loss assumes an equal class distribution among unsupervised samples. If that is not true (i.e. small unsup batch size and large number of classes), this assumption can be removed by using $P_{bab}$

In [6]:
p_ab = tf.placeholder(shape=[None, None], dtype=tf.float32)
p_ba = tf.placeholder(shape=[None, None], dtype=tf.float32)

p_bab = tf.matmul(p_ba, p_ab, name='p_bab')

visit_probability = tf.reduce_mean(p_bab, [0], name='visit_prob_bab', keep_dims=True)

sess = tf.InteractiveSession()
print(sess.run([visit_probability], {p_ab: P_ab, p_ba: P_ba}))

[array([[ 0.25  ,  0.2475,  0.2475,  0.2475]], dtype=float32)]
