## Finite Notebook Fun for Lessons 1/2:

### 1. Checking a few things about x-entropy

Imagine a probability distribution $p$ cross 5 outcomes:

In [1]:
from random import random
import numpy as np

In [2]:
p = np.array([0.3,0.2,0.1,0.25,0.15])
print(p)

[0.3  0.2  0.1  0.25 0.15]


Calculate the entropy:

In [3]:
S = -(0.3 * np.log(0.3) + 0.2 * np.log(0.2) + 0.1 * np.log(0.1) + 0.25 * np.log(0.25) + 0.15 * np.log(0.15))
print(S)

1.5444795210968603


Tedious and silly. Better way? Use vectors:

In [4]:
S = np.dot(p,-np.log(p))
print(S)

1.5444795210968603


Same result. Good. 

Now imagine a 2nd distribution q which is close to $p$ but differs a bit. 
Construct it via a small $\delta$ vector which has a mean of zero, and whose scale is a small number $\epsilon$: 

In [5]:
def create_q(p0, epsilon=0.03):

    delta = epsilon * np.random.random(5)
    delta = delta - np.mean(delta)
    
    return (p0 + delta)

q = create_q(p)
print('q: ', q)

q:  [0.29191397 0.20723054 0.0899192  0.26063432 0.15030197]


What is the cross entropy?

In [6]:
ce = -np.dot(p, np.log(q))
print(ce)

1.5454834617016764


Close! Smaller or bigger than entropy?

In [7]:
print((ce - S) > 0)

True


Let's try that for many q:

In [8]:
for attempt in range(5):
    q = create_q(p)
    ce = -np.dot(p, np.log(q))
    #print(q)
    print('ce > S? ', (ce-S)>0)
    

ce > S?  True
ce > S?  True
ce > S?  True
ce > S?  True
ce > S?  True


Indeed... $ce$ is always larger than $S$ ('experimentally verified', but should be proven). $ce = S$ when $q=p$, therefor minimizing $ce$ drives $q \rightarrow p$.

### 2. Familiarization with Softmax

Imagine your output layer (which does not have a non-linearity) returns these 5 numbers:

In [9]:
o = 10 * np.random.random(5) - 4
print('o: ', o)

o:  [1.49658055 0.15186203 3.40171175 0.37277461 3.49266588]


Let's first exponentiate all values:

In [10]:
exp_o = np.exp(o)
print('outputs: ', exp_o)

outputs:  [ 4.46639033  1.16399963 30.01543501  1.4517571  32.87346771]


Great. All numbers are positive. But they don't sum to $1$. Simple solution: divide all by the sum:

In [11]:
sum_exp_o = np.sum(exp_o)
print('sum of all output values: ', sum_exp_o)

sum of all output values:  69.97104976778468


In [12]:
q = exp_o/sum_exp_o
print('model probabilities q: ', q)

model probabilities q:  [0.06383198 0.01663545 0.42896934 0.02074797 0.46981527]


Cool! Sums to 1 as desired. (Not a surprise.)

### 3. Most Basic Keras Lingo

Simple 3-class classification model. See: https://www.tensorflow.org/guide/keras

In [13]:
import tensorflow as tf
from tensorflow.keras import layers

Setting up a model:

In [14]:
# Define 'sequential' model (vs. 'functional'... we'll discuss later.)
model = tf.keras.Sequential([
    
# Adds a densely-connected layer with 8 units to the model:
layers.Dense(8, activation='relu', input_shape=(4,)),         # '4' is the number of features
    
# Add another:
layers.Dense(8, activation='relu'),
# Add a softmax layer with 3 output units:
layers.Dense(3, activation='softmax')])


W0903 19:34:01.504240 4732507584 deprecation.py:506] From /anaconda3/envs/tf1_14/lib/python3.7/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor


Compiling the model... i.e., adding losses and metrics etc:

In [15]:
# Configure a model for categorical classification.
model.compile(optimizer=tf.train.RMSPropOptimizer(0.01),
              loss=tf.keras.losses.categorical_crossentropy,
              metrics=[tf.keras.metrics.categorical_accuracy])


Create some fake input data:

In [16]:
import numpy as np

def random_one_hot_labels(shape):
    n, n_class = shape
    classes = np.random.randint(0, n_class, n)
    labels = np.zeros((n, n_class))
    labels[np.arange(n), classes] = 1
    return labels

data = np.random.random((1000, 4))
labels = random_one_hot_labels((1000, 3))


And then train your model. (well... nothing to be trained here as there are by construction no patterns.)

In [17]:
model.fit(data, labels, epochs=10, batch_size=32)

W0903 19:34:01.666284 4732507584 deprecation.py:506] From /anaconda3/envs/tf1_14/lib/python3.7/site-packages/tensorflow/python/training/rmsprop.py:119: calling Ones.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0xb3070cf98>

Results surprising? Shouldn't.