# Dive Deeper into Classification Loss Functions

- For two-class classification problem the last layer activation function is sigmoid and loss function would be binary cross entropy

- For multi-class classification problem the last layer activation function is softmax and loss function would be cross entropy

- Lets learn more about binary cross entropy and cross entropy function

## Lets review sigmoid function and softmax function

- Read the following blog post for 7 minutes

https://www.depends-on-the-definition.com/guide-to-multi-label-classification-with-neural-networks/

## Binary Cross Entropy in Numpy

- For each data sample, when we have two-class classification problem then `y_true` and `y_pred` would be scalar

In [2]:
import numpy as np

def binary_cross_entropy_loss(y_pred, y):
    return (-y * np.log(y_pred) - (1 - y) * np.log(1 - y_pred)).mean()

In [3]:
y_true = 1
y_pred = 0.9
# y_pred = 0.2

print(binary_cross_entropy_loss(y_pred, y_true))

0.10536051565782628


## Binary Cross Entropy in Tensorflow

In [4]:
import tensorflow as tf

y_t = 1
y_p = 0.9
# y_p = 0.2

cost = tf.nn.sigmoid_cross_entropy_with_logits(labels=tf.convert_to_tensor(y_t, dtype=tf.float32),

                                               logits=tf.convert_to_tensor(np.log(y_p/(1-y_p)), dtype=tf.float32))
binary_cross_entropy = tf.reduce_mean(cost)
sess = tf.Session()

print(sess.run(binary_cross_entropy))

AttributeError: module 'tensorflow' has no attribute 'Session'

## Binary Cross Entropy in Keras

In [6]:
from keras import backend as K

def f_k_binary_cross_entropy(y_tr, y_pr):
    return K.mean(K.binary_crossentropy(y_tr, y_pr), axis=-1)

y_t = [1]
y_p = [0.9]

print(K.eval(f_k_binary_cross_entropy(K.constant(y_t), K.constant(y_p))))

0.10536041


## Review of Entropy

In [7]:
import numpy as np

def entropy(p):
    H = np.array([-p[i]*np.log2(p[i]) for i in range(len(p))]).sum()
    return H
    
p = [.5, .5]
print(entropy(p))

p = [.9, .1]
print(entropy(p))

1.0
0.4689955935892812


## What if:

In [8]:
p= [1, 0]
print(entropy(p))

nan


  after removing the cwd from sys.path.
  after removing the cwd from sys.path.


## So we should modify our entropy implementation

In [9]:
import numpy as np

eps = 1e-6

def entropy(p):
    H = np.array([-p[i]*np.log2(np.clip(p[i], eps, 1-eps)) for i in range(len(p))]).sum()
    return H

In [10]:
p= [1, 0]
print(entropy(p))

1.4426957622784505e-06


## Cross Entropy

- For each data sample, when we have multi-class classification problem then `y_true` and `y_pred` would be a vector

## Cross Entropy in Numpy

In [11]:
import numpy as np

eps = 1e-6

def cross_entropy(p, q):
	#return -sum([p[i]*np.log(q[i]) for i in range(len(p))])
#     print([np.clip(q[i], eps, 1-eps) for i in range(len(p))])
    return -sum([p[i]*np.log(np.clip(q[i], eps, 1-eps)) for i in range(len(p))])

y_t = np.array([1, 0, 0, 0, 0])
y_p = np.array([0.4, 0.3, 0.05, 0.05, 0.2])
# y_p = np.array([0.98, 0.01, 0, 0, 0.01])
print(cross_entropy(y_t, y_p))

0.916290731874155


## Cross Entropy in Scipy

In [12]:
from scipy.stats import entropy

def cross_entropy_via_scipy(x, y):
        ''' SEE: https://en.wikipedia.org/wiki/Cross_entropy'''
        return  entropy(x, y)
    
print(cross_entropy_via_scipy(y_t, y_p))

0.9162907318741551


## Cross Entropy in Keras

In [13]:
from keras import backend as K

def keras_categorical_crossentropy(y_true, y_pred):
    return K.categorical_crossentropy(y_true, y_pred)

In [14]:
import numpy as np
import tensorflow as tf

#https://medium.com/activating-robotic-minds/demystifying-cross-entropy-e80e3ad54a8
# The data are from the above link
y_t = np.array([1, 0, 0, 0, 0])
y_p = np.array([0.4, 0.3, 0.05, 0.05, 0.2])
# y_p = np.array([0.98, 0.01, 0, 0, 0.01])

print(K.eval(keras_categorical_crossentropy(K.constant(y_t), K.constant(y_p))))

0.9162909


## Cross Entropy in Tensorflow

In [15]:
# Reference: https://github.com/tensorflow/tensorflow/issues/2462
y_t = np.array([1, 0, 0, 0, 0])
y_p = np.array([0.4, 0.3, 0.05, 0.05, 0.2])
# y_p = np.array([0.98, 0.01, 0, 0, 0.01])

y_pred_tf = tf.convert_to_tensor(y_p, np.float32)
y_true_tf = tf.convert_to_tensor(y_t, np.float32)
eps = 1e-6
cliped_y_pred_tf = tf.clip_by_value(y_pred_tf, eps, 1-eps)
loss_tf = tf.reduce_mean(-tf.reduce_sum(y_true_tf * tf.log(cliped_y_pred_tf)))
with tf.Session() as sess:
    loss = sess.run(loss_tf)
    print(loss)

AttributeError: module 'tensorflow' has no attribute 'log'

## Another implementation of Cross Entropy in Tensorflow

In [16]:
import numpy as np
import tensorflow as tf

y_t = np.array([1, 0, 0, 0, 0])
y_p = np.array([0.4, 0.3, 0.05, 0.05, 0.2])
# y_p = np.array([0.98, 0.01, 0, 0, 0.01])

cost = tf.losses.softmax_cross_entropy(onehot_labels=tf.convert_to_tensor(y_t, dtype=tf.float32), logits=tf.convert_to_tensor(np.log(y_p), dtype=tf.float32))
sess = tf.Session()
print(sess.run(cost))

AttributeError: module 'tensorflow.keras.losses' has no attribute 'softmax_cross_entropy'

## Activity: Would we ge the same result if we switched y_t and y_p?

In [17]:
print(cross_entropy(y_t, y_p))
print(cross_entropy(y_p, y_t))

0.916290731874155
8.289306734778764


## Question: How we can use Deep Learning (CNN for example) for sound classification?

- What is Spectogram: https://towardsdatascience.com/understanding-audio-data-fourier-transform-fft-spectrogram-and-speech-recognition-a4072d228520 

- https://medium.com/x8-the-ai-community/audio-classification-using-cnn-coding-example-f9cbd272269e

- https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8605515

## Resource

- https://machinelearningmastery.com/cross-entropy-for-machine-learning/

$H(P) = –\sum_{x \in X} p(x) log(p(x))$

$H(P, Q) = H(P) + KL(P || Q)$

$H(P, Q) != H(Q, P)$

In [None]:
from math import log2
 
# calculate the kl divergence KL(P || Q)
def kl_divergence(p, q):
	return sum(p[i] * log2(p[i]/q[i]) for i in range(len(p)))
 
# calculate entropy H(P)
def entropy(p):
	return -sum([p[i] * log2(p[i]) for i in range(len(p))])
 
# calculate cross entropy H(P, Q)
def cross_entropy(p, q):
	return entropy(p) + kl_divergence(p, q)
 
# define data
p = [0.10, 0.40, 0.50]
q = [0.80, 0.15, 0.05]
# calculate H(P)
en_p = entropy(p)
print('H(P): %.3f bits' % en_p)
# calculate kl divergence KL(P || Q)
kl_pq = kl_divergence(p, q)
print('KL(P || Q): %.3f bits' % kl_pq)
# calculate cross entropy H(P, Q)
ce_pq = cross_entropy(p, q)
print('H(P, Q): %.3f bits' % ce_pq)