<a href="https://colab.research.google.com/github/GDS-Education-Community-of-Practice/DSECOP/blob/main/Intro_to_Deep_Learning/12_Using_Deep_Learning_to_Find_Hot_Jupiters.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lecture XII: Using Deep Learning to Find Hot-Jupiters
## Lecture X-Solution

The first step is to read and clean data. Here, I use pandas to read data from the file 'NASAExoplanetsData.csv' . However, we do not need all the columns. As mentioned in Lecture X, we only want 7 columns of data corresponded to 7 quantities. But before that, it is essential to normalize data before using a neural network, so in Lines 25-28 I normalized all the inputs except the type of the exoplanet, which has to be 0 or 1 ('label' column) and no need to be normalized. In the next step, I have to divide data into two sets: a training set and a test set with a ratio of 6:4 (Line 32-33). Then, make two matrices (X, Y) for each set as inputs of our neural network. 

In [1]:
def readData():

    import pandas as pd
    import numpy as np

    df = pd.read_csv('NASAExoplanetsData.csv')#, encoding = 'utf-8')
    
    # read file: NASAExoplanet_header.txt to understand the header of the data
    f = open('NASAExoplanet_header.txt', 'r')
    lines = f.readlines()

    header = {}
    for i, l in enumerate(lines):
        parts = l.split('\t')
        header[parts[2].rstrip('\n')] = parts[1]

    # define the paramters that we need, use the file: NASAExoplanet_header to find which columns of data you need
    parameters = ['Orbital Period [days]', 'Planetary Radius [Earth radii]', 
                  'Equilibrium Temperature [K]', 
                  'Stellar Surface Gravity [log10(cm/s**2)]', 
                  'Stellar Effective Temperature [K]','label', 
                  'Transit Duration [hrs]', 'Stellar Radius [Solar radii]']

    # store needed data
    data = pd.DataFrame()
    for p in parameters:
        
        if p != 'label':
       
            data[p]  = (df[header[p]] - df[header[p]].mean()) / (df[header[p]].max() - df[header[p]].min()) #normalizing data
        else:
            data[p] = df[header[p]]
        
    # randomly split data in two sets with ratio 6:4  
    trainingSet = data.sample(frac = 0.6) 
    testSet = data.drop(trainingSet.index)
    
    # make the matrices X and Y for the trainig set
    dataX = trainingSet.loc[:, trainingSet.columns != 'label'] 
    dataY = trainingSet['label']
    
    X = np.atleast_2d(dataX.to_numpy()).T
    Y = np.atleast_2d(dataY.to_numpy())
    
    # make the matirces X and Y for the test set
    dataX_test = testSet.loc[:, testSet.columns != 'label']
    dataY_test = testSet['label']
    
    X_test = np.atleast_2d(dataX_test.to_numpy()).T
    Y_test = np.atleast_2d(dataY_test.to_numpy())
    
    
    return X, Y, X_test, Y_test

X_train, Y_train, X_test, Y_test = readData()

Now, it's time to build our neural network. As mentioned in Lecture X, we want a neural network with 2 hidden layers with three nodes in each hidden layer. Let's make our neural network in general form, using $L$ hidden layers and a list of the number of nodes in each layer like \[n_0, n_1, ..., n_L\]. Note that n_0 is the number of parameters which in our case is 7 and n_L has to be 1 (it's a binary classification problem). We also want to use $tanh(z)$ as an activation function for all hidden layers except the last layer in which the sigmoid function has to be used.

In [2]:
import numpy as np

def activation_sigmoid(z):
    return 1 /(1 + np.exp(-z))

def activation_sigmoid_der(z): # derivative of the sigmoid function which is going to be used in backpropagation (in gradient descent method)
    a = (np.exp(-z))/(1 + np.exp(-z))**2
    return a

def activation_tanh(z):
    return np.tanh(z)

def activation_tanh_der(z):   # derivative of the tanh(z) which is going to be used in backpropagation (in gradient descent method)
    a = 1 - (np.tanh(z))**2
    return a

We use the logestic regression method in which we have
\begin{equation}
Z^{[l]} = (\omega^{[l]}) . A^{[l-1]} + b^{[l]}
\end{equation}
\begin{equation}
A^{[l]} = activation(Z^{[l]})
\end{equation}

where $l$ is the number of layers. In each layer, we also need $\frac{dA}{dZ}$.

In [3]:
def logestic_regression(l, X, omega, b): 
    
    Z = np.dot(omega, X) + b
    if l == L: # last layer is different because of the different activation function
        
        A = activation_sigmoid(Z)
        A_prim = activation_sigmoid_der(Z)
    
    else:
        A = activation_tanh(Z)
        A_prim = activation_tanh_der(Z)
    
    return A, A_prim

Back propagation is the most challenging step in building a neural network. Based on Lectures IV and V, in each layer, we need to compute the derivative of the cost function with respect to $\omega^{[l]}$ and $b^{[l]}$ to modify them by using eq.1 and eq.2 in Lecture IV.

Starting with the last layer ($l = L$), what we want is:

\begin{equation}
\frac{\partial J}{\partial \omega^{[L]}} = \frac{\partial J}{\partial A^{[L]}} \frac{\partial A^{[L]}}{\partial Z^{[L]}} \frac{\partial Z^{[L]}}{\partial \omega^{[L]}} \tag{3}
\label{eq:lastlayer1}
\end{equation}

\begin{equation}
\frac{\partial J}{\partial b^{[L]}} = \frac{\partial J}{\partial A^{[L]}} \frac{\partial A^{[L]}}{\partial Z^{[L]}} \frac{\partial Z^{[L]}}{\partial b^{[L]}} \tag{4}
\label{eq:lastlayer2}
\end{equation}

Let's define $\delta^{[L]}$ as:

\begin{equation}
\delta^{[L]} = \frac{\partial J}{\partial A^{[L]}}  \frac{\partial A^{[L]}}{\partial Z^{[L]}} .\tag{5}
\end{equation}

Rewirtting eq.\ref{eq:lastlayer1} and eq. \ref{eq:lastlayer2} using $\delta^{[L]}$ we have

\begin{equation}
\frac{\partial J}{\partial \omega^{[L]}} = \delta^{[L]} \frac{\partial Z^{[L]}}{\partial \omega^{[L]}} \tag{6}
\label{eq:lastlayer3}
\end{equation}

\begin{equation}
\frac{\partial J}{\partial b^{[L]}} = \delta^{[L]} \frac{\partial Z^{[L]}}{\partial b^{[L]}} \tag{7}
\label{eq:lastlayer4}
\end{equation}

Using eq.5 in Lecture IV, $\delta^{[L]}  = (A^{[L]} - Y)$.

For the one layer before the last layer, i.e., $l = L - 1$, we want to calculate:
\begin{equation}
\frac{\partial J}{\partial \omega^{[L - 1]}} = \frac{\partial J}{\partial A^{[L]}} \frac{\partial A^{[L]}}{\partial Z^{[L]}} \frac{\partial Z^{[L]}}{\partial A^{[L - 1]}} \frac{\partial A^{[L - 1]}}{\partial Z^{[L - 1]}} \frac{\partial Z^{[L - 1]}}{\partial \omega^{[L - 1]}} \tag{8} 
\label{eq:l_1layer1}
\end{equation}

\begin{equation}
\frac{\partial J}{\partial b^{[L]}} = \frac{\partial J}{\partial A^{[L]}} \frac{\partial A^{[L]}}{\partial Z^{[L]}} \frac{\partial Z^{[L]}}{\partial A^{[L - 1]}} \frac{\partial A^{[L - 1]}}{\partial Z^{[L - 1]}} \frac{\partial Z^{[L - 1]}}{\partial b^{[L - 1]}}  \tag{9}
\label{eq:l_1layer2}
\end{equation}

Based on the logestic regression method we know that $\frac{\partial Z^{[L]}}{A^{[L-1]}} = (\omega^{[L]})^{T}$ and $\frac{\partial Z^{[L - 1]}}{\omega^{[L-1]}} = (A^{[L - 2]})^{T}$. Thus, using these equation and $\delta^{[L]}$ in eq. \ref{eq:l_1layer1}, we have

\begin{equation}
\frac{\partial J}{\partial \omega^{[L - 1]}} = \{[(\omega^{[L]})^{T} . \delta^{[L]}] \odot \frac{d A^{[L -1]}}{d Z^{[L - 1]}}\} . (A^{[L - 2]})^{T} \tag{10}
\label{eq:l_1layer3}
\end{equation}

where the $\odot$ operator is a member-wise multiplication or **Hadamard multiplication**. 

Now let's define

\begin{equation}
\delta^{[L - 1]} = ((\omega^{[L]})^{T} . \delta^{[L]}) \odot \frac{d A^{[L -1]}}{d Z^{[L - 1]}}\tag{11}
\end{equation}

So for eq. \ref{eq:l_1layer3} we have
\begin{equation}
\frac{\partial J}{\partial \omega^{[L - 1]}} = \delta^{[L - 1]} . (A^{[L - 2]})^{T} \tag{12}
\label{eq:l_1layer4}
\end{equation}


Generaizing this result, for the layer $l < L$ , we have
\begin{equation}
\delta^{[l]} = [(\omega^{[l + 1]})^{T} . \delta^{[l + 1]}] \odot \frac{d A^{[l]}}{d Z^{[l]}}  \tag{13}
\label{eq:delta}
\end{equation}
\begin{equation}
\frac{\partial J}{\partial \omega^{[l]}} = \delta^{[l]} . (A^{[l - 1]})^{T} \tag{14}
\label{eq:cost_der}
\end{equation}

With similar approch we can find $\frac{\partial J}{\partial b^{[l]}}$.


Therefore, in the back propagation algorithm, we take these steps:

1. for the last layer $l = L$, use the eq.\ref{eq:lastlayer2} to compute $\delta^{[L]}$,
2. then, for all the other hidden layers, we calcualte $\delta^{[l]}$ using eq. \ref{eq:delta};
3. having $\delta^{[l]}$, we can now calcualte $\frac{\partial J}{\partial \omega^{[l]}}$ and $\frac{\partial J}{\partial b^{[l]}}$;
4. using eq. 1 and eq. 2 in Lecture IV, at the last step of this part, we compute the new *weight* and *bias* ($\omega^{[l]}$ and $b^{[l]}$).

Following the rank of matrices in this calculation can be helpful to understand better the equation presented here. 

In [4]:
def back_propagation(l, A_prim, w, D): 
    
    delta = np.atleast_2d(np.dot(np.transpose(w), D) * A_prim)
    
    return delta
    
def update(omega, learning_rate, A, D, b, m):
    a = np.atleast_2d(A)
    d = np.atleast_2d(D)
    ad = np.dot(d, a.T)
    omega -= (learning_rate/m) * ad
    
    sigma = (np.atleast_2d(np.sum(d, axis=1))).T
    b -= (learning_rate/m) * sigma
    
    return omega, b

Having all the needed functions and data, now we can build our neural network. The first step is to initialize variables such as $\omega^{[l]}$ and $b^{[l]}$ (Lines: 24-28). 

In [5]:
#----------------------------------------------------------------------------------------------------------------

def NN(L, N_L, n_iteration, learning_rate, X_t, Y):
    
    
    m = X_t.shape[1]
    
    
    # DEFINE VARIABLES:
    #--------------------
    A = np.zeros((L+1), dtype=list)
    A_prim = np.zeros((L+1),dtype=list)
    omega = np.zeros((L+1),dtype=list)
    b = np.zeros((L+1),dtype=list)
    delta = np.zeros((L+1),dtype=list)
    
    
    # INITIALIZIATION:
    #--------------------
    for l in range(1, L+1, 1):
        
        n_o = N_L[l]       # number of nodes
        n_i = N_L[l- 1]    # number of inputs (or nodes of the previous layer)
        
        omega[l] = np.random.rand(n_o, n_i) * np.sqrt(1/n_i)
        b[l] = np.random.rand(n_o, 1)
        A[l] = np.zeros((n_o, m), dtype=float)
        A_prim[l] = np.zeros((n_o, m), dtype=float)
        delta[l] = np.zeros((n_o, m), dtype=float)
            

    for k in range(n_iteration):
        
        # FORWARD:
        #--------------------
        A[0] = X_t
        
        for l in range(1, L+1, +1):
            
            n_o = N_L[l]         # number of nodes
            #n_i = N_L[l - 1]    # number of inputs (or nodes of the previous layer)
            
           
            A[l], A_prim[l] = logestic_regression(l, A[l-1], omega[l], b[l])
            
        
        
        # BACKWARD:
        #--------------------
        for l in range(L, 0, -1):
            
            if l == L:        
                
                error = A[l] - Y
                delta[l] = np.atleast_2d(error)  # Note that if you define the Loss function defirently, this term would be different
                                                # for example, if you define Loss as \Sigma (A - Y)^2, delta[L] would be error * A_prim[L])
                
            else:
                delta[l] = back_propagation(L, A_prim[l], omega[l + 1], delta[l + 1])
            
            
            
        # UPDATE:
        #--------------------
        for l in range(1, L+1, 1):
            omega[l], b[l] = update(omega[l], learning_rate, A[l-1], delta[l], b[l], m)
            
            
             
    return omega, b
#----------------------------------------------------------------------------------------------------------------    

Now it's time to test the *learned* model parameters, $\omega^{[l]}$ and $b^{[l]}$. 

In [6]:
def predict(L, omega, b, X, Y):
    
    m = X.shape[1]
    A = np.zeros((L+1), dtype=list)
    
    
    A[0] = X
    for l in range(1, L+1, 1):

        n_o = N_L[l]     # number of nodes
        A[l] = np.zeros((n_o, m), dtype=float)

        A[l] = logestic_regression(l, A[l-1], omega[l], b[l])[0]
    
    counter = 0.
    for a, y in zip(A[L], Y):
        counter = np.count_nonzero((np.abs(a-y)< 0.5) == True) # i.e. if A[L][i] > 0.5 consider it as 1
                                                                #     if A[L][i] < 0.5 consider it as 0
        
    correctness = (counter/m)*100 # percentage of correct results
    
    return A[L], correctness

Let's run the code with these hyperparameters:

- learning_rate = 0.03
- n_iteration = 5000
- 3 hidden layers
- number of nodes in hidden layers: [7, 4, 3, 1] (note that 7 corresponds to the input layer (l=0))

Run the program 100 times to see if the code converges with these hyperparameters. 

In [None]:
learning_rate = 0.03
n_iteration = 5000
L = 3
N_L = [7, 4, 3, 1]

number_of_efficieny_less_than_80 = 0.

for i in range(100):
    omega, b = NN(L, N_L, n_iteration, learning_rate, X_train, Y_train)
    output, efficieny = predict(L, omega, b, X_test, Y_test)
    
    if efficieny < 80.:
        number_of_efficieny_less_than_80 += 1
print(number_of_efficieny_less_than_80)

Try with

- learning_rate = 0.03
- n_iteration = 10000


In [708]:
learnig_rate = 0.03
n_iteration = 10000
L = 3
N_L = [7, 4, 3, 1]

number_of_efficieny_less_than_80 = 0.

for i in range(100):
    omega, b = NN(L, N_L, n_iteration, learning_rate, X_train, Y_train)
    output, efficieny = predict(L, omega, b, X_test, Y_test)
    
    if efficieny < 80.:
        number_of_efficieny_less_than_80 += 1
print(number_of_efficieny_less_than_80)

4.0


Now let's repeat the procidure with the help of `keras`:

In [None]:
#import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# define the keras model
model = Sequential()
model.add(Dense(3, input_shape=(7,), activation='tanh'))
model.add(Dense(3, activation='tanh'))
model.add(Dense(1, activation='sigmoid'))

# compile the keras model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# fit the keras model on the dataset
model.fit(X_train, Y_train, epochs=150, batch_size=10)

# evaluate the keras model
_, accuracy = model.evaluate(X_test, Y_test)
print('Accuracy: %.2f' % (accuracy*100))