# Deep Learning

## Authors
J. Brinchman, B.W. Holwerda

## Learning Goals
* Deep Learning
* follow-up on PCA analysis
* How well do prediction and truth line up with prediction
* 

## Keywords
Tensorflow, Keras, Deep Learning

## Companion Content


## Summary
This is a deep learning exercise on the same spectra as the PCA assignment. Deep Learning typically uses a lot of computing (often done on GPUs). This is meant to be low-cost computational wise.

<hr>

# Deep learning on Pickles

In this notebook we will use a deep learning approach to do stellar classification on the basis of spectra. For this we will use the Pickles library. Note that this is really too small to do a proper deep learning model but it should be sufficient to get a first start.

The library we will use is *TensorFlow* with the *keras* interface in Python. 

If the import below does not work, make sure to install these two.

In [1]:
import numpy as np
import matplotlib.pyplot as plt

from keras.models import Sequential
from keras.layers import Dense

%matplotlib inline

## Get the spectrum library

This will not work until you have installed matplotlib, pandas and astropy.

You will also need the PCA_pickles_driver.py file in the directory of this assignment.

In [2]:
import PCA_pickles_driver as pp
from importlib import reload
reload(pp)

<module 'PCA_pickles_driver' from '/Users/holwerda/Dropbox/PHYS650/2021S/Assignments/Week 14 - Deep Learning/PCA_pickles_driver.py'>

In [3]:
wave, flux, dflux = pp.load_pickles_library()
t_overview = pp.load_overview_table()

We also want to limit our attention to the optical wavelength region

In [4]:
i_use, = np.where((wave>3000) & (wave < 10000))

flux_use = flux[i_use, :]
dflux_use = dflux[i_use, :]
wave_use = wave[i_use]

X = flux_use.T

## Dividing into test and training samples

This is a crucial step and it is important to do this right. Thus I use a `StratifiedShuffleSplit` which allows us to create a training sample that has the same relative number of examples in each class (here in each spectral class).

This does perhaps lead to overfitting at O & B where we have few examples.

In [5]:
from sklearn.model_selection import StratifiedShuffleSplit

In [6]:
label = np.round(t_overview['numtype'])
sss = StratifiedShuffleSplit(n_splits=5, test_size=0.1, random_state=0)
sss.get_n_splits(X, label )

#for train_index, test_index in sss.split(X, label):
#    print("TRAIN:", train_index, "TEST:", test_index)
#    X_train, X_test = X[train_index], X[test_index]
#    y_train, y_test = label[train_index], label[test_index]

5

In [7]:
# Get one training set
tmp = sss.split(X, label)

The split using `StratifiedShuffleSplit` is a bit more complicated than other splitting functions. It returns an iterator and you need to first split (as above) and then you have to call `send` on the returned result.

In [8]:
i_train, i_test = tmp.send(None)

In [9]:
len(i_train), len(i_test)

(117, 14)

In [10]:
n_X, n_data = X.shape
n_train = len(i_train)
n_test = len(i_test)
n_X, n_data

(131, 1399)

## Set up the keras model

I will use a sequential model in Keras - this allows you to add the layers you want. So let us first do a single hidden layer. The first layer is dense and has 32 units.

In [11]:
model = Sequential()
model.add(Dense(n_data, input_dim=n_data, activation='relu'))
model.add(Dense(250, activation='relu'))
model.add(Dense(100, activation='relu'))
model.add(Dense(1, activation='linear'))

In [12]:
model.compile(optimizer='Adam',
              loss='mse')
#,
#              metrics=['accuracy'])

In [13]:
t_overview.columns

<TableColumns names=('SPType','Lumclass','Metal','numtype','metflag','numlclass','file')>

In [15]:
train_X = X[i_train, :]
train_y = label[i_train]
test_X = X[i_test, :]
test_y = label[i_test]

history = model.fit(train_X, train_y, validation_data=(test_X, test_y), epochs=150, verbose=0)

In [16]:
train_mse = model.evaluate(train_X, train_y, verbose=0)
test_mse = model.evaluate(test_X, test_y, verbose=0)
print("Train= {0}.  Test= {1}".format(train_mse, test_mse))

Train= 0.08867886662483215.  Test= 1.6944500207901


### Exercise 1

The history.history has the training and the validation history of the deep learning. 

1. Explore the structure of history.history['loss'] and history.history['val_loss'] 
2. plot both as a function of epoch (see cells above)
3. Can you tell how fast the algorithm is learning? At what epoch does it appear to be mostly done?

In [20]:
# student work here


In [21]:
# student work here


### Exercise 2

Zoom in on the y-axes. Does that change your perspective when the algorithm has stopped learning?

In [22]:
# student work here


###  Exercise 4 - Comparing predicted to observed class

You get the predicted class from `model.predict` given the $X$ input data.
Compare the true values of the test spectral classes (their label i.e. test_y) 
to the value predicted by the algorithm. How do they compare?

In [139]:
# the predicted class for the test spectra:
y_pred = model.predict(test_X)[:,0]
np.shape(y_pred)

(14,)

In [23]:
# student work


### Exercise 5

It is usually more useful to look at the residual plot. Generate the residual in classification. Where does the classification go wrong?

In [24]:
# student work here


### Exercise 6

What could be the cause for where misclassifications occur? 

*student answer here*

### Exercise 7

Let's see how much the initial inputs make a difference. If you change the size of the training and test sample to 50/50, what happens?
What happens if the test sample is one 1/10th of the starting sample of spectra? 


*student answer here*