# Detecting Epileptic Seizures through EEG Data: Part 2

In this project, we'll be trying to figure out whether a person is experiencing a seizure from their EEG reading.

## Goals for today:
1. Introduction to signal processing
2. Create EEG spectrograms
3. Understand what a CNN is
4. Use a CNN to classify EEG spectrograms
5. Limitations of CNNs for time series data

In [None]:
#@title ##Import libraries and create helper functions!

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

from sklearn.linear_model import LogisticRegression
from sklearn.neural_network import MLPClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, ConfusionMatrixDisplay
from sklearn import model_selection
from sklearn.model_selection import train_test_split

import tensorflow as tf

!pip install hypopt
from hypopt import GridSearch

import keras
from keras.models import Sequential
from keras.layers import Activation, MaxPooling2D, Dropout, Flatten, Reshape, Dense, Conv2D, GlobalAveragePooling2D
from keras.wrappers.scikit_learn import KerasClassifier
import keras.optimizers as optimizers
from keras.callbacks import ModelCheckpoint
monitor = ModelCheckpoint('./model.hdf5', 
                          monitor='val_accuracy', 
                          verbose=0, 
                          save_best_only=True, 
                          save_weights_only=False, 
                          mode='auto', 
                          save_freq='epoch')

import gdown

## Utils function to combine 23 chunks from the same patient into one big chunk
# @author Siyi Tang
def prepare_data(eeg_df):
  file_names = eeg_df['Unnamed: 0'].tolist()

  subject_ids = []
  chunk_ids = []
  for fn in file_names:
    subject_ids.append(fn.split('.')[-1])
    chunk_ids.append(fn.split('.')[0])
  subject_ids = list(set(subject_ids))
  assert len(subject_ids) == 500

  sub2ind = {}
  for ind, sub in enumerate(subject_ids):
    sub2ind[sub] = ind

  eeg_combined = np.zeros((500, int(178*23)))
  labels_combined = np.zeros(500)
  labels_chunks = np.zeros((500, 23))
  labels_dict = {}
  for i in range(len(eeg_df)):
    fn = eeg_df.iloc[i]['Unnamed: 0']
    subject_id = fn.split('.')[-1]
    subject_ind = sub2ind[subject_id]

    chunk_id = int(fn.split('.')[0].split('X')[-1])
    start_idx = (chunk_id - 1) * 178
    end_idx = start_idx + 178
    eeg_combined[subject_ind, start_idx:end_idx] = eeg_df.iloc[i].values[1:-1]

    if subject_id not in labels_dict:
      labels_dict[subject_id] = []
    labels_dict[subject_id].append(eeg_df.iloc[i].values[-1])

  for sub_id, labels in labels_dict.items():
    sub_ind = sub2ind[sub_id]
    is_seizure = int(np.any(np.array(labels) == 1))
    labels_combined[sub_ind] = is_seizure
    labels = np.array(labels)
    labels = np.where(labels>1, 0, labels)
    labels_chunks[sub_ind,:] = labels

  return eeg_combined, labels_combined, labels_chunks


def plot_acc(history, ax = None, xlabel = 'Epoch #'):
  # i'm sorry for this function's code. i am so sorry. 
  history = history.history
  history.update({'epoch':list(range(len(history['val_accuracy'])))})
  history = pd.DataFrame.from_dict(history)

  best_epoch = history.sort_values(by = 'val_accuracy', ascending = False).iloc[0]['epoch']

  if not ax:
    f, ax = plt.subplots(1,1)
  sns.lineplot(x = 'epoch', y = 'val_accuracy', data = history, label = 'Validation', ax = ax)
  sns.lineplot(x = 'epoch', y = 'accuracy', data = history, label = 'Training', ax = ax)
  ax.axhline(0.5, linestyle = '--',color='red', label = 'Chance')
  ax.axvline(x = best_epoch, linestyle = '--', color = 'green', label = 'Best Epoch')  
  ax.legend(loc = 1)    
  ax.set_ylim([0.4, 1])

  ax.set_xlabel(xlabel)
  ax.set_ylabel('Accuracy (Fraction)')
  
  plt.show()


In [None]:
#@title Run this to get the x and y datasets we created last class

data_path = 'https://storage.googleapis.com/inspirit-ai-data-bucket-1/Data/Deep%20Dives/AI%20%2B%20Healthcare/Projects%20(Session%206%2B)/Seizure%20Prediction%20/data.csv'
uci_epilepsy = './uci_epilepsy'
gdown.download(data_path, uci_epilepsy, False)


EEG = pd.read_csv(uci_epilepsy)
eeg, labels, __ = prepare_data(EEG)
x = eeg.astype('float')
y = labels.astype('float')



In [None]:
len(eeg[0])

## Introduction to the Frequency Domain
So far, we've been looking at our data in the time domain. What that means is that we're looking our EEG as a series of values in time. (In other words, if we plot our EEG, the x-axis, or the *domain* of the plot has units of time!

We can also choose to explore our data in the *frequency domain*. In the frequency domain, the x axis of our plot is the frequency of the signal. 

To understand the frequency domain, we need to start by understanding the composition of our EEG data.






**This next part is going to get a little technical! Don't worry if it doesn't completely make sense right now - you can still complete this project without fully understanding this part.**

###Introduction to Waves

EEG data is a waveform - a signal that consists of a single amplitude that changes in time. (Sound, radio signals, and seismic waves are also waveforms!)

The simplest type of waveform is a sinusoidal wave. A sinusoidal wave is a function made of the sum of sine and cosine functions. The most basic sinusoid function is just a single sine wave. A generic sine wave is shown below:

![](https://drive.google.com/uc?id=1BB_0nwPMvDEkDL9vM-HbYUelHnjYU6vg)


The important parts of this wave are the **frequency** (how quickly the wave alternates in time) and the **amplitude** (how tall the wave is). This wave is mathematically represented by the formula:

$y(t) = A*sin(\dfrac{2\pi}{\lambda}t + \phi)$

In the above case, $\phi = 0$! We can also have a non-zero $\phi$. A non-zero value of $\phi$ will shift the wave to the left (if $\phi > 0$) or to the left (if $\phi < 0$), as shown below:

![](https://drive.google.com/uc?id=1gp1HjKRuUO-VOBfDeJeuMD4VF8X_FA1X)

(In the above picture, $\dfrac{2\pi}{\lambda}$ is simplified to $\omega$ but it's the same form of wave!) $\phi$ is known as the **phase** of the wave.

These three variables (amplitude, frequency, and phase) determine the shape of the sine wave.


If you're not familiar with waves, you might consider going over [these](https://www.mathsisfun.com/physics/waves-introduction.html) [two](https://www.mathsisfun.com/algebra/amplitude-period-frequency-phase-shift.html) quick primers from Math is Fun!

### Representing EEG waves as Sine Waves

It's cool that we can represent a wave as a simple mathematical formula, but our EEG data looks a lot messier than the waves we saw above! For reference, here's the EEG we plotted in Colab 1: 

![](https://drive.google.com/uc?id=1G3KK6LBKX1Iiz4dV2nt2LXnWSuWkv3IU)


This looks very different from our simple sine waves, but it turns out that we can represent it pretty well as a sum of sinusoids - a bunch of sinusoids with different amplitudes, different phases, and different frequencies all added on top of each other!

Here's an example of how this process can work:

![](https://drive.google.com/uc?id=1JIguickzYUJ3Y4N_8VQDlwJ0jwQIC3q8)

The red wave at the bottom is the sum of the blue and green waves.

This is still a pretty simple waveform, but it turns out if we add hundreds of different sine waves (all with different amplitudes, frequencies, and phases), then we can accurately represent most EEG signals!



### So... how does this relate to the frequency domain?

If we know that an EEG is formed as the sum of a sine waves with different frequencies, we can plot those sine waves on a frequency axis. 

This is done in the picture below: we have one signal that is formed as the sum of two sinusoids. We can represent that signal by projecting it on to the time domain (the way we've been looking at it so far!) or we can represent that signal by projecting it on to the frequency domain (the projection pictured on the right side of the picture).

![](https://drive.google.com/uc?id=1wO2Lw0JLdCZNP1dk3Lggy9Pes8tq1cLy)

This signal looks a lot simpler in the frequency domain! It can be described as just two spikes at two different frequencies. The frequency domain can be a really useful way for us to represent signals, for purposes of clarity, compression, filtering, and, in our case, classification!

We're going to look at a specific way to project a signal into the frequency domain, called a spectrogram.

### Introduction to Spectrograms


There's only one problem with representing our EEG signals in the frequency domain, and that's that the frequencies in an EEG signal can change over time!

TODO

https://drive.google.com/uc?id=<ID of image>

## Create EEG Spectrograms for our dataset
Similarly to how we visualized one EEG as a time series in the first notebook, we will now visualize that same EEG as a spectrogram.

Then we will convert all of our EEGs to spectrograms.

In order to convert an EEG to a spectrogram, we will need to understand what the *sampling rate ($f_s$)* of the data is. The sampling rate of the data is the number of samples taken per second. 

Review the [UCI Epileptic Seizure Recognition Data Set](https://archive.ics.uci.edu/ml/datasets/Epileptic+Seizure+Recognition#) **Attribute Information** section to figure out how many seconds of EEG recordings were taken per patient. Remember: while the UCI data talks about 'chunks' of EEG, we have recombined those chunks into one continuous data stream per person for the purposes of this notebook! So we're looking for the number of seconds of recording per patient, not per chunk.

In [None]:
#@title What is the length of each EEG in seconds?
duration =  23.5#@param {type:"number"}

if duration == 23.5:
  print("Correct!")
else:
  print("Hmm, that's not correct. Try again!")

In [None]:
#@title How many data points (samples) are in one EEG?


In [None]:
num_samples = ### YOUR CODE HERE ###

In [None]:
#@title Solution
num_samples = len(eeg[0])
print("Number of samples is: "+str(num_samples))

In [None]:
# Calculate the sampling rate from the duration and number of samples in each EEG!
fs = num_samples/duration

# set spectrogram parameters
nfft = 32
overlap = 16

# Extract first EEG sample
eeg1 = x[0,:]

# Plot spectrogram
Sxx, f, t, imageAxis = plt.specgram(eeg1, Fs=fs, NFFT=nfft, noverlap=overlap)
plt.xlabel('Time')
plt.ylabel('Frequency')
plt.show()   



### Next, let's convert all of our EEGs to spectrograms
Note: this step may take a while!

TODO: GPU Instructions?


In [None]:
# Convert all the EEGs to spectrograms
x_spec = []
for eeg in x:
  Sxx, f, t, imageAxis = plt.specgram(eeg, Fs=fs, NFFT=nfft, noverlap=overlap)
  x_spec.append(Sxx)

x_spec = np.array(x_spec)
print(x_spec.shape)

## Let's use convolutional neural nets on our spectrograms!

We've turned our EEGs into spectrograms, which are fancy pictures. ...what type of classifier works well on pictures?

CNNs!

It may be worth it to review the notebook and slide deck for CNNs before you go forward. 

Next, remind yourself of our machine learning pipeline below:

In [None]:
#@title Machine Learning Pipeline
 
one = 'Split the data into training and testing data sets' #@param ["Collect input and output data", "Fit the model to the training data", "Create the model", "Split the data into training and testing data sets", "Test the model on the testing data"]
two = 'Create the model' #@param ["Collect input and output data", "Fit the model to the training data", "Create the model", "Split the data into training and testing data sets", "Test the model on the testing data"]
three = 'Fit the model to the training data' #@param ["Collect input and output data", "Fit the model to the training data", "Create the model", "Split the data into training and testing data sets", "Test the model on the testing data"]
four = 'Test the model on the testing data' #@param ["Collect input and output data", "Fit the model to the training data", "Create the model", "Split the data into training and testing data sets", "Test the model on the testing data"]
five = 'Collect input and output data' #@param ["Collect input and output data", "Fit the model to the training data", "Create the model", "Split the data into training and testing data sets", "Test the model on the testing data"]

First, we'll do a little reshaping of our data to make it fit our CNN. 

In [None]:
x_spec_reshaped = np.reshape(x_spec,(x_spec.shape[0],x_spec.shape[1],x_spec.shape[2],1))

How does our data change after reshaping? Try comparing the shape of our data before and after our transformation.

In [None]:
# Your Response Here

In [None]:
#@title Instructor Solution { display-mode: "form" }
print(x_spec.shape)
print(x_spec_reshaped.shape)

We had to make this modification, since by default, Tensorflow models expect image data to be shaped in the format `[NUM_SAMPLES, IMAGE_WIDTH, IMAGE_HEIGHT, NUM_CHANNELS]`. What does `NUM_CHANNELS` signify? Well, an RGB color image would have three channels for every pixel. Each channel would either convey the red, blue, or green information of that pixel. However, in our case, since our image is grayscale, how many channels would our image have? What would that channel signify?

In [None]:
# Your Response Here

In [None]:
#@title Instructor Solution { display-mode: "form" }
# The 1 channel represents the pixel's grayscale value between 0 and 255.

Next, we need to split our x_spec and y data into training and testing data sets.

In [None]:
x_spec_reshaped = x_spec_reshaped.astype('float')
x_train, x_test, y_train, y_test = train_test_split(x_spec_reshaped, y, test_size=0.2, random_state=1)

Let's start off with creating our CNN model! This is the basic skeleton for a Keras CNN model.

In [None]:
cnn = Sequential()

# Your Layers Here

opt = keras.optimizers.SGD(lr=1e-6, momentum=0.95)
cnn.compile(loss='binary_crossentropy',
              optimizer=opt,
              metrics=['accuracy']) 

cnn.add(Dense(1))
cnn.add(Activation('sigmoid'))

In [None]:
#@title Instructor Solution { display-mode: "form" }
cnn = Sequential()

cnn.add(Conv2D(64, (3, 3)))
cnn.add(Activation('relu'))
cnn.add(MaxPooling2D(pool_size=(2, 2)))

cnn.add(Conv2D(64, (3, 3)))
cnn.add(Activation('relu'))
cnn.add(MaxPooling2D(pool_size=(2, 2)))
cnn.add(Dropout(0.25))

cnn.add(Flatten())

cnn.add(Dense(128))
cnn.add(Activation('relu'))
cnn.add(Dense(64))
cnn.add(Activation('relu'))
cnn.add(Dropout(0.25))
cnn.add(Dense(1))
cnn.add(Activation('sigmoid'))

opt = keras.optimizers.SGD(lr=1e-6, momentum=0.95)
cnn.compile(loss='binary_crossentropy',
              optimizer=opt,
              metrics=['accuracy']) 

Taking a look at the code above, what do the following lines do:
1.   `cnn = Sequential()`
2.   `cnn.compile()`
3.   `cnn.add()`



In [None]:
# Your Response Here

Recall that CNNs require convolutional layers and max pooling layers to "Extract features" from images. On a high level how do both of those layers work?

In [None]:
# Your Response Here

In [None]:
#@title Instructor Solution { display-mode: "form" }
# Refer to the CNN review at the bottom of the notebook

How would we add them to our model with code?

In [None]:
# Your Response Here

In [None]:
#@title Instructor Solution { display-mode: "form" }
# Refer to the CNN review at the bottom of the notebook

Now, go back up to our CNNs and add sets of (convolutional + max pooling) layers!

Also recall that we used "image flattening" and "fully connected layers" to turn our images into probabalistic outputs for classification. Once again, on a high level how do both of those layers work?

In [None]:
# Your Response Here

In [None]:
#@title Instructor Solution { display-mode: "form" }
# Refer to the CNN review at the bottom of the notebook

How would we add them to our model with code?

In [None]:
# Your Response Here

In [None]:
#@title Instructor Solution { display-mode: "form" }
# Refer to the CNN review at the bottom of the notebook

Once again, go up to our skeleton model's definition and add our flattening and fully connected "Dense" layers. Below, there's an example of what your code could look like.


Once that you've finalized your model's architecture, let's train it and view its accuracy!

In [None]:
cnn.fit(x_train, y_train, epochs = 30, validation_data = (x_test, y_test), shuffle = True, callbacks = [monitor])
plot_acc(cnn.history)   

Now, go back and revise your model's architecture and edit your hyperparameters. Could you use more training epochs? Do you need more convolutional layers? Once you've made your edits, let's train our model again.

In [None]:
cnn.fit(x_train, y_train, epochs = 30, validation_data = (x_test, y_test), shuffle = True, callbacks = [monitor])
plot_acc(cnn.history)   

Great job! What are some other ways we could increase our model's performance?

In [None]:
# Your Response Here

What if we could use an automated method to test out a range of possible hyperparameters and model architectures? For that, we can use a grid search!

A grid search iterates through all the parameters you specify, and creates/tests a new model for all the possible parameter combinations.

In [None]:
param_grid = {
              'epochs' :              [20, 30, 40],
              'layers' :              [1, 2, 4],
              'dropout' :             [0.2, 0.3, 0.5],
              'activation' :          ['relu', 'elu']
             }

How many models would be created and tested with the following parameter grid?

In [None]:
# Your Response Here

In [None]:
#@title Instructor Solution { display-mode: "form" }
# There would be 54 models created in total

Let's use the following class as our Grid Search CNN. Feel free to edit the model's arcitecture. Be sure to note however, that the model's architecture and hyperparameters are set through the class's initialization function.

In [None]:
#@title Run this to Define our Grid Search CNN Class { display-mode: "form" }
class gridSearchCNN():
    
    keras_model = None
    model = Sequential()
    #epochs=10
    epochs=30
    batch_size=10
    layers=2
    dropout=0.5
    activation='relu'
    
    def __init__(self, **params):
      pass
  
    def fit(self, X, y, sample_weight = None):
        print("Fitting")
        self.keras_model.fit(X,y)
        print("Fitted \n")
        return self.keras_model
    def predict(self, X):
        return self.keras_model.predict(X)
    def predict_proba(self, X):
        return self.keras_model.predict_proba(X)
    def score(self, X, y, sample_weight = None):
        print("Scoring")
        score = self.keras_model.score(X,y)
        print("Scored \n")
        return score
        #y_pred = self.keras_model.predict(X)
        #roc_auc_score_val = roc_auc_score(y, y_pred)
        #return roc_auc_score_val
                
    def createKerasCNN(self,):
      
      def create_model():
        self.model = Sequential() 
        self.model.add(Reshape((x_spec.shape[1],x_spec.shape[2], 1)))
        
        for i in range(self.layers):
          self.model.add(Conv2D(64, (3, 3), padding='same'))
          self.model.add(Activation(self.activation))
        
        self.model.add(Conv2D(64, (3, 3)))
        self.model.add(Activation(self.activation))
        self.model.add(MaxPooling2D(pool_size=(2, 2)))
        self.model.add(Dropout(self.dropout / 2.0))

        self.model.add(Flatten())
        self.model.add(Dense(128))
        self.model.add(Activation(self.activation))
        self.model.add(Dense(64))
        self.model.add(Activation(self.activation))
        self.model.add(Dropout(self.dropout))
        self.model.add(Dense(1))
        self.model.add(Activation('sigmoid'))


        opt = keras.optimizers.SGD(lr=1e-6, momentum=0.95)
        self.model.compile(loss='binary_crossentropy',
                      optimizer=opt,
                      metrics=['accuracy'])
        
        return self.model

      return KerasClassifier(build_fn=create_model, epochs=self.epochs, 
                            batch_size=self.batch_size, verbose=2)

    def get_params(self, deep = True):
        return {
            'epochs': self.epochs,
            'batch_size': self.batch_size,
            'layers': self.layers,
            'dropout': self.dropout,
            'activation': self.activation
            }

    def set_params(self, **params):
      if 'epochs' in params.keys():
        self.epochs = params['epochs']
      if 'batch_size' in params.keys():
        self.batch_size = params['batch_size']
      if 'layers' in params.keys():
        self.layers = params['layers']
      if 'dropout' in params.keys():
        self.dropout = params['dropout']
      if 'activation' in params.keys():
        self.activation = params['activation']
      
      self.keras_model = self.createKerasCNN()
      return self

Define a parameter grid that has fewer possible hyperparameters. This is so that we can decrease our execution time.

In [None]:
param_grid = {
              # Your Code Here 
              }

In [None]:
#@title Instructor Solution { display-mode: "form" }
param_grid = {
              'epochs' :              [1, 2],
              'dropout' :             [0.2, 0.3],
             }

However, since we are manually editing the parameters for the model, we need to validate these "parameter tuned models" on a seperate dataset.

Why would that be?

In [None]:
# Your Response Here

This additional slice of our dataset is called the "validation set." Now, let's perform the **test, train, and validation** splits!

In [None]:
# Your Code Here

In [None]:
#@title Instructor Solution { display-mode: "form" }
x_train, x_test, y_train, y_test = train_test_split(x_spec, y, test_size=0.2)
x_train, x_val, y_train, y_val = train_test_split(x_train, y_train, test_size=0.25)

Now, let's implement our grid search! We'll be the `GridSearch()` function from the `hypopt` library for this.

In [None]:
gs = GridSearch(model=gridSearchCNN(),param_grid=param_grid,parallelize=False)

Now that we've created our GridSearch model, we can go ahead and use the `.fit()` and `.score()` functions to train our model and evaluate its performance! Also, make sure you're using the correct dataset slices for each step.

In [None]:
# Your Code Here

In [None]:
#@title Instructor Solution { display-mode: "form" }
gs.fit(x_train, y_train, x_val, y_val,verbose=1)

Now let's evaluate our best model's accuracy!

In [None]:
# Your Code Here

In [None]:
#@title Instructor Solution { display-mode: "form" }
gs.score(x_test,y_test)

Great job! We've now created multiple grid search optimized CNNs! Go back and edit your Grid Search parameters to include more variables. Do more layers result in better performance? What dropout value works best? Do certain activation functions work better than others?

In [None]:
# Your Code Here

## Limitations of CNNs for spectrogram classification

This method seems pretty cool! Any drawbacks? Take a look at [this article](https://towardsdatascience.com/whats-wrong-with-spectrograms-and-cnns-for-audio-processing-311377d7ccd). Do the arguments that the author makes for about why CNNs could not be the best way to analyze audio signals also apply to EEG signals? Why or why not? Discuss with your classmates.



# You've finished this notebook 😊 Congrats!


# Extra CNN Review

Let's take a look at an example of a CNN's architecture.

![alt text](https://upload.wikimedia.org/wikipedia/commons/6/63/Typical_cnn.png)

The first segment of the CNN applies transformations to the image itself. As the images passes through these layers, "features" are identified which can be used to distinguish between the classes. 
The most important layers in this part of the CNN are convolutional and max pooling layers. 

Convolutional layers look through the image in a sliding window, and extract the features. An example of a line in Python adding a convolutional layer is `cnn.add(Conv2D(64, (3, 3), padding='same'))`. You can specify the model's activation function by adding `cnn.add(Activation(activation)) `after the convolutional layer's definition.

Max pooling layers decrease the resolution of the image. This is to reduce the complexity of the image (less pixels) and prevent the model from overfitting. An example of a line oin Python adding a max pooling layer is `cnn.add(MaxPooling2D(pool_size=(2, 2)))`.

<!---Dropout layers are layers in which some neurons are randomly removed from a layer during training. These can be used to prevent overfitting during training by temporarily reducing the model's complexity. The code to add a dropout layer is `cnn.add(Dropout(dropout))`.--->


Now, let's translate the above CNN architecture into code.

We start with `cnn = Sequential()` which defines that our CNN will be linearly structured. This means that every layer in the model uses the ouput of the previous layer as its input.

`Reshape` ensures that the input `x_spec` shape is consistent with the shape of the convolutional filters.

In [None]:
## Define number of layers and loss function
np.expand_dims(x_spec, 2)
num_hidden_layers = 2 # reduced this (Raghav)
loss_fxn = 'binary_crossentropy'

## Our CNN will be linearly structured
cnn = Sequential()

Next, we sequentially add layers to the CNN such as convolutional layers and max pooling layers. 

In [None]:
cnn.add(Reshape((x_spec.shape[1],x_spec.shape[2], 1))) # added this (Raghav)
cnn.add(Conv2D(32, (3, 3), padding = 'same'))
cnn.add(Activation('relu'))
cnn.add(MaxPooling2D(pool_size=(2, 2)))

for i in range(num_hidden_layers-1):
    cnn.add(Conv2D(32, (3, 3), padding = 'same'))
    cnn.add(Activation('relu'))
    cnn.add(MaxPooling2D(pool_size=(2, 2)))

Now, let's move onto the second major component of CNNs, the fully connected layers. After the image has been transformed in all of the previous CNN layers, we need to actually classify the image. We need to somehow convert the image into a probability of the image pertaining to a class.

To do this, we first **flatten** the image. With image flattening, we can represent a 2D matrix (image) as a 1D array (feature vector). 

![alt text](https://sds-platform-private.s3-us-east-2.amazonaws.com/uploads/73_blog_image_1.png)

We need to perform this transformation so that our neural network can take in our image data. You can add aflattening layer using `cnn.add(Flatten())` We can then construct a neural network and add layers using `model.add(Dense(128))` and specifying the activation function with `cnn.add(Activation(activation))`. You can also add dropout layers here as well.


In [None]:
## Flatten the 2D matrix into 1D array
cnn.add(Flatten()) 

## Add dense (fully connected) layers
cnn.add(Dense(units = 128, activation = 'relu'))
#cnn.add(Dropout(dropout))
cnn.add(Dense(units = 64, activation = 'relu'))

## The last dense layer produces the output probability, and thus need a sigmoid activation
cnn.add(Dense(units = 1, activation = 'sigmoid'))

Once we are done definining the CNN architecture, we also need to determine the performance metric we want to use the evaluate our model and the optimization function to optimize the model parameters. We can then compile the model using `cnn.compile()`.

Finally, we can train the model with `cnn.fit()`, and visualize its performance over time with `plot_acc()`.

In [None]:
## Train the model using RMSprop optimizer and accuracy as the performance metric
cnn.compile(loss=loss_fxn,
            optimizer=keras.optimizers.SGD(lr=1e-6, momentum=0.95),
            metrics=['accuracy']) 

cnn.fit(x_train, y_train, epochs = 100, validation_data = (x_test, y_test), shuffle = True, callbacks = [monitor])

## Plot accuracy over time
plot_acc(cnn.history) 