### Learning the digits
You're going to build a model on the digits dataset, a sample dataset that comes pre-loaded with scikit learn. The digits dataset consist of 8x8 pixel handwritten digits from 0 to 9:   

You want to distinguish between each of the 10 possible digits given an image, so we are dealing with multi-class classification.   
The dataset has already been partitioned into X_train, y_train, X_test, and y_test, using 30% of the data as testing data. The labels are already one-hot encoded vectors, so you don't need to use Keras to_categorical() function.   

Let's build this new model!

### Instructions
Add a Dense layer of 16 neurons with relu activation and an input_shape that takes the total number of pixels of the 8x8 digit image.  
Add a Dense layer with 10 outputs and softmax activation.  
Compile your model with adam, categorical_crossentropy, and accuracy metrics.  
Make sure your model works by predicting on X_train.

In [None]:
# Instantiate a Sequential model
model = Sequential()

# Input and hidden layer with input_shape, 16 neurons, and relu 
model.add(Dense(16, input_shape=(64,), activation='relu'))

# Output layer with 10 neurons (one per digit) and softmax
model.add(Dense(10, activation='softmax'))

# Compile your model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics = ['accuracy'])

# Test if your model is well assembled by predicting before training
print(model.predict(X_train))

""" Predicting on training data inputs before training can help you quickly check that 
your model works as expected.
"""

### Is the model overfitting?
Let's train the model you just built and plot its learning curve to check out if it's overfitting! You can make use of the loaded function plot_loss() to plot training loss against validation loss, you can get both from the history callback.

If you want to inspect the plot_loss() function code, paste this in the console: show_code(plot_loss)

### Instructions
Train your model for 60 epochs, using X_test and y_test as validation data.
Use plot_loss() passing loss and val_loss as extracted from the history attribute of the h_callback object.


In [None]:
# Train your model for 60 epochs, using X_test and y_test as validation data
h_callback = model.fit(X_train, y_train, epochs=60, validation_data=(X_test, y_test), verbose=0)

# Extract from the h_callback object loss and val_loss to plot the learning curve
plot_loss(h_callback.history['loss'], h_callback.history['val_loss'])

### Do we need more data?
It's time to check whether the digits dataset model you built benefits from more training examples!

In order to keep code to a minimum, various things are already initialized and ready to use:   

The model you just built.  
X_train,y_train,X_test, and y_test.  
The initial_weights of your model, saved after using model.get_weights().  
A pre-defined list of training sizes: training_sizes.  
A pre-defined early stopping callback monitoring loss: early_stop.  
Two empty lists to store the evaluation results: train_accs and test_accs.  
Train your model on the different training sizes and evaluate the results on X_test. End by plotting the results with plot_results().  

The full code for this exercise can be found on the slides!  

### Instructions
Get a fraction of the training data determined by the size we are currently evaluating in the loop.  
Set the model weights to the initial_weights with set_weights() and train your model on the fraction of training data using early_stop as a callback.   
Evaluate and store the accuracy for the training fraction and the test set.  
Call plot_results() passing in the training and test accuracies for each training size.

In [None]:
for size in training_sizes:
  	# Get a fraction of training data (we only care about the training data)
    X_train_frac, y_train_frac = X_train[:size], y_train[:size]

    # Reset the model to the initial weights and train it on the new training data fraction
    model.set_weights(initial_weights)
    model.fit(X_train_frac, y_train_frac, epochs = 50, callbacks = [early_stop])

    # Evaluate and store both: the training data fraction and the complete test set results
    train_accs.append(model.evaluate(X_test, y_test)[1])
    test_accs.append(model.evaluate(X_test, y_test)[1])
    
# Plot train vs test accuracies
plot_results(train_accs, test_accs)

### Comparing activation functions
Comparing activation functions involves a bit of coding, but nothing you can't do! 

You will try out different activation functions on the multi-label model you built for your farm irrigation machine in chapter 2. The function get_model('relu') returns a copy of this model and applies the 'relu' activation function to its hidden layer. 

You will loop through several activation functions, generate a new model for each and train it. By storing the history callback in a dictionary you will be able to visualize which activation function performed best in the next exercise!

X_train, y_train, X_test, y_test are ready for you to use when training your models.   

### Instructions
Fill up the activation functions array with relu,leaky_relu, sigmoid, and tanh.   
Get a new model for each iteration with get_model() passing the current activation function as a parameter.  
Fit your model providing the train and validation_data, use 20 epochs and set verbose to 0

In [None]:
# Activation functions to try
activations = ['relu', 'leaky_relu', 'sigmoid', 'tanh']

# Loop over the activation functions
activation_results = {}

for act in activations:
  # Get a new model with the current activation
  model = get_model(act)
  # Fit the model and store the history results
  h_callback = model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=20, verbose=0)
  activation_results[act] = h_callback

### Comparing activation functions II
What you coded in the previous exercise has been executed to obtain the activation_results variable, this time 100 epochs were used instead of 20. This way you will have more epochs to further compare how the training evolves per activation function.   

For every h_callback of each activation function in activation_results:  
 
The h_callback.history['val_loss'] has been extracted.  
The h_callback.history['val_acc'] has been extracted.   
Both are saved into two dictionaries: val_loss_per_function and val_acc_per_function.  

Pandas is also loaded as pd for you to use. Let's plot some quick validation loss and accuracy charts!  

### Instructions
Use pd.DataFrame()to create a new DataFrame from the val_loss_per_function dictionary.   
Call plot() on the DataFrame.  
Create another pandas DataFrame from val_acc_per_function.  
Once again, plot the DataFrame.

In [None]:
# Create a dataframe from val_loss_per_function
val_loss = pd.DataFrame(val_loss_per_function)

# Call plot on the dataframe
val_loss.plot()
plt.show()

# Create a dataframe from val_acc_per_function
val_acc = pd.DataFrame(val_acc_per_function)

# Call plot on the dataframe
val_acc.plot()
plt.show()

### Preparing a model for tuning
Let's tune the hyperparameters of a binary classification model that does well classifying the breast cancer dataset.   

You've seen that the first step to turn a model into a sklearn estimator is to build a function that creates it. The definition of this function is important since hyperparameter tuning is carried out by varying the arguments your function receives.  
  
Build a simple create_model() function that receives both a learning rate and an activation function as arguments. The Adam optimizer has been imported as an object from keras.optimizers so that you can also change its learning rate parameter.  

### Instructions
Set the learning rate of the Adam optimizer object to the one passed in the arguments.
Set the hidden layers activations to the one passed in the arguments.
Pass the optimizer and the binary cross-entropy loss to the .compile() method.

In [None]:
# Creates a model given an activation and learning rate
def create_model(learning_rate, activation):
  
  	# Create an Adam optimizer with the given learning rate
  	opt = Adam(lr=learning_rate)
  	
  	# Create your binary classification model  
  	model = Sequential()
  	model.add(Dense(128, input_shape=(30,), activation=activation))
  	model.add(Dense(256, activation=activation))
  	model.add(Dense(1, activation='sigmoid'))
  	
  	# Compile your model with your optimizer, loss, and metrics
  	model.compile(optimizer=opt, loss='binary_crossentropy', metrics=['accuracy'])
  	return model

### Tuning the model parameters
It's time to try out different parameters on your model and see how well it performs!   

The create_model() function you built in the previous exercise is ready for you to use.  

Since fitting the RandomizedSearchCV object would take too long, the results you'd get are printed in the show_results() function. You could try random_search.fit(X,y) in the console yourself to check it does work after you have built everything else, but you will probably timeout the exercise (so copy your code first if you try this or you can lose your progress!).   

You don't need to use the optional epochs and batch_size parameters when building your KerasClassifier object since you are passing them as params to the random search and this works already.   

### Instructions
Import KerasClassifier from keras scikit_learn wrappers.   
Use your create_model function when instantiating your KerasClassifier.  
Set 'relu' and 'tanh' as activation, 32, 128, and 256 as batch_size, 50, 100, and 200 epochs, and learning_rate of 0.1, 0.01, and 0.001.   
Pass your converted model and the chosen params as you build your RandomizedSearchCV object.

In [None]:
# Import KerasClassifier from keras scikit learn wrappers
from keras.wrappers.scikit_learn import KerasClassifier

# Create a KerasClassifier
model = KerasClassifier(build_fn=create_model)

# Define the parameters to try out
params = {'activation': ['relu', 'tanh'], 'batch_size': [32, 128, 256], 
          'epochs': [50, 100, 200], 'learning_rate': [.1, .01, .001]}

# Create a randomize search cv object passing in the parameters to try
random_search = RandomizedSearchCV(model, param_distributions=params, cv=KFold(3))

# Running random_search.fit(X,y) would start the search,but it takes too long! 
show_results()

""" <script.py> output:
    Best: 
    0.975395 using {learning_rate: 0.001, epochs: 50, batch_size: 128, activation: relu} 
    Other: 
    0.956063 (0.013236) with: {learning_rate: 0.1, epochs: 200, batch_size: 32, activation: tanh} 
    0.970123 (0.019838) with: {learning_rate: 0.1, epochs: 50, batch_size: 256, activation: tanh} 
    0.971880 (0.006524) with: {learning_rate: 0.01, epochs: 100, batch_size: 128, activation: tanh} 
    0.724077 (0.072993) with: {learning_rate: 0.1, epochs: 50, batch_size: 32, activation: relu} 
    0.588752 (0.281793) with: {learning_rate: 0.1, epochs: 100, batch_size: 256, activation: relu} 
    0.966608 (0.004892) with: {learning_rate: 0.001, epochs: 100, batch_size: 128, activation: tanh} 
    0.952548 (0.019734) with: {learning_rate: 0.1, epochs: 50, batch_size: 256, activation: relu} 
    0.971880 (0.006524) with: {learning_rate: 0.001, epochs: 200, batch_size: 128, activation: relu}
    0.968366 (0.004239) with: {learning_rate: 0.01, epochs: 100, batch_size: 32, activation: relu}
    0.910369 (0.055824) with: {learning_rate: 0.1, epochs: 100, batch_size: 128, activation: relu}
"""

### Training with cross-validation
Time to train your model with the best parameters found: 0.001 for the learning rate, 50 epochs, a 128 batch_size and relu activations.   

The create_model() function from the previous exercise is ready for you to use. X and y are loaded as features and labels.  

Use the best values found for your model when creating your KerasClassifier object so that they are used when performing cross_validation.   

End this chapter by training an awesome tuned model on the breast cancer dataset!

### Instructions
Import KerasClassifier from keras scikit_learn wrappers.  
Create a KerasClassifier object providing the best parameters found.  
Pass your model, features and labels to cross_val_score to perform cross-validation with 3 folds.

In [None]:
# Import KerasClassifier from keras wrappers
from keras.wrappers.scikit_learn import KerasClassifier

# Create a KerasClassifier
model = KerasClassifier(build_fn=create_model(learning_rate=.001, activation='relu'), 
                        epochs=50, 
                        batch_size=128,
                        verbose = 0)

# Calculate the accuracy score for each fold
kfolds = cross_val_score(model, X, y, cv=3)

# Print the mean accuracy
print('The mean accuracy was:', kfolds.mean())

# Print the accuracy standard deviation
print('With a standard deviation of:', kfolds.std())

""" <script.py> output:
    The mean accuracy was: 0.9718834066666666
    With a standard deviation of: 0.002448915612216046
"""