<a href="https://colab.research.google.com/github/dlab-berkeley/Computational-Social-Science-Training-Program/blob/master/Deep_Learning_and_Tensorflow.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **GPU, Deep Learning, and Tensorflow**


This notebook will introduce you to the fundamentals of Tensorflow and explore techniques for deep learning with text data. Key concepts covered in this notebook include:

1. Google Colab and GPUs
2. Tensors and basic tensor operations
3. Use tensorflow/keras to adapt and tune neural nets
4. Existing resources to help analyze language data


With these basic building blocks, you will be equipped to explore and implement deep learning algorithms for your own project. 

# Google Colab

---



Objectives:

- Set up a Google Colab notebook
- Create, delete, run, and edit cells
- Cover variable, notebook and package management

## Introducing Google Colab


Google Colab is a platform for cloud-based computation and coding. It can be thought of as similar to a jupyter notebook, where individual cells can be executed with code inside. It doesn't require local installation on your computer and can be shared and edited by multiple people at the same time. However the colab notebook requires you to be connected to the internet, while in comparison jupyter notebooks can be run on your machine offline. Google Colab notebooks are in the .ipynb format, and can be saved and opened either directly or via Google Drive. 



##Basic Operations

Google Colab has several features that help organize code and long notebooks. A few key concept to know to use this notebook effectively are:

- Use the Insert tab in the upper bar, or press the +Code/+Text buttons in the top left of the window.

- ctrl/cmd +alt +n opens a scratch cell. This is a place to test code without needing to edit the main notebook.

- Text cells can be edited and formatting with the buttons at the top of the cell.

- The buttons at the top right of the cell give you options to move, modify and delete the cell. 

- You can run code with shift+enter, or by clicking the top left of the box.

- For more commands, use ctl+shift+p and select the desired command from the command palette

An example code cell is below:

In [None]:
print("Welcome to Google Colab")
x=12+78

Welcome to Google Colab


The buttons on the left panel help manage the notebook (search, table of contents, files). This is important for organizing your code and navigating long notebooks.



## Package Management

Like Anaconda, Google Colab comes with many packages already available, and you can also install local packages using pip. Use the following lines of code in order to see which packages you have and which ones you need to install. Because this runs on the cloud, I suggest checking the packages here to confirm that you have the right ones for your project. 



```
#check which packages you have available (listed alphabetically). The version numbers are also avaliable which can be useful in determining issues with coding between computers.
!pip list

#install a new package
!pip install numpy 
```




<List of packages that we will use in this tutorial>

In the following cell are the packages that you will need to complete this notebook:

In [1]:
#import packages for deep learning
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
import tensorflow as tf


from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.utils import np_utils
from tensorflow.keras.optimizers import RMSprop

## Customization

Finally, there are several settings that you can customize if you so choose. These can be found under Tools -> Settings, where you can change the font size, background, and other aesthetic settings of the notebook to suit you. 

In addition, in Tools -> Keyboard shortcuts you can view and adapt shortcuts to your preferences as well.




## Challenge

Try out the following exercises to get comfortable with the new interface:

1) Open the editor settings (Tools-> Settings->Editor) and select "Show line numbers". Now your cells will have line numbers next to them, which we can refer to when discussing code during this workshop.

2) Make a new code cell below and save the product of 60 and 72 to a new variable. Then check the value of the variable in the variable tab to the left.

3) How many sections (main headers) are there in this notebook? Navigate to the Introduction to Tensors section. Collapse that section.

4) How many times does 'variable' appear in this notebook? (hint, use the search feature on the left)

5) The cell below contains code, but is in a text format. Use the Command Palette (ctrl+shift+p) and use the appropriate command to change the cell type.

print("Turn me into a code cell")

In [3]:
#solutions
#1) follow the directions in the question
#2) 
new = 60*72
#3) 9 sections
#4) 5 times
#5) ctrl + shift + p --> "convert to a code cell"
print("Turn me into a code cell")

Turn me into a code cell


# Introduction to GPU

Objectives:
- Understand the benefits of GPUs
- Set up GPU for Google Colab
- Compare performance on tasks vs CPU

As you've found in your previous workshops, some models take a significant amount of time to run. Models may also exceed the capacity of the local computer's processing power. This will either result in code that never finished running, or an error message indicating that the code has timed out without completing. 

To counteract this issue, TPU/GPU are parallel processing units that greatly speed up models. This can make some models that are otherwse impossible to train possible (Think minutes rather than hours)

TPU is made specifically for tensorflow architecture, and speeds it up even more than GPUs.

## GPU Access
Oftentimes you need to pay for cloud services and access to GPUs, but one advantage of Colab is that it has free access to a certain amount of GPU/TPU units. This access is somewhat limited, but should be more than enough for what we are using it for today. We will discuss limitations and further options for long-term use in a later section of the workshop.


Additional resource: https://colab.research.google.com/notebooks/gpu.ipynb#scrollTo=sXnDmXR7RDr2

The notebook will automatically choose which device (read: GPU vs CPU) to run the code on, but if you want to make sure that something is being run on a certain device, you can select a specific device as in the snippet below. 


```
# This is formatted as code
with tf.device(device_name):
  #put task here
  #return output
```

For now, we will trust the notebook's/ Tensorflow's allocation of computing power.



## Challenge

1) Run the following lines of code to test how fast your computer can do a task. Report the results


In [None]:
import timeit

print(timeit.timeit('[x**2 for x in range(10)]'))

2.9254725389999976


2)  Change the settings to use GPU:  Edit --> Notebook Settings --> Hardware Accelerator --> GPU
. Run the code below to make sure GPU is enabled.

In [None]:
#run this code to check that you have the GPU enabled
import tensorflow as tf
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
  raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))

Found GPU at: /device:GPU:0


3) Re-run the same timing task and report your results. How much of a difference is there in timing?

In [None]:
import timeit
print(timeit.timeit('[x**2 for x in range(10)]'))

2.9785137800000143


4) Change the task to be more complex (= longer to run), and compare the GPU vs no GPU time.

As we run more complex tasks, the efficiency of GPUs becomes more and more of a difference. If you are curious, you can compare the timing of the tasks in this notebook with GPU/TPU/CPU and note the difference. Even though in this notebook we are working with fairly small dataset and task, these differences will be important at larger scale.

# Manpulating Tensors

Objectives:
- Understand the tensor data type
- Index, reshape, and slice tensors

In the Neural Networks section of the course, you were introduced to Multilayer Perceptrons as a basic building block of neural networks. You were also introduced to using Keras to build a neural network for digit recognition. In this section we will further explore the powerful Keras/Tensorflow framework for neural networks.

Tensors are the key data structure in Tensorflow. They are simlar to numpy arrays, but can be used with GPUs, which is necessary for large calculations. Tensors have one or more dimensions, are rectangular, and are immutable. Every entry in a tensor must have the same datatype (usually float).

A 3-dimensional tensor can be visually represented in a few different ways. It can be represented as a mxnxp dimensional block:


![3-axis_block.png](https://drive.google.com//uc?id=1ZQIeFD5zm-Nnh28bfgooXnb0AqlaWNzB)


Or the block can be flattened out to three mxn dimensional arrays:

![3-axis_numpy.png](https://drive.google.com//uc?id=1TyHhSZ66fJcFYGGkHrnmI3ZlNZ7-5RJW)


The *shape* of this tensor is 5x3x2 and the *size* is 30, since there are 30 total units in the tensor. Although they are harder to visualize, tensors can have many dimensions.


In Tensorflow, since we are handling every step of the process, tracking the dimensions, shapes, sizes of the tensors is an essential skill for working with this code. 

Images from: https://www.tensorflow.org/guide/tensor

## Tensor Operations

Just like manipulating dataframes or arrays, manipulating tensors is an important skill. There are some basic tensor operations that it is useful to be aware of for manipulating tensors. We will explore these with the same example tensor from above.

### Creating a tensor from an array

Tensors are similar to numpy array and tensorflow will automatically convert an array to a tensor when using tensorflow operations. Similarly, .numpy() can convert a tensor to an array.


More commonly, you will most likely be using methods that process data and output a tensor that you can then work with, and most conversions to tensors will be handled automatically within those methods. 

In [21]:

data=np.array([[[0,1,2,3,4],[5,6,7,8,9]],
      [[10,11,12,13,14],[15,16,17,18,19]],
      [[20,21,22,23,24],[25,26,27,28,29]]])
print('The original data type is:',type(data))
sample_tensor=tf.concat(data,2)
print('The new data type is:',type(sample_tensor))
print(sample_tensor)

The original data type is: <class 'numpy.ndarray'>
The new data type is: <class 'tensorflow.python.framework.ops.EagerTensor'>
tf.Tensor(
[[[ 0  1  2  3  4]
  [ 5  6  7  8  9]]

 [[10 11 12 13 14]
  [15 16 17 18 19]]

 [[20 21 22 23 24]
  [25 26 27 28 29]]], shape=(3, 2, 5), dtype=int64)


###Indexing
Indexing allows you to select subsections of the tensor. This is a very useful skill, but can get confusing with high dimensions. You can select a single number, or range of numbers in the tensor by specifying the position of the number in each dimension.

In [None]:
#get a single number
sample_tensor[0,0,0]
print(sample_tensor[0,0,0].numpy()) #.numpy() converts it to an array to print

#get a range of values
print(sample_tensor[0:2,1:2,:3].numpy()) 


0
[[[ 5  6  7]]

 [[15 16 17]]]


### Slicing

Another important skill is to take a slice of a tensor. This is a subset of the tensor that is of a smaller dimension than the original tensor. Compare the following slices to the image above. Which part of the tensor is being sliced? What is the shape of each of the slices?



In [22]:
#index a slice
print(sample_tensor[0,:,:].numpy())

print(sample_tensor[:,0,:].numpy())

print(sample_tensor[:,:,0].numpy())

[[0 1 2 3 4]
 [5 6 7 8 9]]
[[ 0  1  2  3  4]
 [10 11 12 13 14]
 [20 21 22 23 24]]
[[ 0  5]
 [10 15]
 [20 25]]


### Reshaping
Reshaping is another key operation for manipulating tensors. Reshaping tensors, like arrays, can include switching, increasing, and decreasing dimensions. For example, you can change the three dimensional tensor into one dimension or two dimensions.

In [None]:
#reshaping
#https://www.tensorflow.org/api_docs/python/tf/reshape

print("\nOriginal tensor:")
print(sample_tensor)
print("\nShaped tensor:")
print(tf.reshape(sample_tensor,(2,3,5))) #switch first and second dimensions

print("\nCompare the size of the two tensors:")
print(tf.size(sample_tensor),tf.size(tf.reshape(sample_tensor,(2,3,5))))


Original tensor:
tf.Tensor(
[[[ 0  1  2  3  4]
  [ 5  6  7  8  9]]

 [[10 11 12 13 14]
  [15 16 17 18 19]]

 [[20 21 22 23 24]
  [25 26 27 28 29]]], shape=(3, 2, 5), dtype=int32)

Shaped tensor:
tf.Tensor(
[[[ 0  1  2  3  4]
  [ 5  6  7  8  9]
  [10 11 12 13 14]]

 [[15 16 17 18 19]
  [20 21 22 23 24]
  [25 26 27 28 29]]], shape=(2, 3, 5), dtype=int32)

Compare the size of the two tensors:
tf.Tensor(30, shape=(), dtype=int32) tf.Tensor(30, shape=(), dtype=int32)


You can also change the 3-dimensional tensor to a 1-dimensional tensor

In [None]:
#flatten tensor
print(tf.reshape(sample_tensor,(30)))



tf.Tensor(
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29], shape=(30,), dtype=int32)


Or you can change it to two dimensions. 

In [None]:
#reduce to two dimensions
print(tf.reshape(sample_tensor,(2,15)))
print(tf.reshape(sample_tensor,(3,10)))


tf.Tensor(
[[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]
 [15 16 17 18 19 20 21 22 23 24 25 26 27 28 29]], shape=(2, 15), dtype=int32)
tf.Tensor(
[[ 0  1  2  3  4  5  6  7  8  9]
 [10 11 12 13 14 15 16 17 18 19]
 [20 21 22 23 24 25 26 27 28 29]], shape=(3, 10), dtype=int32)


However, the size of the reshaped tensor must match the size of the original tensor. For example, the following code will not run without an error:

In [24]:
#this code will run with an error

print(tf.reshape(sample_tensor,(2,10)))

InvalidArgumentError: ignored

With reshaping, it is essential to be confident that the type of reshaping is appropriate for the task. If the sizes don't match, the code will return an error. Even harder to track down is reshaping where the sizes match, but the dimensions do not align with intended reshaping. This is important because with reshaping you can get bugs that do not throw errors but result in problems in the final model. 

## Challenge

You are working with a dataset with two-dimensional data (sample x data_dim1 x data_dim2). Use sample_tensor_2 for the input and complete the following challenges below. 


In [25]:
data=[[[2,4],[6,8]],
      [[10,12],[14,16]],
      [[1,3],[5,7]],
      [[9,11],[13,15]],
      ]
sample_tensor_2=tf.stack(data)  
print(sample_tensor_2.numpy())
print(tf.shape(sample_tensor_2))

[[[ 2  4]
  [ 6  8]]

 [[10 12]
  [14 16]]

 [[ 1  3]
  [ 5  7]]

 [[ 9 11]
  [13 15]]]
tf.Tensor([4 2 2], shape=(3,), dtype=int32)


1) What is the total size of the dataset?

2) How many samples are in the dataset? How many entries are there per sample?

3) Index to select the third data sample.

4) What do you predict the following code will do? What is the shape of the output?

In [49]:
sample_tensor_2[:,:,1]

<tf.Tensor: shape=(4, 2), dtype=int32, numpy=
array([[ 4,  8],
       [12, 16],
       [ 3,  7],
       [11, 15]], dtype=int32)>

5) You want to choose the first two data samples in the tensor. The following code gives an error. What does the error mean? How would you fix the code?





In [40]:
tf.reshape(sample_tensor_2,(3,2,2))

InvalidArgumentError: ignored

6) Let's say we want to reshape the dataset to one-dimensional data for a simplified model (sample x data). What should the shape of the output to be? Does the code below achieve that goal? If not, correct it. 

In [41]:
tf.reshape(sample_tensor_2,(4,4,1))

<tf.Tensor: shape=(4, 4, 1), dtype=int32, numpy=
array([[[ 2],
        [ 4],
        [ 6],
        [ 8]],

       [[10],
        [12],
        [14],
        [16]],

       [[ 1],
        [ 3],
        [ 5],
        [ 7]],

       [[ 9],
        [11],
        [13],
        [15]]], dtype=int32)>

In [53]:
#Solutions:
#1) 
print('The size is:',tf.size(sample_tensor_2))
#2) 
print('The number of samples is',tf.shape(sample_tensor_2)[0].numpy())
print('The number of data per sample is',tf.shape(sample_tensor_2)[1].numpy()*tf.shape(sample_tensor_2)[2].numpy())
#3)
print(sample_tensor_2[2,:,:])
#4) slice each sample for the second position in dimension 2, 4x2
#5) The error is that the size of the reshape is not the same as the original 
# to do the operation in the question, index rather than reshape
sample_tensor_2[:3,:,:]

#6) The output should be 4x4. The provide code does 4x4x1, it needs to be changed to:
tf.reshape(sample_tensor_2,(4,4))

The size is: tf.Tensor(16, shape=(), dtype=int32)
The number of samples is 4
The number of data per sample is 4
tf.Tensor(
[[1 3]
 [5 7]], shape=(2, 2), dtype=int32)


<tf.Tensor: shape=(4, 4), dtype=int32, numpy=
array([[ 2,  4,  6,  8],
       [10, 12, 14, 16],
       [ 1,  3,  5,  7],
       [ 9, 11, 13, 15]], dtype=int32)>

# Revisiting Deep Learning









Objectives: 
- Code and optimize a neural network
- Adapt a network to new data

In previous sections of this course, you have covered neural networks and deep learning for classifying the MNIST dataset. In this section, we will revisit deep learning with Python with a few examples. 

We will start with the classificaton problem (student loan vs checking/savings account) from the NLP section of the course. We will use the same embeddings we trained for the final logistic regression problem in that section of the course.

First, we will load in the data and split it into training and validation data.

In [54]:
word2vec_features_df=pd.read_csv('embeddings.csv')
y=pd.read_csv('y.csv')
y_vals=y['Product_binary'].values
X_train, X_test, y_train, y_test = train_test_split(word2vec_features_df, 
                                                    y_vals, 
                                                    train_size = .80, 
                                                    test_size=0.20, 
                                                    random_state = 10)


FileNotFoundError: ignored

Next we define the model. In Keras, each layer of the model has to be individually specified. This allows significant control over the model, including different parameters for each level.

This model has two dense layers with 128 neurons in each, and two dropout layers where 20% of the connections are dropped out for each layer. The final output layer uses a sigmoid activation function to create a final binary output (0 or 1). 

In [None]:
def NN_model():
    # create model
    model = Sequential()

    # A fully connected layer with 128 neurons
    model.add(Dense(128, input_dim=301,activation='relu'))

    # A dropout layer that randomly excludes 20% of neurons in the layer 
    model.add(Dropout(0.2))

    # A fully connected layer with 128 neurons
    model.add(Dense(128, activation='relu'))

    # A dropout layer that randomly excludes 20% of neurons in the layer 
    model.add(Dropout(.2))
    
    # An output layer with binary classification
    model.add(Dense(1, activation='sigmoid'))
    
    # Compile model with crossentropy
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

Finally, we fit and evaluate the model.

In [None]:
model = NN_model()
# Fit the model
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, verbose=2)

# Evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)

print("NN Error: %.2f%%" % (100-scores[1]*100))
print(model.summary())

NameError: ignored

This is a simple neural network with a couple of densely connected layers and a couple of dropout layers. When working with neural nets, it's often a good idea to start with a simple net to make sure the basics of the code work, then gradually create more complicated architectures once the code runs smoothly.

## Challenge


1) Adapt this model to another set of data. First, let's load in the MNIST digits dataset. We are going to flatten the data for now to create a two dimensional tensor with 784 columns to fit with the neural net we are working on. Note: instead of two classes, the MNIST dataset uses 10 classes (one for each digit 0-9). 

In [55]:
#this model gives errors. fix it!
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# reshape to [samples][width][height][pixels]
X_train = X_train.reshape(X_train.shape[0], 28*28)
X_test = X_test.reshape(X_test.shape[0], 28*28)
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


Here is the same code from the NN model above. What do you need to change in order to run the same model on the new data? Note which parameters and values you need to change. How does this relate to the differences in the data? Edit the code to work with the new data shape and execute it.

Hint: use tf.shape() to see the compare the shapes of the MNIST and original dataset

In [56]:

def diff_CNN_model():
    # create model
    model = Sequential()
    
    model.add(Dense(128, input_dim=301,activation='relu')) #change input dim

    # A dropout layer that randomly excludes 20% of neurons in the layer 
    model.add(Dropout(0.2))
    
    # A fully connected layer with 128 neurons
    model.add(Dense(128, activation='relu'))
    model.add(Dropout(.2))
    
    # An output layer with softmax as in MLP
    model.add(Dense(1, activation='sigmoid'))
    
    # Compile model as before in MLP
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

In [62]:
model = diff_CNN_model()
# Fit the model
print(X_train.shape)
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=200, verbose=2)

# Evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print("NN Error: %.2f%%" % (100-scores[1]*100))

(60000, 784)
Epoch 1/10
300/300 - 1s - loss: 5.7785 - accuracy: 0.6342 - val_loss: 0.7637 - val_accuracy: 0.8248 - 1s/epoch - 4ms/step
Epoch 2/10
300/300 - 1s - loss: 0.8961 - accuracy: 0.7787 - val_loss: 0.5545 - val_accuracy: 0.8810 - 764ms/epoch - 3ms/step
Epoch 3/10
300/300 - 1s - loss: 0.6857 - accuracy: 0.8330 - val_loss: 0.4486 - val_accuracy: 0.8989 - 699ms/epoch - 2ms/step
Epoch 4/10
300/300 - 1s - loss: 0.5646 - accuracy: 0.8586 - val_loss: 0.3877 - val_accuracy: 0.9115 - 768ms/epoch - 3ms/step
Epoch 5/10
300/300 - 1s - loss: 0.4874 - accuracy: 0.8763 - val_loss: 0.3654 - val_accuracy: 0.9242 - 734ms/epoch - 2ms/step
Epoch 6/10
300/300 - 1s - loss: 0.4398 - accuracy: 0.8903 - val_loss: 0.3114 - val_accuracy: 0.9259 - 716ms/epoch - 2ms/step
Epoch 7/10
300/300 - 1s - loss: 0.3889 - accuracy: 0.9001 - val_loss: 0.2898 - val_accuracy: 0.9375 - 774ms/epoch - 3ms/step
Epoch 8/10
300/300 - 1s - loss: 0.3573 - accuracy: 0.9078 - val_loss: 0.2831 - val_accuracy: 0.9397 - 737ms/epoch -

2) Optimize NLP model

The logit model from the challenge question in NLP section used to classify the customer complaint data had an accuracy of 78.5%. What is the accuracy of the first neural network model on the same data? Try changing the model to improve accuracy. What configuration gave you the best results? (Hint: you can look at the next section if you need inspiration)

In [None]:
#your code here


In [60]:
#Solution 
#1) 
def diff_CNN_model():
    # create model
    model = Sequential()
    
    model.add(Dense(128, input_dim=784,activation='relu')) #change input dim

    # A dropout layer that randomly excludes 20% of neurons in the layer 
    model.add(Dropout(0.2))
    
    # A fully connected layer with 128 neurons
    model.add(Dense(128, activation='relu'))
    model.add(Dropout(.2))
    
    # An output layer with softmax as in MLP
    model.add(Dense(10, activation='softmax')) #change dimension to number of categories
    
    # Compile model as before in MLP
    #change to categorical crossentropy
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model


model = diff_CNN_model()
# Fit the model
print(X_train.shape)
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=200, verbose=2)

# Evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print("NN Error: %.2f%%" % (100-scores[1]*100))

#2) varies

# Optimizing Neural Nets


Objectives:
- Explore strategies to optimize a neural net
- Implement an optimizer with custom settings
- Grid search parameters

Optimizing neural nets is a key point of using these powerful models effectively, as with any ML models. However, neural nets have many parameters that can be tuned and are a challenge for traditional optmization methods such as grid search.

In the previous challenge, we experimented with improving the accuracy of the model. The following strategies can help guide the optmization process for fine-tuning algorithms.

1. Feature engineering (refer to Natural Language Processing Notebook)

2. Try a smaller network (minimize redundancy) or a larger network (capture more complex relationships)

3. Change learning rate
4. Use appropriate architecture for the data/task

5. Test parameters

6. Decrease batch size 

Depending on the task, data, and neural network used, there may be a significant amount of tuning necessary in order to achieve an optimal result. This is one reason why leveraging existing models that are already optimized can give a huge advantage for language tasks. 

Further reference this article: https://towardsdatascience.com/optimizing-neural-networks-where-to-start-5a2ed38c8345 


For this notebook we will start with changing the learning rate. 

In previous examples, we passed the optimizer to the compile funciton 
```
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

```
Which uses the default parameters for the function. Now that we are customizing the parameters, we want to use the actual optimizer function, and then pass that optimizer into the .compile() function.

```
opt=keras.optimizers.Adam()
```

Here is the documentation for that function: https://keras.io/api/optimizers/adam/

What is the default parameter for learning rate? What are some of the other parameters for the Adam optimizer?

##Challenge

1) Edit the code below to use the adam optimizer function with default parameters. 

2) Test the following learning rates: [.0001,.001,.01,.1]. Which one performs the best? Which one performs the worst?



In [63]:
#load in data to use for this test
from tensorflow.keras.optimizers import Adam

#cnn classification for neural nets
word2vec_features_df=pd.read_csv('embeddings.csv')
y=pd.read_csv('y.csv')
y_vals=y['Product_binary'].values
X_train, X_test, y_train, y_test = train_test_split(word2vec_features_df, 
                                                    y_vals, 
                                                    train_size = .80, 
                                                    test_size=0.20, 
                                                    random_state = 10)
#print(word2vec_features_df.shape)
#print(X_train.shape,X_test.shape,y_train.shape,y_test.shape)

FileNotFoundError: ignored

3) What if we wanted to systematically test a number of learning rates, for example with a for loop? 

Using the pseudocode below as a framework, create a loop to optimize learning rate. 
```
learning-rate-list
for rate in learning-rate-list:
    train model
    record accuracy
select best model
```
Hint: non-linear sampling of the feature space can be helpful in generating a list of parameters to test.

What was the best learning rate you found? What was the best error associated with it? Bonus: how long did this take with/without GPU?

In [None]:
#solution
##1
def NN_model():
    # create model
    model = Sequential()
    
    model.add(Dense(128, input_dim=301,activation='relu')) #change input dim

    # A dropout layer that randomly excludes 20% of neurons in the layer 
    model.add(Dropout(0.2))
    
    # A fully connected layer with 128 neurons
    model.add(Dense(128, activation='relu'))
    model.add(Dropout(.2))
    
    # An output layer with softmax as in MLP
    model.add(Dense(1, activation='sigmoid'))
    adam_opt=Adam(learning_rate=.1)
    # Compile model as before in MLP
    model.compile(loss='binary_crossentropy', optimizer=adam_opt, metrics=['accuracy'])
    return model

In [None]:
model = NN_model()
# Fit the model
print(X_train.shape)
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=200, verbose=2)

# Evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print("CNN Error: %.2f%%" % (100-scores[1]*100))


#2, #3 under construction

(800, 301)
Epoch 1/10
4/4 - 1s - loss: 147.0860 - accuracy: 0.5450 - val_loss: 2.8241 - val_accuracy: 0.7850 - 925ms/epoch - 231ms/step
Epoch 2/10
4/4 - 0s - loss: 1.7689 - accuracy: 0.5713 - val_loss: 0.5360 - val_accuracy: 0.7850 - 36ms/epoch - 9ms/step
Epoch 3/10
4/4 - 0s - loss: 0.6445 - accuracy: 0.7862 - val_loss: 0.5481 - val_accuracy: 0.7850 - 33ms/epoch - 8ms/step
Epoch 4/10
4/4 - 0s - loss: 0.5769 - accuracy: 0.7763 - val_loss: 0.5450 - val_accuracy: 0.7850 - 38ms/epoch - 10ms/step
Epoch 5/10
4/4 - 0s - loss: 0.5661 - accuracy: 0.7862 - val_loss: 0.5327 - val_accuracy: 0.7850 - 34ms/epoch - 9ms/step
Epoch 6/10
4/4 - 0s - loss: 0.5716 - accuracy: 0.7850 - val_loss: 0.5289 - val_accuracy: 0.7850 - 34ms/epoch - 8ms/step
Epoch 7/10
4/4 - 0s - loss: 0.5317 - accuracy: 0.7862 - val_loss: 0.5231 - val_accuracy: 0.7850 - 38ms/epoch - 9ms/step
Epoch 8/10
4/4 - 0s - loss: 0.5239 - accuracy: 0.7862 - val_loss: 0.5207 - val_accuracy: 0.7850 - 36ms/epoch - 9ms/step
Epoch 9/10
4/4 - 0s - l

# Further Neural Nets

[optional: if there are key architectures that need to be developed/described this would be a good place to do so]
For the purpose of this notebook we have been doing has been to give basic strategies and tools for dealing with neural nets. In practice, your net will involve more complex architectures and properties. For example, you may include convolutional layers, preprocessing layers, regularization layers, and recurrent layers.  



# Huggingface

Objectives:
- Explore tasks and data available in Huggingface transformers
- Choose an appropriate language task
- Implement a transformer on local data

In reality, these models  require significant data and computational power, which can exceed the resources available to the analyst. We can circumvent this problem by using pre-trained models. Like a pre-trained embedding models, pre-trained models are trained on a large dataset. While this may not perfectly align with the data or task you have, it can help create a more robust system, and the models can be fine-tuned to your data and goals.

[Huggingface](https://huggingface.co/models) is a set of pretrained models from a variety of datasets and sources with an easy-to-use interface. In this section, we will explore the use of the Huggingface library to streamline language task processing.



In [64]:
#install the transformers library
!pip install transformers

Collecting transformers
  Downloading transformers-4.15.0-py3-none-any.whl (3.4 MB)
[?25l[K     |                                | 10 kB 42.5 MB/s eta 0:00:01[K     |▏                               | 20 kB 48.1 MB/s eta 0:00:01[K     |▎                               | 30 kB 47.8 MB/s eta 0:00:01[K     |▍                               | 40 kB 49.9 MB/s eta 0:00:01[K     |▌                               | 51 kB 33.4 MB/s eta 0:00:01[K     |▋                               | 61 kB 35.8 MB/s eta 0:00:01[K     |▊                               | 71 kB 28.8 MB/s eta 0:00:01[K     |▉                               | 81 kB 30.4 MB/s eta 0:00:01[K     |▉                               | 92 kB 32.4 MB/s eta 0:00:01[K     |█                               | 102 kB 32.2 MB/s eta 0:00:01[K     |█                               | 112 kB 32.2 MB/s eta 0:00:01[K     |█▏                              | 122 kB 32.2 MB/s eta 0:00:01[K     |█▎                              | 133 kB 32.2

The simplest strategy is to use the pipeline method, where you select the task and the pre-trained model (there are multiple models available for many of the tasks)

In [None]:
from transformers import pipeline
classifier = pipeline("sentiment-analysis") 


No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)


Downloading:   0%|          | 0.00/629 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/255M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

The key to using these models, since the preprocessing is built in, is understanding the format of the data necessary for the model. This model takes the raw text as input rather than the word embeddings, so let's reload our data appropriately.

In [None]:
cfpb=pd.read_csv('CFPB 2020 Complaints.csv')
complaints=cfpb['Consumer complaint narrative']
complaints=complaints[~complaints.isna()]
classifier(complaints.values[0])
print(complaints.values[0])

FileNotFoundError: ignored

Then use the pipeline on the example data, and look at the results.

In [None]:
#ex

While this doesn't work for every task, for example the specific classification task that we were working with above, this is a valuable and powerful tool for quick, out-of-the-box models that don't take very long to initialize and tune.

## Challenge 

1) Choose one task  from the [huggingface](https://huggingface.co/docs/transformers/task_summary). Select the task, model, and build a pipeline. What format should the data be in? What preprocessing steps are included in the model, if any?Run it on the cfpb data. Interpret the results. What challenges do you run into?

2) How do different pre-trained models perform on the same task with the same data?


# Next Steps

This lab has introduced Colab as a way to use GPUs to speed up processing power and explored further applications of deep learning to natural language processing. 

In practice, using deep learning for computational social science requires building on the foundational concepts covered in this notebook to implement models with more complicated data and architecture. However, there are many strategies can help you navigate the model ecosystem, some of which we will discuss here: 

1. Documentation (and other resources like tutorials) is a goldmine of information for implementing particular algorithms and completing specific tasks. This is one reason why reading and translating code written by others is a key skill. 

2. Debugging and interpreting error messages, as well as leveraging online resources in order to resolve them, is another key concept. Resources like documentation and Stack Overflow help solve common errors and get code working faster. In addition, checking your code as you go and forming expectations of the results at each step will also help you to code smoothly.

3. Computational resources are important for running complex models. Google Colab has access to GPUs, but does have limitations for large and extended jobs. In those cases, options are: [add further resources here]

4. Further resources [add some books on deep learning and nlp]




