<a href="https://colab.research.google.com/github/dlab-berkeley/Computational-Social-Science-Training-Program/blob/master/Deep_Learning_and_Tensorflow_solutions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **GPU, Deep Learning, and Tensorflow**


This notebook will introduce you to the fundamentals of Tensorflow and explore techniques for deep learning with text data. Key concepts covered in this notebook include:

1. Google Colab and GPUs
2. Tensors and basic tensor operations
3. Use tensorflow/keras to adapt and tune neural nets
4. Existing resources to help analyze language data


With these basic building blocks, you will be equipped to explore and implement deep learning algorithms for your own project. 

# Google Colab

---



Objectives:

- Set up a Google Colab notebook
- Create, delete, run, and edit cells
- Cover variable, notebook and package management

15 minutes

## Introducing Google Colab


Google Colab is a platform for cloud-based computation and coding. It is similar to a Jupyter Notebook, where individual cells of code are executed sequentially. It doesn't require local installation on your computer and can be shared and edited by multiple people at the same time. However the Colab notebook requires you to be connected to the internet, while jupyter notebooks can be run on your local machine. Google Colab notebooks are in the .ipynb format, and can be saved and opened either directly or via Google Drive. 



##Basic Operations

Google Colab has several features that help organize code and long notebooks. A few key concept to know to use this notebook effectively are:

- Use the Insert tab in the upper bar, or press the +Code/+Text buttons in the top left of the window.

- Text cells can be edited and formatting with the buttons at the top of the cell.

- The buttons at the top right of the cell give you options to move, modify and delete the cell. 

- You can run code with shift+enter, or by clicking the top left of the box.

- For more commands, use ctl+shift+p and select the desired command from the command palette

An example code cell is below. Try executing and editing the cell.

In [55]:
print("Welcome to Google Colab")
x=12+78

Welcome to Google Colab


The buttons on the left panel help manage the notebook (search, table of contents, files). This is important for organizing your code and navigating long notebooks.



## Package Management

Like Anaconda, Google Colab comes with many packages already available, and you can also install local packages using pip. Use the following lines of code in order to see which packages you have and which ones you need to install.



```
#check which packages you have available (listed alphabetically). The version numbers are also avaliable which can be useful in determining issues with coding between computers.
!pip list

#install a new package
!pip install numpy 
```




<List of packages that we will use in this tutorial>

In the following cell are the packages that you will need to complete this notebook. Run this line of code to make sure that everything is installed properly.

In [1]:
#import packages for deep learning
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
import tensorflow as tf


from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.utils import np_utils
from tensorflow.keras.optimizers import RMSprop

## Customization

Finally, there are several settings that you can customize if you so choose. These can be found under Tools -> Settings, where you can change the font size, background, and other aesthetic settings of the notebook to suit you. 

In addition, in Tools -> Keyboard shortcuts you can view and adapt shortcuts to your preferences as well.




## Challenge

Try out the following exercises to get comfortable with the new interface:

1) Open the editor settings (Tools-> Settings->Editor) and select "Show line numbers". Now your cells will have line numbers next to them, which we can refer to when discussing code during this workshop.

2) Make a new code cell below and save the product of 60 and 72 to a new variable. Then check the value of the variable in the variable tab to the left.


In [2]:
#solutions
#1) follow the directions in the question. You should see the line numbers in 
# each cell on the left side
#2) 
new = 60*72 #check the value of this variable using the variable panel on the left

# Introduction to GPU

Objectives:
- Understand the benefits of GPUs
- Set up GPU for Google Colab
- Compare performance on tasks vs CPU


10 minutes

As you've found in your previous models, some models take a significant amount of time to run. Models may also exceed the capacity of the local computer's processing power. This will either result in code that never finished running, or an error message indicating that the code has timed out without completing. 

To counteract this issue, TPU/GPU are parallel processing units that greatly speed up models. This can make some models that are otherwse impossible to train possible (Think minutes rather than hours)

TPU is made specifically for tensorflow architecture, and speeds it up even more than GPUs.

## GPU Access
Oftentimes you need to pay for cloud services and access to GPUs, but one advantage of Colab is that it has free access to a certain amount of GPU/TPU units. This access is somewhat limited, but should be more than enough for what we are using it for today. We will discuss limitations and further options for long-term use in a later section of the workshop.


Additional resource: https://colab.research.google.com/notebooks/gpu.ipynb#scrollTo=sXnDmXR7RDr2

The notebook will automatically choose which device (read: GPU vs CPU) to run the code on, but if you want to make sure that something is being run on a certain device, you can select a specific device as in the snippet below. 


```
# This is formatted as code
with tf.device(device_name):
  #put task here
  #return output
```

For now, we will trust the notebook's/ Tensorflow's allocation of computing power.



## Challenge

1) Run the following lines of code to test how fast your computer can do a task. Report the results


In [None]:
import timeit

print(timeit.timeit('[x**2 for x in range(10)]'))

6.2927398080000785


2)  Change the settings to use GPU:  Edit --> Notebook Settings --> Hardware Accelerator --> GPU
. Run the code below to make sure GPU is enabled.

In [None]:
#run this code to check that you have the GPU enabled
import tensorflow as tf
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
  raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))

Found GPU at: /device:GPU:0


3) Re-run the same timing task and report your results. How much of a difference is there in timing?

In [None]:
import timeit
print(timeit.timeit('[x**2 for x in range(10)]'))

4.587992389000192


As we run more complex tasks, the efficiency of GPUs becomes more and more of a difference. If you are curious, you can compare the timing of the tasks in this notebook with GPU/TPU/CPU and note the difference. Even though in this notebook we are working with fairly small dataset and task, these differences will be important at larger scale.

# Manpulating Tensors

Objectives:
- Understand the tensor data type
- Index, reshape, and slice tensors

20 minutes

In the Neural Networks section of the course, you were introduced to Multilayer Perceptrons (MLP) as a basic building block of neural networks. You were also introduced to using Keras to build a neural network for digit recognition. Neural networks are powerful deep learning tools that can learn complex relationships in data. In this section we will further explore the powerful [Tensorflow](https://www.tensorflow.org/) framework for neural networks.

First, we will cover [tensors](https://www.tensorflow.org/guide/tensor), which are an essential concept for interacting with deep learning models. Tensors are the key data structure in Tensorflow and are simlar to numpy arrays but are compatible with GPUs. Tensors have one or more dimensions, are rectangular, and are immutable. Every entry in a tensor must have the same datatype (usually float).

A 3-dimensional tensor can be visually represented in a few different ways. It can be represented as a mxnxp dimensional block:


![3-axis_block.png](https://drive.google.com//uc?id=1ZQIeFD5zm-Nnh28bfgooXnb0AqlaWNzB)


Or the block can be flattened out to three mxn dimensional arrays:

![3-axis_numpy.png](https://drive.google.com//uc?id=1TyHhSZ66fJcFYGGkHrnmI3ZlNZ7-5RJW)


The *shape* of this tensor is 5x3x2 and the *size* is 30, since there are 30 total units in the tensor. Although they are harder to visualize, tensors can have many dimensions.


In Tensorflow, tracking the dimensions, shapes, sizes of the tensors is an essential skill for working with this code. This is key in allowing us to take datasets and adapt them to existing models. We will work through an exercise demonstrating these ideas in the next sections.

Images from:  https://www.tensorflow.org/guide/tensor

## Tensor Operations

Just like manipulating dataframes or arrays, manipulating tensors is an important skill. There are some basic tensor operations that it is useful to be aware of for manipulating tensors. We will focus on the key operations for deep learning: indexing and reshaping. 

### Creating a tensor from an array

Tensors are similar to numpy array and tensorflow will automatically convert an array to a tensor when using tensorflow operations. Similarly, .numpy() can convert a tensor to an array.


More commonly, you will most likely be using methods that process data and output a tensor that you can then work with, and most conversions to tensors will be handled automatically within those methods. 

In [12]:
#make a sample tensor
data=np.array([[[0,1,2,3,4],[5,6,7,8,9]],
      [[10,11,12,13,14],[15,16,17,18,19]],
      [[20,21,22,23,24],[25,26,27,28,29]]])
print('The original data type is:',type(data))
sample_tensor=tf.concat(data,2)
print('The new data type is:',type(sample_tensor))
print(sample_tensor)

#to convert from tensor to array: sample_tensor.numpy()

The original data type is: <class 'numpy.ndarray'>
The new data type is: <class 'tensorflow.python.framework.ops.EagerTensor'>
tf.Tensor(
[[[ 0  1  2  3  4]
  [ 5  6  7  8  9]]

 [[10 11 12 13 14]
  [15 16 17 18 19]]

 [[20 21 22 23 24]
  [25 26 27 28 29]]], shape=(3, 2, 5), dtype=int64)


What is the shape of the tensor? What is the total size?

###Indexing

[Indexing](https://www.tensorflow.org/guide/tensor) allows you to select subsections of the tensor, similar to subsetting an array or DataFrame. You can select a single number, or range of numbers in the tensor by specifying the position of the number in each dimension. It's useful to build an intuition for indexing, since it is a common method in Tensorflow.

In [13]:
#get a single number
sample_tensor[0,0,0]
print(sample_tensor[0,0,0].numpy()) #.numpy() converts it to an array to print


0


In [22]:
#get a range of values
print(sample_tensor[0:2,1:2,3].numpy()) 


[[ 8]
 [18]]


In [21]:
#take all items in a dimension:
print(sample_tensor[0,:,:].numpy())

[[0 1 2 3 4]
 [5 6 7 8 9]]


Refer to the output of the following code. What is the output?



```
# This is formatted as code
sample_tensor[:,0,:].numpy()
```



In [16]:
print(sample_tensor)


tf.Tensor(
[[[ 0  1  2  3  4]
  [ 5  6  7  8  9]]

 [[10 11 12 13 14]
  [15 16 17 18 19]]

 [[20 21 22 23 24]
  [25 26 27 28 29]]], shape=(3, 2, 5), dtype=int64)


Getting used to translating between the printed tensor dimensions and the output is an essential skill for selecting relevant information from a tensor. 

### Reshaping
[Reshaping](https://www.tensorflow.org/api_docs/python/tf/reshape) is another key operation for manipulating tensors. Reshaping tensors, like arrays, can include switching, increasing, and decreasing dimensions. For example, you can change the three dimensional tensor into one dimension or two dimensions. The best way to understand the arguments for reshaping is to practice and look at examples. A series of sample reshapings are given below.

In [23]:
#reshaping

print("\nOriginal tensor:")
print(sample_tensor)
print("\nShaped tensor:")
print(tf.reshape(sample_tensor,(2,3,5))) #switch first and second dimensions

print("\nCompare the size of the two tensors:")
print(tf.size(sample_tensor),tf.size(tf.reshape(sample_tensor,(2,3,5))))


Original tensor:
tf.Tensor(
[[[ 0  1  2  3  4]
  [ 5  6  7  8  9]]

 [[10 11 12 13 14]
  [15 16 17 18 19]]

 [[20 21 22 23 24]
  [25 26 27 28 29]]], shape=(3, 2, 5), dtype=int64)

Shaped tensor:
tf.Tensor(
[[[ 0  1  2  3  4]
  [ 5  6  7  8  9]
  [10 11 12 13 14]]

 [[15 16 17 18 19]
  [20 21 22 23 24]
  [25 26 27 28 29]]], shape=(2, 3, 5), dtype=int64)

Compare the size of the two tensors:
tf.Tensor(30, shape=(), dtype=int32) tf.Tensor(30, shape=(), dtype=int32)


You can also change the 3-dimensional tensor to a 1-dimensional tensor

In [24]:
#flatten tensor
print(tf.reshape(sample_tensor,(30)))



tf.Tensor(
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29], shape=(30,), dtype=int64)


Or you can change it to two dimensions. 

In [25]:
#reduce to two dimensions
print(tf.reshape(sample_tensor,(2,15)))
print(tf.reshape(sample_tensor,(3,10)))


tf.Tensor(
[[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]
 [15 16 17 18 19 20 21 22 23 24 25 26 27 28 29]], shape=(2, 15), dtype=int64)
tf.Tensor(
[[ 0  1  2  3  4  5  6  7  8  9]
 [10 11 12 13 14 15 16 17 18 19]
 [20 21 22 23 24 25 26 27 28 29]], shape=(3, 10), dtype=int64)


However, the size of the reshaped tensor must match the size of the original tensor. For example, the following code will not run without an error:

In [26]:
#this code will run with an error

print(tf.reshape(sample_tensor,(2,10)))

InvalidArgumentError: ignored

With reshaping, it is essential to be confident that the type of reshaping is appropriate for the task. If the sizes don't match, the code will return an error. Even harder to track down is reshaping where the sizes match, but the dimensions do not align with intended reshaping. This is important because with reshaping you can get bugs that do not throw errors but result in problems in the final model. 

Let's do some practice with indexing and reshaping.

In [28]:
data=[[[2,4],[6,8]],
      [[10,12],[14,16]],
      [[1,3],[5,7]],
      [[9,11],[13,15]],
      ]
sample_tensor_2=tf.stack(data)  
print(sample_tensor_2.numpy())
print(tf.shape(sample_tensor_2))


[[[ 2  4]
  [ 6  8]]

 [[10 12]
  [14 16]]

 [[ 1  3]
  [ 5  7]]

 [[ 9 11]
  [13 15]]]
tf.Tensor([4 2 2], shape=(3,), dtype=int32)


## Challenge

Use sample_tensor_2 for the input and complete the following challenges below. 


1) What is the total size of the dataset?

2) How many samples are in the dataset? How many entries are there per sample?

3) What do you predict the following code will do? What is the shape of the output?

In [None]:
sample_tensor_2[:,:,1]

<tf.Tensor: shape=(4, 2), dtype=int32, numpy=
array([[ 4,  8],
       [12, 16],
       [ 3,  7],
       [11, 15]], dtype=int32)>

In [None]:
#Solutions:
#1) 
print('The size is:',tf.size(sample_tensor_2))
#2) 
print('The number of samples is',tf.shape(sample_tensor_2)[0].numpy())
print('The number of data per sample is',tf.shape(sample_tensor_2)[1].numpy()*tf.shape(sample_tensor_2)[2].numpy())

#3) slice each sample for the second position in dimension 2, 4x2



The size is: tf.Tensor(16, shape=(), dtype=int32)
The number of samples is 4
The number of data per sample is 4
tf.Tensor(
[[1 3]
 [5 7]], shape=(2, 2), dtype=int32)


<tf.Tensor: shape=(4, 4), dtype=int32, numpy=
array([[ 2,  4,  6,  8],
       [10, 12, 14, 16],
       [ 1,  3,  5,  7],
       [ 9, 11, 13, 15]], dtype=int32)>

## Reshaping a dataset


Objectves:
  - Adapt a model to an existing architecture

10 minutes

An important skill is being able to reshape a dataset into a shape approprate for a given model. For example, tensor from the challenges above was three dimensions, with two dimensions of features per sample.

In [33]:
print(sample_tensor_2)

tf.Tensor(
[[[ 2  4]
  [ 6  8]]

 [[10 12]
  [14 16]]

 [[ 1  3]
  [ 5  7]]

 [[ 9 11]
  [13 15]]], shape=(4, 2, 2), dtype=int32)


However, many common neural networks would expect 1-dimensional data as an input, so we can use reshaping to get 1-dimensional data. What shape would we expect the input tensor to be to fit the model? Hint: it still needs to be the same size as the original tensor. 

Once we have an expectation of what to do, then we can translate it into code. Which of the following options do you thnk would work? 



In [30]:
tf.reshape(sample_tensor_2,(4,4,1))

tf.reshape(sample_tensor_2,(4,4))

<tf.Tensor: shape=(4, 4), dtype=int32, numpy=
array([[ 2,  4,  6,  8],
       [10, 12, 14, 16],
       [ 1,  3,  5,  7],
       [ 9, 11, 13, 15]], dtype=int32)>

Since we want the features in each sample to be one-dimensional, we would go with option two. Finally, we would check the output tensor to make sure it matches our expectations.

In [34]:
print(tf.reshape(sample_tensor_2,(4,4)))

tf.Tensor(
[[ 2  4  6  8]
 [10 12 14 16]
 [ 1  3  5  7]
 [ 9 11 13 15]], shape=(4, 4), dtype=int32)


This process is very common in taking a raw dataset and adapting it to a neural network architecture.


# Deep Learning









Objectives: 
- Code and optimize a neural network
- Adapt a network to new data

20 minutes

In previous sections of this course, you have covered neural networks and deep learning for classifying the MNIST dataset. The task was classifying handwritten digits 0-9 based on images. In this section, we will revisit deep learning in Python with text data. 

We will start with the classificaton problem (student loan vs checking/savings account) from the NLP section of the course, where customer complaint data was used classify what type of account the complaint was related to. We will use the same embeddings we trained for the final logistic regression problem in that section of the course.

First, we will load in the data and split it into training and validation data.

In [58]:
word2vec_features_df=pd.read_csv('https://github.com/dlab-berkeley/Computational-Social-Science-Training-Program/raw/master/data/embeddings.csv')
y=pd.read_csv('https://github.com/dlab-berkeley/Computational-Social-Science-Training-Program/raw/master/data/y.csv')
y_vals=y['Product_binary'].values
X_train, X_test, y_train, y_test = train_test_split(word2vec_features_df, 
                                                    y_vals, 
                                                    train_size = .80, 
                                                    test_size=0.20, 
                                                    random_state = 10)


Next we define the model. In Keras, each layer of the model has to be individually specified. This allows significant control over the model, including different parameters for each level.

This model has a dense layer with 128 neurons in each, and a dropout layer where 20% of the connections are dropped out for each layer. The final output layer uses a sigmoid activation function to create a final binary output (0 or 1). 

In [59]:
def NN_model():
    # create model
    model = Sequential()

    # A fully connected layer with 128 neurons
    model.add(Dense(128, input_dim=301,activation='relu'))

    # A dropout layer that randomly excludes 20% of neurons in the layer 
    model.add(Dropout(0.2))

    # An output layer with binary classification
    model.add(Dense(1, activation='sigmoid'))
    
    # Compile model with crossentropy
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

Finally, we fit and evaluate the model.

In [60]:
model = NN_model()
# Fit the model
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, verbose=0)

# Evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)

print("NN Error: %.2f%%" % (100-scores[1]*100))
print(model.summary())

NN Error: 21.50%
Model: "sequential_5"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_12 (Dense)            (None, 128)               38656     
                                                                 
 dropout_7 (Dropout)         (None, 128)               0         
                                                                 
 dense_13 (Dense)            (None, 1)                 129       
                                                                 
Total params: 38,785
Trainable params: 38,785
Non-trainable params: 0
_________________________________________________________________
None


This is a simple neural network with a couple of densely connected layers and a couple of dropout layers. When working with neural nets, it's often a good idea to start with a simple net to make sure the basics of the code work, then gradually create more complicated architectures once the code runs smoothly.

Now, let's use our tensor knowledge to adapt this architecture to another set of data. First, let's load in the MNIST digits dataset (in practice, we would likely be using a dataset more similar to the one in the original model). The MNIST dataset is three dimensions (n_samplesx28x28), so we need to flatten the data for now to create a two dimensional tensor  n_samplesx784 to fit with the neural net we are working on. Note: instead of two classes, the MNIST dataset uses 10 classes (one for each digit 0-9). 

In [40]:
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# reshape to [samples][width][height][pixels]
X_train = X_train.reshape(X_train.shape[0], 28*28)
X_test = X_test.reshape(X_test.shape[0], 28*28)
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


Here is the same code from the NN model above. What do you need to change in order to run the same model on the new data? Note which parameters and values you need to change. How does this relate to the differences in the data? Let's edit the code to work with the new data shape and execute it.

Hint: use tf.shape() to see the compare the shapes of the MNIST and original dataset

These are the lines of code we need to change to make this model work with new data: 


In line 6: 
```
model.add(Dense(128, input_dim=784,activation='relu')) #change input dim
```
The embeddings dataset had 301 features, or columns, the new MNIST dataset has 784, so we need to make sure to match the numbers in model architecture. 

In line 12: 

```
    model.add(Dense(10, activation='softmax')) #change dimension to number of categories
```
The final layer needs to have 10 categories, rather than two, since there are more classes in the MNIST dataset. In addition, the activation function needs to be changed to softmax.

In lne 15:

```
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

```

Again, because of the number of classes, the loss function used must be categorical cross entropy rather than binary cross entropy. 

Here is the updated model:





In [41]:
def diff_CNN_model():
    # create model
    model = Sequential()
    
    model.add(Dense(128, input_dim=784,activation='relu')) #change input dim

    model.add(Dropout(0.2))
    
    model.add(Dense(128, activation='relu'))
    model.add(Dropout(.2))
    
    model.add(Dense(10, activation='softmax')) #change dimension to number of categories and activation function
    
    #change to categorical crossentropy
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model


model = diff_CNN_model()
# Fit the model
print(X_train.shape)
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=200, verbose=2)

# Evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print("NN Error: %.2f%%" % (100-scores[1]*100))

(60000, 784)
Epoch 1/10
300/300 - 3s - loss: 5.1993 - accuracy: 0.6690 - val_loss: 0.7474 - val_accuracy: 0.8304 - 3s/epoch - 10ms/step
Epoch 2/10
300/300 - 2s - loss: 0.8942 - accuracy: 0.7889 - val_loss: 0.5607 - val_accuracy: 0.8722 - 2s/epoch - 7ms/step
Epoch 3/10
300/300 - 2s - loss: 0.6801 - accuracy: 0.8358 - val_loss: 0.4515 - val_accuracy: 0.9023 - 2s/epoch - 7ms/step
Epoch 4/10
300/300 - 2s - loss: 0.5592 - accuracy: 0.8656 - val_loss: 0.3760 - val_accuracy: 0.9144 - 2s/epoch - 6ms/step
Epoch 5/10
300/300 - 2s - loss: 0.4838 - accuracy: 0.8822 - val_loss: 0.3555 - val_accuracy: 0.9227 - 2s/epoch - 5ms/step
Epoch 6/10
300/300 - 1s - loss: 0.4250 - accuracy: 0.8959 - val_loss: 0.3256 - val_accuracy: 0.9285 - 943ms/epoch - 3ms/step
Epoch 7/10
300/300 - 1s - loss: 0.3735 - accuracy: 0.9068 - val_loss: 0.2784 - val_accuracy: 0.9346 - 967ms/epoch - 3ms/step
Epoch 8/10
300/300 - 1s - loss: 0.3363 - accuracy: 0.9152 - val_loss: 0.2726 - val_accuracy: 0.9374 - 933ms/epoch - 3ms/step
E

# Optimizing Neural Nets


Objectives:
- Explore strategies to optimize a neural net
- Implement an optimizer with custom settings
- Grid search parameters

20 minutes

Optimizing neural nets is a key point of using these powerful models effectively, as with any ML models. However, neural nets have many parameters that can be tuned and are a challenge for traditional optmization methods such as grid search.

In the previous challenge, we experimented with improving the accuracy of the model. The following strategies can help guide the optmization process for fine-tuning algorithms.

1. Feature engineering (refer to Natural Language Processing Notebook)

2. Try a smaller network (minimize redundancy) or a larger network (capture more complex relationships)

3. Change learning rate
4. Use appropriate architecture for the data/task

5. Test parameters

6. Decrease batch size 

Depending on the task, data, and neural network used, there may be a significant amount of tuning necessary in order to achieve an optimal result. This is one reason why leveraging existing models that are already optimized can give a huge advantage for language tasks. 

Further reference this article: https://towardsdatascience.com/optimizing-neural-networks-where-to-start-5a2ed38c8345 


For this notebook we will start with changing the learning rate. 

In previous examples, we passed the optimizer to the compile funciton 
```
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

```
Which uses the default parameters for the function. Now that we are customizing the parameters, we want to use the actual optimizer function, and then pass that optimizer into the .compile() function.

```
model.compile(....,opt=keras.optimizers.Adam())
```

Here is the documentation for that function: https://keras.io/api/optimizers/adam/

What is the default parameter for learning rate? What are some of the other parameters for the Adam optimizer?

##Challenge

Test the following learning rates: [.0001,.001,.01,.1]. Which one performs the best? Which one performs the worst?



In [45]:
#load in data to use for this test
from tensorflow.keras.optimizers import Adam

#cnn classification for neural nets
word2vec_features_df=pd.read_csv('https://github.com/dlab-berkeley/Computational-Social-Science-Training-Program/raw/master/data/embeddings.csv')

y=pd.read_csv('https://github.com/dlab-berkeley/Computational-Social-Science-Training-Program/raw/master/data/y.csv')
y_vals=y['Product_binary'].values
X_train, X_test, y_train, y_test = train_test_split(word2vec_features_df, 
                                                    y_vals, 
                                                    train_size = .80, 
                                                    test_size=0.20, 
                                                    random_state = 10)
#print(word2vec_features_df.shape)
#print(X_train.shape,X_test.shape,y_train.shape,y_test.shape)

In [46]:
#solution
##1
def NN_model():
    # create model
    model = Sequential()
    
    model.add(Dense(128, input_dim=301,activation='relu')) #change input dim

    # A dropout layer that randomly excludes 20% of neurons in the layer 
    model.add(Dropout(0.2))
    
    # A fully connected layer with 128 neurons
    model.add(Dense(128, activation='relu'))
    model.add(Dropout(.2))
    
    # An output layer with softmax as in MLP
    model.add(Dense(1, activation='sigmoid'))
    adam_opt=Adam(learning_rate=.1)
    # Compile model as before in MLP
    model.compile(loss='binary_crossentropy', optimizer=adam_opt, metrics=['accuracy'])
    return model

In [47]:
model = NN_model()
# Fit the model
print(X_train.shape)
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=200, verbose=2)

# Evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print("CNN Error: %.2f%%" % (100-scores[1]*100))



(800, 301)
Epoch 1/10
4/4 - 1s - loss: 137.0990 - accuracy: 0.6562 - val_loss: 15.7502 - val_accuracy: 0.2150 - 862ms/epoch - 215ms/step
Epoch 2/10
4/4 - 0s - loss: 5.8819 - accuracy: 0.3587 - val_loss: 3.8590 - val_accuracy: 0.7850 - 39ms/epoch - 10ms/step
Epoch 3/10
4/4 - 0s - loss: 1.3224 - accuracy: 0.7837 - val_loss: 1.0794 - val_accuracy: 0.7850 - 34ms/epoch - 8ms/step
Epoch 4/10
4/4 - 0s - loss: 2.0045 - accuracy: 0.7825 - val_loss: 0.5748 - val_accuracy: 0.7850 - 35ms/epoch - 9ms/step
Epoch 5/10
4/4 - 0s - loss: 0.5777 - accuracy: 0.7862 - val_loss: 0.5711 - val_accuracy: 0.7850 - 52ms/epoch - 13ms/step
Epoch 6/10
4/4 - 0s - loss: 0.5623 - accuracy: 0.7862 - val_loss: 0.5472 - val_accuracy: 0.7850 - 34ms/epoch - 9ms/step
Epoch 7/10
4/4 - 0s - loss: 0.5403 - accuracy: 0.7862 - val_loss: 0.5328 - val_accuracy: 0.7850 - 38ms/epoch - 10ms/step
Epoch 8/10
4/4 - 0s - loss: 0.5319 - accuracy: 0.7862 - val_loss: 0.5236 - val_accuracy: 0.7850 - 35ms/epoch - 9ms/step
Epoch 9/10
4/4 - 0s 

# Challenge: Optimizing a Neural Net


The logit model from the challenge question in NLP section used to classify the customer complaint data had an accuracy of 78.5%. What is the accuracy of the first neural network model on the same data? Hint: (read the output) Try changing the model to improve accuracy. What configuration gave you the best results? Try changing the parameters of the existing layers, or adding more layers.

In [44]:
word2vec_features_df=pd.read_csv('https://github.com/dlab-berkeley/Computational-Social-Science-Training-Program/raw/master/data/embeddings.csv')
y=pd.read_csv('https://github.com/dlab-berkeley/Computational-Social-Science-Training-Program/raw/master/data/y.csv')
y_vals=y['Product_binary'].values
X_train, X_test, y_train, y_test = train_test_split(word2vec_features_df, 
                                                    y_vals, 
                                                    train_size = .80, 
                                                    test_size=0.20, 
                                                    random_state = 10)

In [50]:
#original model

def NN_model():
    # create model
    model = Sequential()

    # A fully connected layer with 128 neurons
    model.add(Dense(128, input_dim=301,activation='relu'))

    # A dropout layer that randomly excludes 20% of neurons in the layer 
    model.add(Dropout(0.2))

    # An output layer with binary classification
    model.add(Dense(1, activation='sigmoid'))
    
    # Compile model with crossentropy
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

    model = NN_model()
    
# Fit the model
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, verbose=0)

# Evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)

print("NN Error: %.2f%%" % (100-scores[1]*100))
print(model.summary())

NN Error: 21.50%
Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_7 (Dense)             (None, 128)               38656     
                                                                 
 dropout_4 (Dropout)         (None, 128)               0         
                                                                 
 dense_8 (Dense)             (None, 128)               16512     
                                                                 
 dropout_5 (Dropout)         (None, 128)               0         
                                                                 
 dense_9 (Dense)             (None, 1)                 129       
                                                                 
Total params: 55,297
Trainable params: 55,297
Non-trainable params: 0
_________________________________________________________________
None


In practice we often take advantage of existing code and architecture to help accomplish deep learning tasks. This can range from taking a neural network architecture and adapting it to new data (as in our exercise above) to using other packages with pre-trained models. In the next section we will explore one such package called Huggingface.

# Huggingface

Objectives:
- Explore tasks and data available in Huggingface transformers
- Choose an appropriate language task
- Implement a transformer on local data

20 minutes

In reality, these models  require significant data and computational power, which can exceed the resources available to the analyst. We can circumvent this problem by using pre-trained models. Like a pre-trained embedding model, pre-trained models are trained on a large dataset. While this may not perfectly align with the data or task you have, it can help create a more robust system that can be fine-tuned to your data and goals.

[Huggingface](https://huggingface.co/models) is a set of pretrained models from a variety of datasets and sources with an easy-to-use interface. In this section, we will explore the use of the Huggingface library to streamline language task processing.



In [51]:
#install the transformers library
!pip install transformers

Collecting transformers
  Downloading transformers-4.16.2-py3-none-any.whl (3.5 MB)
[K     |████████████████████████████████| 3.5 MB 5.2 MB/s 
[?25hCollecting tokenizers!=0.11.3,>=0.10.1
  Downloading tokenizers-0.11.5-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.8 MB)
[K     |████████████████████████████████| 6.8 MB 33.9 MB/s 
Collecting pyyaml>=5.1
  Downloading PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (596 kB)
[K     |████████████████████████████████| 596 kB 45.1 MB/s 
Collecting sacremoses
  Downloading sacremoses-0.0.47-py2.py3-none-any.whl (895 kB)
[K     |████████████████████████████████| 895 kB 46.3 MB/s 
[?25hCollecting huggingface-hub<1.0,>=0.1.0
  Downloading huggingface_hub-0.4.0-py3-none-any.whl (67 kB)
[K     |████████████████████████████████| 67 kB 5.2 MB/s 
Installing collected packages: pyyaml, tokenizers, sacremoses, huggingface-hub, transformers
  Attempting uninstall: pyyaml
  

The simplest strategy is to use the pipeline method, where you select the task and the pre-trained model (there are multiple models available for many of the tasks)

In [52]:
from transformers import pipeline
classifier = pipeline("sentiment-analysis") 


No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)


Downloading:   0%|          | 0.00/629 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/255M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

The key to using these models, since the preprocessing is built in, is understanding the format of the data necessary for the model. This model takes the raw text as input rather than the word embeddings, so let's reload our data appropriately.

In [53]:
cfpb=pd.read_csv('https://raw.githubusercontent.com/dlab-berkeley/Computational-Social-Science-Training-Program/master/data/CFPB%202020%20Complaints.csv')
complaints=cfpb['Consumer complaint narrative']
complaints=complaints[~complaints.isna()]
classifier(complaints.values[0])
print(complaints.values[0])

Reviewed my credit report in XX/XX/XXXX and noticed a lot of errors, inconsistent, and incorrect information. Sent a letter to Equifax on XX/XX/XXXX via mail asking them for an investigation and to verify all the dates and amounts were correct and fix the incorrect reporting on my credit. They did not respond at all so I sent another letter on XX/XX/XXXX via mail, again asking for an investigation and proof. They still didnt respond to that letter so I sent a third letter on XX/XX/XXXX certified mail so I have proof that they signed for my letter.

Last week I received two letters from Equifax dated XX/XX/XXXX on the same day. The said that they could not locate my credit file and needed me to send proof of identification and address. With all three letters I sent a copy of my Arizona drivers license and my XXXX direct deposit sub as my proof of address. The second letter said that they received my request to be removed from the promotions list and that it was added to my credit file. 

Then use the pipeline on the example data, and look at the results.

In [54]:
for k in range(10):
  print(complaints.values[k])
  print(classifier(complaints.values[k]))

Reviewed my credit report in XX/XX/XXXX and noticed a lot of errors, inconsistent, and incorrect information. Sent a letter to Equifax on XX/XX/XXXX via mail asking them for an investigation and to verify all the dates and amounts were correct and fix the incorrect reporting on my credit. They did not respond at all so I sent another letter on XX/XX/XXXX via mail, again asking for an investigation and proof. They still didnt respond to that letter so I sent a third letter on XX/XX/XXXX certified mail so I have proof that they signed for my letter.

Last week I received two letters from Equifax dated XX/XX/XXXX on the same day. The said that they could not locate my credit file and needed me to send proof of identification and address. With all three letters I sent a copy of my Arizona drivers license and my XXXX direct deposit sub as my proof of address. The second letter said that they received my request to be removed from the promotions list and that it was added to my credit file. 

As you might expect, the complaints dataset has mostly negative values. While this is somewhat of a trivial example, it highlights how in just a few lines of code and no preprocessing we can implement a model on our own data. While this doesn't work for every task, for example the specific classification task that we were working with above, this is a valuable and powerful tool for quick, out-of-the-box models that don't take very long to initialize and tune.

## Challenge 

Let's practice with another task from [huggingface](https://huggingface.co/docs/transformers/task_summary). 

Let's say we want to check our data for grammatical correctness. We will use the CoLA model ("textattack/distilbert-base-uncased-CoLA") in the Text Classification pipeline ('text-classification') What is the grammatical correctness of each of the first 15 entries in the cfpb dataset?


In [56]:
#solution
#classifier = pipeline("text-classification", model = "textattack/distilbert-base-uncased-CoLA")
#classifier("I went to the bus.")


There are thousands of models on huggingface that can be used for a variety of language tasks. This can be a great way to use the models already available to increase our modeling power. 

# Next Steps

This lab has introduced Colab as a way to use GPUs to speed up processing power and explored further applications of deep learning to natural language processing. 

In practice, using deep learning for computational social science requires building on the foundational concepts covered in this notebook to implement models with more complicated data and architecture. However, there are many strategies can help you navigate the model ecosystem, some of which we will discuss here: 

1. Documentation (and other resources like tutorials) is a goldmine of information for implementing particular algorithms and completing specific tasks. This is one reason why reading and translating code written by others is a key skill. 

2. Debugging and interpreting error messages, as well as leveraging online resources in order to resolve them, is another key concept. Resources like documentation and Stack Overflow help solve common errors and get code working faster. 

3. Computational resources are important for running complex models. Google Colab has access to GPUs, but does have limitations for large and extended jobs. In those cases, options are paid services such as Google Colab Pro or on-campus [resources](https://docs-research-it.berkeley.edu/services/high-performance-computing/overview/). 

4. Further resources:
  - Deep Learning with Python (Francois Chollet)
  - [Huggingface course](https://huggingface.co/course/chapter1/1)
  - [Tensorflow](https://www.tensorflow.org/tutorials)




