<a href="https://colab.research.google.com/github/rzl-ds/gu511_hw/blob/master/hw11.ipynb" target="_parent">
    <img src="https://colab.research.google.com/assets/colab-badge.svg"/>
</a>

# Exercises due by EOD 2020.12.04

## goal

in this homework assignment we will work with the `tensorflow` and `keras` deep learning frameworks and will look at the acceleration capable when using `gpu`s

## method of delivery

*as mentioned in our first lecture, the method of delivery may change from assignment to assignment. we will include this section in every assignment to provide an overview of how we expect homework results to be submitted, and to provide background notes or explanations for "new" delivery concepts or methods.*

this week you will be submitting the results of your homework via upload to your `s3` submission bucket

summary:

| exercise | deliverable | method of delivery | points |
|----------|-------------|--------------------|--------|
| 1 | a file `load_train_positions.py` | uploaded to your `s3` homework bucket | 10 |
| 2 | a file `boston_keras.py` | uploaded to your `s3` homework bucket | 15 |
| 3 | a commit which resolves a `github` issue and a `merge`d pull request | will be seen on `github` | 10 |

there is also a completely optional, ungraded exercise #4 for anyone interested in using a walkthrough of spinning up and utilizing GPU `ec2` instance 

total points: 35

<div style="border: 1px solid lightgrey;">

# exercise 1: load a `csv` as a `tensorflow` `dataset`

let's use the `tensorflow` `dataset` `api` to process a large `csv` file as a tensor and do some simple calculations

## 1.1: acquire the `csv`

we will use the `1GB` `train_positions.csv` file I have made publically available on `s3`. download it to your `/tmp` directory on your `ubuntu` `ec2` server with the command

```sh
# the -P /tmp will save the resulting file in the /tmp directory
wget https://s3.amazonaws.com/shared.rzl.gu511.com/train_positions/train_positions.csv -P /tmp
```

### 1.1.1: out of disk space?

if in the process of downloading this file you run out of disk space, try running the following and then downloading the file again:

```sh
sudo apt-get clean
sudo apt autoremove -y
conda clean -y --all
```

if that *still* doesn't work, you will need to increase the size of your `ec2`'s hard disk (it's `ebs` volume) through the web console. on the `ec2` dashboard, click on the `ec2` instance with the hard drive you wish to expand, and in the bottom panel find the root device link. click on that link and a popup will show the `ebs` id link, click that link

<br><div align="center"><img src="http://drive.google.com/uc?export=view&id=1mW1APVujBcS_C31Vd_kWR1Q7Ox1g1pFo" width="1000px"></div>

that link will have dropped you on the `ebs` id page. right click the volume row in the top panel and choose to modify the volume. modify the disk size by adding at least 1 GB.

## 1.2: create a `CsvDataset` object

in addition to the core `tensorflow` routines and `api`s, the `tensorflow` developers have a rigorous process for allowing developers to contribute new or experimental features. these new features are often saved in the `tf.contrib` namespace, but for datasets there is a special place for experimental (soon-to-be standard?) methods and classes: `tf.data.experimental`.

one of the classes defined in that namespace is `tf.data.experimental.CsvDataset`. look at [the docstring](https://www.tensorflow.org/api_docs/python/tf/data/experimental/CsvDataset)

```python
help(tf.data.experimental.CsvDataset)
```

why, in 2020, is `tensorflow`'s `csv` reader function experimental? beats me!

### 1.2.1: initialization arguments

a quick review of [the *initialization* function documentation](https://www.tensorflow.org/api_docs/python/tf/data/experimental/CsvDataset#__init__) for this class (the one that is called to build our `CsvDataset` object)

```python
help(tf.data.experimental.CsvDataset.__init__)
```

shows us what arguments we have and gives us an idea of what we have to do to build this object.

```python
__init__(filenames,
         record_defaults,
         compression_type=None,
         buffer_size=None,
         header=False,
         field_delim=',',
         use_quote_delim=True,
         na_value='',
         select_cols=None)
```

+ `filenames`: is a single filename string, a list of filenames strings, or a `tensor` of filenames as strings
+ `record_default`: a list of default values for incoming records
    + each feature is represented by either a default value (e.g. '') if it *is not* required, or a `tensorflow` `dtype` if it *is* required (e.g. `tf.int32`)
+ `header`: a `bool` indicating whether or not the file has a `header` row

for example, if we had a `csv` file named `data.csv`, with columns

| column name | contains `null` values | data type | suggested default value |
|-|-|-|-|
| `column_a` | no | an integer | `tf.int32` |
| `column_b` | no | a string | `''` |

that had no header, we would want to run

```python
import tensorflow as tf

filenames = 'data.csv'
record_defaults = [tf.int32, '']
header = False

data = tf.data.experimental.CsvDataset(
    filenames=filenames,
    record_defaults=record_defaults,
    header=header)
```

we will need to specify values for those three arguments for our real `trainpositions.csv` dataset; the rest of the arguments can be left as defaults.

#### 1.2.1.1: `record_defaults`

check the first few records of the `csv` file with

```sh
head -n20 /tmp/train_positions.csv
```

the following table summarizes the columns, whether or not they contain null values, and the suggested default value we should use in the `record_default` parameter. use this table to construct the `record_default` list

| column name | contains `null` values | suggested default value |
|-|-|-|
| `carcount` | no | `tf.int32` |
| `circuitid` | no | `tf.int32` |
| `destinationstationcode` | no | `''` |
| `directionnum` | no | `tf.int32` |
| `linecode` | no | `''` |
| `secondsatlocation` | no | `tf.int32` |
| `servicetype` | no | `tf.string` |
| `trainid` | no | `tf.string` |
| `timestamp` | no | `tf.string` |

### 1.2.2: invoking `CsvDataset`

fill in the code below to create your dataset

```python
import tensorflow as tf

filenames = #----------------#
            # FILL THIS IN!! #
            #----------------#
record_defaults = #----------------#
                  # FILL THIS IN!! #
                  #----------------#
header = #----------------#
         # FILL THIS IN!! #
         #----------------#
        
train_positions_dataset = tf.data.experimental.CsvDataset(
    filenames=filenames,
    record_defaults=record_defaults,
    header=header)
```

## 1.3: create a `batch`ed `dataset`

using your `train_positions_dataset` object's [`.batch`](https://www.tensorflow.org/api_docs/python/tf/data/experimental/CsvDataset#batch) method, create a `dataset` that has a batch size of 3 by filling in the code below

```python
BATCH_SIZE = 3

# make a batched dataset
tp_batched = #----------------#
             # FILL THIS IN!! #
             #----------------#

# imported for assertion tests
from tensorflow.python.data.ops.dataset_ops import BatchDataset
assert isinstance(tp_batched, BatchDataset)
```

you can verify that this worked by executing


```python
for elem in tp_batched:
    break

assert elem[1].numpy().tolist() == [2009, 1912, 1480]
assert elem[-2].numpy().tolist() == [b'067', b'175', b'182']
```

## 1.4: put it together

fill in the following block of `python` code and save the results as `load_train_positions.py`

```python
import tensorflow as tf

# imported for assertion tests
from tensorflow.python.data.ops.dataset_ops import BatchDataset
from tensorflow.python.data.ops.iterator_ops import Iterator

BATCH_SIZE = 3

def build_train_positions_dataset():
    filenames = #----------------#
                # FILL THIS IN!! #
                #----------------#
    record_defaults = #----------------#
                      # FILL THIS IN!! #
                      #----------------#
    header = #----------------#
             # FILL THIS IN!! #
             #----------------#

    train_positions_dataset = tf.data.experimental.CsvDataset(
        filenames=filenames,
        record_defaults=record_defaults,
        header=header)

    # make a batched dataset
    tp_batched = #----------------#
                 # FILL THIS IN!! #
                 #----------------#

    assert isinstance(tp_batched, BatchDataset)

    return tp_batched


def validate():
    tp_batched = build_train_positions_dataset()
    
    for elem in tp_batched:
        break
        
    assert elem[1].numpy().tolist() == [2009, 1912, 1480]
    assert elem[-2].numpy().tolist() == [b'067', b'175', b'182']
    
    print("you're all good!")


if __name__ == '__main__':
    validate()
```

if everything works as expected, you should be able to run (from the `bash` command line)

```sh
python load_train_positions.py
```

and the result will be nothing a handful of noisy tensorflow log messages followed by the phrase `you're all good!`, e.g.

```
2020-11-20 01:20:52.942501: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2020-11-20 01:20:52.942682: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2020-11-20 01:20:55.932371: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2020-11-20 01:20:55.932620: W tensorflow/stream_executor/cuda/cuda_driver.cc:312] failed call to cuInit: UNKNOWN ERROR (303)
2020-11-20 01:20:55.932778: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (ip-172-31-76-204): /proc/driver/nvidia/version does not exist
2020-11-20 01:20:55.935850: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-11-20 01:20:55.982617: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2400080000 Hz
2020-11-20 01:20:55.982939: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5600e9d8eec0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-11-20 01:20:55.983034: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
you're all good!
```


##### upload your filled-in file `load_train_positions.py` to your s3 homework bucket

<div style="border: 1px solid lightgrey;">

# exercise 2: simple `keras` models

let's create a pair of simple `keras` models to predict housing prices.

in the cells below, we will build out a `python` program block by block. the end result will be a full program we will save as a file `boston_keras.py`. in a `python` session, execute the code in each part of the exercise before moving on to the next part

## 2.1: load the Boston housing price dataset

`keras` provides built-in access to a number of datasets via the [`keras.datasets`](https://keras.io/datasets/#boston-housing-price-regression-dataset) module. we will use that to load train and test data in a format that is immediately consumable in a `keras` model

```python
from tensorflow import keras

(x_train, y_train), (x_test, y_test) = keras.datasets.boston_housing.load_data()
```

additionaly, let's normalize the predictor data:

```python
mean = x_train.mean(axis=0)
std = x_train.std(axis=0)
x_train = (x_train - mean) / std
x_test = (x_test - mean) / std
```

## 2.2: a linear model


### 2.2.1: build the model

we can build a linear regression in `keras` quite easily -- a linear regression is simply a

+ one-layer `Sequential` model
+ in which the one layer
    + is `Dense`
    + has only one set of weights (i.e. is only one node tall, aka 1 `unit`)
    + takes our `x` datasets as `inputs` (this defines `input_dim`)
    + has a `linear` `activation` (this is the default `activation` value, so no argument is necessary to `Dense(...)`)

fill in the below snippet to create a linear model

```python
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense

linear_model = #----------------#
               # FILL THIS IN!! #
               #----------------#
```

after doing so, you should be able to run

```python
linear_model.summary()
```

and see

```
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 1)                 14        
=================================================================
Total params: 14
Trainable params: 14
Non-trainable params: 0
_________________________________________________________________
```

*(the layer name may be `dense_N` for integer `n`)

### 2.2.2: `compile`

furthermore, we want to `compile` this model to use the `adam` optimizer algorithm to optimize a `mse` `loss` funciton. let's track the `mean_absolute_error` `metric` as well

```python
linear_model.compile(loss=,  # FILL THIS IN!!
                     optimizer=,  # FILL THIS IN!!
                     metrics=)  # FILL THIS IN!!
```

### 2.2.3: `fit`

finally, let's fit our training dataset. let's use validation within each `epoch` with a `validation_split` of 0.05. also, in order to treat both model types on an equal footing, rather than stop after a fixed number of epochs we wil stop after our best `mse` value. to do this, we will use `EarlyStopping` and `ModelCheckpoint` `callback`s.

```python
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint

es_callback = EarlyStopping(monitor='val_loss',
                            min_delta=0.01,
                            patience=100)

mc_callback = ModelCheckpoint('linear.hdf5',
                              monitor='val_loss',
                              save_best_only=True)

callbacks = [es_callback, mc_callback]
```

set the `validation_split` value to 0.05, set the `verbose` value to 0, the number of `epoch`s to be 10,000, and add the `callbacks` to fit on the `x` and `y` train datasets

```python
linear_model.fit(
    # FILL THIS IN!!
)
```

on your `ec2` instance, this could take a minute or two, but should not take many minutes.

### 2.2.4: `evaluate`

load the saved best dataset and view the ultimate accuracy of this model on the held-out test data:

```python
best_linear_model = keras.models.load_model('linear.hdf5')
linear_test_mse, linear_test_mae = best_linear_model.evaluate(x_test, y_test)
print(f"linear test mse: {linear_test_mse}")
print(f"linear test mae: {linear_test_mae}")
```

## 2.3: a deep neural net model

let's repeat the above but with a neural network architecture. create a new `Sequential` model with the following:

+ several layers
    + one 20-node layer with `relu` activation and `input_dim` determined by the shape of `x_test`
    + one 10-node layer with `relu` activation
    + one 6-node layer with `relu` activation
    + one 1-node output layer with the default activation
+ compile with
    + an `adam` optimizer
    + a `mse` loss
    + a `mean_absolute_error` metric
+ fit with
    + 10,000 `epochs`
    + a `validation_split` of 0.05

```python
dnn_model = #----------------#
            # FILL THIS IN!! #
            #----------------#

dnn_model.compile(loss=,  # FILL THIS IN!!
                  optimizer=,  # FILL THIS IN!!
                  metrics=)  # FILL THIS IN!!

from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint

es_callback = EarlyStopping(monitor='val_loss',
                            min_delta=0.01,
                            patience=100)

mc_callback = ModelCheckpoint('dnn.hdf5',
                              monitor='val_loss',
                              save_best_only=True)

callbacks = [es_callback, mc_callback]

dnn_model.fit(
    # FILL THIS IN!!
)

best_dnn_model = keras.models.load_model('dnn.hdf5')
dnn_test_mse, dnn_test_mae = best_dnn_model.evaluate(x_test, y_test)
print(f"dnn test mse: {dnn_test_mse}")
print(f"dnn test mae: {dnn_test_mae}")
```

## 2.4: bring it all together

fill in all of the above in one file named `boston_keras.py`

```python
from tensorflow import keras
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense

from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint

def main():
    # load boston data
    (x_train, y_train), (x_test, y_test) = keras.datasets.boston_housing.load_data()
    
    # standardize
    mean = x_train.mean(axis=0)
    std = x_train.std(axis=0)
    x_train = (x_train - mean) / std
    x_test = (x_test - mean) / std
    
    # linear model ------------------------------------------------------------
    print('{:-<80}'.format('linear model '))
    
    # init
    linear_model = #----------------#
                   # FILL THIS IN!! #
                   #----------------#

    linear_model.summary()
    
    # compile
    linear_model.compile(loss=,  # FILL THIS IN!!
                         optimizer=,  # FILL THIS IN!!
                         metrics=)  # FILL THIS IN!!
    
    # linear callbacks
    es_callback = EarlyStopping(monitor='val_loss',
                                min_delta=0.01,
                                patience=1000)

    mc_callback = ModelCheckpoint('linear.hdf5',
                                  monitor='val_loss',
                                  save_best_only=True)

    callbacks = [es_callback, mc_callback]

    # fit
    print('starting training...')
    linear_model.fit(
        # FILL THIS IN!!
    )
    print('finished training...')
    
    # evaluate
    best_linear_model = keras.models.load_model('linear.hdf5')
    linear_test_mse, linear_test_mae = best_linear_model.evaluate(x_test, y_test)
    print(f"linear test mse: {linear_test_mse}")
    print(f"linear test mae: {linear_test_mae}")
    
    # dnn model ---------------------------------------------------------------
    print('{:-<80}'.format('dnn model '))

    # init
    dnn_model = #----------------#
                # FILL THIS IN!! #
                #----------------#

    dnn_model.summary()
            
    # compile
    dnn_model.compile(loss=,  # FILL THIS IN!!
                      optimizer=,  # FILL THIS IN!!
                      metrics=)  # FILL THIS IN!!
    
    # dnn callbacks
    es_callback = EarlyStopping(monitor='val_loss',
                                min_delta=0.01,
                                patience=1000)

    mc_callback = ModelCheckpoint('dnn.hdf5',
                                  monitor='val_loss',
                                  save_best_only=True)

    callbacks = [es_callback, mc_callback]

    # fit
    print('starting training...')
    dnn_model.fit(
        # FILL THIS IN!!
    )
    print('finished training...')

    # evaluate
    best_dnn_model = keras.models.load_model('dnn.hdf5')
    dnn_test_mse, dnn_test_mae = best_dnn_model.evaluate(x_test, y_test)
    print(f"dnn test mse: {dnn_test_mse}")
    print(f"dnn test mae: {dnn_test_mae}")


if __name__ == '__main__':
    main()
```


##### upload your filled-in `boston_keras.py` to your `s3` homework submission bucket

<div style="border: 1px solid lightgrey;">

# exercise 3: resolving an issue with a `commit` to `master`

`github` -- *not* `git` itself -- has a concept of "issues". issues are a way to record "issues" you have with the code as it is, including feature requests, bugs, and improvements. you can view these from the "issues" tab on the main repository page:

<br><div align="center"><img src="http://drive.google.com/uc?export=view&id=1SYqdKMaNsFvWxEsyAFARir4lZkLZIQ_U"></div>

these provide a great way for people to communicate and discuss their development efforts. you should use them!

`github` also has other integrations with `issues`, including (importantly) the ability to *close* issues by referencing them in commit messages. that's what we're going to do.

I have already added an issue to your repositories requesting a simple change be made. the issue's title is **pin version numbers in requirements file**, and the goal is to hard-code the version numbers of the packages we want users to install and use to run the `dspipeline.py` file.

## 3.1: viewing and assigning the issue

log in to `github` and click on the "issues" tab, and open the issue I created for you. in particular, I want you to **assign** it to yourself -- click on the "assign yourself" link on the issues page

<br><div align="center"><img src="http://drive.google.com/uc?export=view&id=1AHYHy4b_A5k4CFC6IUj6Fyyx5_ICnNOz"></div>

## 3.2: editing the `requirements.txt` file

locally, edit the `requirements.txt` file to read

```
numpy==1.15.2
pandas==0.23.4
plotly==3.3.0
scikit-learn==0.20.0
```

hold off on committing for just a moment

## 3.3: background on resolving `github` issues with `commit` messages

`github` [allows users to resolve and close issues with `commit` message](https://help.github.com/articles/closing-issues-using-keywords/). read the documentation on that page!

if you make a `commit` message to any branch (including `master`) that uses keywords like "fixes" or "resolves" and references the issue by number, `github` will link that commit and the named issue. if that commit is to the `master` branch, `github` will automatically close the issue with a reference to the `commit` `sha`:

<br><div align="center"><img src="http://drive.google.com/uc?export=view&id=1vn8HX4wkn2HhjOBGu842wMuVQSWxHtIu" width="700px"></div>

if the `commit` is made to a non-`master` branch, it will allow users to create something called a "pull request" -- more on that in a future exercise!

## 3.4: actually resolving `github` issues with `commit` messages

`add` and `commit` your update to `requirements.txt` to the `master` branch with commit message

```
requirements.txt: pin versions, fixes #YOUR_ISSUE_NUMBER
```

where you replace `YOUR_ISSUE_NUMBER` with the number of your issue as seen in `github`:

<br><div align="center"><img src="http://drive.google.com/uc?export=view&id=1otKcV-eudPkQm-EbJ3alevItgcHhIl_O" width="500px"></div>

your issue number is *probably* 1, but double-check! for example, my commit message was

```
requirements.txt: pin versions, fixes #1
```

after you've `commit`ed, `push` to `origin` `master`

## 3.5: verify that the issue in your `github` repo is closed

check your issue page in `github` and verify that it appears closed and references the comit message you made:

<br><div align="center"><img src="http://drive.google.com/uc?export=view&id=1vn8HX4wkn2HhjOBGu842wMuVQSWxHtIu" width="700px"></div>


##### submission will be verified via `github`

<div style="border: 1px solid lightgrey;">

# <span style="color:red;font-weight:bold">OPTIONAL</span> exercise 4: benchmark differences in performance using `gpu`s <span style="color:red;font-weight:bold">OPTIONAL</span>

<span style="color:red;font-weight:bold">this exercise is optional and ungraded; it is included for anyone interested in the details of acquiring and using a `gpu`</span>

let's spin up a `gpu` `ec2` instance and do a simple benchmark to see the performance improvements available via `gpu`s

## 4.1: `gpu`s are expensive!

go check [the per-hour price](https://aws.amazon.com/ec2/pricing/on-demand/) of `gpu` compute for a `p3.2xlarge` instance in the US East (Virginia) region. as of writing, it is 3.06 USD per hour.

we don't want to leave that on for long, so let's make this quick!

## 4.2: spin up a `p3.2xlarge` instance

`aws` has already created a deep learning `ami` for us, so let's use it and save time (and money) on downloads.

+ open the `ec2` web console and create a new instance
+ `ami`: scroll down to "Deep Learning AMI (Ubuntu 16.04) Version 36.0" and select that record
    + note this is ***NOT*** the Ubuntu 18.04 version, we are using 16.04 (I have not tested this benchmarking script on 18.04)
    + if the version is larger (e.g. 36.1), select and continue
+ instance type: `p3.2xlarge`
    + click "review and launch"
+ launch the instance
    + make sure you have your `ssh` key saved somewhere easy!
    
**you may receive the following error:**

<br><div align="center"><img src="http://drive.google.com/uc?export=view&id=16VCyePLZeDUBI6mymcS_0Zwv4dgtKcvJ"></div>

if you do, you are limited to 0 `p3.2xlarge` instances (verify [here](https://console.aws.amazon.com/ec2/v2/home?region=us-east-1#Limits:)), and the only path forward is to request a limit increase for your account. if you would like to do so, head to https://console.aws.amazon.com/support/home#/case/create?issueType=service-limit-increase&limitType=service-code-ec2-instances to request an increase in the `ec2` instance type limit up to a limit value of 1. I am not sure how long this will take!!

## 4.3: log in

after your `ec2` instance is up and running, log in to it using username `ubuntu` and providing the path to the private key `.pem` file you either downloaded just now or when you created that key pair for a previous `ec2` instance

if you don't know where this key file is, *terminate the instance* (right click > instance state > terminate) and start over.

## 4.4: download a benchmark

log in to your new `ec2` instance and then download [this public `gist`](https://gist.github.com/RZachLamberty/fe8e05060b809e90fd2722feeb80fcda):

```sh
wget https://gist.githubusercontent.com/RZachLamberty/fe8e05060b809e90fd2722feeb80fcda/raw/8009c87f1b7e49c96181b011f226a5097ab89be3/cpu_gpu_benchmark.py
```

## 4.5: activate an environment

the good folks at `aws` have pre-configured this `ami` with a ton of different deep-learning-capable environments, and the commands for entering any one of them are printed out when you log in to the `ami`. one in particular is for us right now:

```
for TensorFlow(+Keras2) with Python3 (CUDA 9.0 and Intel MKL-DNN) ___________________________ source activate tensorflow_p36
```

run that command to activate that environment

## 4.6: run the benchmark

now that we have everything we need already installed (thanks, `aws` `ami`!), go ahead and run the benchmark script:

```sh
python cpu_gpu_benchmark.py
```

> **note**: the first time you run on this machine, the process of initializing `tensorflow` for the first time may require enough overhead to cause an error in the benchmarking script. you will see a `ValueError: Empty data passed with indices specified.` or `ValueError: could not broadcast input array from shape (0) into shape (1)` error. if you see this, just run the benchmark script again. if you see it more than three times, terminate your instance and send me an email.

the final output will be a dataframe which lists the amount of time it took to create random matrices of increasing sizes and multiply them, as well as the ratio of the speeds for the different devices for each operation.

it will also output a `csv` named `results.csv`. download that file (use `scp` to copy it from your `ec2` to your laptop, or just open it, highlight, and copy-paste to a local file)

## 4.7: TERMINATE YOUR `gpu` INSTANCE!!

don't forget to go back into the `ec2` web console and terminate your instance (right click > instance state > terminate).

<div style="border: 1px solid lightgrey;">