# Running Jupyter Notebooks on FloydHub

[Jupyter Notebooks](https://jupyter.org/) are great for interactively writing, running and sharing your code right from your browser. This tutorial teaches you the basics of running GPU-powered Notebooks on FloydHub.

Running a Jupyter Notebook on FloydHub is easy. Simply type this on your terminal:
```
floyd run --mode jupyter --gpu
```
This will open a Jupyter Notebook with GPU support, running on FloydHub's servers. If you're viewing this Notebook on FloydHub, you probably already did that! For more info, here's a [quick start tutorial](https://docs.floydhub.com/getstarted/quick_start_jupyter/).

### CPU and GPU Support

Notice the `--gpu` flag in the above command? That's all you need to do to get access to a powerful GPU in the cloud. You can view the stats and usage of your GPU by executing the command in the cell below (press **`shift + enter`**).

In [None]:
!nvidia-smi

Note that you will see the GPU stats only if you use the `--gpu` flag in your `floyd run` command. You can run your Notebook on a CPU machine by omitting the flag or using `--cpu`

### Deep Learning Environments

FloydHub comes with fully-configured and optimized environments for all deep learning frameworks! So, you don't have to fiddle with installing CUDA drivers, the framework(s) of your choice and all their dependencies.

The default environment is the latest version of Tensorflow and Keras. Go ahead and run the next cell.

In [None]:
import tensorflow as tf
print(tf.__name__)
print(tf.__version__)

In [None]:
import keras
print(keras.__name__)
print(keras.__version__)

If you want to use a different framework, you can specify this using the `--env` flag when you start your Notebook. Want a PyTorch Notebook? Start your Notebook with the following command from your local terminal
```
floyd run --mode jupyter --gpu --env pytorch-0.2
```
You can see the complete list of deep learning environments [here](https://docs.floydhub.com/guides/environments/).

### Installing Dependencies

All the environments also include lots of common machine learning and deep learning libraries like [Numpy](http://www.numpy.org/), [Pandas](http://pandas.pydata.org/) and [Matplotlib](https://matplotlib.org/).

In [None]:
import numpy as np
a = np.arange(15).reshape(3, 5)
print(a)

Of course, we might not have all the packages you want. You can install your own packages from inside your Notebook! Let's install the `plotly` Python package.

In [None]:
! pip install plotly

You might have more involved requirements - we got you covered!

Say, you want to install multiple Python packages. See how to use [floyd_requirements.txt](https://docs.floydhub.com/guides/jobs/installing_dependencies/#installing-python-dependencies).

Or, your dependency isn't a Python package at all and you want to install it via `apt-get` or even compile it from source. Take a look at our in-depth guide on [installing extra dependencies](https://docs.floydhub.com/guides/jobs/installing_dependencies/#installing-non-python-dependencies).

### Training a model for handwritten digit recognition

MNIST is a simple computer vision dataset of handwritten digits like these:
<img src="https://www.tensorflow.org/images/MNIST.png" width="300"/>
Owing to its popularity, it is commonly called the "Hello World" of machine learning! You can read more about it in the [Tensorflow's tutorial](https://www.tensorflow.org/get_started/mnist/beginners).

We will now train a simple multilayer perceptron model to recognize the handwritten digits using Keras

In [None]:
from keras.models import Sequential, save_model, load_model
from keras.layers import Dense, Dropout
from keras.optimizers import RMSprop
from keras.datasets import mnist
from keras.utils import np_utils
from keras.callbacks import ModelCheckpoint

In [None]:
# Hyper parameters
batch_size = 128
nb_epoch = 10

# Parameters for MNIST dataset
nb_classes = 10

# Parameters for MLP
prob_drop_input = 0.2               # drop probability for dropout @ input layer
prob_drop_hidden = 0.5              # drop probability for dropout @ fc layer

In [None]:
# Load MNIST dataset from the internet (https://s3.amazonaws.com/img-datasets/mnist.npz)
(X_train, y_train), (X_test, y_test) = mnist.load_data()

# Split the dataset into a training set and test set
X_train = X_train.reshape(60000, 784)
X_test = X_test.reshape(10000, 784)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
Y_Train = np_utils.to_categorical(y_train, nb_classes)
Y_Test = np_utils.to_categorical(y_test, nb_classes)

In [None]:
# Multilayer Perceptron model
model = Sequential()
model.add(Dense(activation="sigmoid", units=625, input_dim=784, kernel_initializer="normal", name="dense1"))
model.add(Dropout(prob_drop_input, name='dropout1'))
model.add(Dense(activation="sigmoid", units=625, input_dim=625, kernel_initializer="normal", name="dense2"))
model.add(Dropout(prob_drop_hidden, name='dropout2'))
model.add(Dense(activation="softmax", units=10, input_dim=625, kernel_initializer="normal", name="dense3"))
model.compile(optimizer=RMSprop(lr=0.001, rho=0.9), loss='categorical_crossentropy', metrics=['accuracy'])

# Print summary of the model
model.summary()

In [None]:
# Define directories to save the model checkpoints and logs
save_model(model, '/output/model_mlp')
!mkdir -p /output/logs
checkpoint = ModelCheckpoint(filepath='/output/logs/weights.epoch.{epoch:02d}-val_loss.{val_loss:.2f}.hdf5', verbose=0)

# Start training model
history = model.fit(X_train, Y_Train, epochs=nb_epoch, batch_size=batch_size, verbose=1,
                    callbacks=[checkpoint], validation_data=(X_test, Y_Test))

We trained our model for 10 epochs. At the end of the 10th epoch, our accuracy on the holdout validation set is around 97%. Not bad!

Now, let's test our model on the Test set.

In [None]:
# Evaluate
evaluation = model.evaluate(X_test, Y_Test, verbose=1)
print('\nSummary: Loss over the test dataset: %.2f, Accuracy: %.2f' % (evaluation[0], evaluation[1]))

That's it folks! You learnt some of the basics of using Jupyter Notebook on FloydHub and trained a pretty sleek model to recognize handwritten digits. Feel free to play around! (and don't forget to shutdown your job)

Below, we'll talk about slightly more advanced FloydHub constructs. 

* How do you save your data so you can come back later and use it? 
* How do you find and use others' public datasets in your job?
* How do I restart my old Notebook?

### Saving Output Data on FloydHub

We just trained a model that recognizes handwritten digits with a 98% accuracy! We, of course, want to save the model that we trained so we can utilize it later.

If you look at the code above, you will notice that the model checkpoints and logs are stored under `/output/model_mlp` and `/output/logs` respectively.
```
save_model(model, '/output/model_mlp')
!mkdir -p /output/logs
checkpoint = ModelCheckpoint(filepath='/output/logs/weights.epoch.{epoch:02d}-val_loss.{val_loss:.2f}.hdf5', verbose=0)
```

**The `/output` directory is a special directory on FloydHub.** Any directories, subdirectories or files that you create under the `/output` directory will be saved for you to use later, even after you close your Jupyter Notebook. 

**tl;dr: Please ensure that any data that you want to persist should be saved under `/output`. Data stored in any other location will be deleted when you end your Jupyter Notebook job.** Please see our extensive guide on [saving persistant outputs on FloydHub](https://docs.floydhub.com/guides/data/storing_output/).

### Using FloydHub's Public Datasets in your Jobs

FloydHub has a ton of popular datasets. These are community contributed datasets for many machine learning and deep learning tasks. You can find them in the [Explore Page](https://www.floydhub.com/explore/trending) or using the [Search box](https://www.floydhub.com/search/datasets?query=)

In the above example, we downloaded the MNIST dataset from the internet using this line of code:
```
(X_train, y_train), (X_test, y_test) = mnist.load_data()
```
This works well because the MNIST dataset is only about 11MB. If you had a larger dataset, it'd be a pain to download it every time. We highly recommend [creating a separate dataset](https://docs.floydhub.com/guides/create_and_upload_dataset/) or using a publicly available dataset. Here's a public MNIST dataset on FloydHub: [https://www.floydhub.com/redeipirati/datasets/mnist](https://www.floydhub.com/redeipirati/datasets/mnist)

To use a public dataset in your job, you need to _mount_ it when you execute your `floyd run` command:
```
floyd run --mode jupyter --gpu --data redeipirati/datasets/mnist/1:mnist
```
This will make this dataset available at `/mnist` for your code to access. You can read more about mounting datasets in our [docs here](https://docs.floydhub.com/guides/data/mounting_data/).

### Saving and Stopping your Notebook

You can save the progress you've made in your Notebook by clicking `File -> Save and Checkpoint`. You can view your saved Notebook from the `Code` tab of your job.

Here's [an example](https://www.floydhub.com/emilwallner/projects/deep-learning-from-scratch/3/code/MNIST_deep_learning.ipynb).

**Once you're done working on your Notebook, don't forget to shutdown your job!** You can shutdown your job by clicking on `Cancel` in your job's dashboard. Here's [our guide](https://docs.floydhub.com/guides/stop_job/).

<img src="https://docs.floydhub.com/img/stop_job.jpg" width="500"/>

**Note that simply closing the Notebook tab does not shutdown the job.** Since Jupyter Notebooks are interactive development environments, we don't know if you're done for the day or if you're going to come back and continue working on your Notebook. So, we'll keep your your Notebook running (and charge you) till you explicitly shut it down.

### Restarting your Notebook

You can also restart your Notebook to continue working from where you left off by clicking on the `Restart` button. Here's the [guide for it](https://docs.floydhub.com/guides/restart_job/).

<img src="https://docs.floydhub.com/img/restart_jupyter.gif" width="500" />

## FloydHub Best Practices

### a. Keeping code separate from data

A keypoint of your experiments and a data science best pratice is to have a clean separation of the code from the data that it uses. This will allow you to structure the experiments/Jobs in a more elegant way and optimize the code you need to upload on FloydHub and speed up the experiment cycle iterations.

### b. Sync your remote experiments locally

If you have followed this tutorial, you have certainly noticed that we have worked only on a FloydHub remote Job and the code we have locally is not synced with the current state of our Jupyter Notebook.

If you’d like to update everything locally, we can download everything from the Output tab of the Job's Overview of the Web Dashboard or by using the CLI with `floyd data clone <output>`.

You can read more on [output download](https://docs.floydhub.com/guides/download_output/) on our docs.

### c.  Using .floydignore

Use `.floydignore` will can speed up your upload and experiments iterations if your project code contains items that can be ignored from experiments code’s point of view (such as docs, images and video). See our FAQ about [long sync](https://docs.floydhub.com/faqs/job/#my-job-is-taking-a-while-to-sync-changes-how-do-i-make-it-go-faster).

**Note**: If your internet connection have a low bandwidth in upload, with this file you can really improve your experience on our service.