# Using Google Colab for Fast.ai

Welcome! Here is my one-stop-shop for getting all the Fast.ai lessons to work on Google Colab. I'll be updating this as I work through new lessons. Let me know if you have suggestions or improvements at @corythesaurus (DM me on Twitter).

My general workflow is to open each Fast.ai notebook and make a copy of it to save in my Drive, so I can add in my own cells as needed (and save them for later!). You can do that from within Colab: *File > Open Notebook... > click on "Github" tab > search for "fastai"*. All the notebooks should be there. Once you open a notebook, you can make a copy of it: *File > Save a copy in Drive...*. 

Finally, make sure you've enabled the GPU! *Edit > Notebook settings > set "Hardware Accelerator" to GPU.*

### The contribution of @denis-trofimov.
* bump PyTorch version
* make utilities quiet, less verbose
* unite install commands in one inside some sections

## Installing dependencies ##
We need to manually install fastai and pytorch. And maybe other things that fastai depends on (see [here](https://github.com/fastai/fastai/blob/master/requirements.txt)).

I will be referring to [this fastai forum thread](http://forums.fast.ai/t/colaboratory-and-fastai/10122/6) and [this blogpost](https://towardsdatascience.com/fast-ai-lesson-1-on-google-colab-free-gpu-d2af89f53604) if I get stuck. This is also a handy resource for using pytorch in colab:   https://jovianlin.io/pytorch-with-gpu-in-google-colab/ (and his [example notebook](https://colab.research.google.com/drive/1jxUPzMsAkBboHMQtGyfv5M5c7hU8Ss2c#scrollTo=ed-8FUn2GqQ4)!). And this [post](https://medium.com/@chsafouane/getting-started-with-pytorch-on-google-colab-811c59a656b6).

In [0]:
# Check python version
import sys
sys.version

'3.6.3 (default, Oct  3 2017, 21:45:48) \n[GCC 7.2.0]'

In [None]:
# Install PyTorch fastai
!pip3 install -q http://download.pytorch.org/whl/cu80/torch-0.3.1-cp36-cp36m-linux_x86_64.whl fastai torchvision

## GPU setup ##
Google is very generous and gives access to a GPU for CoLab users. Make sure it's enabled: Edit > Notebook settings > set "Hardware accelerator" to GPU.

The following is just to assuage your fears that you're being rate-limited or otherwise; you don't need to add these cells to your notebooks to get them to run. Just make sure you've enabled the GPU in the notebook settings. This is easy to forget :)

### Check that the GPU is available

In [0]:
import torch
torch.cuda.is_available()

True

In [0]:
torch.backends.cudnn.enabled

True

### Check how much of the GPU is available

I'm using the following code from [a stackoverflow thread](https://stackoverflow.com/questions/48750199/google-colaboratory-misleading-information-about-its-gpu-only-5-ram-available
) to check what % of the GPU is being utilized right now. 100% is bad; 0% is good (all free for me to use!).

In [None]:
# memory footprint support libraries/code
!ln -sf /opt/bin/nvidia-smi /usr/bin/nvidia-smi
!pip -q install gputil psutil humanize
import psutil
import humanize
import os
import GPUtil as GPU
GPUs = GPU.getGPUs()
# XXX: only one GPU on Colab and isn’t guaranteed
gpu = GPUs[0]
def printm():
 process = psutil.Process(os.getpid())
 print("Gen RAM Free: " + humanize.naturalsize( psutil.virtual_memory().available ), " | Proc size: " + humanize.naturalsize( process.memory_info().rss))
 print("GPU RAM Free: {0:.0f}MB | Used: {1:.0f}MB | Util {2:3.0f}% | Total {3:.0f}MB".format(gpu.memoryFree, gpu.memoryUsed, gpu.memoryUtil*100, gpu.memoryTotal))
printm()

### Special additions for particular lessons 4, 6

In [0]:
!pip3 -q install spacy
!python -m spacy download en

## Import all the libraries ##

In [0]:
# This file contains all the main external libs we'll use
from fastai.imports import *

In [0]:
from fastai.transforms import *
from fastai.conv_learner import *
from fastai.model import *
from fastai.dataset import *
from fastai.sgdr import *
from fastai.plots import *

### Save or load files with Google Drive

Run these codes first in order to install the necessary libraries and perform authorization.

In [0]:
!apt-get install -y -qq software-properties-common python-software-properties module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null
!apt-get -y install -qq google-drive-ocamlfuse fuse
from google.colab import auth
auth.authenticate_user()
from oauth2client.client import GoogleCredentials
creds = GoogleCredentials.get_application_default()
import getpass
!google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret} < /dev/null 2>&1 | grep URL
vcode = getpass.getpass()
!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}

Click the link, copy verification code and paste it to text box.

After completion of the authorization process,

mount your Google Drive:

In [0]:
!mkdir -p drive
!google-drive-ocamlfuse drive

## Cloning the fastai git repo ##
You likely don't actually need to do this, but if you want direct access to the .xls files, or want to inspect or fork their code... clone the fastai repository!

In [0]:
!git clone https://github.com/fastai/courses.git

Cloning into 'courses'...
remote: Counting objects: 765, done.[K
remote: Total 765 (delta 0), reused 0 (delta 0), pack-reused 765[K
Receiving objects: 100% (765/765), 22.40 MiB | 41.70 MiB/s, done.
Resolving deltas: 100% (409/409), done.


In [0]:
!pwd

/content


In [0]:
!ls courses

deeplearning1  deeplearning2  LICENSE.txt  README.md  requirements.txt	setup


In [0]:
!ls courses/deeplearning1

excel  nbs


In [0]:
!ls courses/deeplearning1/excel

collab_filter.xlsx  entropy_example.xlsx  layers_example.xlsx
conv-example.xlsx   graddesc.xlsm


## Accessing the fastai data files (lessons 1, 3, 4) ##
If you get a fastai URL to a .zip or .tgz - follow these directions to import the data into your notebook.

Here's the snippet from Lesson 1: *The dataset is available at http://files.fast.ai/data/dogscats.zip. You can download it directly on your server by running the following line in your terminal. wget http://files.fast.ai/data/dogscats.zip. You should put the data in a subdirectory of this notebook's directory, called data/. Note that this data is already available in Crestle and the Paperspace fast.ai template.*

### If it's a .zip file (lesson 1):

#### Lesson 1: Dogs & Cats data

In [0]:
# Get the file from fast.ai URL, unzip it, and put it into the folder 'data'
# -q to make the unzipping less verbose.
!mkdir -p data
!wget -q http://files.fast.ai/data/dogscats.zip && unzip -qq dogscats.zip -d data/

In [0]:
# Check to make sure the data is where you think it is:
!ls

data  datalab  dogscats.zip


In [0]:
# Check to make sure the folders all unzipped properly:
!ls data/dogscats

models	sample	test1  train  valid


### If it's a .tgz file (lesson 3 & 4):

#### Lesson 3: Rossmann data

In [None]:
# Get the Rossmann data from the fast.ai URL, and make a nested directory to put it in later. 
# -p flag from mkdir is to make a parent directory (allows nested directories to be created at once)
!wget -q http://files.fast.ai/part2/lesson14/rossmann.tgz && mkdir -p ~/data/rossmann

In [0]:
# Unzip the .tgz file
# x for extract
# -v for verbose    # NOTE: I usually turn this off; it prints a lot...
# -z for gnuzip
# -f for file (should come at last just before file name)
# -C to extract the zipped contents to a different directory
!tar -xzf rossmann.tgz -C ~/data/rossmann/

In [0]:
# Remove the .tgz file
!rm rossmann.tgz

In [0]:
# Make sure the data's where we think it is:
!ls ~/data/rossmann

googletrend.csv        state_names.csv	store_states.csv  train.csv
sample_submission.csv  store.csv	test.csv	  weather.csv


#### Lesson 4: IMDB data

In [None]:
# Get the IMDB data from the fastai URL: 
!wget -q http://files.fast.ai/data/aclImdb.tgz

In [0]:
# Make sure it imported properly:
!ls

aclImdb.tgz  data  datalab  dogscats.zip


In [0]:
# Unzip the tgz file
# x for extract
# -v for verbose    # NOTE: I usually turn this off; it prints a lot...
# -z for gnuzip
# -f for file (should come at last just before file name)
# -C to extract the zipped contents to a different directory
!tar -xzf aclImdb.tgz -C data/

In [0]:
# Remove the original .tgz file
!rm aclImdb.tgz

In [0]:
# Make sure the data is where we think it is:
!ls data/aclImdb

imdbEr.txt  imdb.vocab	README	test  train


## Getting data from Kaggle, using the Kaggle API (lesson 2)

Install the Kaggle API; authenticate; and then use the Kaggle command line interface to access data.

In [0]:
# Install the Kaggle API
!pip3 -q install kaggle

In [0]:
# Import kaggle.json from Google Drive
# This snippet will output a link which needs authentication from any Google account

from googleapiclient.discovery import build
import io, os
from googleapiclient.http import MediaIoBaseDownload
from google.colab import auth

auth.authenticate_user()

drive_service = build('drive', 'v3')
results = drive_service.files().list(
        q="name = 'kaggle.json'", fields="files(id)").execute()
kaggle_api_key = results.get('files', [])

filename = "/content/.kaggle/kaggle.json"
os.makedirs(os.path.dirname(filename), exist_ok=True)

request = drive_service.files().get_media(fileId=kaggle_api_key[0]['id'])
fh = io.FileIO(filename, 'wb')
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
    status, done = downloader.next_chunk()
    print("Download %d%%." % int(status.progress() * 100))
os.chmod(filename, 600)

**Now we have the Kaggle API set up!**

Here are a few examples of what we can do now, using the Kaggle API:

```
!kaggle competitions list
!kaggle datasets download -d stanfordu/street-view-house-numbers -w -f street-view-house-numbers.zip
```
More documentation on the Kaggle API here: https://github.com/Kaggle/kaggle-api

**Typical workflow:**

Download the zip file of a dataset:
```
!kaggle datasets download -d 
```
And then unzip the file and move to a directory:
```
!unzip street-view-house-numbers.zip 
```
Check to make sure it's there:
```
!ls
```

*This post was helpful for this lesson 2 data in particular: http://forums.fast.ai/t/how-to-download-data-for-lesson-2-from-kaggle-for-planet-competition/7684/38*

In [0]:
# List the files for the Planet data
!kaggle competitions files -c planet-understanding-the-amazon-from-space

In [0]:
# -c: competition name
# -f: which file you want to download
# -p: path to where the file should be saved
!kaggle competitions download -c planet-understanding-the-amazon-from-space -f train-jpg.tar.7z -p ~/data/planet/
!kaggle competitions download -c planet-understanding-the-amazon-from-space -f test-jpg.tar.7z -p ~/data/planet/
!kaggle competitions download -c planet-understanding-the-amazon-from-space -f train_v2.csv.zip -p ~/data/planet/

In [0]:
# Make sure the data is where you think it is:
!ls ~/data/planet

In [0]:
# In order to unzip the 7z files, need to install p7zip
# This was helpful: http://forums.fast.ai/t/unzipping-tar-7z-files-in-google-collab-notebook/14857/4
!apt-get -qq install p7zip-full

In [0]:
# Unzip the 7zip files
# -d: which file to un7zip
!p7zip -d ~/data/planet/test-jpg.tar.7z 
!p7zip -d ~/data/planet/train-jpg.tar.7z 

In [0]:
# Unzip the .tar files
!tar -xf ~/data/planet/test-jpg.tar
!tar -xf ~/data/planet/train-jpg.tar

In [0]:
# Move the unzipped folders into data/planet/
!mv test-jpg ~/data/planet/ && mv train-jpg ~/data/planet/

In [0]:
# Unzip the regular file
!unzip -qq ~/data/planet/train_v2.csv.zip -d ~/data/planet/

In [0]:
# Make sure everything looks as it should:
!ls ~/data/planet/

## Now we're ready to go! ##