<a href="https://colab.research.google.com/github/cyuancheng/fastai_v3/blob/master/Using_Google_Colab_for_Fastai.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Using Google Colab for Fast.ai

Welcome! Here is my one-stop-shop for getting all the Fast.ai lessons to work on Google Colab. I'll be updating this as I work through new lessons. Let me know if you have suggestions or improvements at @corythesaurus (DM me on Twitter).

My general workflow is to open each Fast.ai notebook and make a copy of it to save in my Drive, so I can add in my own cells as needed (and save them for later!). You can do that from within Colab: *File > Open Notebook... > click on "Github" tab > search for "fastai"*. All the notebooks should be there. Once you open a notebook, you can make a copy of it: *File > Save a copy in Drive...*. 

Finally, make sure you've enabled the GPU! *Edit > Notebook settings > set "Hardware Accelerator" to GPU.*

In [13]:
from google.colab import drive
drive.mount('/content/gdrive', force_remount=True)
root_dir = "/content/gdrive/My Drive/"
base_dir = root_dir + 'fastai-v3/'


Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/gdrive


## Installing dependencies ##
We need to manually install fastai and pytorch. And maybe other things that fastai depends on (see [here](https://github.com/fastai/fastai/blob/master/requirements.txt)).

I will be referring to [this fastai forum thread](http://forums.fast.ai/t/colaboratory-and-fastai/10122/6) and [this blogpost](https://towardsdatascience.com/fast-ai-lesson-1-on-google-colab-free-gpu-d2af89f53604) if I get stuck. This is also a handy resource for using pytorch in colab:   https://jovianlin.io/pytorch-with-gpu-in-google-colab/ (and his [example notebook](https://colab.research.google.com/drive/1jxUPzMsAkBboHMQtGyfv5M5c7hU8Ss2c#scrollTo=ed-8FUn2GqQ4)!). And this [post](https://medium.com/@chsafouane/getting-started-with-pytorch-on-google-colab-811c59a656b6).

In [1]:
# Check python version
import sys
sys.version

'3.6.7 (default, Oct 22 2018, 11:32:17) \n[GCC 8.2.0]'

In [2]:
# Install fastai
!pip3 install fastai

Collecting numpy>=1.15 (from fastai)
[?25l  Downloading https://files.pythonhosted.org/packages/35/d5/4f8410ac303e690144f0a0603c4b8fd3b986feb2749c435f7cdbb288f17e/numpy-1.16.2-cp36-cp36m-manylinux1_x86_64.whl (17.3MB)
[K    100% |████████████████████████████████| 17.3MB 2.5MB/s 
[31mfeaturetools 0.4.1 has requirement pandas>=0.23.0, but you'll have pandas 0.22.0 which is incompatible.[0m
[31malbumentations 0.1.12 has requirement imgaug<0.2.7,>=0.2.5, but you'll have imgaug 0.2.8 which is incompatible.[0m
Installing collected packages: numpy
  Found existing installation: numpy 1.14.6
    Uninstalling numpy-1.14.6:
      Successfully uninstalled numpy-1.14.6
Successfully installed numpy-1.16.2


In [2]:
# Install PyTorch
# I haven't needed to do this, but here's how just in case.
#!pip3 install http://download.pytorch.org/whl/cu80/torch-0.3.0.post4-cp36-cp36m-linux_x86_64.whl 
!pip3 install torchvision
!pip3 install torch torchvision



### Special additions for particular lessons

In [3]:
# Lesson 4
!pip3 install spacy
!python -m spacy download en


[93m    Linking successful[0m
    /usr/local/lib/python3.6/dist-packages/en_core_web_sm -->
    /usr/local/lib/python3.6/dist-packages/spacy/data/en

    You can now load the model via spacy.load('en')



## Import all the libraries ##

In [4]:
# This file contains all the main external libs we'll use
from fastai.imports import *

ImportError: ignored

In [5]:
from fastai.transforms import *
from fastai.conv_learner import *
from fastai.model import *
from fastai.dataset import *
from fastai.sgdr import *
from fastai.plots import *

ModuleNotFoundError: ignored

## GPU setup ##
Google is very generous and gives access to a GPU for CoLab users. Make sure it's enabled: Edit > Notebook settings > set "Hardware accelerator" to GPU.

The following is just to assuage your fears that you're being rate-limited or otherwise; you don't need to add these cells to your notebooks to get them to run. Just make sure you've enabled the GPU in the notebook settings. This is easy to forget :)

### Check that the GPU is available

In [7]:
import torch
torch.cuda.is_available()

True

In [8]:
torch.backends.cudnn.enabled

True

### Check how much of the GPU is available

I'm using the following code from [a stackoverflow thread](https://stackoverflow.com/questions/48750199/google-colaboratory-misleading-information-about-its-gpu-only-5-ram-available
) to check what % of the GPU is being utilized right now. 100% is bad; 0% is good (all free for me to use!).

In [9]:
# memory footprint support libraries/code
!ln -sf /opt/bin/nvidia-smi /usr/bin/nvidia-smi
!pip install gputil
!pip install psutil
!pip install humanize
import psutil
import humanize
import os
import GPUtil as GPU
GPUs = GPU.getGPUs()
# XXX: only one GPU on Colab and isn’t guaranteed
gpu = GPUs[0]
def printm():
 process = psutil.Process(os.getpid())
 print("Gen RAM Free: " + humanize.naturalsize( psutil.virtual_memory().available ), " | Proc size: " + humanize.naturalsize( process.memory_info().rss))
 print("GPU RAM Free: {0:.0f}MB | Used: {1:.0f}MB | Util {2:3.0f}% | Total {3:.0f}MB".format(gpu.memoryFree, gpu.memoryUsed, gpu.memoryUtil*100, gpu.memoryTotal))
printm()

Collecting gputil
  Downloading https://files.pythonhosted.org/packages/ed/0e/5c61eedde9f6c87713e89d794f01e378cfd9565847d4576fa627d758c554/GPUtil-1.4.0.tar.gz
Building wheels for collected packages: gputil
  Building wheel for gputil (setup.py) ... [?25ldone
[?25h  Stored in directory: /root/.cache/pip/wheels/3d/77/07/80562de4bb0786e5ea186911a2c831fdd0018bda69beab71fd
Successfully built gputil
Installing collected packages: gputil
Successfully installed gputil-1.4.0
Gen RAM Free: 12.9 GB  | Proc size: 241.5 MB
GPU RAM Free: 11430MB | Used: 11MB | Util   0% | Total 11441MB


## Cloning the fastai git repo ##
You likely don't actually need to do this, but if you want direct access to the .xls files, or want to inspect or fork their code... clone the fastai repository!

In [0]:
!git clone https://github.com/fastai/courses.git

Cloning into 'courses'...
remote: Counting objects: 765, done.[K
remote: Total 765 (delta 0), reused 0 (delta 0), pack-reused 765[K
Receiving objects: 100% (765/765), 22.40 MiB | 41.70 MiB/s, done.
Resolving deltas: 100% (409/409), done.


In [0]:
!pwd

/content


In [0]:
!ls courses

deeplearning1  deeplearning2  LICENSE.txt  README.md  requirements.txt	setup


In [0]:
!ls courses/deeplearning1

excel  nbs


In [0]:
!ls courses/deeplearning1/excel

collab_filter.xlsx  entropy_example.xlsx  layers_example.xlsx
conv-example.xlsx   graddesc.xlsm


## Accessing the fastai data files (lessons 1, 3, 4) ##
If you get a fastai URL to a .zip or .tgz - follow these directions to import the data into your notebook.

Here's the snippet from Lesson 1: *The dataset is available at http://files.fast.ai/data/dogscats.zip. You can download it directly on your server by running the following line in your terminal. wget http://files.fast.ai/data/dogscats.zip. You should put the data in a subdirectory of this notebook's directory, called data/. Note that this data is already available in Crestle and the Paperspace fast.ai template.*

### If it's a .zip file (lesson 1):

####Lesson 1: Dogs & Cats data

In [0]:
# Get the file from fast.ai URL, unzip it, and put it into the folder 'data'
# This uses -qq to make the unzipping less verbose.
!wget http://files.fast.ai/data/dogscats.zip && unzip -qq dogscats.zip -d data/

In [0]:
# Check to make sure the data is where you think it is:
!ls

data  datalab  dogscats.zip


In [0]:
# Check to make sure the folders all unzipped properly:
!ls data/dogscats

models	sample	test1  train  valid


### If it's a .tgz file (lesson 3 & 4):

####Lesson 3: Rossmann data

In [0]:
# Get the Rossmann data from the fast.ai URL, and make a nested directory to put it in later. 
# -p flag from mkdir is to make a parent directory (allows nested directories to be created at once)
!wget http://files.fast.ai/part2/lesson14/rossmann.tgz && mkdir -p ~/data/rossmann

--2018-07-18 21:05:33--  http://files.fast.ai/part2/lesson14/rossmann.tgz
Resolving files.fast.ai (files.fast.ai)... 67.205.15.147
Connecting to files.fast.ai (files.fast.ai)|67.205.15.147|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 7730448 (7.4M) [text/plain]
Saving to: ‘rossmann.tgz’


2018-07-18 21:05:33 (27.8 MB/s) - ‘rossmann.tgz’ saved [7730448/7730448]



In [0]:
# Unzip the .tgz file
# x for extract
# -v for verbose    # NOTE: I usually turn this off; it prints a lot...
# -z for gnuzip
# -f for file (should come at last just before file name)
# -C to extract the zipped contents to a different directory
!tar -xzf rossmann.tgz -C ~/data/rossmann/

In [0]:
# Remove the .tgz file
!rm rossmann.tgz

In [0]:
# Make sure the data's where we think it is:
!ls ~/data/rossmann

googletrend.csv        state_names.csv	store_states.csv  train.csv
sample_submission.csv  store.csv	test.csv	  weather.csv


####Lesson 4: IMDB data

In [10]:
# Get the IMDB data from the fastai URL: 
!wget http://files.fast.ai/data/aclImdb.tgz

--2019-03-18 12:04:07--  http://files.fast.ai/data/aclImdb.tgz
Resolving files.fast.ai (files.fast.ai)... 67.205.15.147
Connecting to files.fast.ai (files.fast.ai)|67.205.15.147|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 145982645 (139M) [text/plain]
Saving to: ‘aclImdb.tgz’


2019-03-18 12:04:12 (31.6 MB/s) - ‘aclImdb.tgz’ saved [145982645/145982645]



In [34]:
# Make sure it imported properly:
!ls

aclImdb.tgz  base_dir  gdrive  sample_data


In [33]:
#!cd /content
#!ls
!mv  /content/aclImdb  base_dir

mv: cannot stat '/content/aclImdb': No such file or directory


In [0]:
# Unzip the tgz file
# x for extract
# -v for verbose    # NOTE: I usually turn this off; it prints a lot...
# -z for gnuzip
# -f for file (should come at last just before file name)
# -C to extract the zipped contents to a different directory
!tar -xvzf aclImdb.tgz -C /content/gdrive/My\ Drive/fastai-v3

aclImdb/
aclImdb/imdbEr.txt
aclImdb/train/
aclImdb/train/neg/
aclImdb/train/neg/2966_1.txt
aclImdb/train/neg/9679_2.txt
aclImdb/train/neg/308_1.txt
aclImdb/train/neg/9735_1.txt
aclImdb/train/neg/10126_2.txt
aclImdb/train/neg/10425_3.txt
aclImdb/train/neg/5349_4.txt
aclImdb/train/neg/10511_1.txt
aclImdb/train/neg/2527_2.txt
aclImdb/train/neg/8238_1.txt
aclImdb/train/neg/2636_1.txt
aclImdb/train/neg/5685_1.txt
aclImdb/train/neg/8069_1.txt
aclImdb/train/neg/8252_3.txt
aclImdb/train/neg/2026_2.txt
aclImdb/train/neg/2185_1.txt
aclImdb/train/neg/9817_2.txt
aclImdb/train/neg/1780_1.txt
aclImdb/train/neg/249_3.txt
aclImdb/train/neg/2569_3.txt
aclImdb/train/neg/5421_4.txt
aclImdb/train/neg/2191_3.txt
aclImdb/train/neg/11209_3.txt
aclImdb/train/neg/8470_2.txt
aclImdb/train/neg/5827_2.txt
aclImdb/train/neg/5506_2.txt
aclImdb/train/neg/9281_1.txt
aclImdb/train/neg/11097_2.txt
aclImdb/train/neg/4045_1.txt
aclImdb/train/neg/11265_1.txt
aclImdb/train/neg/11058_1.txt
aclImdb/train/neg/5541_2.txt
aclIm

In [0]:
# Remove the original .tgz file
!rm aclImdb.tgz

In [0]:
# Make sure the data is where we think it is:
!ls data/aclImdb

imdbEr.txt  imdb.vocab	README	test  train


##Getting data from Kaggle, using the Kaggle CLI (lesson 2)

Install the Kaggle API; authenticate; and then use the Kaggle command line interface to access data.

In [0]:
# Install the Kaggle API
!pip3 install kaggle

In [0]:
# Import kaggle.json from Google Drive
# This snippet will output a link which needs authentication from any Google account

from googleapiclient.discovery import build
import io, os
from googleapiclient.http import MediaIoBaseDownload
from google.colab import auth

auth.authenticate_user()

drive_service = build('drive', 'v3')
results = drive_service.files().list(
        q="name = 'kaggle.json'", fields="files(id)").execute()
kaggle_api_key = results.get('files', [])

filename = "/content/.kaggle/kaggle.json"
os.makedirs(os.path.dirname(filename), exist_ok=True)

request = drive_service.files().get_media(fileId=kaggle_api_key[0]['id'])
fh = io.FileIO(filename, 'wb')
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
    status, done = downloader.next_chunk()
    print("Download %d%%." % int(status.progress() * 100))
os.chmod(filename, 600)

**Now we have the Kaggle API set up!**

Here are a few examples of what we can do now, using the Kaggle API:

```
!kaggle competitions list
!kaggle datasets download -d stanfordu/street-view-house-numbers -w -f street-view-house-numbers.zip
```
More documentation on the Kaggle API here: https://github.com/Kaggle/kaggle-api

**Typical workflow:**

Download the zip file of a dataset:
```
!kaggle datasets download -d 
```
And then unzip the file and move to a directory:
```
!unzip street-view-house-numbers.zip 
```
Check to make sure it's there:
```
!ls
```

*This post was helpful for this lesson 2 data in particular: http://forums.fast.ai/t/how-to-download-data-for-lesson-2-from-kaggle-for-planet-competition/7684/38*

In [0]:
# List the files for the Planet data
!kaggle competitions files -c planet-understanding-the-amazon-from-space

In [0]:
# -c: competition name
# -f: which file you want to download
# -p: path to where the file should be saved
!kaggle competitions download -c planet-understanding-the-amazon-from-space -f train-jpg.tar.7z -p ~/data/planet/
!kaggle competitions download -c planet-understanding-the-amazon-from-space -f test-jpg.tar.7z -p ~/data/planet/
!kaggle competitions download -c planet-understanding-the-amazon-from-space -f train_v2.csv.zip -p ~/data/planet/

In [0]:
# Make sure the data is where you think it is:
!ls ~/data/planet

In [0]:
# In order to unzip the 7z files, need to install p7zip
# This was helpful: http://forums.fast.ai/t/unzipping-tar-7z-files-in-google-collab-notebook/14857/4
!apt-get install p7zip-full

In [0]:
# Unzip the 7zip files
# -d: which file to un7zip
!p7zip -d ~/data/planet/test-jpg.tar.7z 
!p7zip -d ~/data/planet/train-jpg.tar.7z 

In [0]:
# Unzip the .tar files
!tar -xvf ~/data/planet/test-jpg.tar
!tar -xvf ~/data/planet/train-jpg.tar

In [0]:
# Move the unzipped folders into data/planet/
!mv test-jpg ~/data/planet/ && mv train-jpg ~/data/planet/

In [0]:
# Unzip the regular file
!unzip ~/data/planet/train_v2.csv.zip -d ~/data/planet/

In [0]:
# Make sure everything looks as it should:
!ls ~/data/planet/

## Now we're ready to go! ##