# Setting up a GPU Deep Learning Server
Most of the work you end up doing with *most* frameworks will require a GPU. Nvidia is the way to go, but I will be doing this course using WSL 2.0, and DirectML to use the Discrete Intel Graphics on this machine.


If you do not have the time/need to set up on a machine, Google Collaboratory is a good bet, as you can get access to a good GPU, and have everything hosted on a server (therefore not requiring great specs).

After a lot of work and tinkering, I have managed to get all of this working on a GPU (being a relatively old GTX 950). This comes with the compromise of having to manage the VRAM on it (as it only has 2GB), therefore requires relatively small batch sizes.

In [2]:
from fastai.vision.all import * 
path = untar_data(URLs.PETS)/'images'
import torch
torch.cuda.empty_cache()
import gc
gc.collect()


def is_cat(x): return x[0].isupper() 
dls = ImageDataLoaders.from_name_func( path, get_image_files(path), valid_pct=0.2, seed=42, label_func=is_cat, item_tfms=Resize(50))

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Using device:', device)
print()

#Additional Info when using cuda
if device.type == 'cuda':
    print(torch.cuda.get_device_name(0))
    print('Memory Usage:')
    print('Allocated:', round(torch.cuda.memory_allocated(0)/1024**3,1), 'GB')
    print('Cached:   ', round(torch.cuda.memory_reserved(0)/1024**3,1), 'GB')

print(torch.cuda.memory_summary(device=None, abbreviated=False))

learn = cnn_learner(dls, resnet34, metrics=error_rate)
learn.fine_tune(1)

Using device: cuda

NVIDIA GeForce GTX 950
Memory Usage:
Allocated: 0.0 GB
Cached:    0.0 GB
|                  PyTorch CUDA memory summary, device ID 0                 |
|---------------------------------------------------------------------------|
|            CUDA OOMs: 0            |        cudaMalloc retries: 0         |
|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |
|---------------------------------------------------------------------------|
| Allocated memory      |       0 B  |   38400 B  |   38400 B  |   38400 B  |
|       from large pool |       0 B  |       0 B  |       0 B  |       0 B  |
|       from small pool |       0 B  |   38400 B  |   38400 B  |   38400 B  |
|---------------------------------------------------------------------------|
| Active memory         |       0 B  |   38400 B  |   38400 B  |   38400 B  |
|       from large pool |       0 B  |       0 B  |       0 B  |       0 B  |
|       from small pool |       0 B  |   38400 B 

[W NNPACK.cpp:79] Could not initialize NNPACK! Reason: Unsupported hardware.


epoch,train_loss,valid_loss,error_rate,time
0,0.929192,0.689685,0.296346,00:14


epoch,train_loss,valid_loss,error_rate,time
0,0.542773,0.415126,0.188769,00:16


In [4]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Using device:', device)
print()

#Additional Info when using cuda
if device.type == 'cuda':
    print(torch.cuda.get_device_name(0))

Using device: cuda

NVIDIA GeForce GTX 950


You are able to use the 'error' rate column in the table above to look at the accuracy of the model. This is vital in the building of deep learning models, as it shows how reliable and accurate it can be.

Here is the image I will be using to judge if this model is accurate.

In [None]:
from IPython.display import Image
Image(url= "Images/cat.jpg")

This is the end of the first classifier model from within the book.

# What is Machine Learning?

This classifier that was made is a deep learning model. Deep Learning models use neural networks, which were initially produced in the 1950s, and are an incredibly versatile and powerful method of Machine Learning.

Deep Learning is a more modern and complex form of Machine Learning, and I will write down some more notes from the book in the following sections.

*Machine Learning* is similar to normal programming by the fact that is is a way to get computers to complete a specific task. The main difference is that it avoids the programmer from having to account for every individual case manually, and allows for the computer to do it itself, and therefore be more efficient.

Right at the beginning of computing, all the way back in 1949, an IBM researcher (Arthur Samuel) worked on a different way of getting machines to complete tasks, which he coined *'Machine Learning'*.

- *"Programming a computer for such computations is, at best, a difficult task, not primarily because of any inherent complexity in the computer itself but, rather, because of the need to spell out every minute step of the process in the most exasperating detail. Computers, as any programmer will tell you, are giant morons, not
giant brains"*
- *"Suppose we arrange for some automatic means of testing the effectiveness of any current weight assignment in terms of actual performance and provide a mechanism for altering the weight assignment so as to maximize the performance. We need not go into the details of such a procedure to see that it could be made entirely automatic and to see that a machine so programmed would
“learn” from its experience."*

There are a number of powerful concepts embedded in this short statement:
- The idea of a “weight assignment”
- The fact that every weight assignment has some “actual performance”
- The requirement that there be an “automatic means” of testing that performance
- The need for a “mechanism” (i.e., another automatic process) for improving the performance by changing the
weight assignments

# Rundown of the above code
Here I'll go through the above code line by line and explain what is happening

In [None]:
from fastai.vision.all import *

In this line, I am importing all of the libraries that are used from within the fastai.vision library. This gives a lot of functions and classes which will be used to create a wide variety of models.

It is often advised not to import all sections of a framework or library into an environment (i.e. importing with ``import *``), although this library has been optimised to be used like this, so it isn't much of an issue, and can make things more convenient.

In [None]:
path = untar_data(URLs.PETS)/'images'

This line downloads a standard data set from the 'fast-ai datasets' collection (https://course.fast.ai/datasets), and returns it to the variable ``path``

In [None]:
def is_cat(x): return x[0].isupper()

This defines a function whereby this labels cats based on a filename rule (as given by the dataset's creators)

In [None]:
dls = ImageDataLoaders.from_name_func( path, get_image_files(path), valid_pct=0.2, seed=42, label_func=is_cat, item_tfms=Resize(50))

This line tells fastai what kind of dataset is being used, and how it is being structure. I have reduced each 'batch' to 50, as otherwise I run into errors with my GPU running out of memory, as seen below:

In [5]:
from fastai.vision.all import * 
path = untar_data(URLs.PETS)/'images'
import torch
torch.cuda.empty_cache()
import gc
gc.collect()


def is_cat(x): return x[0].isupper() 
dls = ImageDataLoaders.from_name_func( path, get_image_files(path), valid_pct=0.2, seed=42, label_func=is_cat, item_tfms=Resize(224)) #original value here was 224

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Using device:', device)
print()

#Additional Info when using cuda
if device.type == 'cuda':
    print(torch.cuda.get_device_name(0))
    print('Memory Usage:')
    print('Allocated:', round(torch.cuda.memory_allocated(0)/1024**3,1), 'GB')
    print('Cached:   ', round(torch.cuda.memory_reserved(0)/1024**3,1), 'GB')

print(torch.cuda.memory_summary(device=None, abbreviated=False))

learn = cnn_learner(dls, resnet34, metrics=error_rate)
learn.fine_tune(1)

Using device: cuda

NVIDIA GeForce GTX 950
Memory Usage:
Allocated: 0.0 GB
Cached:    0.0 GB
|                  PyTorch CUDA memory summary, device ID 0                 |
|---------------------------------------------------------------------------|
|            CUDA OOMs: 0            |        cudaMalloc retries: 0         |
|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |
|---------------------------------------------------------------------------|
| Allocated memory      |       0 B  |  753152 B  |  753152 B  |  753152 B  |
|       from large pool |       0 B  |       0 B  |       0 B  |       0 B  |
|       from small pool |       0 B  |  753152 B  |  753152 B  |  753152 B  |
|---------------------------------------------------------------------------|
| Active memory         |       0 B  |  753152 B  |  753152 B  |  753152 B  |
|       from large pool |       0 B  |       0 B  |       0 B  |       0 B  |
|       from small pool |       0 B  |  753152 B 

[W NNPACK.cpp:79] Could not initialize NNPACK! Reason: Unsupported hardware.


epoch,train_loss,valid_loss,error_rate,time


RuntimeError: CUDA out of memory. Tried to allocate 50.00 MiB (GPU 0; 1.95 GiB total capacity; 963.13 MiB already allocated; 27.00 MiB free; 1016.00 MiB reserved in total by PyTorch)