<a href="https://colab.research.google.com/github/Jotadebeese/ContactsList/blob/main/transfer_learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Transfer Learning with PyTorch

The following is the implementation of transfer learning for a Computer Vision classification application using torchvision.

In [3]:
# Continue with regular imports
import matplotlib.pyplot as plt
import torch
import torchvision

from torch import nn
from torchvision import transforms

# Try to get torchinfo
try:
    from torchinfo import summary
except:
    print("[INFO] Couldn't find torchinfo, installing it.")
    !pip install -q torchinfo
    from torchinfo import summary

# Try to import the PyTorch Scripts directory, download it from GitHub
try:
    from modular_scripts import data_setup, engine
except:
    # Get the scripts
    print("[INFO] Couldn't find scripts, downloading them from GitHub.")
    !git clone https://github.com/Jotadebeese/pytorch_scripts
    !mv pytorch_scripts/modular_scripts .
    !rm -rf pytorch_scripts
    from modular_scripts import data_setup, engine, utils

[INFO] Couldn't find scripts, downloading them from GitHub.
Cloning into 'pytorch_scripts'...
remote: Enumerating objects: 67, done.[K
remote: Counting objects: 100% (67/67), done.[K
remote: Compressing objects: 100% (40/40), done.[K
remote: Total 67 (delta 32), reused 60 (delta 25), pack-reused 0[K
Receiving objects: 100% (67/67), 20.30 KiB | 10.15 MiB/s, done.
Resolving deltas: 100% (32/32), done.


## 1. Get data

Getting data using the function `get_data` from `utils.py`

In [4]:
image_path = utils.get_data(zip_file_id='17oNGRMw72dcTOhbm_H4Gu7GrCx_LPpVp')

'data/images_dataset' does not exist, creating directory...


Downloading...
From: https://drive.google.com/uc?id=17oNGRMw72dcTOhbm_H4Gu7GrCx_LPpVp
To: /content/data/dataset.zip
100%|██████████| 549M/549M [00:06<00:00, 82.4MB/s]


Unzipping data...


### 1.1 Converting Images to jpg format

Converting images to jpg using `image_convertor` from `utils`

In [5]:
# cardboard class convertion
utils.image_convertor(path="data/rubbish_dataset/cardboard/",
                    format="jpg")
# glass class convertion
utils.image_convertor(path="data/rubbish_dataset/glass/",
                    format="jpg")
# metal class convertion
utils.image_convertor(path="data/rubbish_dataset/metal/",
                    format="jpg")
# paper class convertion
utils.image_convertor(path="data/rubbish_dataset/paper/",
                    format="jpg")
# plastic class convertion
utils.image_convertor(path="data/rubbish_dataset/plastic/",
                      format="jpg")
# trash class convertion
utils.image_convertor(path="data/rubbish_dataset/trash/",
                      format="jpg")

492it [00:03, 133.80it/s]


66 images converted to 'jpg' in 'data/rubbish_dataset/cardboard'


772it [00:02, 369.82it/s]


42 images converted to 'jpg' in 'data/rubbish_dataset/glass'


672it [00:00, 91080.34it/s]


0 images converted to 'jpg' in 'data/rubbish_dataset/metal'


743it [00:00, 91655.18it/s]


0 images converted to 'jpg' in 'data/rubbish_dataset/paper'


632it [00:00, 80133.01it/s]


0 images converted to 'jpg' in 'data/rubbish_dataset/plastic'


456it [00:05, 76.02it/s] 

125 images converted to 'jpg' in 'data/rubbish_dataset/trash'





### 1.2 Spliting Data into train, validation and test

Using `split folders`

Source: https://github.com/jfilter/split-folders

In [6]:
# get split-folders ready to use
import shutil

try:
    import splitfolders
except:
    !pip install split-folders[full]
    import splitfolders

# Define input and output folders
input_folder = "data/rubbish_dataset"
output_folder = str(image_path)

splitfolders.ratio(input_folder, output=output_folder,
    seed=1337, ratio=(.8, .1, .1), group_prefix=None, move=False) # default values

shutil.rmtree(input_folder)

Collecting split-folders[full]
  Downloading split_folders-0.5.1-py3-none-any.whl (8.4 kB)
Installing collected packages: split-folders
Successfully installed split-folders-0.5.1


Copying files: 3767 files [00:02, 1843.22 files/s]


In [7]:
# Setup directory path
train_dir = image_path / "train"
test_dir = image_path / "test"
val_dir = image_path / "val"

train_dir, test_dir, val_dir

(PosixPath('data/images_dataset/train'),
 PosixPath('data/images_dataset/test'),
 PosixPath('data/images_dataset/val'))

## 2. Create Datasets and DataLoaders

To do so, we use `data_setup.py` and the `create_dataLoaders()` inside it.