# Rubbish Classifier

This is the first attend to create a rubbish classifier by creating a Deep Learning model using computer vision.

## 0. Importing PyTorch and setting up device-agnostic code

In [1]:
import torch
from torch import nn 

# Setup device-agnostic code
device = "cuda" if torch.cuda.is_available() else "cpu"
device

'cuda'

## 1. Getting data

For our rubbish clasifier we have the following nine label classes:

1. cans (softdrinks, beer, etc..)
2. carton_boxes (Milk boxes, Juice Boxes, etc...)
3. coffee_cups (Takeaway coffe cups)
4. glass_bottles (Beer bottles, wine bottles, spirits bottles, etc...)
5. paper_bags (Shopping paper bags)
6. plastic_bags (shopping plastic bags)
7. plastic_bottles (Milk plastic bottles, water, juices, etc...)
8. takeaway_containers (carboard takeaway conteiners)
9. tissues (tissues, napkins)

Where the ideal is to divide the data into 75% train data and 25% test data.

However, `rubbish_classifier_v0.001` is our first approach, so at the moment we have enough data to train only on the classes **cans, carton_boxes, coffee_cups** and **glass_bottles**

In [2]:
import requests
import zipfile
import os
from pathlib import Path

# Setup a path to a data folder
data_path = Path("data/")
images_path = data_path / "images_dataset"

# If the data folder doesn't exist, download it and prepare it.
if images_path.is_dir():
    print(f"'{images_path}' directory already exists, skipping directory creation...")
else:
    print(f"'{images_path}' does not exist, creating directory...")
    images_path.mkdir(parents=True, exist_ok=True)

# Download data
try:
    import gdown
except:
    !pip install gdown
    import gdown

url = 'https://drive.google.com/uc?id=1M-jVb1fw5TZrHsBecIE-EYDkcNVqoCSq'
output = str(data_path)+'/images.zip'
gdown.download(url, output, quiet=False)

# Unzip data
with zipfile.ZipFile(data_path / "images.zip", "r") as zip_ref:
    print("Unzipping data...")
    zip_ref.extractall(data_path)
    
os.remove(str(data_path)+"/images.zip")

'data\images_dataset' does not exist, creating directory...


Downloading...
From: https://drive.google.com/uc?id=1M-jVb1fw5TZrHsBecIE-EYDkcNVqoCSq
To: c:\Users\juanb\Documents\AI\rubbish_classifier\data\images.zip
100%|██████████| 481M/481M [00:25<00:00, 19.1MB/s] 


Unzipping data...


### 1.1 Converting all images to jpg format

`tqdm` source https://github.com/tqdm/tqdm

In [3]:
# Fucntion to convert from any image format to jpg
import os, sys
from pathlib import Path
from PIL import Image

# Install tqdm to show smart progess meter
try:
    from tqdm import tqdm
except:
    !pip install tqdm
    from tqdm import tqdm

def image_convertor(path: str, format: str):
    count=0
    path=Path(path)
    for file in tqdm(path.glob("./*")):
        f, e = os.path.splitext(file)
        renameFile = f + "."+format.lower()
        if e.lower() != "."+format.lower():
            old_file=file
            count+=1
            try:
                with Image.open(file) as img:
                    img.save(renameFile)
            except OSError:
                print("cannot convert", file)
            os.remove(old_file)
    print(f"{count} images converted to '{format}' in '{path}'")

In [4]:
# cans class convertion
image_convertor(path="data/dataset/cans/",
                format="jpg")
# carton_boxes class convertion
image_convertor(path="data/dataset/carton_boxes/",
                format="jpg")
# coffee_cups class convertion
image_convertor(path="data/dataset/coffee_cups/",
                format="jpg")
# glass_bottles class convertion
image_convertor(path="data/dataset/glass_bottles/",
                format="jpg")

112it [00:00, 55494.63it/s]


0 images converted to 'jpg' in 'data\dataset\cans'


89it [00:07, 11.81it/s]


66 images converted to 'jpg' in 'data\dataset\carton_boxes'


169it [00:14, 11.95it/s]


125 images converted to 'jpg' in 'data\dataset\coffee_cups'


121it [00:04, 25.33it/s]

42 images converted to 'jpg' in 'data\dataset\glass_bottles'





### 1.2 Split data into train and test dataset by using `split-folders`

Source https://github.com/jfilter/split-folders

In [5]:
# get split-folders ready to use
import shutil

try:
    import splitfolders
except:
    !pip install split-folders[full]
    import splitfolders

# Define input and output folders
input_folder = "data/dataset/"
output_folder = str(images_path)

splitfolders.fixed(input_folder, output= output_folder,
                   seed=42, fixed=(0, 30), move=True)

shutil.rmtree("data/dataset")
shutil.rmtree("data\images_dataset/val")

Copying files: 491 files [00:00, 1111.91 files/s]
