# Image Classification Lab

## Create an image dataset

Use the `bing-image-downloader` Python package to make a dataset with at least two classes.
Each class should have at least 100 images in each class.
Here's a really simple example:

```python
!pip install -Uqq bing-image-downloader

from bing_image_downloader import downloader

classes = ['puppies','kittens']
for c in classes:
    downloader.download(class, output_dir='data', verbose=False)
```

Feel free to use this example, but it will be more interesting if you use something that's relevant to you.

In [1]:
!pip install -Uqq bing-image-downloader

In [2]:
from bing_image_downloader import downloader
import os

In [None]:
%%time
categories = ['puppy','kitten','baby pig']
for c in categories:
    downloader.download(c, output_dir='data', verbose=False)

## Train a CNN from scratch using `torch`

In [4]:
# Imports here
from PIL import Image
from torchvision import transforms as T #Or use albumentations
from pathlib import Path
import torch
from torch import nn, optim
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
import pandas as pd

AttributeError: ignored

### Build your `Dataset`s and `DataLoader`s

By this time, you should be very familiar with building datasets and dataloaders.
We leave it up to you how to build them - at the end, you should have a `train_dl` and a `valid_dl` object.

In [None]:
def load_image(path):
    return Image.open(path)

In [None]:
path = Path('data')

In [None]:
extensions = set(i.suffix for i in path.glob('**/*') if len(i.suffix))

In [None]:
image_files = [i for i in path.glob(pattern='**/*') if i.suffix in extensions]

In [None]:
def label_img_from_path(path):
    path = Path(path)
    return path.parent.name

In [None]:
data = pd.DataFrame(image_files, columns=['img_path'])

In [None]:
data['label'] = data.img_path.map(label_img_from_path)
data['label_idx'] = LabelEncoder().fit_transform(data['label'])

In [None]:
data.groupby('label').label_idx.unique().explode().to_frame()

In [None]:
train, valid = train_test_split(data, test_size=0.2, stratify=data.label_idx)

In [None]:
class AnimalDataset(Dataset):
    def __init__(self, df):
        self.df = df
    
    def len(self):
        return len(self.df)
    
    def __getitem__(self, idx):
        row = self.df.iloc[idx]
        return Image.open(row.img_path), row.label_idx

In [None]:
train_ds = AnimalDataset(train)
valid_ds = AnimalDataset(valid)

In [None]:
img, label = train_ds[0]

In [None]:
label

In [None]:
img

In [74]:
tfms = torch.jit.script(nn.Sequential(
    T.Resize(size=(224)),
    T.ToTensor()
))

TypeError: ignored

In [71]:
tfms(img).shape

torch.Size([3, 224, 336])

In [72]:
DataLoader?

In [None]:
train_dl = DataLoader(train_ds, )

## Train the same CNN, but add image augmentation with `pytorch-lightning` or `fastai`

## Fine-tune a pre-trained model