### This is the best way I've found to work with all of this images!

We all had problems when we needed to work with HUGE (* Ok, this isn't a huge dataset, but it's large* :D ) dataset of images. But, calm down, we're here to help you!! <br/>
We all want to play around with all of these whales, but that can be a challenge. If you've also have the problem of working with large image datasets, this is the Kernel for you.

Thinking about exploring this dataset and creating a simpler model? I've also created a more hands on approach to this competition, you can find the Kernel <br/><br/>
*  [Whales. A Simple Guide!](https://www.kaggle.com/jhonatansilva31415/whales-a-simple-guide/)

To make a more in depth explanation on the details of this Kernel I've made this video on YouTube 
## Full Video Explanation of this Kernel
[How to work with large image datasets](https://www.youtube.com/watch?v=myYMrZXpn6U)
## Full Video Explanation of the previous Kernel
[KAGGLE KERNELS - HOW TO START AT 2019](https://www.youtube.com/watch?v=AXcTm4gFerE)

<br/> 
If you are still here ( I'm glad you are .0. ) let's move on :D 

## Notebook Content
1. [The Libraries we all Like](#first-bullet)
2. [Our Own customizable Whales Class](#second-bullet)
3. [Having a look at the dataset](#third-bullet)
4. [Transforming our images ](#forth-bullet)
5. [Transforming one Whale](#fifth-bullet)
6. [Creating our transformed dataset](#sixth-bullet)
7. [Loading it upl](#seventh-bullet)

#### Disclaimer
This tutorial was just possible by the great documentation from the PyTorch website, I've made some adaptations from the [DATA LOADING AND PROCESSING TUTORIAL](https://pytorch.org/tutorials/beginner/data_loading_tutorial.html)

### The Libraries we all Like <a class="anchor" id="first-bullet"></a>
Pandas, Numpy, Matplotlib are in pratically all the Kernels I see. A part from that, we are going to be using **PyTorch** 

In [None]:
from __future__ import print_function, division

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import time
import os
import copy

from os import listdir, makedirs, getcwd, remove
from os.path import isfile, join, abspath, exists, isdir, expanduser
from torch.optim import lr_scheduler
from skimage import io, transform
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms, utils
from torchvision import datasets, models, transforms
from torch.autograd import Variable
from IPython.display import clear_output

# Ignore warnings
import warnings
warnings.filterwarnings("ignore")

plt.ion()   # interactive mode

In [None]:
image_size = 128

In [None]:
humpback_whales_path = '/kaggle/input' 
train_path = os.path.join(humpback_whales_path,'train.csv')
humpback_whales_train_path = os.path.join(humpback_whales_path,'train')

### Our Own customizable Whales Class  <a class="anchor" id="second-bullet"></a>
To handle this entire dataset we're going to be creating our own class to him. In this we define what will be the dataframe, what is the root directory, and what transformations we are going to pass to it. <br/> 
Another thing we need to create ( this is where the **magic** happens ) is the \__getitem__ , we're going to be using this to iterate throught the dataset <br/>
Last but not least we are going to be using the same encoding as [here](https://www.kaggle.com/jhonatansilva31415/whales-a-simple-guide/) , you can find out more in the [Video](https://www.youtube.com/watch?v=AXcTm4gFerE)

In [None]:
class WhalesDS(Dataset):
    """ Humpback Whale Identification Challenge dataset. """
    def __init__(self, csv_file, root_dir, transform=None):
        """
        Args:
            csv_file (string): Path to the csv file with annotations.
            root_dir (string): Directory with all the images.
            transform (callable, optional): Optional transform to be applied
                on a sample.
        """
        self.whales_frame = self.encode()
        self.root_dir = root_dir
        self.transform = transform
        
    def __len__(self):
        return len(self.whales_frame)

    def __getitem__(self, idx):
        img_name = os.path.join(self.root_dir,
                                self.whales_frame.iloc[idx, 0])
        image = io.imread(img_name)
        label = self.whales_frame.iloc[idx,1]
        sample = {'image': image, 'label': label}

        if self.transform:
            sample = self.transform(sample)

        return sample

    def encode(self):
        """ Encoding """
        df = pd.read_csv(train_path)
        unique_classes = pd.unique(df['Id'])
        encoding = dict(enumerate(unique_classes))
        encoding = {value: key for key, value in encoding.items()}
        df = df.replace(encoding)
        return df 

### Having a look at the dataset <a class="anchor" id="third-bullet"></a>
We can instantiate our WhalesDS model and pass the csv path file ( If you've downloaded this Kernel into your **personal computer**, change this to the location of the files). Then we can iterate through it and explore the images ( already a matrix ) with sample['image'] and remember our labels are being transformed on the instantiation of the dataset, so you wont be getting "new_whale" **but a number**.

In [None]:
whales_ds = WhalesDS(csv_file=train_path,
                     root_dir=humpback_whales_train_path)

fig = plt.figure()

for i in range(len(whales_ds)):
    sample = whales_ds[i]
    print(i, sample['image'].shape, sample['label'])

    ax = plt.subplot(1, 4, i + 1)
    ax.set_title('Sample #{}'.format(i))
    ax.axis('off')
    plt.imshow(sample['image'])

    if i == 3:
        plt.show()
        break

### Transforming our images  <a class="anchor" id="forth-bullet"></a>
Ńow  we have control over our labels and images. This let us have some work and prepare this to our model. Here we are using some personalized solutions, we are creating our **Rescale**, or **RandomCrop** and Transforming it to **Tensor**, you can have a look at some out of the box solutions from PyTorch [here](https://pytorch.org/docs/stable/torchvision/transforms.html)!
<br/>
But let me tell you why is good to create our own functions, this dataset has different type of images, RGB, grayscale, sometimes **PyTorch** only allows one type of image, and this will led you to a crazy trobleshooting with crazy errors (*Believe me hahah*). <br/>
It is **worth** the time to create this classes.

In [None]:
class Rescale(object):
    """Rescale the image in a sample to a given size.

    Args:
        output_size (tuple or int): Desired output size. If tuple, output is
            matched to output_size. If int, smaller of image edges is matched
            to output_size keeping aspect ratio the same.
    """

    def __init__(self, output_size):
        assert isinstance(output_size, (int, tuple))
        self.output_size = output_size

    def __call__(self, sample):
        image, label = sample['image'], sample['label']

        h, w = image.shape[:2]
        if isinstance(self.output_size, int):
            if h > w:
                new_h, new_w = self.output_size * h / w, self.output_size
            else:
                new_h, new_w = self.output_size, self.output_size * w / h
        else:
            new_h, new_w = self.output_size

        new_h, new_w = int(new_h), int(new_w)

        img = transform.resize(image, (new_h, new_w))

        return {'image': img, 'label': label}


class RandomCrop(object):
    """Crop randomly the image in a sample.

    Args:
        output_size (tuple or int): Desired output size. If int, square crop
            is made.
    """

    def __init__(self, output_size):
        assert isinstance(output_size, (int, tuple))
        if isinstance(output_size, int):
            self.output_size = (output_size, output_size)
        else:
            assert len(output_size) == 2
            self.output_size = output_size

    def __call__(self, sample):
        image, label = sample['image'], sample['label']

        h, w = image.shape[:2]
        new_h, new_w = self.output_size

        top = np.random.randint(0, h - new_h)
        left = np.random.randint(0, w - new_w)

        image = image[top: top + new_h,
                      left: left + new_w]

        return {'image': image, 'label': label}


class ToTensor(object):
    """Convert ndarrays in sample to Tensors."""

    def __call__(self, sample):
        image, label = sample['image'], sample['label']

        # swap color axis because
        # numpy image: H x W x C
        # torch image: C X H X W
        """ The original code didn't expect gray scale images """
        gray_scale_image = torch.zeros([image_size,image_size]).shape == image.shape
        if gray_scale_image:
            image = np.stack((image,)*3, axis=-1)
        image = image.transpose((2, 0, 1))
        return {'image': torch.from_numpy(image),
                'label': torch.tensor(label)}

### Transforming one Whale  <a class="anchor" id="fifth-bullet"></a>
 We can test this out with a random sample whales_ds[65] ( Not that **random** hah )  

In [None]:
scale = Rescale(int(image_size*1.25))
crop = RandomCrop(image_size)
composed = transforms.Compose([Rescale(int(image_size*1.25)),
                               RandomCrop(image_size)])

# Apply each of the above transforms on sample.
fig = plt.figure()
sample = whales_ds[65]
for i, tsfrm in enumerate([scale, crop, composed]):
    transformed_sample = tsfrm(sample)

    ax = plt.subplot(1, 3, i + 1)
    ax.set_title(type(tsfrm).__name__)

    plt.imshow(transformed_sample['image'])
plt.show()

### Creating our transformed dataset  <a class="anchor" id="sixth-bullet"></a>
With all of this created we can now instantiate our WhalesDS and pass our transform to the class and that's it!

In [None]:
transformed_dataset = WhalesDS(csv_file=train_path,
                                           root_dir=humpback_whales_train_path,
                                           transform=transforms.Compose([
                                               Rescale(int(image_size*1.25)),
                                               RandomCrop(image_size),
                                               ToTensor()
                                           ]))

for i in range(len(transformed_dataset)):
    sample = transformed_dataset[i]

    print(i, sample['image'].size(), sample['label'])

    if i == 3:
        break

### Loading it up  <a class="anchor" id="seventh-bullet"></a>
Now we can use the [DataLoader](https://pytorch.org/docs/stable/data.html) and iterate throughout our dataset! This is **IT** <br/>
You can now play around with any model

In [None]:
dataloader = DataLoader(transformed_dataset, batch_size=4,
                        shuffle=True, num_workers=4)

In [None]:
# Helper function to show a batch
def show_whale_batch(sample_batched):
    """Show whales for a batch of samples."""
    images_batch, labels_batch = \
            sample_batched['image'], sample_batched['label']
    batch_size = len(images_batch)
    im_size = images_batch.size(2)

    grid = utils.make_grid(images_batch)
    plt.imshow(grid.numpy().transpose((1, 2, 0)))

    for i in range(batch_size):
        plt.title('Batch from dataloader')

for i_batch, sample_batched in enumerate(dataloader):
    print(i_batch, sample_batched['image'].size(),
          sample_batched['label'])
    # observe 4th batch and stop.
    if i_batch == 3:
        plt.figure()
        show_whale_batch(sample_batched)
        plt.axis('off')
        plt.ioff()
        plt.show()
        break

### Resources  <a class="anchor" id="seventh-bullet"></a>
Here are some resources that I put together ( disclaimer here, this blog posts are from my site haha )

### Building a very simple sequential model <a class="anchor" id="seventh-bullet"></a>

This is a great way to play around if you are a begginner in the area. If you don't know much from building Neural Networks I have a few resources 

1. [Creating a Perceptron](https://jhonatandasilva.com/build-your-own-perceptron/)
2. [What are the building blocks of Deep Learning](https://jhonatandasilva.com/perceptrons/) 
3. [Play around with Neural Nets](https://jhonatandasilva.com/play-with-nn/)
4. [Training your Neural Net](https://jhonatandasilva.com/training-your-neural-networks/)
5. [When all comes together](https://jhonatandasilva.com/mnist-pytorch/) 

Exploring more on the Vision side there's also

1. [How Neural Nets sees the world ](https://jhonatandasilva.com/how-nn-sees-the-world/)

<img src="https://jhonatandasilva.com/wp-content/uploads/2018/12/cnns.gif" alt="drawing" width="400"/>

You can Look it up more resources on CNNs here

* [CNNs made it easy](https://jhonatandasilva.com/cnns-made-it-easy/) 
* [How the layers of CNNs works](https://jhonatandasilva.com/cnns-layers/)
