# DATA20001 Deep Learning - Group Project
## Image project

**Due Thursday, December 13, before 23:59.**

The task is to learn to assign the correct labels to a set of images.  The images are originally from a photo-sharing site and released under Creative Commons-licenses allowing sharing.  The training set contains 20 000 images. We have resized them and cropped them to 128x128 to make the task a bit more manageable.

We're only giving you the code for downloading the data. The rest you'll have to do yourselves.

Some comments and hints particular to the image project:

- One image may belong to many classes in this problem, i.e., it's a multi-label classification problem. In fact there are images that don't belong to any of our classes, and you should also be able to handle these correctly. Pay careful attention to how you design the outputs of the network (e.g., what activation to use) and what loss function should be used.

- As the dataset is pretty imbalanced, don't focus too strictly on the outputs being probabilistic. (Meaning that the right threshold for selecting the label might not be 0.5.)

- Image files can be loaded as numpy matrices for example using `imread` from `matplotlib.pyplot`. Most images are color, but a few grayscale. You need to handle the grayscale ones somehow as they would have a different number of color channels (depth) than the color ones.

- In the exercises we used e.g., `torchvision.datasets.MNIST` to handle the loading of the data in suitable batches. Here, you need to handle the dataloading yourself.  The easiest way is probably to create a custom `Dataset`. [See for example here for a tutorial](https://github.com/utkuozbulak/pytorch-custom-dataset-examples).

## Download the data

In [4]:
import os
import torch
import torchvision
from torchvision.datasets.utils import download_url
import zipfile


In [1]:
train_path = 'train'
dl_file = 'dl2018-image-proj.zip'
dl_url = 'https://users.aalto.fi/mvsjober/misc/'

zip_path = os.path.join(train_path, dl_file)
if not os.path.isfile(zip_path):
    download_url(dl_url + dl_file, root=train_path, filename=dl_file, md5=None)

with zipfile.ZipFile(zip_path) as zip_f:
    zip_f.extractall(train_path)
    #os.unlink(zip_path)

Downloading https://users.aalto.fi/mvsjober/misc/dl2018-image-proj.zip to train/dl2018-image-proj.zip


In [175]:
if torch.cuda.is_available():
    print('Using GPU!')
    device = torch.device('cuda')
else:
    print('Using CPU')
    device = torch.device('cpu')

Using GPU!


The above command downloaded and extracted the data files into the `train` subdirectory.

The images can be found in `train/images`, and are named as `im1.jpg`, `im2.jpg` and so on until `im20000.jpg`.

The class labels, or annotations, can be found in `train/annotations` as `CLASSNAME.txt`, where CLASSNAME is one of the fourteen classes: *baby, bird, car, clouds, dog, female, flower, male, night, people, portrait, river, sea,* and *tree*.

Each annotation file is a simple text file that lists the images that depict that class, one per line. The images are listed with their number, not the full filename. For example `5969` refers to the image `im5969.jpg`.

## Your stuff goes here ...

In [5]:
import os
from os import listdir
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import random as rd
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import datasets, transforms
import matplotlib.pyplot as plt
import numpy as np
img_path = "./train/images/"
labels_path = "./train/annotations/"


Get all annotations & images paths

In [6]:
all_img_paths = sorted([img_path + current_img_name  for current_img_name  in listdir(img_path) ])
all_labels_paths = sorted([labels_path + current_label_name  for current_label_name  in listdir(labels_path) ])

In [7]:
all_img_nb = [all_img_paths[i].split("/")[-1].split(".")[0].split("im")[-1] for i in range(len(all_img_paths))]
all_labels_names = [all_labels_paths[i].split("/")[-1].split(".")[0] for i in range(len(all_labels_paths))]

LABELS PREPROCESSING 

ADD labelled Images

In [8]:
img_labels = {}
for current_label in range(len(all_labels_paths)):
    current_label_contents = pd.read_csv(all_labels_paths[current_label])
    current_label_img_nb = sorted(np.append(list(current_label_contents), current_label_contents.get_values() ).tolist())
    for current_img in current_label_img_nb:
        if( current_img not in img_labels.keys()):
            img_labels[current_img] = np.zeros(len(all_labels_names))
        img_labels[current_img][current_label] = 1
        

ADD unlabelled images

In [9]:
def unmatched_set(a, b):
    return [[a_i for a_i in a if a_i not in b], [b_i for b_i in b if b_i not in a]]

Get disjoined sets of unlabelled images and inexistent labelled images .

In [10]:
unmatched_set_images = unmatched_set( list(img_labels.keys()) , np.array(all_img_nb))

In [11]:
for current_unlab_img in unmatched_set_images[1]:
    img_labels[current_unlab_img] = np.zeros(len(all_labels_names))

IMAGES PREPROCESSING

There are 2 kinds of pictures :  
  - the first kind : The shapes equal to (128, 128, 3)
  - the second kind : The shapes equal to (128, 128)
  

In [12]:
img_shape = {}
for current_img_path in all_img_paths:
    current_img_vector =  plt.imread( current_img_path )
    if(str(current_img_vector.shape) not in img_shape.keys()):
        img_shape[str(current_img_vector.shape)] = []
    img_shape[str(current_img_vector.shape)].append(current_img_path)

In [13]:
img_shape.keys()

dict_keys(['(128, 128, 3)', '(128, 128)'])

RGB images

In [14]:
img_vector_3 = {}
#To process with all images, don't forget to remove "[:10]"
for current_img_path in img_shape["(128, 128, 3)"][:1024]:
    I =  plt.imread( current_img_path )
    modified_current_img_vector = np.array([I[:,:,0], I[:,:,1], I[:,:,2]])
    img_vector_3[current_img_path.split("/")[-1].split(".")[0].split("im")[-1]] = modified_current_img_vector


Grey images

In [15]:
img_vector_1 = {}

for current_img_path in img_shape["(128, 128)"]:
    current_img_vector =  plt.imread( current_img_path )
    #Do we have to reshape to (128*128) or (1,128*128) ?
    img_vector_1[current_img_path.split("/")[-1].split(".")[0].split("im")[-1]] = np.reshape(current_img_vector, (1,128,128))


In [16]:
nb_img_vector_3 = len(img_vector_3.keys())

Create train dataset and test dataset from 3 dimention images

In [46]:

batch_size = 64

img_names = rd.sample(list(img_vector_3.keys()), nb_img_vector_3)
train_img_name = img_names[ :int(nb_img_vector_3 * 0.8)]
test_img_name = img_names[ int(nb_img_vector_3 * 0.8):]


X_Train = torch.from_numpy(np.array([ [ img_vector_3[ train_img_name[id_name*batch_size + id_batch]]  for id_batch in range(batch_size) ] for id_name in range(int(len(train_img_name)/batch_size))]))
y_Train = torch.from_numpy(np.array([ [ img_labels[   train_img_name[id_name*batch_size + id_batch]]  for id_batch in range(batch_size) ]for id_name in range(int(len(train_img_name)/batch_size))]))

X_Test = torch.from_numpy(np.array([ [ img_vector_3[ test_img_name[id_name*batch_size + id_batch]]  for id_batch in range(batch_size) ] for id_name in range(int(len(test_img_name)/batch_size))]))
y_Test = torch.from_numpy(np.array([ [ img_labels[   test_img_name[id_name*batch_size + id_batch]]  for id_batch in range(batch_size) ]for id_name in range(int(len(test_img_name)/batch_size))]))

X_Train = X_Train.type('torch.FloatTensor')
y_Train = y_Train.type('torch.FloatTensor')
X_Test = X_Test.type('torch.FloatTensor')
y_Test = y_Test.type('torch.FloatTensor')


In [47]:
print(X_Train.shape)
print(y_Train.shape)
print(X_Test.shape)
print(y_Test.shape)

torch.Size([12, 64, 3, 128, 128])
torch.Size([12, 64, 14])
torch.Size([3, 64, 3, 128, 128])
torch.Size([3, 64, 14])


In [48]:
X_Test.shape

torch.Size([3, 64, 3, 128, 128])

In [49]:
X_Test[0].shape

torch.Size([64, 3, 128, 128])

In [50]:
train_loader = zip(X_Train, y_Train)
test_loader = zip(X_Test, y_Test)

In [57]:
for  batch_idx, (data, target) in enumerate(train_loader) :
    print(data.shape)
    print(target.shape)
    print(batch_idx)

torch.Size([64, 3, 128, 128])
torch.Size([64, 14])
0
torch.Size([64, 3, 128, 128])
torch.Size([64, 14])
1
torch.Size([64, 3, 128, 128])
torch.Size([64, 14])
2
torch.Size([64, 3, 128, 128])
torch.Size([64, 14])
3
torch.Size([64, 3, 128, 128])
torch.Size([64, 14])
4
torch.Size([64, 3, 128, 128])
torch.Size([64, 14])
5
torch.Size([64, 3, 128, 128])
torch.Size([64, 14])
6
torch.Size([64, 3, 128, 128])
torch.Size([64, 14])
7
torch.Size([64, 3, 128, 128])
torch.Size([64, 14])
8
torch.Size([64, 3, 128, 128])
torch.Size([64, 14])
9
torch.Size([64, 3, 128, 128])
torch.Size([64, 14])
10


random NN

In [51]:

class Net(nn.Module):
    def __init__(self ):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 10, kernel_size=3)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=3)
        self.mp = nn.MaxPool2d(2)
        self.do = nn.Dropout(p=0.5)
        self.bn1 = nn.BatchNorm1d(num_features=18000)
        self.fc1 = nn.Linear(18000, 240)
        self.fc2 = nn.Linear(240, 120)
        self.fcout = nn.Linear(120, 14)
        
    def forward(self, x):
        in_size = x.size(0)
        x = F.relu(self.mp(self.conv1(x)))
        x = F.relu(self.mp(self.conv2(x)))
        x = x.view(in_size, -1)  # flatten the tensor
        x = self.do(x)
        x =  F.relu(self.bn1(x))
        x = F.relu((self.fc1(x)))
        x = F.relu((self.fc2(x)))
        x = F.log_softmax(self.fcout(x))
        return x
    
    
model = Net( )#.to(device)
optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.5) 
criterion = nn.CrossEntropyLoss()  
print(model)

Net(
  (conv1): Conv2d(3, 10, kernel_size=(3, 3), stride=(1, 1))
  (conv2): Conv2d(10, 20, kernel_size=(3, 3), stride=(1, 1))
  (mp): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (do): Dropout(p=0.5)
  (bn1): BatchNorm1d(18000, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (fc1): Linear(in_features=18000, out_features=240, bias=True)
  (fc2): Linear(in_features=240, out_features=120, bias=True)
  (fcout): Linear(in_features=120, out_features=14, bias=True)
)


Forward

In [52]:
model.forward(X_Train[0])



tensor([[-2.6899, -2.7569, -2.6692, -2.6296, -2.5993, -2.6312, -2.7027, -2.7495,
         -2.5297, -2.7054, -2.6273, -2.6172, -2.5819, -2.4954],
        [-2.6685, -2.7440, -2.6553, -2.5907, -2.5834, -2.6164, -2.6697, -2.7932,
         -2.5452, -2.7378, -2.7034, -2.6131, -2.5453, -2.5245],
        [-2.6803, -2.7571, -2.6869, -2.6151, -2.5898, -2.6008, -2.6823, -2.6790,
         -2.5968, -2.7356, -2.6304, -2.6273, -2.5673, -2.5254],
        [-2.6780, -2.7354, -2.6554, -2.6097, -2.5995, -2.6485, -2.7248, -2.7138,
         -2.5579, -2.7064, -2.6376, -2.6320, -2.5729, -2.5052],
        [-2.6646, -2.8393, -2.6370, -2.5931, -2.5790, -2.5678, -2.6486, -2.7410,
         -2.5439, -2.7338, -2.6833, -2.6081, -2.6648, -2.4941],
        [-2.6623, -2.7782, -2.6019, -2.5803, -2.6188, -2.6427, -2.7015, -2.7456,
         -2.5582, -2.7160, -2.7286, -2.5846, -2.5391, -2.5313],
        [-2.6920, -2.7627, -2.6730, -2.5706, -2.6032, -2.6387, -2.6815, -2.7004,
         -2.5685, -2.7386, -2.6336, -2.6042, -2.5

In [54]:
def train(epoch,trainv, log_interval=100):
    # Set model to training mode
    model.train()
    train_loss = 0
    
    # Loop over each batch from the training set
    for batch_idx, (data, target) in enumerate(train_loader):
        # Copy data to GPU if needed
        #data = data.to(device)
        #target = target.to(device)

        # Zero gradient buffers
        optimizer.zero_grad() 
        
        # Pass data through the network
        output = model(data.type("torch.FloatTensor"))

        # Calculate loss
        loss = criterion(output, target)
        train_loss += loss.data.item()
        
        # Backpropagate
        loss.backward()
        
        # Update weights
        optimizer.step()
        
        if batch_idx % log_interval == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.data.item()))
    trainv.append( train_loss / float(len(train_loader)))

In [55]:
def validate(loss_vector, accuracy_vector):
    model.eval()
    val_loss, correct = 0, 0
    for data, target in validation_loader:
        #data = data.to(device)
        #target = target.to(device)
        
        output = model(data)
        val_loss += criterion(output, target).data.item()
        pred = output.data.max(1)[1] # get the index of the max log-probability
        correct += pred.eq(target.data).cpu().sum()

    val_loss /= len(validation_loader)
    loss_vector.append(val_loss)

    accuracy = 100. * correct / len(validation_loader.dataset)
    accuracy_vector.append(accuracy)
    
    print('\nValidation set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        val_loss, correct, len(validation_loader.dataset), accuracy))

In [56]:
%%time
epochs = 2

lossv, accv, trainv = [], [], []
for epoch in range(1, epochs + 1):
    train(epoch, trainv)
    #validate(lossv, accv)



RuntimeError: Expected object of type torch.LongTensor but found type torch.FloatTensor for argument #2 'target'

## Save your model

It might be useful to save your model if you want to continue your work later, or use it for inference later.

In [None]:
torch.save(model.state_dict(), 'model.pkl')

The model file should now be visible in the "Home" screen of the jupyter notebooks interface.  There you should be able to select it and press "download".  [See more here on how to load the model back](https://github.com/pytorch/pytorch/blob/761d6799beb3afa03657a71776412a2171ee7533/docs/source/notes/serialization.rst) if you want to continue training later.

## Download test set

The testset will be made available during the last week before the deadline and can be downloaded in the same way as the training set.

## Predict for test set

You should return your predictions for the test set in a plain text file.  The text file contains one row for each test set image.  Each row contains a binary prediction for each label (separated by a single space), 1 if it's present in the image, and 0 if not. The order of the labels is as follows (alphabetic order of the label names):

    baby bird car clouds dog female flower male night people portrait river sea tree

An example row could like like this if your system predicts the presense of a bird and clouds:

    0 1 0 1 0 0 0 0 0 0 0 0 0 0
    
The order of the rows should be according to the numeric order of the image numbers.  In the test set, this means that the first row refers to image `im20001.jpg`, the second to `im20002.jpg`, and so on.

If you have the prediction output matrix prepared in `y` you can use the following function to save it to a text file.

In [None]:
np.savetxt('results.txt', y, fmt='%d')