# Download and format dataset notebook

## Grocery products recognition

Hello and welcome to this notebook. 

This notebook trains a YOLOv2 algorithm in order to localize the grocery products in the supermarket shell.

The following dataset is used to train this CNN:
[Grocery Store dataset](https://www.amazon.de/clouddrive/share/J3OaZMNnhBpKG28mAfs5CqTgreQxFCY8uENGaIk7H3s?_encoding=UTF8&mgh=1&ref_=cd_ph_share_link_copy).

This dataset was annotated in the work by [George, Marian and Floerkemeier](http://vision.disi.unibo.it/index.php?option=com_content&view=article&id=111&catid=78). (There is only 76 images annotated.)

The Yolo (You Only Look Once) algorithm is presented in the following papers:
* Redmon et al., 2016 (https://arxiv.org/abs/1506.02640) 
* Redmon and Farhadi, 2016 (https://arxiv.org/abs/1612.08242).

The yolo algorithm was originally tested in [Darknet]( https://pjreddie.com/darknet). This algorithm has been implemented in python for several machine-learning frameworks. This work is based in the [YAD2K]( https://github.com/allanzelener/YAD2K) implementation for Keras and Tensorflow.

#### This notebook will guide you in order put in format the data for training

Libraries:

In [1]:
import os
import pandas as pd
import numpy as np
import wget
import zipfile
from matplotlib.pyplot import imshow

from PIL import Image, ImageDraw
%matplotlib inline

Download the annotations from [George, Marian and Floerkemeier](http://vision.disi.unibo.it/index.php?option=com_content&view=article&id=111&catid=78)

In [2]:
# download the dataset 
my_dataset = "Dataset"

if not os.path.isdir(my_dataset) :
    file_path=wget.download('http://vision.deis.unibo.it/joomla/images/research/Product-recognition/PlanogramDataset.zip')
    with zipfile.ZipFile(file_path,"r") as zip_ref:
        zip_ref.extractall()
    os.rename("Planogram Dataset","Dataset")
    os.remove(file_path)
    #correct error in the dataset
    file_path='Dataset/annotations/s4_46.csv'
    df = pd.read_csv(file_path)
    df.drop(df.index[10],inplace=True)
    df.to_csv(file_path)

#### Please download and extract the dataset!!

Please download and extract the dataset from [Grocery Store dataset](https://www.amazon.de/clouddrive/share/J3OaZMNnhBpKG28mAfs5CqTgreQxFCY8uENGaIk7H3s?_encoding=UTF8&mgh=1&ref_=cd_ph_share_link_copy).

In [3]:
my_dataset = "Grocery_products"
if not os.path.isdir(my_dataset) :
    print('Please download and extract the dataset!!')

In [4]:
def corners_to_yolo_boxes(annotations,im_size):
    """Convert bounding box corners to YOLO box style."""
    
    df = pd.read_csv(annotations,header=None,names=['path_image','xmin','ymin','xmax','ymax'])
    
    dw = 1./im_size[1]
    dh = 1./im_size[0]

    df['box_width'] = dw*(df.xmax-df.xmin)
    df['box_height'] = dh*(df.ymax-df.ymin)
    df['x_center'] = dw*((df.xmax+df.xmin)/2)
    df['y_center'] = dh*((df.ymax+df.ymin)/2)
    df['class']=0
    
    df.drop(['path_image','xmin','ymin','xmax','ymax'],axis=1,inplace=True)

    df = df[['x_center', 'y_center', 'box_width', 'box_height','class']]

    return df

The folloing function creates the txt files containing the paths to the images and boxes

In [5]:
def read_annotations_and_create_list():
    """
    Read the annotations and create the list of ellements to train.
    This function creates the files containing the paths to the images and annotations in yolo format
    This fuction is spatialy desing for the "Grocery products recognition" project
    """
    yolo_annotations = 'yolo_annotations'
    path_annotations = 'Dataset/annotations'
    path_dataset = "Grocery_products/Testing/"
    
    boxes_path='model_data/boxes_path.txt'
    images_path='model_data/images_path.txt'
    
    if not os.path.exists(yolo_annotations):
        os.makedirs(yolo_annotations)
    
    for root, dirs, files in os.walk(path_annotations):
        print("Done")
        
    list_csv=[root + "/" + name for name in files]
    
    file_annotations = open(boxes_path,'w') 
    file_images      = open(images_path,'w') 
    
    boxes = []
    images_list = []
      
    for annotations in list_csv:

        temporal=annotations.split('/')[-1].split('.')[0].split('_')
        img_path=path_dataset+temporal[0].replace('s','store')+'/images/'+temporal[1]+'.jpg'
        images_list.append(img_path)
        
        annotations_path= yolo_annotations+'/'+annotations.split('/')[-1].split('.')[0]+'.txt'

        im = np.array(Image.open(img_path), dtype=np.uint8)
       
        a,b,c=im.shape 
        w= int(a)
        h= int(b)
        df=corners_to_yolo_boxes(annotations,(w,h))
        boxes.append(df.as_matrix())
               
        df.to_csv(annotations_path, header=None, index=None, sep=' ', mode='w')
        
        file_annotations.write(annotations_path+"\n")
        file_images.write(img_path+"\n")
        
        
    file_annotations.close() 
    file_images.close()
    
    return boxes_path,images_path

In [6]:
boxes_path, images_path= read_annotations_and_create_list()

Done
