# <!-- TITLE --> Loading datetsets
<!-- DESC --> Loading images:
- Database import (Local,AWS, github)
- Cleaning the datasets
- Separation into subsets (Train, Test, Meta)
- Reading the dataset

#### Datatset : https://deepai.org/dataset/cub-200-2011

<!-- DESC -->

**Rename the folder 'images' file in CUB-200-2011 to images_orig and add the correct path for the dataset in the ROOT_DIR**

#### Using Pylint : 

```Pylint [options] modules_or_packages```
<!-- DESC -->

##### Source code analysis section : 

````MESSAGE_TYPE: LINE_NUM:[OBJECT:] MESSAGE````

Output explications : 
 - [I]nformational messages that Pylint emits (do not contribute to your analysis score)
 - [R]efactor for a "good practice" metric violation
 - [C]onvention for coding standard violation
 - [W]arning for stylistic problems, or minor programming issues
 - [E]rror for important programming issues (i.e. most probably bug)
 - [F]atal for errors which prevented further processing





# Import and init

In [55]:
import os
import tensorflow as tf
import pandas as pd
import shutil


ROOT_DIR = ''

# Prepare the dataset

#### Loads the image files paths with image_ID (images.txt), and train_test_split.txt designation into Pandas Dataframes

In [56]:
orig_images_folder = 'images_orig'
new_images_folder = 'images'

image_fnames = pd.read_csv(filepath_or_buffer=os.path.join(ROOT_DIR, 'images.txt'),
                           header=None,
                           delimiter=' ',
                           names=['Img ID', 'file path'])

image_fnames['is training image?'] = pd.read_csv(filepath_or_buffer=os.path.join(ROOT_DIR, 'train_test_split.txt'),
                                                 header=None, delimiter=' ',
                                                 names=['Img ID', 'is training image?'])['is training image?']
image_fnames.head()


Unnamed: 0,Img ID,file path,is training image?
0,1,001.Black_footed_Albatross/Black_Footed_Albatr...,0
1,2,001.Black_footed_Albatross/Black_Footed_Albatr...,1
2,3,001.Black_footed_Albatross/Black_Footed_Albatr...,0
3,4,001.Black_footed_Albatross/Black_Footed_Albatr...,1
4,5,001.Black_footed_Albatross/Black_Footed_Albatr...,1


#### Let's modify the dataset files structure to pytorch ImageFolder form.
<!-- DESC --> 
Using the train_test_split.txt file, each image is copied either to the relevant folder in either the train or test folders.

In [57]:
data_dir = os.path.join(ROOT_DIR, orig_images_folder)
new_data_dir = os.path.join(ROOT_DIR, new_images_folder)
os.makedirs(os.path.join(new_data_dir, 'train'), exist_ok=True)
os.makedirs(os.path.join(new_data_dir, 'test'), exist_ok=True)

for i_image, image_fname in enumerate(image_fnames['file path']):
    if image_fnames['is training image?'].iloc[i_image]:
        new_dir = os.path.join(new_data_dir, 'train', image_fname.split('/')[0])
        os.makedirs(new_dir, exist_ok=True)
        shutil.copy(src=os.path.join(data_dir, image_fname), dst=os.path.join(new_dir, image_fname.split('/')[1]))
    else:
        new_dir = os.path.join(new_data_dir, 'test', image_fname.split('/')[0])
        os.makedirs(new_dir, exist_ok=True)
        shutil.copy(src=os.path.join(data_dir, image_fname), dst=os.path.join(new_dir, image_fname.split('/')[1]))
print('prepare dataset structure done.')

prepare dataset structure done.


# Parameters

In [58]:
# scale = 0.2
# progress_verbosity = 1

data_dir = os.path.join(ROOT_DIR, 'images')
train_data_dir = os.path.join(data_dir, 'train')
test_data_dir = os.path.join(data_dir, 'test')
