# <div class="alert alert-block alert-info" style="border-width:4px">SBrain DataSet Retry API </div>


### NOTE : This is a sample notebook. Please make a copy of it for yourself and try it out.
This notebook is a follow up tutorial. 
Please make sure to go through the [DataSetManagement-Basic](./DataSetManagement-Basic.ipynb) before trying out this notebook.
<a id='top'></a>
This notebook covers the following:
- [Create DataSet With Faulty Image Iterator](#faulty_image_iterator)
- [DataSetImageClassification Retry API](#retry_api)
- [Using Retry() API To Fix Faulty Image Iterator](#fix_image_iterator)
- [Create DataSet With Faulty Label Iterator](#faulty_label_iterator)
- [Using Retry() API To Fix Faulty Label Iterator](#fix_label_iterator)

This tutorial shows how to resume dataset creation, in case there is a failure due to 
image_iterator or label_iterator 


In [None]:
from sbrain.dataset import DataSetImageClassification,DataSetVersion,DataSetSplit
from sbrain.dataset import DataSetStatus,JobStatus,DataSetSplitStatus,DataSetVersionStatus
from sbrain.dataset import Transformation
import numpy as np
import cv2
import uuid
import time
from IPython.display import clear_output

#### **_NOTE_**: Following values should be unique across the SBrain System.
Please set it to some unique value before starting the tutorial, 
otherwise it will throw duplicate entry found error.


In [None]:
user_name = "admin"

In [None]:
import time
def unique_id():
    return str(int(time.time()))

<a id='faulty_image_iterator'></a>
# _Create DataSet With Faulty Image Iterator_
<div align="right"><a href="#top">BackToTheTop</a></div>

**_sbrain.dataset.DataSetImageClassification_** is an abstraction which supports creating and handling of image dataset for classification model training. 

DataSetImageClassification construtor takes the **_name_** of the parameter as input.

**_DataSetImageClassification.create()_** method takes following parameters:

- **source_archive_path** : the path to the directory containing the images and labels. 
- **classes** :[optional] a dict with class names in the dataset as the keys and class ids as values
- **collection_date** : date of collection of data in string format **_mm-dd-yyyy_**
- **image_iterator** : function returning an iterator to the list of path of images in the archive
- **label_iterator** : function returning an iterator. Each element returned by iterator is 
a tuple (image name, class id)


In [None]:
# defining classes

classes = {
                'airplane': 0,
                'automobile':1,
                'bird': 2,
                'cat': 3,
                'deer': 4,
                'dog': 5,
                'frog': 6,
                'horse': 7,
                'ship': 8,
                'truck': 9
            }

#### NOTE : In the following example the image iterator, an exception has been artificially introduced to show dataset creation failing due to faulty image_iterator function

In [None]:
# defining iterator to get image file paths

def faulty_iterator_images(data_root_path):
    import glob
    result = []
    files = glob.glob("{}/*.*".format(data_root_path))
    cnt = 0
    for f in files:
        cnt = cnt+1
        if cnt == 50:
            raise Exception("Sample exception just for testing")
            
        yield f
    

# defining iterator to get tuples (image_name, class_id) e.g. (xyz.jpeg,1)
def iterator_labels(data_root_path):
    import glob
    files = glob.glob("{}/*.*".format(data_root_path))
    labels = []
    classes = {
                'airplane': 0,
                'automobile':1,
                'bird': 2,
                'cat': 3,
                'deer': 4,
                'dog': 5,
                'frog': 6,
                'horse': 7,
                'ship': 8,
                'truck': 9
            }
    for f in files:
        img_name =  f.split('/')[-1:][0]
        lbl_str = img_name[img_name.index('_')+1:img_name.index('.')]
        lbl_id = classes[lbl_str]
        labels.append((img_name, lbl_id))    
    return iter(labels)  
    
  

### NOTE : After a few mins the job will fail and you should see a message saying "Please fix your image iterator and use ImageClassification.retry() api

In [None]:
# creating dataset
dataset_name = "cifar10-small-{}".format(unique_id())

job = DataSetImageClassification(name=dataset_name).create(
    description = "Dataset with subset images from cifar 10 dataset",
    source_archive_path = "shared-dir/sample-notebooks/demo-data/cifar10_small",
    classes=classes,
    collection_date="07-25-2018",
    image_iterator=faulty_iterator_images,
    label_iterator=iterator_labels
)

#Check Job Status

while job.status != JobStatus.COMPLETE.value and job.status != JobStatus.FAILED.value:
    clear_output(wait=True)
    job = job.get_status()
    time.sleep(2)

#### DataSet.create() will return a DataSetExtractionJob object
The job object can be used to track the progress of DataSet creation.

job.getdataset() will return a DataSet object that's a handle to the new dataset created

#### Search results will show the dataset  and the version "v1" (default version of any dataset) with status "CreationFailed"

In [None]:
DataSetImageClassification.search(name=dataset_name)
ds = DataSetImageClassification.lookup(dataset_name)
ds.search_versions(version_name="v1")

<a id='retry_api'></a>
# DataSetImageClassification Retry API
<div align="right"><a href="#top">BackToTheTop</a></div>

**_DataSetImageClassification.retry_create()_** api can be used with the following parameters:

- **source_archive_path** :[optional] the path to the directory containing the images and labels.
- **image_iterator** : [optional] function returning an iterator to the list of path of images in the archive
- **label_iterator** : [optional] function returning an iterator. Each element returned by iterator is a tuple (image name, class id)
- **classes** :[optional] a dict with class names in the dataset as the keys and class ids as values
- **collection_date** : [optional] date of collection of data in string format **_mm-dd-yyyy_**

NOTE : 
1. if "source_archive_path" parameter is not provided while retry_create(), the original path provided while dataset.create() api will be used

2. if the dataset.create() failed because of faulty image iterator, you can call the retry_create() with only the "image_iterator" parameter. Other parameters are optional and the original values provided in create() will be used

3. if the dataset.create() failed because of faulty label iterator, you can call the retry_create() with only the "label_iterator" parameter. Other parameters are optional and the original values provided in create() will be used

4. "collection_date" parameter can be used to override the date given in original create() api, only if this parameter is passed to retry_create() along with "source_archive_path" and/or "image_iterator" parameters


<a id='fix_image_iterator'></a>
## Using Retry() API To Fix Faulty Image Iterator

Lets retry creation using an image_iterator that is fixed now.
<div align="right"><a href="#top">BackToTheTop</a></div>

In [None]:
def good_iterator_images(data_root_path):
    import glob
    result = []
    files = glob.glob("{}/*.*".format(data_root_path))
    for f in files:
        yield f
    

In [None]:
job = ds.retry_create(image_iterator=good_iterator_images)

#Check job status
while job.status != JobStatus.COMPLETE.value and job.status != JobStatus.FAILED.value:
    clear_output(wait=True)
    job = job.get_status()
    time.sleep(2)

<a id='faulty_label_iterator'></a>
# Create DataSet With Faulty Label Iterator
<div align="right"><a href="#top">BackToTheTop</a></div>

In [None]:
# defining iterator to get tuples (image_name, class_id) e.g. (xyz.jpeg,1)
def faulty_iterator_labels(data_root_path):
    import glob
    files = glob.glob("{}/*.*".format(data_root_path))
    labels = []
    classes = {
                'airplane': 0,
                'automobile':1,
                'bird': 2,
                'cat': 3,
                'deer': 4,
                'dog': 5,
                'frog': 6,
                'horse': 7,
                'ship': 8,
                'truck': 9
            }
    cnt = 0
    for f in files:
        cnt = cnt+1
        if cnt == 50:
            raise Exception("Sample exception just for testing")
        img_name =  f.split('/')[-1:][0]
        lbl_str = img_name[img_name.index('_')+1:img_name.index('.')]
        lbl_id = classes[lbl_str]
        yield(img_name, lbl_id)
   

In [None]:
dataset_name = "cifar10-small-{}".format(unique_id())

job = DataSetImageClassification(name=dataset_name).create(
    description = "Dataset with subset of cifar 10 images",
    source_archive_path = "shared-dir/sample-notebooks/demo-data/cifar10_small",
    classes=classes,
    collection_date="07-25-2018",
    image_iterator=good_iterator_images,
    label_iterator=faulty_iterator_labels
)

#check job status
while job.status != JobStatus.COMPLETE.value and job.status != JobStatus.FAILED.value:
    clear_output(wait=True)
    job = job.get_status()
    time.sleep(2)

<a id='fix_label_iterator'></a>
## Using Retry() API To Fix Faulty Label Iterator

Lets retry creation using an label_iterator that is fixed now.
<div align="right"><a href="#top">BackToTheTop</a></div>

In [None]:
DataSetImageClassification.search(name=dataset_name)
ds = DataSetImageClassification.lookup(dataset_name)
ds.search_versions(version_name="v1")

In [None]:
def good_iterator_labels(data_root_path):
    import glob
    files = glob.glob("{}/*.*".format(data_root_path))
    labels = []
    classes = {
                'airplane': 0,
                'automobile':1,
                'bird': 2,
                'cat': 3,
                'deer': 4,
                'dog': 5,
                'frog': 6,
                'horse': 7,
                'ship': 8,
                'truck': 9
            }
    for f in files:
        img_name =  f.split('/')[-1:][0]
        lbl_str = img_name[img_name.index('_')+1:img_name.index('.')]
        lbl_id = classes[lbl_str]
        yield(img_name, lbl_id)

In [None]:
job = ds.retry_create(label_iterator=good_iterator_labels)

# check job status
while job.status != JobStatus.COMPLETE.value and job.status != JobStatus.FAILED.value:
    clear_output(wait=True)
    job = job.get_status()
    time.sleep(2)

In [None]:
DataSetImageClassification.search(name=dataset_name)

## **_<font color="green">Congratulations !!! You completed the tutorial successfully.</font>_**