# STARR Project - General Data Collection Routine
Author: George Gorospe

Note: This notebook draws heavily from Nvidia's jetbot code base found at: https://github.com/NVIDIA-AI-IOT/jetbot

Before we can get started we need to ensure that we have the jetbot module installed. The jetbot module provides some useful tools that will help us collect data.

If you encounter an error early in the is notebook it is likely that you'll need to downdload and install the jetbot module. To do this follow the next two steps:
1. From a terminal on the nano, use: git clone https://github.com/NVIDIA-AI-IOT/jetbot.git
2. Once inside your new jetbot folder, use: sudo python3 setup.py install


### Getting Started

We can use this basic routine for collecting and labeling different types of data.
Start by entering the name of the object you're photographing. The routine will create a new folder in your data sets directory for the data you collect.

You'll want to take multiple photos of the object all by itself in differient orientations, locations, and lighting conditions. a good combination of all of these will ensure a robust AI capable of identifying the object with high accuracy.

If you are collecting location data, i.e. gym, library, classroom, make sure to take photos of all the objects that will always be in the room. This means taking photos of the walls, desks, tables, trashcans, but not backpacks, coats, or note books.

Final Note: This is experimental software, it has been tested in limited environments and may not do everything we expect. Use it, experiment with it, and feel free to change it. If you break the software you can always download it again from the source and experiment more.

### Display live camera feed

To start, we'll initialize and display the feed from our camera.
Important part here is that we size the image to fit what our neural network.
For certain tasks it may be better to collect larger images then downscale later.

In [None]:
# First we'll import some tools to help us get the camera started and to make this note book interactive
import traitlets
import ipywidgets.widgets as widgets
from ipywidgets import interact, interactive, fixed, interact_manual
from IPython.display import Image, display
from jetbot import Camera, bgr8_to_jpeg

camera = Camera.instance(width=224, height=224)

image = widgets.Image(format='jpeg', width=224, height=224)  # this width and height doesn't necessarily have to match the camera

camera_link = traitlets.dlink((camera, 'value'), (image, 'value'), transform=bgr8_to_jpeg)

display(image)

Next we want to label the data we'll be collecting:
use the text box to create a label.

In [None]:
import os  # This library allows us to explore and modify the file structure on the Nano
# Creating a function to add a folder to our directory
def folder_function(label):
    # Using the "try/except" statement here because the makedirs function can throw an error if the directory exists already
    try:
        os.makedirs('data/'+label)
    except:
        print('Directory no created because it already exists')
    print("Creating data folder: data/"+label)            
    
# Creating the interactive text box widget
# Creating the interactive text box widget
interactiveTextBox = interactive(folder_function,{'manual': True}, label = widgets.Text(value='structured_string',placeholder = 'structured_string',description='Enter Label:'));
interactiveTextBox # calling the interactive element

If you refresh the Jupyter file browser on the left, you should now see a new directory with your label.  

Now we have a folder with our label, we're nearly ready to start saving images inside our folder.
It is important to understand that the name of the image file is not important for our training purposes, only the folder name is important.
This means that the images can have names like img1.jpg, tm223.jpg, cat.jpg, ect. But if they are all in a folder titled, "dog" then they will be interpreted as dogs.

Since we don't want to manually name each image we collect, we'll use the ``uuid`` package in python, which defines the ``uuid1`` method to generate
a unique identifier.  This unique identifier is generated from information like the current time and the machine address.


In [None]:
from uuid import uuid1

# the save snapshot function will collect an image and save it to file
# This is a callback function, it is executed when we press a button to collect images
def save_snapshot(directory):
    global image_count
    image_path = os.path.join(directory, str(uuid1()) + '.jpg')
    with open(image_path, 'wb') as f:
        f.write(image.value)
        image_count.value = len(os.listdir(directory))

In [None]:
# Constructing the interative textbox for the number of photos and the button to collect images

label = interactiveTextBox.children[0].value # take the label text from the interactive text box module
directory = 'data/' + label
button_layout = widgets.Layout(width='128px', height='64px')
collect_button = widgets.Button(description='Collect Image', button_style='success', layout=button_layout)
image_count = widgets.IntText(layout=button_layout, value=len(os.listdir(directory)))
    
# attach the callbacks, we use a 'lambda' function to ignore the
# parameter that the on_click event would provide to our function
# because we don't need it.
collect_button.on_click(lambda x: save_snapshot(directory))
display(image)
display(widgets.HBox([image_count, collect_button]))

Great! Now that we have an interface for collecting data we can go out and start collecting high quality images.

Collect images of things that are the same class. Cups, shoes, books, backpacks.
Vary the position and orientation of the object, the lighting, the ground surface.
Try to have limited other things in the background, walls are fine, but you don't want to have other class objects.
This means no banana for scale. Just the object we're interested in.

Here are some tips for labeling data

1. Try different orientations
2. Try different lighting
3. Try varied object / collision types; walls, ledges, objects
4. Try different textured floors / objects;  patterned, smooth, glass, etc.

Ultimately, the more data we have of scenarios the robot will encounter in the real world, the better our object classification and navigation (collision avoidance) performance  will be.  It's important
to get *varied* data (as described by the above tips) and not just a lot of data, but you'll probably need at least 100 images of each class (that's not a science, just a helpful tip here).  But don't worry, it goes pretty fast once you get going :)


## Next

Once you've collected lots of images of each class, go back up to the top of the notebook and start again with a new class.