# Explore the dataset


In this notebook, we will perform an EDA (Exploratory Data Analysis) on the processed Waymo dataset (data in the `processed` folder). In the first part, you will create a function to display 

In [1]:
from utils import get_dataset
import numpy as np
import tensorflow as tf
from matplotlib import pyplot as plt
from matplotlib.patches import Rectangle

In [5]:
dataset = get_dataset("/home/workspace/data/waymo/training_and_validation/*.tfrecord")

INFO:tensorflow:Reading unweighted datasets: ['/home/workspace/data/waymo/training_and_validation/*.tfrecord']
INFO:tensorflow:Reading record datasets for input file: ['/home/workspace/data/waymo/training_and_validation/*.tfrecord']
INFO:tensorflow:Number of filenames to read: 97


## Write a function to display an image and the bounding boxes

Implement the `display_instances` function below. This function takes a batch as an input and display an image with its corresponding bounding boxes. The only requirement is that the classes should be color coded (eg, vehicles in red, pedestrians in blue, cyclist in green).

In [3]:
def display_instances(tfrecord):
    '''This function shows an image from the tfrecord with its
       corresponding ground truth bounding boxes and labels.
    '''
    # Variables
    name    = tfrecord['filename']
    img     = tfrecord['image'].numpy()
    img_shape = img.shape
    bboxes   = tfrecord['groundtruth_boxes'].numpy()
    classes = tfrecord['groundtruth_classes'].numpy()
    
    #Display the information of the tfrecord
    print('#########################################TFrecord Information#########################################')
    print('Name of the TFrecord: {}'.format(tfrecord['filename']))
    print('The shape of the image is: {}'.format(img_shape))
    print('The are {} boxes in the image:'.format(len(bboxes)))
    print('The are {} objects in the image:'.format(len(classes)))
    
    _, ax = plt.subplots(1,figsize=(20, 10))
    # color mapping of classes
    colormap = {1: [1, 0, 0], 2: [0, 0, 1], 4: [0, 1, 0]}
    
    for cl, bb in zip(classes, bboxes):
        y1, x1, y2, x2 = bb
        y1 = y1*img_shape[0]
        x1 = x1*img_shape[1]
        y2 = y2*img_shape[0]
        x2 = x2*img_shape[1]
        rec = Rectangle((x1, y1), x2- x1, y2-y1, facecolor='none', edgecolor=colormap[cl])
        ax.add_patch(rec)
        
    # Plot the image with its corresponding bounding boxes
    imgplot = plt.imshow(img)
    plt.show()

## Display 10 images 

Using the dataset created in the second cell and the function you just coded, display 10 random images with the associated bounding boxes. You can use the methods `take` and `shuffle` on the dataset.

In [6]:
dataset.shuffle(100)
for batches in dataset.take(10):
    display_instances(batches)

#########################################TFrecord Information#########################################
Name of the TFrecord: b'segment-10072140764565668044_4060_000_4080_000_with_camera_labels_80.tfrecord'
The shape of the image is: (640, 640, 3)
The are 51 boxes in the image:
The are 51 objects in the image:
#########################################TFrecord Information#########################################
Name of the TFrecord: b'segment-12174529769287588121_3848_440_3868_440_with_camera_labels_180.tfrecord'
The shape of the image is: (640, 640, 3)
The are 22 boxes in the image:
The are 22 objects in the image:




#########################################TFrecord Information#########################################
Name of the TFrecord: b'segment-10235335145367115211_5420_000_5440_000_with_camera_labels_100.tfrecord'
The shape of the image is: (640, 640, 3)
The are 56 boxes in the image:
The are 56 objects in the image:
#########################################TFrecord Information#########################################
Name of the TFrecord: b'segment-10517728057304349900_3360_000_3380_000_with_camera_labels_150.tfrecord'
The shape of the image is: (640, 640, 3)
The are 0 boxes in the image:
The are 0 objects in the image:
#########################################TFrecord Information#########################################
Name of the TFrecord: b'segment-11004685739714500220_2300_000_2320_000_with_camera_labels_130.tfrecord'
The shape of the image is: (640, 640, 3)
The are 2 boxes in the image:
The are 2 objects in the image:
#########################################TFrecord Information#######

## Additional EDA

In this last part, you are free to perform any additional analysis of the dataset. What else would like to know about the data?
For example, think about data distribution. So far, you have only looked at a single file...