# Data Prep

In this notebook I will import and inspect the image and telemetry data taken from Donkey Sim, and organize it for modeling steps.

* Import telemetry .csv
* Import image data
* Convert image data
* Create dataframe with: 
  * steering inputs, 
  * throttle inputs
  * converted imageds
* Save as a pickle format file for modeling

In [1]:
## Imports
import numpy as np
import pandas as pd
import os
import sys
import matplotlib.pyplot as plt
import time

from tensorflow.keras.preprocessing.image import img_to_array, load_img

2021-11-09 16:23:29.988106: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1


## Telemetry Data

Steps:
* load simulation telemetry data csv
* cut unneeded columns
* cut incomplete last lap
* import images as numpy arrays
* create numpy array of image data arrays
* scale image data to range \[0, 1.0\]
* save images as input dataset X
* save steering angle and throttle data in numpy array as target dataset y


In [2]:
## Constants
working_date = '11_08_2021'
working_time = '22_04_06'
local_project_path = '/home/grant/projects/donkeysim-client/data'

In [3]:
## Directories
local_data_directory = f'{local_project_path}/{working_date}/{working_time}'
local_image_directory = f'{local_data_directory}/images'
working_data_directory = f'../data/{working_date}/{working_time}'

## File paths
telemetry_csv = f'{local_data_directory}/data.csv'
input_dataset_file = f'{working_data_directory}/X.npy'
target_dataset_file = f'{working_data_directory}/y.npy'

### Load CSV file as DataFrame

In [4]:
df = pd.read_csv(telemetry_csv)
df.head()

Unnamed: 0,steering_angle,throttle,speed,image,hit,time,accel_x,accel_y,accel_z,gyro_x,...,totalNodes,pos_x,pos_y,pos_z,vel_x,vel_y,vel_z,on_road,progress_on_shortest_path,lap
0,0.0,0.189638,0.009831,2.993828.PNG,none,2.993828,0.004379,-0.046158,0.572317,0.000207,...,307,14.03998,0.565938,-68.18555,-0.000634,0.008615,0.004694,0,0,0
1,0.0,0.319648,0.009307,3.042857.PNG,none,3.042857,0.002191,-0.046821,-0.481078,0.000152,...,307,14.03964,0.566061,-68.18594,-0.000503,0.006545,0.006599,0,0,0
2,0.0,0.319648,0.07756,3.093997.PNG,none,3.093997,-0.000576,-0.058607,1.317935,0.000639,...,307,14.03693,0.565829,-68.1888,-0.000194,0.002986,0.077502,0,0,0
3,0.0,0.71261,0.096726,3.142711.PNG,none,3.142711,0.003198,-0.030822,-0.162779,0.00048,...,307,14.03389,0.565611,-68.19193,-8.3e-05,0.00125,0.096717,0,0,0
4,0.0,0.907136,0.228915,3.193358.PNG,none,3.193358,-0.030274,-0.049059,1.639316,0.001036,...,307,14.02505,0.56497,-68.20108,-0.000137,-0.002383,0.228903,0,0,0


In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8986 entries, 0 to 8985
Data columns (total 28 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   steering_angle             8986 non-null   float64
 1   throttle                   8986 non-null   float64
 2   speed                      8986 non-null   float64
 3   image                      8986 non-null   object 
 4   hit                        8986 non-null   object 
 5   time                       8986 non-null   float64
 6   accel_x                    8986 non-null   float64
 7   accel_y                    8986 non-null   float64
 8   accel_z                    8986 non-null   float64
 9   gyro_x                     8986 non-null   float64
 10  gyro_y                     8986 non-null   float64
 11  gyro_z                     8986 non-null   float64
 12  gyro_w                     8986 non-null   float64
 13  pitch                      8986 non-null   float

In [6]:
df.drop(columns=['speed','hit', 'time', 'accel_x', 'accel_y', 'accel_z', 
                 'gyro_x', 'gyro_y', 'gyro_z', 'gyro_w', 'pitch', 'yaw', 
                 'roll', 'cte', 'activeNode', 'totalNodes', 'pos_x', 
                 'pos_y', 'pos_z', 'vel_x', 'vel_y', 'vel_z', 'on_road', 
                 'progress_on_shortest_path',], inplace=True)

In [7]:
df.lap.unique()

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

In [8]:
sys.getsizeof(df)

834840

In [9]:
df.dtypes

steering_angle    float64
throttle          float64
image              object
lap                 int64
dtype: object

### Rename Steering Column to Match Throttle

In [10]:
df.rename(columns={'steering_angle': 'steering', 'throttle': 'throttle',
                   'image': 'image', 'lap': 'lap'}, inplace=True)

### Convert datatypes (?)

In [11]:
df['steering'] = pd.to_numeric(df['steering'], downcast='float')
df['throttle'] = pd.to_numeric(df['throttle'], downcast='float')
df['lap'] = pd.to_numeric(df['lap'], downcast='unsigned')
sys.getsizeof(df)

700050

In [12]:
df.dtypes

steering    float32
throttle    float32
image        object
lap           uint8
dtype: object

### Remove "Extra" Lap

In [13]:
df['lap'].value_counts()

1     1038
3      971
2      898
5      856
4      854
7      852
6      845
8      844
10     830
9      821
11     141
0       36
Name: lap, dtype: int64

In [14]:
## Cut off the little bit after the end of the training session
df_y = df.loc[df['lap'] < df['lap'].max(), :].drop(columns='lap')
df_y

Unnamed: 0,steering,throttle,image
0,0.0,0.189638,2.993828.PNG
1,0.0,0.319648,3.042857.PNG
2,0.0,0.319648,3.093997.PNG
3,0.0,0.712610,3.142711.PNG
4,0.0,0.907136,3.193358.PNG
...,...,...,...
8840,0.0,1.000000,444.9851.PNG
8841,0.0,1.000000,445.0352.PNG
8842,0.0,1.000000,445.0849.PNG
8843,0.0,1.000000,445.1353.PNG


## Image Data

### Verify Files

In [15]:
## Verify Files
os.listdir(local_image_directory)[:5]

['420.181.PNG', '238.385.PNG', '374.6413.PNG', '380.1419.PNG', '78.53777.PNG']

## Create Image Array

In [16]:
# test_img = load_img(f"{local_image_directory}/420.181.PNG", color_mode='grayscale')

In [17]:
# test_img.size

In [18]:
## Using keras, load images as list, adding to a list
img_array_list = []
for img in df_y['image']:
    img_array_list.append(img_to_array(load_img(f"{local_image_directory}/{img}", 
                                                color_mode='grayscale', 
                                                target_size=(80, 60))))

In [19]:
## convert list of arrays into a numpy array (of arrays())
X = np.array(img_array_list)

## Create Datasets

### Create Targets

In [20]:
## Target: throttle and steering data
y = df_y.drop(columns=['image']).to_numpy().copy()

## Verify size
print(f'X.shape: {X.shape}')
print(f'y.shape: {y.shape}')

X.shape: (8845, 80, 60, 1)
y.shape: (8845, 2)


### Scale Image Data

In [21]:
X /= 255

### Save Datasets

In [22]:
## Make sure directoires exist
os.makedirs(working_data_directory, exist_ok=True)

## Save as binary NumPy .npy format
with open(input_dataset_file, 'wb') as X_out:
    np.save(file=X_out, arr=X)
with open(target_dataset_file, 'wb') as y_out:
    np.save(file=y_out, arr=y)