    This notebook takes in a log file, cleans it, adds data we want for our model, and outputs it to a log file
    Justin Wasserman - 

## Import and Verify Datalog

In [14]:
import pandas as pd
import numpy as np

In [15]:
datalog_DIR = '../../data/'

In [16]:
datalogFile = datalog_DIR + '02-05-2019_13-21-48.csv'
#Use error_bad_lines to fill in blanks as NA
#The WallIds should be only to have NaN

df = pd.read_csv(datalogFile, sep=',')
df.head()

Unnamed: 0,Time,ID,X,Y,Yaw,ResetID,checkCorrectness,NumberOfWalls,WallId(s)
0,0 1000000,0,0.0,0.0,0.0,1,1,0,
1,0 2000000,0,0.0,0.0,0.0,1,1,0,
2,0 3000000,0,0.0,0.0,0.0,1,1,0,
3,0 4000000,0,0.0,0.0,0.0,1,1,0,
4,0 5000000,0,0.0,0.0,0.0,1,1,0,


In [17]:
#Drop last row in df, sometimes datalog will be stopped while writing numbers to log
#which will cause NaNs to be inserted.So it is just best to drop the last row.
df.drop(df.tail(1).index,inplace=True) # drop last row

In [18]:
#Verify that only WallId(s) has NaN in it
NaNs = df.isnull().any() #Checks which columns have an NA in it
if(NaNs.where(NaNs == True).sum() != 1.0 and NaNs['WallId(s)'] != True): #Should only be 1 NA and it should be WallId(s)
    print("[cleans_minimal] More than one column has a NaN in it")


## Time

The time column is in the form of second(s) space millisecond with six 0's after the milliseconds (up to 999 milliseconds are contained in the time). So "0 1000000" is 1 millisecond while "0 10000000" is 10 milliseconds. However, times that are just seconds, and have 0 milliseconds only have one 0, so "1 0" is one second and not "1 000000".

In [19]:
for i in df.index:
    (second, millisecond) = df['Time'][i].split(' ')
    second = float(second)
    if(millisecond != '0'):
        millisecond = float(millisecond[:-6]) / 1000.0
    else:
        millisecond = float(millisecond)
    df.at[i, 'Time'] = second + millisecond
df.head()

Unnamed: 0,Time,ID,X,Y,Yaw,ResetID,checkCorrectness,NumberOfWalls,WallId(s)
0,0.001,0,0.0,0.0,0.0,1,1,0,
1,0.002,0,0.0,0.0,0.0,1,1,0,
2,0.003,0,0.0,0.0,0.0,1,1,0,
3,0.004,0,0.0,0.0,0.0,1,1,0,
4,0.005,0,0.0,0.0,0.0,1,1,0,


## Check Correctness

The gazebo simulator verifies that the ball is in a hub, and the hubs/weaselballs are within the environment. CheckCorrectness is the variable that gets printed to the datalog to verify that the simulator is running correctly for a given timestep. So, any rows with a checkCorrectness = 0 should be removed.

In [20]:
df = df[df.checkCorrectness != 0]
df.head()

Unnamed: 0,Time,ID,X,Y,Yaw,ResetID,checkCorrectness,NumberOfWalls,WallId(s)
0,0.001,0,0.0,0.0,0.0,1,1,0,
1,0.002,0,0.0,0.0,0.0,1,1,0,
2,0.003,0,0.0,0.0,0.0,1,1,0,
3,0.004,0,0.0,0.0,0.0,1,1,0,
4,0.005,0,0.0,0.0,0.0,1,1,0,


## WallId(s) / NumberOfWalls

Since the Gazebo simulator will have the models shoot out after a collision, I will add a huerisitc where if a wall was touched in the last n ms and there are no collisions currently then we will consider the row to collide with the wall.

In [21]:
n = 5 #milliseconds since last collision

In [22]:
rowsSinceLastWall = 0
lastWall = None
lastNumberOfWalls = None
for i in df.index:
    rowNumberOfWalls = df['NumberOfWalls'][i]
    rowWallIds = df['WallId(s)'][i]
    if rowNumberOfWalls > 0:
        rowsSinceLastWall = 0
        lastWall = rowWallIds
        lastNumberOfWalls = rowNumberOfWalls
    elif rowsSinceLastWall < n and lastWall != None:
        df.at[i, 'NumberOfWalls'] = lastNumberOfWalls
        df.at[i, 'WallId(s)'] = lastWall
    rowsSinceLastWall += 1

In [23]:
total = 0
for i in df.index:
    total += df['NumberOfWalls'][i]
total

15547

## Enclosure Data

Here I will import the enclosure data

In [42]:
enclosureFile = datalog_DIR + 'boundaryDescription.txt'
enclosure_df = pd.read_csv(enclosureFile, sep=',')
enclosure_df.head()


Unnamed: 0,name,X,Y,Z,Roll,Pitch,Yaw,sizeX,sizeY,sizeZ
0,rail01,0.56355,0.0,0.03175,0,0,0.0,0.01905,1.12713,0.0889
1,rail02,0.0,0.56356,0.03175,0,0,1.57,0.01905,1.1525,0.0889
2,rail03,-0.56355,0.0,0.03175,0,0,3.14,0.01905,1.12713,0.0889
3,rail04,0.0,-0.56356,0.03175,0,0,-1.57319,0.01905,1.1525,0.0889


Next I will change the name of the railXX to become the ID to match the df.

In [43]:
for i in enclosure_df.index:
    enclosure_df.at[i, 'name'] = int(enclosure_df.at[i, 'name'].replace("rail",""))
enclosure_df.head()

Unnamed: 0,name,X,Y,Z,Roll,Pitch,Yaw,sizeX,sizeY,sizeZ
0,1,0.56355,0.0,0.03175,0,0,0.0,0.01905,1.12713,0.0889
1,2,0.0,0.56356,0.03175,0,0,1.57,0.01905,1.1525,0.0889
2,3,-0.56355,0.0,0.03175,0,0,3.14,0.01905,1.12713,0.0889
3,4,0.0,-0.56356,0.03175,0,0,-1.57319,0.01905,1.1525,0.0889


Now I will get a vector to represent each corner, this can be used to perform a cross product on the trajectory of the robot going into/out of a corn to find the angle that the robot enters/leaves

In [44]:
#get vector
from numpy import ones,vstack
from numpy.linalg import lstsq
wall_v = {}
for i in enclosure_df.index:
    x1 = enclosure_df.at[i, 'X'] - (enclosure_df.at[i,'sizeX'] / 2.0) * np.cos(enclosure_df.at[i,'Yaw'])
    y1 = enclosure_df.at[i, 'Y'] - (enclosure_df.at[i,'sizeX'] / 2.0) * np.sin(enclosure_df.at[i,'Yaw'])
    x2 = x1 + (enclosure_df.at[i,'sizeY']*np.sin(enclosure_df.at[i,'Yaw']))
    y2 = y1 + (enclosure_df.at[i,'sizeY']*np.cos(enclosure_df.at[i,'Yaw']))
    
    v = (x2-x1, y2-y1)
    
    enclosure_df.at[i,'vector_x'] = v[0]
    enclosure_df.at[i,'vector_y'] = v[1]
enclosure_df


(0.0, 1.12713)
(1.1524996345789396, 0.0009177665341201235)
(0.0017951268817597565, -1.1271285704920615)
(-1.1524966982784248, -0.002758705734466793)


Unnamed: 0,name,X,Y,Z,Roll,Pitch,Yaw,sizeX,sizeY,sizeZ,vector_x,vector_y
0,1,0.56355,0.0,0.03175,0,0,0.0,0.01905,1.12713,0.0889,0.0,1.12713
1,2,0.0,0.56356,0.03175,0,0,1.57,0.01905,1.1525,0.0889,1.1525,0.000918
2,3,-0.56355,0.0,0.03175,0,0,3.14,0.01905,1.12713,0.0889,0.001795,-1.127129
3,4,0.0,-0.56356,0.03175,0,0,-1.57319,0.01905,1.1525,0.0889,-1.152497,-0.002759


## Bounce angle

To get the bounce angle, 2 lines are needed. The first one is the line from the wall which is found above. The second line comes from creating a line from the point where the wall is hit with the points from the previous k time steps.

In [45]:
MAX_K = 5

## Output CSV

In [None]:
#df.to_csv(datalog_DIR + "results.csv")

## Debug

In [None]:
#Find rows with more than 2 walls
for i in df.index:
    if df.at[i, 'NumberOfWalls'] > 1:
        print(i)