    This notebook takes in a log file, cleans it, adds data we want for our model, and outputs it to a log file
    Justin Wasserman - 

## Import and Verify Datalog

In [123]:
import pandas as pd
import numpy as np

In [124]:
datalog_DIR = '../../data/'

In [125]:
datalogFile = datalog_DIR + '02-05-2019_13-21-48.csv'
#Use error_bad_lines to fill in blanks as NA
#The WallIds should be only to have NaN

df = pd.read_csv(datalogFile, sep=',')
df.head()

  interactivity=interactivity, compiler=compiler, result=result)


Unnamed: 0,Time,ID,X,Y,Yaw,ResetID,checkCorrectness,NumberOfWalls,WallId(s)
0,0 1000000,0,0.0,0.0,0.0,1,1,0,
1,0 2000000,0,0.0,0.0,0.0,1,1,0,
2,0 3000000,0,0.0,0.0,0.0,1,1,0,
3,0 4000000,0,0.0,0.0,0.0,1,1,0,
4,0 5000000,0,0.0,0.0,0.0,1,1,0,


In [127]:
#Drop last row in df, sometimes datalog will be stopped while writing numbers to log
#which will cause NaNs to be inserted.So it is just best to drop the last row.
df.drop(df.tail(1).index,inplace=True) # drop last row

In [128]:
#Verify that only WallId(s) has NaN in it
NaNs = df.isnull().any() #Checks which columns have an NA in it
if(NaNs.where(NaNs == True).sum() != 1.0 and NaNs['WallId(s)'] != True): #Should only be 1 NA and it should be WallId(s)
    print("[cleans_minimal] More than one column has a NaN in it")


## Time

The time column is in the form of second(s) space millisecond with six 0's after the milliseconds (up to 999 milliseconds are contained in the time). So "0 1000000" is 1 millisecond while "0 10000000" is 10 milliseconds. However, times that are just seconds, and have 0 milliseconds only have one 0, so "1 0" is one second and not "1 000000".

In [129]:
for i in df.index:
    (second, millisecond) = df['Time'][i].split(' ')
    second = float(second)
    if(millisecond != '0'):
        millisecond = float(millisecond[:-6]) / 1000.0
    else:
        millisecond = float(millisecond)
    df.at[i, 'Time'] = second + millisecond
df.head()

Unnamed: 0,Time,ID,X,Y,Yaw,ResetID,checkCorrectness,NumberOfWalls,WallId(s)
0,0.001,0,0.0,0.0,0.0,1,1,0,
1,0.002,0,0.0,0.0,0.0,1,1,0,
2,0.003,0,0.0,0.0,0.0,1,1,0,
3,0.004,0,0.0,0.0,0.0,1,1,0,
4,0.005,0,0.0,0.0,0.0,1,1,0,


## Check Correctness

The gazebo simulator verifies that the ball is in a hub, and the hubs/weaselballs are within the environment. CheckCorrectness is the variable that gets printed to the datalog to verify that the simulator is running correctly for a given timestep. So, any rows with a checkCorrectness = 0 should be removed.

In [130]:
df = df[df.checkCorrectness != 0]
df.head()

Unnamed: 0,Time,ID,X,Y,Yaw,ResetID,checkCorrectness,NumberOfWalls,WallId(s)
0,0.001,0,0.0,0.0,0.0,1,1,0,
1,0.002,0,0.0,0.0,0.0,1,1,0,
2,0.003,0,0.0,0.0,0.0,1,1,0,
3,0.004,0,0.0,0.0,0.0,1,1,0,
4,0.005,0,0.0,0.0,0.0,1,1,0,


## WallId(s) / NumberOfWalls

Since the Gazebo simulator will have the models shoot out after a collision, I will add a huerisitc where if a wall was touched in the last n ms and there are no collisions currently then we will consider the row to collide with the wall.

In [131]:
n = 5 #milliseconds since last collision

In [133]:
rowsSinceLastWall = 0
lastWall = None
lastNumberOfWalls = None
for i in df.index:
    rowNumberOfWalls = df['NumberOfWalls'][i]
    rowWallIds = df['WallId(s)'][i]
    if rowNumberOfWalls > 0:
        rowsSinceLastWall = 0
        lastWall = rowWallIds
        lastNumberOfWalls = rowNumberOfWalls
    elif rowsSinceLastWall < n and lastWall != None:
        df.at[i, 'NumberOfWalls'] = lastNumberOfWalls
        df.at[i, 'WallId(s)'] = lastWall
    rowsSinceLastWall += 1

In [134]:
total = 0
for i in df.index:
    total += df['NumberOfWalls'][i]
total

15547

## Output CSV

In [135]:
df.to_csv(datalog_DIR + "results.csv")

## Debug

In [137]:
#Find rows with more than 
for i in df.index:
    if df.at[i, 'NumberOfWalls'] > 1:
        print(i)

42986
42987
42988
42989
42990
42991
42992
42993
42994
42995
42996
42997
42998
42999
43000
