# Data Cleaning

## Read File

In [1]:
# Imports
import pandas as pd
telemetry = pd.read_csv('./telemetry.etr', delim_whitespace=True)

## Normalise Data
In order to anonymise the data, we must normalise the Easting and Northing values. The following formula is used:
$$
z_i = \frac{x_i - \text{min}(x)}{\text{max}(x)-\text{min}(x)}
$$

In [2]:
# Normalisation function
def normalise(column: str) -> float:
	min = telemetry[column].min()
	max = telemetry[column].max()
	x = telemetry[column]
	return (x - min) / (max - min)

telemetry['Easting'] = normalise('Easting')
telemetry['Northing'] = normalise('Northing')

In [23]:
telemetry.head(5)

Unnamed: 0,Date,Time,Easting,Northing,WaterDepth,Roll,Pitch,Heading,Tide
0,20-02-27,20:50:47.502,0.158187,0.410159,92.93,-6.3,2.0,19.9,0.0
1,20-02-27,20:50:47.502,0.158187,0.410159,5.86,-6.3,2.0,19.9,0.0
2,20-02-27,20:50:49.416,0.534407,0.535898,5.86,-5.0,3.1,22.8,0.0
3,20-02-27,20:50:49.416,0.534311,0.535924,6.33,-5.0,3.1,22.8,0.0
4,20-02-27,20:50:49.416,0.533734,0.536254,7.02,-5.0,3.1,22.8,0.0


This has now put the Easting and Northing values in a form that is mathematically equivalent to what they were previously, but are now encoded to protect the actual locations.

## Empty Values
The Tide column appears at first glance to consist only of values of `0.0`. We can check this by finding the unique values:

In [3]:
telemetry['Tide'].unique()

array([0.])

As suspected, the only value returned was 0.0. This won't be useful to the model, so we can drop this column.

In [4]:
# Drop unnecessary fields
telemetry_select = telemetry.drop(columns=['Tide'], inplace=False)
telemetry_select.head(5)

Unnamed: 0,Date,Time,Easting,Northing,WaterDepth,Roll,Pitch,Heading
0,20-02-27,20:50:47.502,0.158187,0.410159,92.93,-6.3,2.0,19.9
1,20-02-27,20:50:47.502,0.158187,0.410159,5.86,-6.3,2.0,19.9
2,20-02-27,20:50:49.416,0.534407,0.535898,5.86,-5.0,3.1,22.8
3,20-02-27,20:50:49.416,0.534311,0.535924,6.33,-5.0,3.1,22.8
4,20-02-27,20:50:49.416,0.533734,0.536254,7.02,-5.0,3.1,22.8


## Write to CSV
Now that we have processed the telemetry, we can write the dataframe to a file. Previously the format was ETR, but since this is equivalent to a space-delimited CSV, we have opted to write the data to a CSV to put it in a more commonly understood format.

In [5]:
telemetry_select.to_csv('../ProcessView/Data/telemetry.csv', index=False)