# Reading an iOS CSV Data File from the Physics Toolbox App
***

On iOS, the Physics Toolbox app creates a file, `sensor.csv` that has the following form: 

<pre>
time,gFx,gFy,gFz,gFTotal
2018-10-03 17:23:55.4640,0.035,-0.517,-0.821,0.971
2018-10-03 17:23:55.4650,0.035,-0.517,-0.821,0.971
2018-10-03 17:23:55.4650,0.035,-0.517,-0.821,0.971
</pre>

As you can see, the time data is a complicated string that includes the year-month-day as well as the time down to the thousands of a second.    

Luckily, `pandas` has a solution for dealing with this. 

In [1]:
import numpy as np
import pandas as pd

Let's look at our data first!

In [2]:
data = pd.read_csv("../data/sensor_ios.csv")      # Read the sensor data in
data.head(3)                               # Display top 3 rows

Unnamed: 0,time,gFx,gFy,gFz,gFTotal
0,2018-10-02 15:47:05.5170,0.005,0.009,-0.982,0.982
1,2018-10-02 15:47:05.5260,0.005,0.006,-0.973,0.973
2,2018-10-02 15:47:05.5520,0.008,-0.007,-0.977,0.977


The datetime in the csv is just a line of text (a *string*), formatted a particular way.
Let's convert this string into seconds. 

In [3]:
data.head(3)["time"]  # Head(x) outputs the first x rows of the dataframe

0    2018-10-02 15:47:05.5170
1    2018-10-02 15:47:05.5260
2    2018-10-02 15:47:05.5520
Name: time, dtype: object

Pandas has a neat function, `to_datetime`, that understands the formatting of *most* "date strings" and converts them to a **date object**. 

In [4]:
data["time"] = pd.to_datetime(data.time)   # to_datetime converts the date string into the datetime object

Let's see how the table looks now.

In [5]:
data.head(3)        

Unnamed: 0,time,gFx,gFy,gFz,gFTotal
0,2018-10-02 15:47:05.517,0.005,0.009,-0.982,0.982
1,2018-10-02 15:47:05.526,0.005,0.006,-0.973,0.973
2,2018-10-02 15:47:05.552,0.008,-0.007,-0.977,0.977


It looks the same! But if we look at the time column alone...

In [6]:
data.head(3)["time"]

0   2018-10-02 15:47:05.517
1   2018-10-02 15:47:05.526
2   2018-10-02 15:47:05.552
Name: time, dtype: datetime64[ns]

Note that the **type** of our updated "time" column is **datetime**[ns]. The [ns] in brackets means that the data is *actually stored in nanoseconds*, but the pandas engine formats it to output a pretty date like this. We can use this to get time in seconds.

First, we don't want to count time from the beginning of all time, so we subtract the **initial time** from the array. The initial time is

In [7]:
# find the initial time
t0 = data.loc[0, "time"]
print( t0 )

# "loc" (short for location) lets you choose an element in the data frame based on its row and column, like so: dataframe.loc[row, column]
# I am using 0 as the row number because I want 1st row and Python starts counting at 0, not 1 like us.
# I am then specifying the column called "time".

2018-10-02 15:47:05.517000


Now, subtract the initial time from all other time measurements and store this in a new column called **Laboratory time**

In [8]:
data["lab_time (ns)"] = data["time"] - t0
data.head(3)

Unnamed: 0,time,gFx,gFy,gFz,gFTotal,lab_time (ns)
0,2018-10-02 15:47:05.517,0.005,0.009,-0.982,0.982,00:00:00
1,2018-10-02 15:47:05.526,0.005,0.006,-0.973,0.973,00:00:00.009000
2,2018-10-02 15:47:05.552,0.008,-0.007,-0.977,0.977,00:00:00.035000


You can see that now we have a new column that stores the time since I pressed *start*. It is still in the weird date format. I can convert it to nanoseconds by simply **forcing Pandas to output lab_time as a number**

In [9]:
data["lab_time (ns)"] = data["lab_time (ns)"].astype(np.int64)    
# Astype converts one data format into the one in brackets (if possible), like date to integer (number of nanoseconds)
# I use np.int64 instead of int because it allows to store larger numbers. The limit of a regular int is 2,147,483,647

data.head(3)

Unnamed: 0,time,gFx,gFy,gFz,gFTotal,lab_time (ns)
0,2018-10-02 15:47:05.517,0.005,0.009,-0.982,0.982,0
1,2018-10-02 15:47:05.526,0.005,0.006,-0.973,0.973,9000000
2,2018-10-02 15:47:05.552,0.008,-0.007,-0.977,0.977,35000000


Finally, I want to conver my time in nanoseconds to time in seconds:

In [10]:
data["lab_time (s)"] = data["lab_time (ns)"]/1e9
data.head(3)

Unnamed: 0,time,gFx,gFy,gFz,gFTotal,lab_time (ns),lab_time (s)
0,2018-10-02 15:47:05.517,0.005,0.009,-0.982,0.982,0,0.0
1,2018-10-02 15:47:05.526,0.005,0.006,-0.973,0.973,9000000,0.009
2,2018-10-02 15:47:05.552,0.008,-0.007,-0.977,0.977,35000000,0.035


The end result is a DataFrame that has two additional columns **lab_time (ns)** and **lab_time (s)**.   The last column is the elapsed time in seconds and will likely be the most useful going forward with an analysis. 