<h1> NumMobility for Mobility Data Preprocessing </h1>

<p>
NumMobility Library stores Mobility Data (Trajectories) in a specialised
pandas Dataframe structure called NumPandasTraj. As a result, the following
constraints are enforced for the data to be able to be stores in a NumPandasTraj.

<ol>
   <li>
        Firstly, for a mobility dataset to be able to work with NumMobility Library needs
        to have the following mandatory columns present:
       <ul>
           <li> DateTime </li>
           <li> Trajectory ID </li>
           <li> Latitude </li>
           <li> Longitude </li>
       </ul>
   </li>
   <li>
       Secondly, NumPandasTraj has a very specific constraint for the index of the
       dataframes, the Library enforces a multi-index consisting of the
       <b><i> Trajectory ID, DateTime </i></b> columns because the operations of the
       library are dependent on the 2 columns. As a result, it is recommended
       to not change the index and keep the multi-index of <b><i> Trajectory ID, DateTime </i></b>
       at all times.
   </li>
</ol>
</p>

<hr>

<p>
This jupyter notebook contains a gentle introduction to NumMobility library. <br>
The following functionalities ar demonstrated in this Jupyter notebook:

<ol>
   <li>
       Import data to and store it in a NumPandasTraj Dataframe.
   </li>
   <li>
       Generate various temporal i.e., features related to DateTime (Timestamps) present
       in the DataFrame. Feautres like Date, Time, Week-day, Time of Day
       etc are calculated using the Library functions and the results
       are appended to the original dataframe.
   </li>
   <li>
       Note that execution times are also displayed for each executed cell
       which shows how fast is the library as compared to other libraries
       openly available for Mobility Data Preprocessing.
   </li>
</ol>
</p>

<hr>

Here, we are going to work on the 2 following datasets:
<ul>
   <li> <a href="https://github.com/YakshHaranwala/NumMobility/blob/main/examples/data/geolife_sample.csv" target="_blank"> Geolife-Sample </a> </li>
   <li> <a href="https://github.com/YakshHaranwala/NumMobility/blob/main/examples/data/gulls.csv" target="_blank"> Seagulls Dataset </a> </li>
</ul>

Without further ado, lets jump into NumPandasTraj and explore the various
functionalities provided.

<h1> NumMobility Temporal Features </h1>

In [11]:
# First, lets import the DataFrame, the temporal features
# and pandas library to read the csv file.
from core.TrajectoryDF import NumPandasTraj as NumTrajDF
from features.temporal_features import TemporalFeatures as temporal
import pandas as pd

In [12]:
%%time

# First, lets import the gulls.csv and convert it into
# NumPandasTraj.

gulls = pd.read_csv('./data/gulls.csv')
print(f"The data type of dataframe read from CSV: {type(gulls)}")
gulls = NumTrajDF(data_set=gulls,
                 latitude='location-lat',
                 longitude='location-long',
                 datetime='timestamp',
                 traj_id='tag-local-identifier',
                 rest_of_columns=[])
print(f"The data type after converting to NumTrajDF: {type(gulls)}")
gulls.head()

The data type of dataframe read from CSV: <class 'pandas.core.frame.DataFrame'>
The data type after converting to NumTrajDF: <class 'core.TrajectoryDF.NumPandasTraj'>
CPU times: user 301 ms, sys: 12.1 ms, total: 314 ms
Wall time: 312 ms


Unnamed: 0_level_0,Unnamed: 1_level_0,event-id,visible,lon,lat,sensor-type,individual-taxon-canonical-name,individual-local-identifier,study-name
traj_id,DateTime,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
91732,2009-05-27 14:00:00,1082620685,True,24.58617,61.24783,gps,Larus fuscus,91732A,Navigation experiments in lesser black-backed ...
91732,2009-05-27 20:00:00,1082620686,True,24.58217,61.23267,gps,Larus fuscus,91732A,Navigation experiments in lesser black-backed ...
91732,2009-05-28 05:00:00,1082620687,True,24.53133,61.18833,gps,Larus fuscus,91732A,Navigation experiments in lesser black-backed ...
91732,2009-05-28 08:00:00,1082620688,True,24.582,61.23283,gps,Larus fuscus,91732A,Navigation experiments in lesser black-backed ...
91732,2009-05-28 14:00:00,1082620689,True,24.5825,61.23267,gps,Larus fuscus,91732A,Navigation experiments in lesser black-backed ...


In [13]:
%%time

# Now lets import the geolife_sample.csv and convert it into
# NumPandasTraj DF.

pdf = pd.read_csv('./data/geolife_sample.csv')
geolife_df = NumTrajDF(data_set=pdf,
                      latitude='lat',
                      longitude='lon',
                      datetime='datetime',
                      traj_id='id')

print(f"The data type of dataframe read from CSV: {type(pdf)}")
print(f"The data type after converting to NumTrajDF: {type(geolife_df)}")
geolife_df.head()

The data type of dataframe read from CSV: <class 'pandas.core.frame.DataFrame'>
The data type after converting to NumTrajDF: <class 'core.TrajectoryDF.NumPandasTraj'>
CPU times: user 542 ms, sys: 20.1 ms, total: 562 ms
Wall time: 561 ms


Unnamed: 0_level_0,Unnamed: 1_level_0,lat,lon
traj_id,DateTime,Unnamed: 2_level_1,Unnamed: 3_level_1
1,2008-10-23 05:53:11,39.984224,116.319402
1,2008-10-23 05:53:16,39.984211,116.319389
1,2008-10-23 05:53:21,39.984217,116.319422
1,2008-10-23 05:53:23,39.98471,116.319865
1,2008-10-23 05:53:28,39.984674,116.31981


In [14]:
%%time

geolife_df.plot_folium_traj(color='blue',
                            opacity=0.5,
                            weight=3)

CPU times: user 616 ms, sys: 7.85 ms, total: 624 ms
Wall time: 623 ms


In [15]:
%%time

# First, we will perform the aforementioned operations on the
# geolife-sample dataset.

# First, lets generate the Time column and find out the
# time of the day from the TimeStamp provided in the data
# New columns will be added to the dataset indicating the time
# of the point recorded and the period of the time.


geolife_df = temporal.create_time_column(geolife_df)
geolife_df = temporal.create_time_of_day_column(geolife_df)
geolife_df.head()

CPU times: user 565 ms, sys: 20.4 ms, total: 586 ms
Wall time: 583 ms


Unnamed: 0_level_0,Unnamed: 1_level_0,lat,lon,Time,Time_Of_Day
traj_id,DateTime,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1,2008-10-23 05:53:11,39.984224,116.319402,05:53:11,Early Morning
1,2008-10-23 05:53:16,39.984211,116.319389,05:53:16,Early Morning
1,2008-10-23 05:53:21,39.984217,116.319422,05:53:21,Early Morning
1,2008-10-23 05:53:23,39.98471,116.319865,05:53:23,Early Morning
1,2008-10-23 05:53:28,39.984674,116.31981,05:53:28,Early Morning


In [16]:
%%time

"""
Here, we will create 3 new columns that contain the following features:
    1. The date on which the data point was recorded.
    2. The day of the week on which the data point was recorded.
    3. A boolean column indicating whether the point was recorded
       on a weekend or not.

    All the operations are being performed on the geolife-sample dataset.
"""
geolife_df = temporal.create_date_column(geolife_df)
geolife_df = temporal.create_day_of_week_column(geolife_df)
geolife_df = temporal.create_weekend_indicator_column(geolife_df)
geolife_df.head()


CPU times: user 1.02 s, sys: 16.2 ms, total: 1.04 s
Wall time: 1.04 s


Unnamed: 0_level_0,Unnamed: 1_level_0,lat,lon,Time,Time_Of_Day,Date,Day_Of_Week,Weekend
traj_id,DateTime,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1,2008-10-23 05:53:11,39.984224,116.319402,05:53:11,Early Morning,2008-10-23,Thursday,False
1,2008-10-23 05:53:16,39.984211,116.319389,05:53:16,Early Morning,2008-10-23,Thursday,False
1,2008-10-23 05:53:21,39.984217,116.319422,05:53:21,Early Morning,2008-10-23,Thursday,False
1,2008-10-23 05:53:23,39.98471,116.319865,05:53:23,Early Morning,2008-10-23,Thursday,False
1,2008-10-23 05:53:28,39.984674,116.31981,05:53:28,Early Morning,2008-10-23,Thursday,False


In [17]:
%%time

# Now, moving onto the seagulls dataset.

# Now, lets generate the Time column and find out the
# time of the day from the TimeStamp provided in the data
# New columns will be added to the dataset indicating the time
# of the point recorded and the period of the time.

gulls = temporal.create_time_column(gulls)
gulls = temporal.create_time_of_day_column(gulls)
gulls.head()

CPU times: user 197 ms, sys: 11.6 ms, total: 209 ms
Wall time: 207 ms


Unnamed: 0_level_0,Unnamed: 1_level_0,event-id,visible,lon,lat,sensor-type,individual-taxon-canonical-name,individual-local-identifier,study-name,Time,Time_Of_Day
traj_id,DateTime,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
91732,2009-05-27 14:00:00,1082620685,True,24.58617,61.24783,gps,Larus fuscus,91732A,Navigation experiments in lesser black-backed ...,14:00:00,Noon
91732,2009-05-27 20:00:00,1082620686,True,24.58217,61.23267,gps,Larus fuscus,91732A,Navigation experiments in lesser black-backed ...,20:00:00,Evening
91732,2009-05-28 05:00:00,1082620687,True,24.53133,61.18833,gps,Larus fuscus,91732A,Navigation experiments in lesser black-backed ...,05:00:00,Early Morning
91732,2009-05-28 08:00:00,1082620688,True,24.582,61.23283,gps,Larus fuscus,91732A,Navigation experiments in lesser black-backed ...,08:00:00,Early Morning
91732,2009-05-28 14:00:00,1082620689,True,24.5825,61.23267,gps,Larus fuscus,91732A,Navigation experiments in lesser black-backed ...,14:00:00,Noon


In [18]:
%%time

"""
Here, we will create 3 new columns that contain the following features:
    1. The date on which the data point was recorded.
    2. The day of the week on which the data point was recorded.
    3. A boolean column indicating whether the point was recorded
       on a weekend or not.

    All the operations are being performed on the seagulls dataset.
"""
gulls = temporal.create_date_column(gulls)
gulls = temporal.create_day_of_week_column(gulls)
gulls = temporal.create_weekend_indicator_column(gulls)
gulls.head()

CPU times: user 395 ms, sys: 4.24 ms, total: 400 ms
Wall time: 398 ms


Unnamed: 0_level_0,Unnamed: 1_level_0,event-id,visible,lon,lat,sensor-type,individual-taxon-canonical-name,individual-local-identifier,study-name,Time,Time_Of_Day,Date,Day_Of_Week,Weekend
traj_id,DateTime,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
91732,2009-05-27 14:00:00,1082620685,True,24.58617,61.24783,gps,Larus fuscus,91732A,Navigation experiments in lesser black-backed ...,14:00:00,Noon,2009-05-27,Wednesday,False
91732,2009-05-27 20:00:00,1082620686,True,24.58217,61.23267,gps,Larus fuscus,91732A,Navigation experiments in lesser black-backed ...,20:00:00,Evening,2009-05-27,Wednesday,False
91732,2009-05-28 05:00:00,1082620687,True,24.53133,61.18833,gps,Larus fuscus,91732A,Navigation experiments in lesser black-backed ...,05:00:00,Early Morning,2009-05-28,Thursday,False
91732,2009-05-28 08:00:00,1082620688,True,24.582,61.23283,gps,Larus fuscus,91732A,Navigation experiments in lesser black-backed ...,08:00:00,Early Morning,2009-05-28,Thursday,False
91732,2009-05-28 14:00:00,1082620689,True,24.5825,61.23267,gps,Larus fuscus,91732A,Navigation experiments in lesser black-backed ...,14:00:00,Noon,2009-05-28,Thursday,False


In [19]:
%%time

"""
Now, another feature provided in the library calculates the
duration of the trajectory present in the dataset. The result
obtained from it depends on 2 conditions:
    1. If the user wants the duration of a particular ID, then
       the library returns the duration of that particular ID only.
    2. However, if no particular ID is mentioned by the user, then
       the library calculates the trajectory duration for each
       unique trajectory ID present in the dataset.
"""
# First, lets get the trajectory durations for all the
# unique trajectories present in the dataset.

durations = temporal.get_traj_duration(gulls)
durations

CPU times: user 614 ms, sys: 87.9 ms, total: 702 ms
Wall time: 1.81 s


Unnamed: 0_level_0,Traj_Duration
traj_id,Unnamed: 1_level_1
91732,519 days 00:00:00
91733,11 days 15:00:00
91734,140 days 13:55:00
91735,122 days 00:00:00
91737,276 days 02:00:00
...,...
91920,42 days 19:00:00
91921,12 days 00:00:00
91924,6 days 00:00:00
91929,11 days 09:00:00


In [20]:
%%time

# Now, lets get the trajectory duration for only a single
# trajectory with the Trajectory ID: 1 present in the
# geolife-dataset.

delta_two = temporal.get_traj_duration(geolife_df, traj_id='1')
delta_two

CPU times: user 114 ms, sys: 171 µs, total: 114 ms
Wall time: 116 ms


DateTime   52 days 18:38:07
dtype: timedelta64[ns]