<span>
<img src="https://raw.githubusercontent.com/scikit-mobility/scikit-mobility/master/logo_skmob.png" width="260px" align="right"/>
</span>
<span>
<b>Author:</b> <a href="https://kdd.isti.cnr.it/people/cornacchia-giuliano">Giuliano Cornacchia</a><br/>
<b>Python version:</b> 3.8.5<br/>
<b>Scikit-mobility version:</b>  1.2.2<br/>
<b>Last update:</b> 08/10/2021
</span>

# Scikit-Mobility
### Tutorial - Human Mobility Networks
___

<i>"Human mobility is the discipline that studies the movements of individuals in space and time."</i>
<br><br>
`scikit-mobility` is a Python library designed for human mobility analysis in Python

This notebook gives an introduction of the main concepts of the library with a focus on data **preprocessing**, **visualization**, and the creation of a **mobility network**.

**Note:** this notebook is purposely not 100% comprehensive, it only discusses the basic things you need to get started.

<img src="https://media.springernature.com/m685/springer-static/image/art%3A10.1038%2Fncomms9166/MediaObjects/41467_2015_Article_BFncomms9166_Fig1_HTML.jpg"  width="500px" height="auto">
<br><br>

## Table of Contents

1. [Installing scikit-mobility](#install)
2. [Introduction and Data Structures](#into_ds) 
    1. [Trajectory](#trajectory)
    2. [Spatial Tessellation](#spatial_tess)
    3. [Flow](#flow)
3. [Preprocessing Mobility Data](#preprocess)
    1. [Noise Filtering](#noise)
    2. [Trajectory Compression](#compression)
4. [Hands on tutorial: create a mobility network from real data](#tutorial)
    1. [Dataset loading](#dataloading)
    2. [Dataset preprocessing](#pptutorial)
    3. [Mobility Network creation](#mobnet)
    4. [Export the Mobility Network](#export)
5. [Conclusion](#conclusion)
6. [Exercises](#exercise)





## 1. Installing scikit-mobility<a id='install'></a>

The first step is to install `scikit-mobility` and check that it is working.

The installation can be performed with conda.

`conda install -c conda-forge scikit-mobility`

`scikit-mobility` can be installed on Google Colab using the following commands:

    !apt-get install -qq curl g++ make
    !curl -L http://download.osgeo.org/libspatialindex/spatialindex-src-1.8.5.tar.gz | tar xz
    import os
    os.chdir('spatialindex-src-1.8.5')
    !./configure
    !make
    !make install
    !pip install rtree
    !ldconfig
    !pip install scikit-mobility

To check if `scikit-mobility` is installed try to import it

In [3]:
import skmob

The project is available on GitHub at https://github.com/scikit-mobility

If you would like to contribute to the `scikit-mobility` project, feel free to fork the project, open an issue and contact the developers.

<a id="into_ds"></a>
## 2. Introduction and Data Structures

### 2.1 Trajectory <a id="trajectory"></a>

#### Definition
____
The trajectory of an individual is a sequence of records that allows for reconstructing their movements during the period of observation. <br>

A trajectory $T$ for an individual $u$ is defined as a **time ordered sequence** of spatial points, usually **GPS points**.

$T_u=<(l_1, t_1), . . . , (l_n, t_n)>$, where:
- $l_i=(x_i, y_i)$ is a location of coordinates $ x_i$ and $y_i$;
- $t_i<t_j$ if $i<j$.
<br><br>
<img src="https://i.ibb.co/crmZmdT/img-def-trajectory.png"  width="500px" height="auto"><br><br>

#### Trajectory dataframe
___
In `scikit-mobility` a set of trajectories is modeled through a `TrajDataFrame` data structure that extends the pandas `DataFrame`.
<br><br>

Each row of a `TrajDataFrame` describes a trajectory's point and contains the following columns:

```
    lat: latitude of the point
    lng: longitude of the point
    datetime: timestamp of the point
```
For multi-user datasets, there is an optional column:

    uid: user's identifier to which the trajectory belongs to

In short words **WHO**, **WHEN** and **WHERE**.

Let's create a `TrajDataFrame` from a list of spatio-temporal points relative to the movements of two users (ID 24 and ID 11) walking in Berlin to reach the ACAI-2021 conference.
<br><br>
Each element in the list is in the form of: `[user_id, latitude, longitude, timestamp]`.

In [None]:
list_of_points = [[24, 52.516091, 13.378148, '2021-10-11 08:10:05'], # start user 24
                  [24, 52.516469, 13.377842, '2021-10-11 08:11:45'],
                  [24, 52.517760, 13.376662, '2021-10-11 08:11:59'],
                  [24, 52.517745, 13.376325, '2021-10-11 08:12:33'],
                  [24, 52.517775, 13.369990, '2021-10-11 08:22:32'],
                  [24, 52.517321, 13.369051, '2021-10-11 08:33:32'],
                  [24, 52.517658, 13.361681, '2021-10-11 08:35:12'],
                  [24, 52.517116, 13.354575, '2021-10-11 08:44:02'],
                  [24, 52.514408, 13.348412, '2021-10-11 08:50:12'],
                  [24, 52.513306, 13.331626, '2021-10-11 08:51:02'],
                  [24, 52.515921, 13.327897, '2021-10-11 08:58:05'], #end user 24
                  [11, 52.506203, 13.332373, '2021-10-11 08:30:15'], #start user 11
                  [11, 52.513286, 13.322168, '2021-10-11 08:37:24'],
                  [11, 52.514590, 13.322360, '2021-10-11 08:44:02'],      
                  [11, 52.517522, 13.324961, '2021-10-11 08:49:15'],                  
                  [11, 52.515921, 13.328082, '2021-10-11 08:57:07']] #end user 11
              

# set the indexes of the mandatory columns using arguments latitude, longitude and datetime.
tdf = skmob.TrajDataFrame(list_of_points, user_id=0, latitude=1, longitude=2, datetime=3)
print(type(tdf))

In [None]:
#the TrajDataFrame that contains the two trajectories.
#sort the TrajDataFrame by uid and time (to ensure the continuity of trajectories)
tdf = tdf.sort_by_uid_and_datetime()
tdf

On a `TrajDataFrame` we can perform the same operations as on a pandas DataFrame.

In [None]:
# example 1: filtering
tdf.query("uid==24")

In [None]:
# example 2: group by
tdf.groupby(['uid'], as_index=False).mean()

And many other functions supported by a pandas DataFrame.
<br><br>
`scikit-mobility` allows the visualization of the trajectories of a `TrajDataFrame` on a `Foulim` map using the method `plot_trajectory`.

In [None]:
import warnings
warnings.filterwarnings('ignore')
from skmob.utils.plot import *

tdf.plot_trajectory(zoom=13, weight=3, opacity=0.9, start_end_markers=True)

### 2.2 Spatial Tessellation <a id="spatial_tess"></a>

#### Definition
___
A **spatial tessellation** is a discretization of the spatial region into a set of non-overlapping **tiles**. Usually the tiles are squares or hexagons.
Each tile represents a location.
<br><br>
<img src="https://i.ibb.co/k8586pG/img-def-tessellation.png"  width="500px" height="auto"><br><br>

#### Spatial Tessellation
____
In `scikit-mobility` a tessellation is represented through a `GeoDataFrame` data structure that extends the pandas `DataFrame`.
<br><br>

Each row of a `GeoDataFrame` describes a tile and contains the following columns:

    tile_ID: identifier of the tile
    geometry: geometric shape of the tile
    

In `scikit-mobility` we can obtain a tessellation using the function `tiler` and by specifying the region, the shape, and the granularity of the tessellation.

The available shapes are squared and hexagonal.

In [None]:
from skmob.tessellation import *

tess_berlin = tilers.tiler.get("squared", base_shape="Berlin, Germany", meters=2000)
type(tess_berlin)

In [None]:
tess_berlin[:5]

Visualize the tessellation on a map.
`scikit-mobility` allows the visualization of a `GeoDataFrame` on a `Foulim` map using the function `plot_gdf`.

In [None]:
# style of the tessellation
tex_style = {'fillColor':'gray', 'color':'black', 'opacity': 0.2}

plot_gdf(tess_berlin, style_func_args=tex_style, zoom=10, popup_features=['tile_ID'])

The spatial tessellation can be used for mapping GPS points to their corresponding tile.
After the mapping, usually, at each point is assigned the **centroid's** coordinates of the corresponding tile.<br><br>
The `mapping` function assigns each point of the `TrajDataFrame` to the corresponding tile of a spatial tessellation. If a point has no correspondance the special value `NaN` is used.

In [None]:
#mapping the trajectories w.r.t. the spatial tessellation
tdf.mapping(tess_berlin)[:5]

<img src="https://i.ibb.co/Nnr4Q9T/mapped.png" width="500px" height="auto">

Important!
- the mapping results in a loss of trajectory's details;
- the finer the tessellation, the less details will be lost.

### 2.3 Flow <a id="flow"></a>

#### Definition
____
A **flow**, in human mobility, describes the **movements (flows)** of individuals **between locations**.
While trajectories refer to movements of single objects, flows refer to aggregated movements of objects between a set of locations.

Formally, a flow is a $n\times m$ matrix $M$, where:
- $n$ is the number of distinct origin locations;
- $m$ is the number of distinct destination locations;
- the element $M_{ij}$ contains the number of inviduals moving from location $i$ to location $j$ during the observation period.
<br><br>

A flow represented in a directed graph is a <b>Mobility Network</b>.
<br><br>
<img src="https://i.ibb.co/wSbjdLY/img-def-flow.png"  width="500px" height="auto"><br><br>


#### FlowDataFrame
___
In `scikit-mobility` a flow is modeled through a `FlowDataFrame` data structure that extends the pandas `DataFrame`.
<br><br>

Each row of a `FlowDataFrame` describes a flow and contains the following columns:

    origin: ID of the origin tile
    destination: ID of the destination tile
    flow: number of people travelling from origin to destination
    

A `FlowDataFrame` is associated with a spatial tessellation that discretizes the spatial region into a set of tiles.

___

In `scikit-mobility` a `FlowDataFrame` can be obtined from a `TrajDataFrame` with the method `to_flowdataframe`.


Create the `FlowDataFrame` from the `TrajDataFrame` describing the movements of the individuals moving in Berlin using the squared tessellation.

In [None]:
fdf = tdf.to_flowdataframe(tessellation=tess_berlin, self_loops=False)
type(fdf)

In [None]:
fdf

Visualize the Flow on a map.

`scikit-mobility` allows the visualization of the flows represented by a `FlowDataFrame` on a `Foulim` map using the method `plot_flows`.

In [None]:
tex_style = {'fillColor':'gray', 'color':'black', 'opacity': 0.1}

#first plot the spatial tessellation
map_f = fdf.plot_tessellation(style_func_args=tex_style, tiles='CartoDB positron', 
                                   zoom=12)

#then, usuing map_f as argument, plot the flows
map_f = fdf.plot_flows(map_f=map_f, flow_color='green', tiles='CartoDB positron',
                   opacity=1, flow_weight=2, radius_origin_point=2)

# plot the original trajectory
fdf.plot_flows(map_f=map_f, flow_color='green', tiles='CartoDB positron',
                   opacity=1, flow_weight=2, radius_origin_point=3)


## 3.1 Preprocessing Mobility Data <a id="preprocess"></a>

The main pre-processing steps to deal with mobility data are:
- noise filtering;
- trajectory compression.

In [None]:
# create a fake noisy trajectory

noise_traj =  [[7, 52.505377, 13.440478, '2021-10-11 08:50:12'],#East Side gallery
               [7, 52.516561, 13.4461621, '2021-10-11 08:50:13'],#"Noise" point (fast and far)
               [7, 52.506509, 13.4375157 , '2021-10-11 08:55:02'],
               [7, 52.508453, 13.43466914, '2021-10-11 08:58:05'],#Start group of close points
               [7, 52.508370, 13.43471270, '2021-10-11 08:58:11'],
               [7, 52.508560, 13.43507037, '2021-10-11 08:58:15'],#End group of close points
               [7, 52.509067, 13.43552048, '2021-10-11 09:07:17']]

#create the TrajDataFrame
noisy_tdf = skmob.TrajDataFrame(noise_traj, user_id=0, latitude=1, longitude=2, datetime=3)
noisy_tdf

Let's visualize the trajectory

In [None]:
noisy_tdf.plot_trajectory(hex_color="red", zoom=14)

### 3.1 Noise filtering <a id="noise"></a>

`scikit-mobility` provides the `filter` function to filter out points with speed higher than `max_speed` km/h from the previous point.

In [None]:
from skmob.preprocessing import *

max_speed_kmh = 200
tdf_filtered = filtering.filter(noisy_tdf, max_speed_kmh=max_speed_kmh, 
                                    include_loops=False)
print("Filtered "+str(len(noisy_tdf)-len(tdf_filtered))+" GPS point.")

In [None]:
tdf_filtered

Let's compute the speed! the function `getDistanceByHaversine(p0, p1)` returns the distance in km between point p0 and p1.

In [None]:
dt = 1/3600 #1 second
p0 = noisy_tdf[['lat','lng']].iloc[0].values
p1 = noisy_tdf[['lat','lng']].iloc[1].values

distance = skmob.utils.gislib.getDistanceByHaversine(p0, p1)

print("Speed: "+str(distance/dt)+" km/h.")

### 3.2 Trajectory compression <a id="compression">

`scikit-mobility` provides the `compress` function to reduce the number of points in a trajectory. All points within a radius of `spatial_radius_km` kilometers from a given initial point are compressed into a single point that has the **median coordinates** of all points and the time of the initial point.

<img src="https://i.ibb.co/HrBZqsH/compression.png" width="500px" height="auto">


In [1]:
spatial_radius_km = 30/1000 # 30 meters
tdf_compressed = compression.compress(tdf_filtered, spatial_radius_km=spatial_radius_km)

print("Compressed "+str(len(tdf_filtered)-len(tdf_compressed))+" GPS points.")

NameError: name 'compression' is not defined

Visualization of the noisy (red) and pre-processed (blue) trajectory.

In [None]:
map_f = noisy_tdf.plot_trajectory(hex_color="red", zoom=14)

tdf_compressed.plot_trajectory(map_f=map_f, hex_color="blue", zoom=14)

## 4. Hands on tutorial: create a mobility network from real data <a id="tutorial"></a>

In this tutorial we will see how to create a mobility network relative to New York City describing the movements of one week.

### 4.1 Dataset loading <a id="dataloading"></a>

The dataset includes long-term (about 10 months) check-in data in New York city from Foursquare from 12 April 2012 to 16 February 2013.
The dataset contains 8 columns, which are:

    1. User ID (anonymized)
    2. Venue ID (Foursquare)
    3. Venue category ID (Foursquare)
    4. Venue category name (Fousquare)
    5. Latitude
    6. Longitude
    7. Timezone offset in minutes (The offset in minutes between when this check-in occurred and the same time in UTC)
    8. UTC time
<br>    
The dataset was collected by Dingqi Yang et al. [1].
<br><br><br>
[1] <i>Dingqi Yang, Daqing Zhang, Vincent W. Zheng, Zhiyong Yu. Modeling User Activity Preference by Leveraging User Spatial Temporal Characteristics in LBSNs. IEEE Trans. on Systems, Man, and Cybernetics: Systems, (TSMC), 45(1), 129-142, 2015.</i>

Download the dataset and read it using pandas.

In [4]:
import pandas as pd

# the url at which the dataset is available
url_fs = 'https://drive.google.com/uc?export=download&id=1idA3yrFUpGlNpa466ZuN5udbeGVRYO_s'

#download the dataset and open it in a pandas DataFrame
df = pd.read_csv(url_fs, sep='\t', 
                 names=['uid','venue_id','venue_category_id','venue_category_name','lat',
                        'lng','tmz','datetime'], encoding="ISO-8859-1", header=None)

df.head()

Unnamed: 0,uid,venue_id,venue_category_id,venue_category_name,lat,lng,tmz,datetime
0,470,49bbd6c0f964a520f4531fe3,4bf58dd8d48988d127951735,Arts & Crafts Store,40.71981,-74.002581,-240,Tue Apr 03 18:00:09 +0000 2012
1,979,4a43c0aef964a520c6a61fe3,4bf58dd8d48988d1df941735,Bridge,40.6068,-74.04417,-240,Tue Apr 03 18:00:25 +0000 2012
2,69,4c5cc7b485a1e21e00d35711,4bf58dd8d48988d103941735,Home (private),40.716162,-73.88307,-240,Tue Apr 03 18:02:24 +0000 2012
3,395,4bc7086715a7ef3bef9878da,4bf58dd8d48988d104941735,Medical Center,40.745164,-73.982519,-240,Tue Apr 03 18:02:41 +0000 2012
4,87,4cf2c5321d18a143951b5cec,4bf58dd8d48988d1cb941735,Food Truck,40.740104,-73.989658,-240,Tue Apr 03 18:03:00 +0000 2012


There are a lot of interesting attributes, but keep it simple.<br>
We select only the attributes that are necessary to create the `TrajDataFrame`.

In [5]:
#remember: WHO, WHERE, WHEN
df = df[['uid','lat','lng','datetime']]
df.head()

Unnamed: 0,uid,lat,lng,datetime
0,470,40.71981,-74.002581,Tue Apr 03 18:00:09 +0000 2012
1,979,40.6068,-74.04417,Tue Apr 03 18:00:25 +0000 2012
2,69,40.716162,-73.88307,Tue Apr 03 18:02:24 +0000 2012
3,395,40.745164,-73.982519,Tue Apr 03 18:02:41 +0000 2012
4,87,40.740104,-73.989658,Tue Apr 03 18:03:00 +0000 2012


Create the `TrajDataFrame`

In [6]:
tdf = skmob.TrajDataFrame(df, user_id='uid', latitude='lat', longitude='lng', 
                          datetime='datetime')
tdf = tdf.sort_by_uid_and_datetime()
tdf.head()

Unnamed: 0,uid,lat,lng,datetime
2454,1,40.781558,-73.975792,2012-04-04 23:31:31+00:00
3660,1,40.784018,-73.974524,2012-04-07 17:42:24+00:00
5603,1,40.739398,-73.99321,2012-04-08 18:20:29+00:00
5783,1,40.785677,-73.976498,2012-04-08 20:02:10+00:00
6696,1,40.719929,-74.008532,2012-04-09 16:20:52+00:00


Note that in the original dataset the datetime is in UTC time.
We need to apply the timezone of New York City, and this be done as following:

In [7]:
tdf['datetime'] = tdf['datetime'].dt.tz_convert('US/Eastern')
tdf.head()

Unnamed: 0,uid,lat,lng,datetime
2454,1,40.781558,-73.975792,2012-04-04 19:31:31-04:00
3660,1,40.784018,-73.974524,2012-04-07 13:42:24-04:00
5603,1,40.739398,-73.99321,2012-04-08 14:20:29-04:00
5783,1,40.785677,-73.976498,2012-04-08 16:02:10-04:00
6696,1,40.719929,-74.008532,2012-04-09 12:20:52-04:00


Select only the GPS points from 05/10 to 05/18 (one week)

In [9]:
start = pd.to_datetime('2012/05/10 00:00:00',utc=True)
end = pd.to_datetime('2012/05/18 00:00:00',utc=True)

tdf_week = tdf[(tdf['datetime']>=start)&(tdf['datetime']<end)]

#used to sort the TrajDataFrame by uid and time (to ensure the continuity of trajectories)
tdf_week = tdf_week.sort_by_uid_and_datetime()
tdf_week.head()

Unnamed: 0,uid,lat,lng,datetime
60549,1,40.71969,-74.008543,2012-05-11 12:10:55-04:00
61756,1,40.786766,-73.975734,2012-05-11 21:20:06-04:00
62929,1,40.771426,-73.973501,2012-05-12 14:40:38-04:00
63516,1,40.784101,-73.977686,2012-05-12 18:44:11-04:00
63680,1,40.776187,-73.98242,2012-05-12 20:01:58-04:00


Print some statistics

In [10]:
#one week statistics

print("There are "+str(len(tdf_week))+" GPS points.")
print("There are "+str(len(tdf_week['uid'].unique()))+" users.")

There are 16061 GPS points.
There are 944 users.


Visualize a set of 100 trajectories

In [13]:
tdf_week.plot_trajectory(max_users=100, zoom=9, start_end_markers=False)

From the visualization, it is evident that the dataset contains trajectories made outside New York City (e.g., in New Jersey).
Let's begin the data cleaning!

### 4.2 Preprocessing <a id="pptutorial"></a>

Create and visualize the spatial tessellation relative to New York City.

In [14]:
tess_nyc = tilers.tiler.get("squared", base_shape="New York City, USA", meters=2000)

tex_style = {'fillColor':'gray', 'color':'black', 'opacity': 0.2}
plot_gdf(tess_nyc, style_func_args=tex_style, zoom=10)

NameError: name 'tilers' is not defined

Keep only the trajectory inside the spatial tessellation. <br>
There are necessary two steps:
- Map each point into a tile;
- Discard the trajectory with at least one point outside the region.

Map each point into a tile <br>
Points that do not have a corresponding tile in the spatial tessellation have `tile_ID=NaN`.

In [None]:
mapped_tdf = tdf_week.mapping(tess_nyc, remove_na=False)
mapped_tdf[2004:2008]

In [None]:
points_outside = mapped_tdf[mapped_tdf['tile_ID'].isna()]
uid_outside = points_outside['uid'].unique()
tdf_week_nyc = mapped_tdf[~mapped_tdf['uid'].isin(uid_outside)]

print(str(len(points_outside))+" GPS points outside New York City")
print(str(len(uid_outside))+" trajectories with at least one GPS point outside New York City")

Visualize the trajectories in New York City

In [None]:
tdf_week_nyc.plot_trajectory(max_users=100, zoom=10, start_end_markers=False)

#### Filtering

In [None]:
max_speed_kmh = 300
tdf_filtered_nyc = filtering.filter(tdf_week_nyc, max_speed_kmh=max_speed_kmh)

print("Filtered "+str(len(tdf_week_nyc)-len(tdf_filtered_nyc))+" GPS point.")

#### Compression

In [None]:
spatial_radius_km = 10/1000 #10 meters
tdf_compressed_nyc = compression.compress(tdf_filtered_nyc, 
                                          spatial_radius_km=spatial_radius_km)

print("Compressed "+str(len(tdf_filtered_nyc)-len(tdf_compressed_nyc))+" GPS points.")

In [None]:
print("Statistics after pre-processing:\t")

print("There are "+str(len(tdf_compressed_nyc))+" GPS points.")
print("There are "+str(len(tdf_compressed_nyc['uid'].unique()))+" users.")

Filter out users with only one GPS point (no mobility can be inferred)

In [None]:
# compute the number of points for each user
tdf_gb = tdf_compressed_nyc.groupby(['uid'],as_index=False).count()
tdf_gb[:3]

In [None]:
# list of users with more than one GPS point
users_to_keep = tdf_gb.query("lat>1")['uid']

# TrajDataFrame of users WITH mobility (>1 GPS points)
tdf_final = tdf_compressed_nyc[tdf_compressed_nyc['uid'].isin(users_to_keep)]

print("# users with no mobility: "
      +str(len(tdf_compressed_nyc['uid'].unique())-len(users_to_keep)))

In [None]:
print("Final statistics:\t")

print("There are "+str(len(tdf_final))+" GPS points.")
print("There are "+str(len(tdf_final['uid'].unique()))+" users.")

## 4.3 Flow <a id="mobnet"></a>

To create the mobility network we first need to create flows through a `TraDataFrame`.

In [None]:
#no self loops
tdf_final = tdf_final.drop(['tile_ID'], axis=1)
fdf = tdf_final.to_flowdataframe(tess_nyc, self_loops=False)

In [None]:
fdf.head()

Visualize the flow/mobility network

In [None]:
tex_style = {'fillColor':'gray', 'color':'black', 'opacity': 0.1}
map_f = fdf.plot_tessellation(style_func_args=tex_style, tiles='CartoDB positron', 
                                   zoom=10)
fdf.plot_flows(map_f=map_f, flow_color='green', tiles='CartoDB positron',
                   opacity=0.3, flow_weight=0.6, radius_origin_point=1)

What are the most "important" nodes?
- Note the high flow in the area of Manhattan;
- The JFK Airport (tile 518) plays an important role too.

In [None]:
to_208 = fdf[fdf['destination']=="208"]
others = fdf[fdf['destination']!="208"]

map_f = fdf.plot_tessellation(style_func_args=tex_style, tiles='CartoDB positron',zoom=10)


#others
map_f = others.plot_flows(map_f=map_f, flow_color='green', tiles='CartoDB positron',
                   opacity=0.3, flow_weight=0.6, radius_origin_point=1)
#to 208
map_f = to_208.plot_flows(map_f=map_f, flow_color='blue', tiles='CartoDB positron',
                   opacity=0.7, flow_weight=1.5, radius_origin_point=1)


map_f

## 4.4 Export the Mobility Network <a id="export"></a>

In [None]:
fdf.head()

The `FlowDataFrame` is already in a **weighted edge list** representation, we can save it as a csv to use it later.

In [None]:
fdf.to_csv('mobility_network_nyc_acai2021.csv', sep=",", index=False)

## 5. Conclusions <a id="conclusion"></a>
In this notebook we introduced the basic functionalities of `scikit-mobility`, namely, how to **visualize and represent** trajectories, flows and tessellations, how to **filter and clean raw mobility data** by using standard techniques proposed in the mobility data mining literature, and how to extract a **mobility network** from real data. <br><br>


For any question, please feel free to contact me at giuliano.cornacchia@phd.unipi.it

For any issue, suggestion, bug report please contact us on the official GitHub at https://github.com/scikit-mobility

## 6. Exercises <a id="exercise"></a>

**Exercise 1:**
- Download the dataset containing the GPS traces of 536 taxis operating in San Francisco over 25 days https://crawdad.org/epfl/mobility/20090224/;
- Select only a subset of 5 taxis for this exercise;
- Try to segment the trajectories with respect to each trip (tip: use the attribute `occupancy`);
- Clean and preprocess the dataset using the standard methods;
- Select and visualize a set of trajectories;
- Try to answer the following questions:
    - What are the five most visited tiles of San Francisco?
    - What hour is the peak for departures from the San Francisco airport?
- Create and visualize the mobility network as shown in the tutorial;
- Create and visualize the mobility network by considering only origin and destination points for each trip.

**Exercise 2:**
- Given a trajectory $T$ of $n$ GPS points, and a squared spatial tessellation $S$ in which each square has size $x$. What is the upper bound of the "mapping" error caused by the mapping of $T$ with respect to the tessellation? (see 2.2 of this tutorial)