# ICT for transport systems
## Laboratory Activity

### Introduction

The goal of this work is to analyze real data coming from two free floating car sharing providers: Enjoy and Car2Go.
In particular, the city of Turin is considered as a case study scenario. Providers' data are enriched with Google Directions API data, for the purpose of bettering metrics accuracy and enabling comparisons between public transports and car sharing services.
The data are collected using different techniques depending on the provider: in the case of Enjoy, the data collection is performed through a Web scraper, while in the case of Car2Go and Google the data come from their API.
The first section of this notebook is focused on data description and exploration, whilst the second part highlights the data analysis aspects.

### Step 1 - Preliminary data analysis


The main entities defined in MongoDB are Databases and Collections. In the next code sections those entities will be wrapped into pymongo objects. Those objects contain all the descriptive informations we need about the data collected.

In [1]:
from pymongo import MongoClient
client = MongoClient('mongodb://localhost:27017/')
client

MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True)

In [2]:
client['CSMS_']

Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), u'CSMS_')

In [20]:
client['CSMS_'].collection_names()

[u'fleet', u'parks', u'books', u'snapshots']

In [3]:
client['CSMS_']['books']

Collection(Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), u'CSMS_'), u'books')

In [4]:
client['CSMS_']['books'].count()

133987

In [17]:
first_timestamp = client['CSMS_']['books'].find().sort("start", 1).limit(1).next()["start"]
last_timestamp = client['CSMS_']['books'].find().sort("start", -1).limit(1).next()["start"]
first_timestamp, last_timestamp

(datetime.datetime(2016, 12, 4, 19, 59, 58, 829000),
 datetime.datetime(2017, 1, 10, 20, 5, 26, 171000))

In [13]:
cursor = client['CSMS_']['books'].find({"provider" : "car2go"}).sort("start", 1).limit(1)
cursor.next()

{u'_id': ObjectId('587b6362f5fd7358e43fd25b'),
 u'arrival_time_google_transit': datetime.datetime(2017, 1, 23, 21, 46, 3),
 u'city': u'torino',
 u'departure_time_google': datetime.datetime(2017, 1, 23, 21, 8, 12),
 u'distance_driving': 6.679,
 u'distance_google_transit': 7.111,
 u'duration_driving': 18.483333333333334,
 u'duration_google_transit': 37.85,
 u'duration_in_traffic_google': 16.866666666666667,
 u'end': datetime.datetime(2016, 12, 5, 21, 24, 8, 397000),
 u'end_fuel': 50.0,
 u'end_lat': 45.06555,
 u'end_lon': 7.63681,
 u'fare_google_transit': 1.5,
 u'plate': u'224/FF078SJ',
 u'provider': u'car2go',
 u'start': datetime.datetime(2016, 12, 5, 21, 4, 31, 10000),
 u'start_fuel': 53.0,
 u'start_lat': 45.07784,
 u'start_lon': 7.69347,
 u'tot_duration_google_transit': 41.5333339}

In [18]:
cursor = client['CSMS_']['books'].find({"provider" : "enjoy"}).sort("start", 1).limit(1)
cursor.next()

{u'_id': ObjectId('587b3f89f5fd733718c45de9'),
 u'city': u'torino',
 u'end': datetime.datetime(2016, 12, 4, 20, 16, 11, 652000),
 u'end_fuel': 4.0,
 u'end_lat': 45.070843,
 u'end_lon': 7.6936703,
 u'plate': u'EZ118GW',
 u'provider': u'enjoy',
 u'start': datetime.datetime(2016, 12, 4, 19, 59, 58, 829000),
 u'start_fuel': 5.0,
 u'start_lat': 45.026306,
 u'start_lon': 7.676534}