### EOI 13-05-2019 - Using cartoframes & CARTO VL expression to investigate the NY Citibike system

What does the NY Citibike station system look like right now? Citibike publishes an open feed of station statuses. Let's use cartframes to process this data and send it to your CARTO account and create some maps.

`cartoframes` lets you use CARTO in a Python environment so that you can do all of your analysis and mapping in, for example, a Jupyter notebook. `cartoframes` allows you to use CARTO's functionality for data analysis, storage, location services like routing and geocoding, and visualization. `cartoframes` is based on working with data in a Pandas dataframe. Pandas is a handy python library for data analysis (https://pandas.pydata.org/)

Read the `cartoframes` docs here: http://cartoframes.readthedocs.io/en/latest/

You can view this notebook best on `nbviewer` here: <https://nbviewer.jupyter.org/github/CartoDB/cartoframes/blob/master/examples/Citibike%20Example.ipynb>, however
it is recommended to download this notebook, install cartoframes and dependencies, and use on your computer instead so you can more easily explore the functionality of `cartoframes`.

To get started, let's load the required packages, and set credentials.

In [115]:
import cartoframes

# For convenience we're getting Credentials, Layer, Basemap, and styling
from cartoframes import Credentials
from cartoframes import Layer, BaseMap, styling

import pandas as pd
%matplotlib inline

In [116]:
USERNAME = 'dotgis'  # <-- replace with your username 
APIKEY = ''  # <-- your CARTO API key
creds = Credentials(username=USERNAME, 
                    key=APIKEY)
cc = cartoframes.CartoContext(creds=creds)

Citibike system data can be found here: https://www.citibikenyc.com/system-data
We're going to use the real time data, which comes in General Bikeshare Feed Specification (GBFS) format as a series of JSON files.

In [117]:
# Use Pandas to read a JSON of Citibike stations and their statuses
stations_data = pd.read_json('https://gbfs.citibikenyc.com/gbfs/en/station_information.json')
stations = pd.DataFrame(stations_data.data[0])
status_data = pd.read_json('https://gbfs.citibikenyc.com/gbfs/en/station_status.json')
status = pd.DataFrame(status_data.data[0])

In [118]:
# Grab the last updated timestamps
timestamp_stations = stations_data.last_updated[0]
timestamp_status = status_data.last_updated[0]

In [119]:
# Join the station and statuses together by 'station_id'
station_status = pd.merge(stations,status,how='left', on='station_id')

In [120]:
# Preview the dataframe
station_status.head()

Unnamed: 0,capacity,eightd_has_key_dispenser,eightd_station_services,electric_bike_surcharge_waiver,external_id,has_kiosk,lat,lon,name,region_id,...,eightd_has_available_keys,is_installed,is_renting,is_returning,last_reported,num_bikes_available,num_bikes_disabled,num_docks_available,num_docks_disabled,num_ebikes_available
0,53,False,"[{u'description': u'', u'bikes_availability': ...",False,66db6387-0aca-11e7-82f6-3863bb44ef7c,True,40.734546,-73.990741,Broadway & E 14 St,71,...,False,1,1,1,1557605546,16,1,36,0,0
1,31,False,"[{u'description': u'', u'bikes_availability': ...",False,66db72f1-0aca-11e7-82f6-3863bb44ef7c,True,40.722055,-73.989111,Allen St & Stanton St,71,...,False,1,1,1,1557605591,20,3,8,0,0
2,43,True,[{u'description': u'Citi Bike Station Valet at...,False,66dc8768-0aca-11e7-82f6-3863bb44ef7c,True,40.710451,-73.960876,S 5 Pl & S 5 St,71,...,True,1,1,1,1557605437,7,2,34,0,0
3,27,False,"[{u'description': u'', u'bikes_availability': ...",False,66dd01c5-0aca-11e7-82f6-3863bb44ef7c,True,40.719009,-73.958525,Berry St & N 8 St,71,...,False,1,1,0,1557605078,1,0,26,0,0
4,55,False,,False,66db237e-0aca-11e7-82f6-3863bb44ef7c,True,40.767272,-73.993929,W 52 St & 11 Ave,71,...,False,1,1,1,1557605535,1,3,51,0,0


## `cc.write`

`CartoContext` has several methods for interacting with [CARTO](https://carto.com) in a Python environment. The first one we're using is `cc.write` which will send a Pandas dataframe to your CARTO account.

In [121]:
# Write station status data to CARTO, using string-formatting to name the dataset with the timestmap
cc.write(station_status, 'cb_stations_status_{}'.format(timestamp_stations), lnglat=('lon','lat'), overwrite=True)

Table successfully written to CARTO: https://dotgis.carto.com/dataset/cb_stations_status_1557605607


## `cc.map`

Now that we can inspect the data, we can map it to see how the values change over the geography. We can use the `cc.map` method for this purpose.

`cc.map` takes a `layers` argument which specifies the data layers that are to be visualized. They can be imported from `cartoframes` as below.

There are different types of layers:

* `Layer` for visualizing CARTO tables
* `QueryLayer` for visualizing arbitrary queries from tables in user's CARTO account
* `BaseMap` for specifying the base map to be used

Each of the layers has different styling options. `Layer` and `QueryLayer` take the same styling arguments, and `BaseMap` can be specified to be light/dark and options on label placement.

Maps can be `interactive` or not. Set interactivity with the `interactive` with `True` or `False`. If the map is static (not interactive), it will be embedded in the notebook as either a `matplotlib` axis or `IPython.Image`. Either way, the image will be transported with the notebook. Interactive maps will be embedded zoom and pan-able maps.

In [122]:
# Bring the data back as a map. Style by number of bikes available at each station
# Replace the name of the table with the correct timestamp!

cc.map(layers=[Layer('cb_stations_status_1557605607',
                      color={'column': 'num_bikes_available',
                            'scheme': styling.geyser(7, bin_method='quantiles')},
                      size=6),
               BaseMap(source='dark')],
       interactive=True)

## `cc.query`

`CartoContext` has several methods for retrieving data from your CARTO account into a Pandas dataframe. In this example, we'll use `cc.query` to pass in a SQL query and return the results.

In [123]:
# set up SQL query to find all stations that are not empty

empty_query = '''
        SELECT *
        FROM cb_stations_status_1557605607
        WHERE num_bikes_available > 0 
        '''

In [124]:
# use cartoframes query method, and persist as a new table called empty_stations, also return results as dataframe
new_df = cc.query(empty_query, table_name="not_empty_citybikes")

Table successfully written to CARTO: https://dotgis.carto.com/dataset/not_empty_citybikes


In [112]:
new_df.head()

Unnamed: 0_level_0,capacity,eightd_active_station_services,eightd_has_available_keys,eightd_has_key_dispenser,eightd_station_services,electric_bike_surcharge_waiver,external_id,has_kiosk,is_installed,is_renting,...,num_bikes_disabled,num_docks_available,num_docks_disabled,num_ebikes_available,region_id,rental_methods,rental_url,short_name,station_id,the_geom
cartodb_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
286,27,,False,False,,False,66dcbb2c-0aca-11e7-82f6-3863bb44ef7c,True,1,1,...,0,17,0,0,71,"[KEY, CREDITCARD]",http://app.citibikenyc.com/S6Lr/IBV092JufD?sta...,4237.02,3044,0101000020E6100000454772F90F7C52C0E33785950A57...
1,53,[{u'id': u'0ac0379c-2fe0-4ae6-9b8a-591331fcc63...,False,False,"[{u'description': u'', u'bikes_availability': ...",False,66db6387-0aca-11e7-82f6-3863bb44ef7c,True,1,1,...,1,26,0,0,71,"[KEY, CREDITCARD]",http://app.citibikenyc.com/S6Lr/IBV092JufD?sta...,5905.12,285,0101000020E6100000546CB34E687F52C0C208AF97055E...
2,31,[{u'id': u'25a32ebc-88b9-4d49-bd0c-0c8a9fe81fc...,False,False,"[{u'description': u'', u'bikes_availability': ...",False,66db72f1-0aca-11e7-82f6-3863bb44ef7c,True,1,1,...,2,26,0,0,71,"[KEY, CREDITCARD]",http://app.citibikenyc.com/S6Lr/IBV092JufD?sta...,5484.09,312,0101000020E6100000494739984D7F52C0E674594C6C5C...
3,43,[{u'id': u'babca1c5-344c-4bda-a052-329c170ed04...,True,True,[{u'description': u'Citi Bike Station Valet at...,False,66dc8768-0aca-11e7-82f6-3863bb44ef7c,True,1,1,...,3,16,0,0,71,"[KEY, CREDITCARD]",http://app.citibikenyc.com/S6Lr/IBV092JufD?sta...,5125.03,532,0101000020E6100000B9E00CFE7E7D52C08B34F10EF05A...
4,27,[{u'id': u'31421525-29e6-4af7-802e-dfea42e2771...,False,False,"[{u'description': u'', u'bikes_availability': ...",False,66dd01c5-0aca-11e7-82f6-3863bb44ef7c,True,1,1,...,0,24,0,0,71,"[KEY, CREDITCARD]",http://app.citibikenyc.com/S6Lr/IBV092JufD?sta...,5379.09,3092,0101000020E61000002FE9DE79587D52C0B401D880085C...


## `vector maps`

This module allows users to create interactive vector maps using CARTO VL. The API for vector maps is broadly similar to `CartoContext.map`, with the exception that all styling expressions are expected to be straight CARTO VL expressions. See examples in the CARTO VL styling guide: https://carto.com/developers/carto-vl/reference/#cartoexpressions

In [147]:
# style the stations by capacity. color & (animated) width
from cartoframes.contrib import vector

vector.vmap(
    [vector.Layer(
        'not_empty_citybikes',
        size= '($capacity/10000)*scaled(1) * animation($capacity, 5, fade(1, 1))',
        color= 'opacity(ramp(globalEqIntervals($capacity, 7), sunset), 0.6)',
        strokeColor= 'ramp(globalEqIntervals($capacity, 7), sunset)',
        strokeWidth= 1
    ),
    ],
    context=cc,
    basemap=vector.BaseMaps.darkmatter)