# Using CDP Databases
Methods for retrieving open access data.

A database schema diagram for production instances of CDP may be found [here](https://github.com/CouncilDataProject/cdptools/blob/master/docs/resources/database_diagram.pdf).

### Connecting to the database

CDP Seattle uses Firebase's 'Cloud Firestore' to store our data. However, a properly setup database host and associated database module _should_ have the same functionality.

Here is how to connect to the Seattle database for **read only** operations.

**Note:** This notebook connects to the staging instance of Seattle's Firestore database. To use production data, connect to `cdp-seattle`.

In [1]:
from cdptools.databases.cloud_firestore_database import CloudFirestoreDatabase

db = CloudFirestoreDatabase("stg-cdp-seattle")
db

<CloudFirestoreDatabase [stg-cdp-seattle]>

### Retrieving a single item
If you know the id of an item in a table, please use the `select_row_by_id` function provided.

In [2]:
event = db.select_row_by_id(table="event", id="35dd5e95-c233-493c-b830-07c95d84293a")
event

{'event_id': '35dd5e95-c233-493c-b830-07c95d84293a',
 'source_uri': 'http://www.seattlechannel.org/mayor-and-council/city-council/seattle-park-district-board?videoid=x92993',
 'video_uri': 'http://video.seattle.gov:8080/media/council/parkdist_062518V.mp4',
 'created': datetime.datetime(2019, 5, 31, 6, 25, 53, 975500),
 'event_datetime': datetime.datetime(2018, 6, 25, 0, 0),
 'body_id': 'f6f1dd25-a842-4874-a2eb-8104599b5fc8'}

### Retrieving many items from a table

You may not know the id's of items you are looking for. In that case, use the `select_rows_as_list` function provided.

In [3]:
events = db.select_rows_as_list(table="event")
events[0]

{'event_id': '35dd5e95-c233-493c-b830-07c95d84293a',
 'video_uri': 'http://video.seattle.gov:8080/media/council/parkdist_062518V.mp4',
 'created': datetime.datetime(2019, 5, 31, 6, 25, 53, 975500),
 'event_datetime': datetime.datetime(2018, 6, 25, 0, 0),
 'body_id': 'f6f1dd25-a842-4874-a2eb-8104599b5fc8',
 'source_uri': 'http://www.seattlechannel.org/mayor-and-council/city-council/seattle-park-district-board?videoid=x92993'}

### Joining with other tables

In the above event results, notice that a `body_id` is returned for each event. To attach body details to this we can use the python package `pandas` and query the `body` table. Let's first put each of the query results into `pandas.DataFrame` objects.

In [4]:
import pandas as pd

In [5]:
events = pd.DataFrame(events)
events.head()

Unnamed: 0,body_id,created,event_datetime,event_id,source_uri,video_uri
0,f6f1dd25-a842-4874-a2eb-8104599b5fc8,2019-05-31 06:25:53.975500,2018-06-25,35dd5e95-c233-493c-b830-07c95d84293a,http://www.seattlechannel.org/mayor-and-counci...,http://video.seattle.gov:8080/media/council/pa...
1,da09c03b-e820-4002-b9a7-bdf777d9fa1d,2019-05-31 06:42:07.366939,2019-01-22,6893c187-da0d-429d-8116-e0343b7066cf,http://www.seattlechannel.org/FullCouncil?vide...,http://video.seattle.gov:8080/media/council/co...


In [6]:
bodies = db.select_rows_as_list("body")
bodies = pd.DataFrame(bodies)
bodies.head()

Unnamed: 0,body_id,created,description,name
0,02330e27-2a6e-4de6-bf70-7c05fcb268df,2019-05-31 05:53:37.328200,,Full Council
1,42a6b9df-aa42-45c5-9991-9cf445517bb2,2019-05-16 18:39:11.235766,,"Central Waterfront, Seawall, and Alaskan Way V..."
2,8f23cb96-200c-4fb0-ad5d-4a0213f7f4ac,2019-05-16 18:34:06.156169,,Education and Governance Committee
3,b7c392fd-7580-41cf-a25b-6b9419ad9266,2019-05-16 18:46:27.182217,,"Planning, Land Use, and Zoning Committee"
4,da09c03b-e820-4002-b9a7-bdf777d9fa1d,2019-05-31 06:42:07.101987,,City Council


In [7]:
expanded_event_details = events.merge(bodies, left_on="body_id", right_on="body_id", suffixes=("_event", "_body"))
expanded_event_details.head()

Unnamed: 0,body_id,created_event,event_datetime,event_id,source_uri,video_uri,created_body,description,name
0,f6f1dd25-a842-4874-a2eb-8104599b5fc8,2019-05-31 06:25:53.975500,2018-06-25,35dd5e95-c233-493c-b830-07c95d84293a,http://www.seattlechannel.org/mayor-and-counci...,http://video.seattle.gov:8080/media/council/pa...,2019-05-31 06:25:53.735517,,Seattle Park District Board Meeting
1,da09c03b-e820-4002-b9a7-bdf777d9fa1d,2019-05-31 06:42:07.366939,2019-01-22,6893c187-da0d-429d-8116-e0343b7066cf,http://www.seattlechannel.org/FullCouncil?vide...,http://video.seattle.gov:8080/media/council/co...,2019-05-31 06:42:07.101987,,City Council


`left_on` refers to the column name in the dataframe calling the operation.
In this case, the column to merge on is `body_id` in the events results.

Similarly, `right_on` refers to the column name in the dataframe to be passed to the operation.
In this case, the column to merge on is `id` in the bodies results.

`suffixes` is a tuple to use for adding suffixes to any columns with the same name between the two dataframes.
Commonly for CDP query results, these are columns such as `created`, which provide a `datetime` value for when that row was stored in the database.

Please refer to `pandas.DataFrame.merge` documentation for more details.

[reference](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html#pandas.DataFrame.merge)

### Filtering

You may notice that the function: `select_rows_as_list` allows for additional parameters to be passed: `filters`, `order_by`, and `limit`. Unfortunately, at this time, `filters` is not available for the open access portions of the API. So while you can provide them to the function, they are not actually used. Because of this, you must do filtering on your end. Fortunately however, `pandas` works well for these types of operations.

[stackoverflow](https://stackoverflow.com/questions/17071871/select-rows-from-a-dataframe-based-on-values-in-a-column-in-pandas)

In [8]:
parks = "Seattle Park District Board Meeting"
parks_events = expanded_event_details.loc[expanded_event_details["name"] == parks]
parks_events

Unnamed: 0,body_id,created_event,event_datetime,event_id,source_uri,video_uri,created_body,description,name
0,f6f1dd25-a842-4874-a2eb-8104599b5fc8,2019-05-31 06:25:53.975500,2018-06-25,35dd5e95-c233-493c-b830-07c95d84293a,http://www.seattlechannel.org/mayor-and-counci...,http://video.seattle.gov:8080/media/council/pa...,2019-05-31 06:25:53.735517,,Seattle Park District Board Meeting
