# Using CDP Databases
Methods for retrieving open access data.

A database schema diagram for production instances of CDP may be found [here](https://github.com/CouncilDataProject/cdptools/blob/master/docs/resources/database_diagram.pdf).

### Connecting to the database

CDP Seattle uses Firebase's 'Cloud Firestore' to store our data. However, a properly setup database host and associated database module _should_ have the same functionality.

Here is how to connect to the Seattle database for **read only** operations.

**Note:** This notebook connects to the staging instance of Seattle's Firestore database. To use production data, connect to `cdp-seattle`.

In [1]:
from cdptools.databases.cloud_firestore_database import CloudFirestoreDatabase

db = CloudFirestoreDatabase("stg-cdp-seattle")
db

<CloudFirestoreDatabase [stg-cdp-seattle]>

### Retrieving a single item
If you know the id of an item in a table, please use the `select_row_by_id` function provided.

In [2]:
event = db.select_row_by_id(table="event", id="0e3bd59c-3f07-452c-83cf-e9eebeb73af2")
event

{'id': '0e3bd59c-3f07-452c-83cf-e9eebeb73af2',
 'body_id': '6f38a688-2e96-4e33-841c-883738f9f03d',
 'source_uri': 'http://www.seattlechannel.org/mayor-and-council/city-council/2016/2017-gender-equity-safe-communities-and-new-americans-committee?videoid=x78448',
 'video_uri': 'http://video.seattle.gov:8080/media/council/gen_062717V.mp4',
 'created': datetime.datetime(2019, 4, 21, 23, 58, 4, 832481),
 'event_datetime': '2017-06-27T00:00:00'}

### Retrieving many items from a table

You may not know the id's of items you are looking for. In that case, use the `select_rows_as_list` function provided.

In [3]:
events = db.select_rows_as_list(table="event")
events[0]

{'id': '0e3bd59c-3f07-452c-83cf-e9eebeb73af2',
 'created': datetime.datetime(2019, 4, 21, 23, 58, 4, 832481),
 'event_datetime': '2017-06-27T00:00:00',
 'body_id': '6f38a688-2e96-4e33-841c-883738f9f03d',
 'source_uri': 'http://www.seattlechannel.org/mayor-and-council/city-council/2016/2017-gender-equity-safe-communities-and-new-americans-committee?videoid=x78448',
 'video_uri': 'http://video.seattle.gov:8080/media/council/gen_062717V.mp4'}

### Joining with other tables

In the above event results, notice that a `body_id` is returned for each event. To attach body details to this we can use the python package `pandas` and query the `body` table. Let's first put each of the query results into `pandas.DataFrame` objects.

In [4]:
import pandas as pd

In [5]:
events = pd.DataFrame(events)
events

Unnamed: 0,body_id,created,event_datetime,id,source_uri,video_uri
0,6f38a688-2e96-4e33-841c-883738f9f03d,2019-04-21 23:58:04.832481,2017-06-27T00:00:00,0e3bd59c-3f07-452c-83cf-e9eebeb73af2,http://www.seattlechannel.org/mayor-and-counci...,http://video.seattle.gov:8080/media/council/ge...
1,44a794de-6e1d-43dd-ac9f-317924345bdb,2019-04-21 23:23:30.958242,2017-12-06T00:00:00,1ffb5920-3c23-4084-b287-cef74c9c56c8,http://www.seattlechannel.org/mayor-and-counci...,http://video.seattle.gov:8080/media/council/ed...
2,c28e1141-60f2-421d-9c17-e629b57e8890,2019-04-21 23:31:38.209946,2019-04-11T00:00:00,226d8033-666c-49aa-831d-37d04d693106,http://www.seattlechannel.org/mayor-and-counci...,http://video.seattle.gov:8080/media/council/li...
3,8309112f-85c6-458a-8ef8-879907068177,2019-04-22 06:56:26.878303,2016-12-13T00:00:00,3807e904-a7f6-44a9-8116-667aac02ec93,http://www.seattlechannel.org/mayor-and-counci...,http://video.seattle.gov:8080/media/council/ci...
4,318d0a2a-93d1-417b-aa26-e37ad61b81e8,2019-04-21 23:24:47.698886,2015-06-24T00:00:00,614c9534-810f-48b7-b375-afc6e14024cd,http://www.seattlechannel.org/mayor-and-counci...,http://video.seattle.gov:8080/media/council/fi...
5,f993dbf3-47d3-4632-85ea-424852247a4b,2019-04-21 23:30:06.231758,2017-06-09T00:00:00,bcdff355-e045-45ee-b1f5-477cb518a27e,http://www.seattlechannel.org/mayor-and-counci...,http://video.seattle.gov:8080/media/council/su...


In [6]:
bodies = db.select_rows_as_list("body")
bodies = pd.DataFrame(bodies)
bodies

Unnamed: 0,created,description,id,name
0,2019-04-21 23:24:47.435263,,318d0a2a-93d1-417b-aa26-e37ad61b81e8,Finance and Culture Committee
1,2019-04-21 23:23:30.575342,,44a794de-6e1d-43dd-ac9f-317924345bdb,"Education, Equity, and Governance Committee"
2,2019-04-21 23:58:04.378827,,6f38a688-2e96-4e33-841c-883738f9f03d,"Gender Equity, Safe Communities & New Americans"
3,2019-04-22 06:56:26.298234,,8309112f-85c6-458a-8ef8-879907068177,"Civil Rights, Utilities, Economic Development ..."
4,2019-04-21 23:31:37.572810,,c28e1141-60f2-421d-9c17-e629b57e8890,Select Committee on the Library Levy
5,2019-04-21 23:30:05.890924,,f993dbf3-47d3-4632-85ea-424852247a4b,Sustainability and Transportation Committee


In [7]:
expanded_event_details = events.merge(bodies, left_on="body_id", right_on="id", suffixes=("_event", "_body"))
expanded_event_details

Unnamed: 0,body_id,created_event,event_datetime,id_event,source_uri,video_uri,created_body,description,id_body,name
0,6f38a688-2e96-4e33-841c-883738f9f03d,2019-04-21 23:58:04.832481,2017-06-27T00:00:00,0e3bd59c-3f07-452c-83cf-e9eebeb73af2,http://www.seattlechannel.org/mayor-and-counci...,http://video.seattle.gov:8080/media/council/ge...,2019-04-21 23:58:04.378827,,6f38a688-2e96-4e33-841c-883738f9f03d,"Gender Equity, Safe Communities & New Americans"
1,44a794de-6e1d-43dd-ac9f-317924345bdb,2019-04-21 23:23:30.958242,2017-12-06T00:00:00,1ffb5920-3c23-4084-b287-cef74c9c56c8,http://www.seattlechannel.org/mayor-and-counci...,http://video.seattle.gov:8080/media/council/ed...,2019-04-21 23:23:30.575342,,44a794de-6e1d-43dd-ac9f-317924345bdb,"Education, Equity, and Governance Committee"
2,c28e1141-60f2-421d-9c17-e629b57e8890,2019-04-21 23:31:38.209946,2019-04-11T00:00:00,226d8033-666c-49aa-831d-37d04d693106,http://www.seattlechannel.org/mayor-and-counci...,http://video.seattle.gov:8080/media/council/li...,2019-04-21 23:31:37.572810,,c28e1141-60f2-421d-9c17-e629b57e8890,Select Committee on the Library Levy
3,8309112f-85c6-458a-8ef8-879907068177,2019-04-22 06:56:26.878303,2016-12-13T00:00:00,3807e904-a7f6-44a9-8116-667aac02ec93,http://www.seattlechannel.org/mayor-and-counci...,http://video.seattle.gov:8080/media/council/ci...,2019-04-22 06:56:26.298234,,8309112f-85c6-458a-8ef8-879907068177,"Civil Rights, Utilities, Economic Development ..."
4,318d0a2a-93d1-417b-aa26-e37ad61b81e8,2019-04-21 23:24:47.698886,2015-06-24T00:00:00,614c9534-810f-48b7-b375-afc6e14024cd,http://www.seattlechannel.org/mayor-and-counci...,http://video.seattle.gov:8080/media/council/fi...,2019-04-21 23:24:47.435263,,318d0a2a-93d1-417b-aa26-e37ad61b81e8,Finance and Culture Committee
5,f993dbf3-47d3-4632-85ea-424852247a4b,2019-04-21 23:30:06.231758,2017-06-09T00:00:00,bcdff355-e045-45ee-b1f5-477cb518a27e,http://www.seattlechannel.org/mayor-and-counci...,http://video.seattle.gov:8080/media/council/su...,2019-04-21 23:30:05.890924,,f993dbf3-47d3-4632-85ea-424852247a4b,Sustainability and Transportation Committee


`left_on` refers to the column name in the dataframe calling the operation.
In this case, the column to merge on is `body_id` in the events results.

Similarly, `right_on` refers to the column name in the dataframe to be passed to the operation.
In this case, the column to merge on is `id` in the bodies results.

`suffixes` is a tuple to use for adding suffixes to any columns with the same name between the two dataframes.
Commonly for CDP query results, these are columns such as `created`, which provide a `datetime` value for when that row was stored in the database.

Notice that some columns are duplicates. Example: `body_id` and `id_body`. This is an artifact of the merge operation.

Please refer to `pandas.DataFrame.merge` documentation for more details.

[reference](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html#pandas.DataFrame.merge)

### Filtering

You may notice that the function: `select_rows_as_list` allows for additional parameters to be passed: `filters`, `order_by`, and `limit`. Unfortunately, at this time, `filters` is not available for the open access portions of the API. So while you can provide them to the function, they are not actually used. Because of this, you must do filtering on your end. Fortunately however, `pandas` works well for these types of operations.

[stackoverflow](https://stackoverflow.com/questions/17071871/select-rows-from-a-dataframe-based-on-values-in-a-column-in-pandas)

In [8]:
gender_eq = "Gender Equity, Safe Communities & New Americans"
gender_eq_events = expanded_event_details.loc[expanded_event_details["name"] == gender_eq]
gender_eq_events

Unnamed: 0,body_id,created_event,event_datetime,id_event,source_uri,video_uri,created_body,description,id_body,name
0,6f38a688-2e96-4e33-841c-883738f9f03d,2019-04-21 23:58:04.832481,2017-06-27T00:00:00,0e3bd59c-3f07-452c-83cf-e9eebeb73af2,http://www.seattlechannel.org/mayor-and-counci...,http://video.seattle.gov:8080/media/council/ge...,2019-04-21 23:58:04.378827,,6f38a688-2e96-4e33-841c-883738f9f03d,"Gender Equity, Safe Communities & New Americans"
