# Using CDP Databases
Methods for retrieving open access data.

A database schema diagram for production instances of CDP may be found [here](https://github.com/CouncilDataProject/cdptools/blob/master/docs/resources/database_diagram.pdf).

In [1]:
from cdptools import CDPInstance, configs

seattle = CDPInstance(configs.SEATTLE)
seattle.database

<CloudFirestoreDatabase [stg-cdp-seattle]>

### Retrieving a single item
If you know the id of an item in a table, please use the `select_row_by_id` function provided.

In [2]:
event = seattle.database.select_row_by_id(table="event", id="1d57a6d9-965e-4e37-b0fb-a0bcf84fa22e")
event

{'event_id': '1d57a6d9-965e-4e37-b0fb-a0bcf84fa22e',
 'source_uri': 'http://www.seattlechannel.org/mayor-and-council/city-council/2018/2019-governance-equity-and-technology-committee?videoid=x106134',
 'legistar_event_id': 4055,
 'event_datetime': datetime.datetime(2019, 8, 6, 9, 30),
 'agenda_file_uri': 'http://legistar2.granicus.com/seattle/meetings/2019/8/4055_A_Governance%2C_Equity%2C_and_Technology_Committee_19-08-06_Committee_Agenda.pdf',
 'minutes_file_uri': None,
 'video_uri': 'https://video.seattle.gov/media/council/gov_080619_2571925V.mp4',
 'created': datetime.datetime(2019, 8, 7, 4, 25, 27, 506080),
 'body_id': '2d74aeb0-71dd-47bb-a534-df6db760de17',
 'legistar_event_link': 'https://seattle.legistar.com/MeetingDetail.aspx?LEGID=4055&GID=393&G=FFE3B678-CEF6-4197-84AC-5204EA4CFC0C'}

### Retrieving many items from a table

You may not know the id's of items you are looking for. In that case, use the `select_rows_as_list` function provided.

In [3]:
events = seattle.database.select_rows_as_list(table="event")
events[0]

{'event_id': '1d57a6d9-965e-4e37-b0fb-a0bcf84fa22e',
 'body_id': '2d74aeb0-71dd-47bb-a534-df6db760de17',
 'legistar_event_link': 'https://seattle.legistar.com/MeetingDetail.aspx?LEGID=4055&GID=393&G=FFE3B678-CEF6-4197-84AC-5204EA4CFC0C',
 'source_uri': 'http://www.seattlechannel.org/mayor-and-council/city-council/2018/2019-governance-equity-and-technology-committee?videoid=x106134',
 'legistar_event_id': 4055,
 'event_datetime': datetime.datetime(2019, 8, 6, 9, 30),
 'agenda_file_uri': 'http://legistar2.granicus.com/seattle/meetings/2019/8/4055_A_Governance%2C_Equity%2C_and_Technology_Committee_19-08-06_Committee_Agenda.pdf',
 'minutes_file_uri': None,
 'video_uri': 'https://video.seattle.gov/media/council/gov_080619_2571925V.mp4',
 'created': datetime.datetime(2019, 8, 7, 4, 25, 27, 506080)}

### Joining with other tables

In the above event results, notice that a `body_id` is returned for each event. To attach body details to this we can use the python package `pandas` and query the `body` table. Let's first put each of the query results into `pandas.DataFrame` objects.

In [4]:
import pandas as pd

In [5]:
events = pd.DataFrame(events)
events.head()

Unnamed: 0,event_id,body_id,legistar_event_link,source_uri,legistar_event_id,event_datetime,agenda_file_uri,minutes_file_uri,video_uri,created
0,1d57a6d9-965e-4e37-b0fb-a0bcf84fa22e,2d74aeb0-71dd-47bb-a534-df6db760de17,https://seattle.legistar.com/MeetingDetail.asp...,http://www.seattlechannel.org/mayor-and-counci...,4055,2019-08-06 09:30:00,http://legistar2.granicus.com/seattle/meetings...,,https://video.seattle.gov/media/council/gov_08...,2019-08-07 04:25:27.506080
1,36cbb43b-faf0-48aa-96b3-7f201d51b114,887c08bd-ae3b-455a-85bd-3c17502b3121,https://seattle.legistar.com/MeetingDetail.asp...,http://www.seattlechannel.org/mayor-and-counci...,4056,2019-08-06 14:00:00,http://legistar2.granicus.com/seattle/meetings...,,https://video.seattle.gov/media/council/sus_08...,2019-08-07 05:17:33.183874


In [6]:
bodies = seattle.database.select_rows_as_list("body")
bodies = pd.DataFrame(bodies)
bodies.head()

Unnamed: 0,body_id,name,description,created
0,2d74aeb0-71dd-47bb-a534-df6db760de17,"Governance, Equity, and Technology Committee",,2019-08-07 04:25:27.279863
1,887c08bd-ae3b-455a-85bd-3c17502b3121,Sustainability and Transportation Committee,,2019-08-07 05:17:32.926111


In [7]:
expanded_event_details = events.merge(bodies, left_on="body_id", right_on="body_id", suffixes=("_event", "_body"))
expanded_event_details.head()

Unnamed: 0,event_id,body_id,legistar_event_link,source_uri,legistar_event_id,event_datetime,agenda_file_uri,minutes_file_uri,video_uri,created_event,name,description,created_body
0,1d57a6d9-965e-4e37-b0fb-a0bcf84fa22e,2d74aeb0-71dd-47bb-a534-df6db760de17,https://seattle.legistar.com/MeetingDetail.asp...,http://www.seattlechannel.org/mayor-and-counci...,4055,2019-08-06 09:30:00,http://legistar2.granicus.com/seattle/meetings...,,https://video.seattle.gov/media/council/gov_08...,2019-08-07 04:25:27.506080,"Governance, Equity, and Technology Committee",,2019-08-07 04:25:27.279863
1,36cbb43b-faf0-48aa-96b3-7f201d51b114,887c08bd-ae3b-455a-85bd-3c17502b3121,https://seattle.legistar.com/MeetingDetail.asp...,http://www.seattlechannel.org/mayor-and-counci...,4056,2019-08-06 14:00:00,http://legistar2.granicus.com/seattle/meetings...,,https://video.seattle.gov/media/council/sus_08...,2019-08-07 05:17:33.183874,Sustainability and Transportation Committee,,2019-08-07 05:17:32.926111


`left_on` refers to the column name in the dataframe calling the operation.
In this case, the column to merge on is `body_id` in the events results.

Similarly, `right_on` refers to the column name in the dataframe to be passed to the operation.
In this case, the column to merge on is `id` in the bodies results.

`suffixes` is a tuple to use for adding suffixes to any columns with the same name between the two dataframes.
Commonly for CDP query results, these are columns such as `created`, which provide a `datetime` value for when that row was stored in the database.

Please refer to `pandas.DataFrame.merge` documentation for more details.

[reference](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html#pandas.DataFrame.merge)

### Filtering

You may notice that the function: `select_rows_as_list` allows for additional parameters to be passed: `filters`, `order_by`, and `limit`. Currently, it is very easy to filter or order by a single field for CDP instances that use a Cloud Firestore Database.

For example, let's request only events from the "Sustainability and Transportation Committee" (`body_id`: `887c08bd-ae3b-455a-85bd-3c17502b3121`)

In [8]:
from cdptools.databases import WhereOperators

# WhereOperators.eq is short hand for "equal to"
sustainability_meetings = seattle.database.select_rows_as_list("event", filters=[("body_id", WhereOperators.eq, "887c08bd-ae3b-455a-85bd-3c17502b3121")])
sustainability_meetings = pd.DataFrame(sustainability_meetings)
sustainability_meetings

Unnamed: 0,event_id,legistar_event_id,event_datetime,agenda_file_uri,minutes_file_uri,video_uri,created,body_id,legistar_event_link,source_uri
0,36cbb43b-faf0-48aa-96b3-7f201d51b114,4056,2019-08-06 14:00:00,http://legistar2.granicus.com/seattle/meetings...,,https://video.seattle.gov/media/council/sus_08...,2019-08-07 05:17:33.183874,887c08bd-ae3b-455a-85bd-3c17502b3121,https://seattle.legistar.com/MeetingDetail.asp...,http://www.seattlechannel.org/mayor-and-counci...


Unfortunately, for Cloud Firestore Databases, requesting to filter on multiple fields or requesting to filter and order by *may* result in an error but the error will provide  directions to add an index to the database to make that query possible. Please contact the maintainer of that CDP instance you are using and ask if they will add a composite index to the database.

Most operations can be done with simply queries, but in the case they can't, please refer to this [stackoverflow post](https://stackoverflow.com/questions/17071871/select-rows-from-a-dataframe-based-on-values-in-a-column-in-pandas) about filtering down dataframes that can be used while your CDP instance maintainer is adding an additional index.