# Working with CDP Transcripts

Methods for retrieving open access data.

A database schema diagram for production instances of CDP may be found [here](https://github.com/CouncilDataProject/cdptools/blob/master/docs/resources/database_diagram.pdf).

# Connecting to resources

Having access to both the CDP instance's database and file store with make accessing and using the transcripts easiest.

For details on database usage, refer to the notebook example on database basics [here](./database.ipynb).

For details on file store usage, refer to the notebook example on file store basics [here](./file_store.ipynb).

**Note:** This notebook connects to the staging instance of Seattle's Firestore database and file store. To use production data, connect to the Cloud Firestore instance: `cdp-seattle`. To use production files, connect to the GCS instance: `cdp-seattle.appspot.com`.

In [2]:
from cdptools.databases.cloud_firestore_database import CloudFirestoreDatabase
from cdptools.file_stores.gcs_file_store import GCSFileStore
import pandas as pd

db = CloudFirestoreDatabase("stg-cdp-seattle")
fs = GCSFileStore("stg-cdp-seattle.appspot.com")
db, fs

(<CloudFirestoreDatabase [stg-cdp-seattle]>,
 <GCSFileStore [stg-cdp-seattle.appspot.com]>)

### Find all transcripts

Simple query the transcript table!

In [3]:
transcripts = pd.DataFrame(db.select_rows_as_list("transcript"))
transcripts

Unnamed: 0,confidence,created,event_id,file_id,id
0,0.947337,2019-04-21 23:31:38.809855,226d8033-666c-49aa-831d-37d04d693106,43b2d231-5a0e-4c5b-876e-51859e86f0da,658bfe6b-6efc-4efc-b7c9-de53a1a98651
1,0.924026,2019-04-21 23:30:06.600107,bcdff355-e045-45ee-b1f5-477cb518a27e,0aceb6c8-3f7c-494f-9a97-cf5c319acb77,a5ae7d7d-3bc2-4c3a-b07a-7829a8abf1c9
2,0.930153,2019-04-21 23:58:05.245933,0e3bd59c-3f07-452c-83cf-e9eebeb73af2,ebbd9727-d3ef-41ea-82b5-1cda7d1ca050,bb31c1eb-021d-4eb4-8c34-ec97e8871828
3,0.933456,2019-04-21 23:24:47.975906,614c9534-810f-48b7-b375-afc6e14024cd,480cb0a9-0c5f-4791-8795-ca71583eb785,d147a3ac-2b08-462f-b2ba-c7c18c182c2d
4,0.92734,2019-04-21 23:23:31.532067,1ffb5920-3c23-4084-b287-cef74c9c56c8,0099dfe6-ee18-4a16-a9b5-e7a053d9582d,d4e6bb99-6624-44af-a25f-dfde9f69bf29


### Join file, event, and body information

While the above results are somewhat useful, we should probably merge information from the other tables too...

In [5]:
# Get the other tables
events = pd.DataFrame(db.select_rows_as_list("event"))
bodies = pd.DataFrame(db.select_rows_as_list("body"))
files = pd.DataFrame(db.select_rows_as_list("file"))

# Merge the transcripts