<a href="https://colab.research.google.com/github/ICBI/Data.Bridge.Notebooks/blob/main/Tutorial2_BigQueryAPI.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Tutorial 2 - Accessing the Georgetown SDOH Datawarehouse through an API

Team members: Peter McGarvey, Adil Alaoui, Yili Zhang, Jia Li Dong, Krithika Bhuvaneshwar, Camelia Bencheqroun

Affiliation: Innovation Center for Biomedical Informatics (Georgetown-ICBI), Georgetown University Medical Center

## Pre-requisites
* Users must have an google account
* Users must enable cloud access
* Users must generate a JSON File with credentials . Instructions are [here](https://cloud.google.com/bigquery/docs/authentication/service-account-file)
* Access/Credentials: Please contact icbi@georgetown.edu to request read-only access to the Georgetown-SDOH BigQuery database
* NOTE : Please do NOT share with JSON File with anyone else, or upload it into any public repository
* Upload this JSON File to your Google Drive "MyDrive folder". This is its path: `/content/drive/MyDrive/`



## Installation and Authentication

To get into BigQuery, you needed to authenticate using Google Application Credentials via BigQuery API. \ Upload the JSON file into your drive, and run the following code:



In [1]:
from google.colab import drive # Mount Google Drive
drive.mount('/content/drive')

Mounted at /content/drive


Reminder: Upload your JSON credentials file to your Google Drive "MyDrive folder". This is its path: `/content/drive/MyDrive/`

For this example, the JSON file is in **/content/sdoh-352614-66c1df96e90e.json**. Then, initialize the BigQuery client using this code:

In [2]:
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = '/content/drive/MyDrive/sdoh-352614-66c1df96e90e.json'

In [3]:
from google.cloud import bigquery
# simple non parameterized query
client = bigquery.Client()

### Process
* Write the SQL query inside three quotations. Then, query into the client using SQL.
* Use the SQL syntax of ```SELECT * FROM the_location_of_your_project.the_location_of_your_dataset.the_table_location``` **inside** the client query with three single quotation marks in order to obtain the data.






## Example query with sample code
For example, lets say we want to query BigQuery table named **SVI 2018**, which is inside the BigQuery dataset  **Social_Determinants**.
The **Social_Determinants** dataset is inside a BigQuery project named **sdoh-352614**.
We want to save the resulting data table into a new dataframe in Colab called **df1**.

**Quick Summary of example**
* Query BigQuery table: `SVI 2018`
* Name of dataset: `Social_Determinants`
* Name of BigQuery project: `sdoh-352614`
* Save result into object : `df1`
* This is the syntax of the SELECT query that contains the table name, dataset and project name:
```SELECT * FROM `sdoh-352614.Social_Determinants.SVI 2018` ```

* This is the full syntax with the query wrapped inside the client statement using three single quotes
```
df1 = client.query('''
SELECT * FROM `sdoh-352614.Social_Determinants.SVI 2018`
''').to_dataframe()

```

In [5]:
query1 = """
    SELECT *
    FROM `sdoh-352614.Social_Determinants.NDI 2013-2017`

"""
df1 = client.query(query1).to_dataframe()
df1.head()

Unnamed: 0,TractID,StCoFIPS,StAbbr,NDI,NDIQuint,MedHHInc,PctRecvIDR,PctPubAsst,MedHomeVal,PctMgmtBusSciArt,PctFemHeadKids,PctOwnerOcc,PctNoPhone,PctNComPlmb,PctEducHSPlus,PctEducBchPlus,PctFamBelowPov,PctUnempl
0,1101003000,1101,AL,,9-NDI not avail,26635,2.5,40.9,62300.0,,26.1,46.3,2.1,3.5,72.2,11.6,40.3,22.8
1,2016000100,2016,AK,,9-NDI not avail,62083,,1.0,81300.0,28.1,,60.8,4.2,2.1,92.0,13.8,12.3,3.5
2,2105000200,2105,AK,,9-NDI not avail,42500,65.0,,,39.3,3.0,75.0,6.3,9.4,83.9,16.1,5.3,12.5
3,2170000101,2170,AK,,9-NDI not avail,37222,66.9,,128100.0,22.6,6.6,82.4,6.3,22.4,87.8,11.5,22.0,15.8
4,2290000100,2290,AK,,9-NDI not avail,27222,53.4,,81100.0,38.3,14.2,71.7,7.1,56.4,77.6,8.7,34.7,23.1


Congratulations! You are now able to run a query via the BigQuery API!