# Ingest BigQuery data to local IDE

We have experimented to connect the notebook to BigQuery with Google Colab in [bigquery-colab.ipynb](./biquery-colab.ipynb). 
Now, we will set-up the connection on local IDE (VSCode). 

**Question #1**: Though, WHY don't we just work on Google Colab, rather than bear all the steps below just to set-up the connection? 

### Conda environment & Dependencies

First of all, we create the conda environment as the instruction in [README.md](../README.md). On VSCode, choose it as the kernel of your notebook.

> You might notice that we do not need to `pip install` the `google.cloud` as it is already in `requirements.txt`, we install all dependencies into the conda environment. 

**Question #2**: Why do we need conda environment?


In [1]:
from google.cloud import bigquery
from google.oauth2 import service_account
import pandas as pd 

### Provide credentials

Make sure that you ask your GCP admin for a json service account key and put it on `credentials/`

**Question #3**: Why in `.gitignore`, we put `credentials/*`?  

In [2]:
%ls ../credentials/ 

demo-bigquery-editor.json


In [3]:
# Set up credentials 
project_id = 'PROJECT-ID-HERE'
credentials = service_account.Credentials.from_service_account_file('../credentials/demo-bigquery-editor.json')

In [4]:
# test the connection
bq_client = bigquery.Client(
    project=project_id,
    credentials=credentials,
)

### Query data directly from BigQuery

In [6]:
sql_script = '''
SELECT *
FROM `bigquery-public-data.hacker_news.stories`
LIMIT 10
'''

query_job = bq_client.query(sql_script)

In [7]:
query_job.to_dataframe()

Unnamed: 0,id,by,score,time,time_ts,title,url,text,deleted,dead,descendants,author
0,6988445,cflick,0,1388454902,2013-12-31 01:55:02+00:00,Appshare,http://chadflick.ws/appshare.html,Did facebook or angrybirds pay you? We will!,,True,,cflick
1,2130263,alikamp,0,1295692873,2011-01-22 10:41:13+00:00,A Handfull of Gold.,,A handful of gold. Im sure we all believe that...,,True,,alikamp
2,7410550,jeassonlens,0,1394994498,2014-03-16 18:28:18+00:00,Fastest Growing Skin Care Supplement for Incre...,http://naturosciences.com/,Naturo Sciences is a health &amp; beauty speci...,,True,,jeassonlens
3,7164302,annawright010,0,1391303307,2014-02-02 01:08:27+00:00,R4 3ds sdhc,http://www.r4i3dscards.co.uk,R4i3dscards.co.uk is a reliable online store t...,,True,,annawright010
4,7791964,limpeseunomebvw,0,1400888288,2014-05-23 23:38:08+00:00,Empréstimo Com Nome Sujo,http://www.emprestimopessoal-bvw.com.br/empres...,"limpe seu nome online e pela internet, para se...",,True,,limpeseunomebvw
5,4689683,kogir,0,1401561740,2014-05-31 18:42:20+00:00,Placeholder,,Mind the gap.,,,0.0,kogir
6,4712731,kogir,0,1401561740,2014-05-31 18:42:20+00:00,Placeholder,,Mind the gap.,,,0.0,kogir
7,3067098,kogir,0,1401561740,2014-05-31 18:42:20+00:00,Placeholder,,Mind the gap.,,,0.0,kogir
8,4724746,kogir,0,1401561740,2014-05-31 18:42:20+00:00,Placeholder,,Mind the gap.,,,0.0,kogir
9,1253117,kogir,0,1401561740,2014-05-31 18:42:20+00:00,Placeholder,,Mind the gap.,,,0.0,kogir


### TODO: Assignment

Base on your understanding about the dataset from [bigquery-colab.ipynb](./biquery-colab.ipynb) and the reading of [lean-analytics-framework.md](./lean-analytics-framework.md)

1. Design a set of metrics for `hacker_news`
2. Write the code by `duckdb` or `pandas` on this notebook to process data and calculate the metrics
3. Create any charts / tables to present the insights from those metrics

In [None]:
#TODO: Your work from here