## Setting-up and accessing Google Cloud Platform (Big Query) via the client

**Installing packages** 

* virtualenv (recomended for installing google-cloud packages) _Virtual environments help to avoid possible conflicts between different packages_
    * https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/
    * https://janakiev.com/blog/jupyter-virtual-envs/ This article gives good instructions how to create a virtual environment with Anaconda and how to add it in your Jupyter notebook

* google-cloud-bigquery 
    * https://cloud.google.com/bigquery/docs/reference/libraries 
    
**Creating a virtual environment with Anaconda and adding it to Jupyter** 

1. Open Anaconda prompt
2. _conda create -n myenv_  where myenv is any name you want to set for your virtual environment, it is stored in the envs folder in your Anaconda directory. 
3. To start working in it: _conda activate myenv_; 
    * to stop: _conda deactivate_
    * to list all available environments: _conda env list_
    * to remove an environment: conda env list _conda env remove -n myenv_
4. After activating the virtual environment, you need to add it to Jupyter. First, install ipykernel which provides the IPython kernel for Jupyter _pip install --user ipykernel_
5. This command will add the environment to Jupyter _python -m ipykernel install --user --name=myenv_
6. You should see the following output _Installed kernelspec myenv in /home/user/.local/share/jupyter/kernels/myenv_
7. Now you are able to choose this new environment as a kernel in Jupyter 
    * In an open notebook: Kernel --> Change Kernel --> myenv
    * Once you remove the virtual environment, you can remove the kernel from Jupyter: _jupyter kernelspec uninstall myenv_
8. Again from Anaconda prompt (make sure you are in the virtual environment): _pip install google-cloud-bigquery_

**Setting up authentication on Google Cloud** (you can do the same using the command line)
1. In the Cloud Console, select the relevant project and go to the Create service account key page.  https://console.cloud.google.com/apis/credentials/serviceaccountkey 
2. From the Service account list, select New service account. In the Service account name field, enter a name. From the Role list, select Project > Owner.
3. Save the generated json file

In [1]:
# Import the packages
import os, sys
from google.cloud import bigquery
import pandas as pd

In [2]:
# Add path to your .json file with credentials
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = os.path.abspath("YOUR_PATH/file.json")

In [None]:
# Construct a BigQuery client object.
client = bigquery.Client()

Optionally you can configure settings of your query

https://googleapis.dev/python/bigquery/latest/generated/google.cloud.bigquery.job.QueryJobConfig.html

    settings = bigquery.QueryJobConfig(setting1=value, setting2=value, ...., )

Note: dry_run = True. This setting is very useful in the beginning: instead of actually running the survey, it estimates its processing and billing costs. 

In [37]:
# Example query
query = """
    SELECT name, SUM(number) as total_people
    FROM `bigquery-public-data.usa_names.usa_1910_2013`
    WHERE state = 'TX'
    GROUP BY name, state
    ORDER BY total_people DESC
    LIMIT 10
"""

In [38]:
# Before actually executing the query, let's check its processing costs
settings = bigquery.QueryJobConfig(dry_run = True) 
job = client.query(query, job_config=settings)
print("Total GB that will be processed: ", job.total_bytes_processed/1000000000)
print("Bytes billed: ", job.total_bytes_billed)

Total GB that will be processed:  0.110355534
Bytes billed:  0


In [39]:
# Now let's run the query and convert the results to a dataframe
job = client.query(query)  # Make an API request
job.to_dataframe()

Unnamed: 0,name,total_people
0,James,272793
1,John,235139
2,Michael,225320
3,Robert,220399
4,David,219028
5,Mary,209893
6,William,173092
7,Jose,157362
8,Christopher,144196
9,Maria,131056
