# BigQuery Parameterization

Google BigQuery Standard SQL [supports parameterization](https://cloud.google.com/bigquery/querying-data#running_parameterized_queries). It is interesting to be able to use Python variables defined in the notebook as parameter values for SQL.

This notebook is an example how to use parameterized queries.

## Data Preview

In [1]:
%load_ext google.datalab.kernel

In [2]:
%%bq query -n logs_query
SELECT * FROM `cloud-datalab-samples.httplogs.logs_20140615`

In [3]:
%bq sample -q logs_query --count 10

timestamp,latency,status,method,endpoint
2014-06-15 08:12:39.711942,256,200,GET,Home
2014-06-15 10:26:53.442199,256,200,GET,Home
2014-06-15 13:04:58.103063,256,200,GET,Home
2014-06-15 13:13:50.615016,256,200,GET,Home
2014-06-15 15:12:02.263743,256,200,GET,Home
2014-06-15 21:17:39.864311,256,200,GET,Home
2014-06-15 22:16:04.044991,256,200,GET,Home
2014-06-15 23:30:32.241880,256,200,GET,Home
2014-06-15 23:55:10.922257,256,200,GET,Home
2014-06-16 04:03:26.749572,256,200,GET,Home


In [4]:
%%bq query
SELECT endpoint FROM `cloud-datalab-samples.httplogs.logs_20140615` GROUP BY endpoint

endpoint
Home
Admin
Other
Create
Recent
Warmup
Popular
Interact1
Interact2
Interact3


# Parameterization within SQL queries

Parameters are declared in SQL queries using a `@name` syntax within the SQL, and then defining `name`'s value when executing the query. Notice you will have to define the query and execute it in two different cells. The shorthand way of running queries (using `%%bq query` without `--name`) gives you little control over the execution of the query.

In [5]:
%%bq query -n endpoint_stats
SELECT *
FROM `cloud-datalab-samples.httplogs.logs_20140615`
WHERE endpoint = @endpoint
LIMIT 10

In [6]:
%%bq execute -q endpoint_stats
parameters:
- name: endpoint
  type: STRING
  value: Interact2

timestamp,latency,status,method,endpoint
2014-06-15 08:13:58.427707,256,200,POST,Interact2
2014-06-15 09:36:24.143390,256,200,POST,Interact2
2014-06-15 18:32:52.298549,256,200,POST,Interact2
2014-06-15 19:38:44.964363,256,200,POST,Interact2
2014-06-15 20:48:31.345635,256,200,POST,Interact2
2014-06-15 23:10:45.617531,256,200,POST,Interact2
2014-06-15 23:21:58.101823,256,200,POST,Interact2
2014-06-16 00:12:58.615643,256,200,POST,Interact2
2014-06-15 07:13:34.071411,257,200,POST,Interact2
2014-06-15 08:56:09.715447,257,200,POST,Interact2


This defined a SQL query with a string parameter named `endpoint`, which can be filled when executing the query. Let's give it some value in a separate cell:

In [7]:
endpoint_val = 'Interact3'

In order to reference the variable defined above, Google Cloud Datalab offers the `$var` syntax, which can be invoked in the magic command:

In [8]:
%%bq execute -q endpoint_stats
parameters:
- name: endpoint
  type: STRING
  value: $endpoint_val

timestamp,latency,status,method,endpoint
2014-06-15 07:00:43.429957,256,200,GET,Interact3
2014-06-15 07:11:51.955861,256,200,GET,Interact3
2014-06-15 07:15:33.557944,256,200,GET,Interact3
2014-06-15 07:29:22.931989,256,200,GET,Interact3
2014-06-15 07:29:59.839051,256,200,GET,Interact3
2014-06-15 07:32:40.753827,256,200,GET,Interact3
2014-06-15 07:34:11.738413,256,200,GET,Interact3
2014-06-15 07:38:24.232697,256,200,GET,Interact3
2014-06-15 07:42:14.313880,256,200,GET,Interact3
2014-06-15 07:46:38.641068,256,200,GET,Interact3


This can also be achieved using the Python API instead of the magic commands (`%%bq`). This is how we will create and execute a parameterized query using the API:

In [9]:
import google.datalab.bigquery as bq
endpoint_stats2 = bq.Query(sql='''
SELECT *
FROM `cloud-datalab-samples.httplogs.logs_20140615`
WHERE endpoint = @endpoint
LIMIT 10
''')

endpoint_value = 'Interact3'

query_parameters = [
  {
    'name': 'endpoint',
    'parameterType': {'type': 'STRING'},
    'parameterValue': {'value': endpoint_value}
  }
]

job = endpoint_stats2.execute(query_params=query_parameters)

job.result()

timestamp,latency,status,method,endpoint
2014-06-15 07:00:43.429957,256,200,GET,Interact3
2014-06-15 07:11:51.955861,256,200,GET,Interact3
2014-06-15 07:15:33.557944,256,200,GET,Interact3
2014-06-15 07:29:22.931989,256,200,GET,Interact3
2014-06-15 07:29:59.839051,256,200,GET,Interact3
2014-06-15 07:32:40.753827,256,200,GET,Interact3
2014-06-15 07:34:11.738413,256,200,GET,Interact3
2014-06-15 07:38:24.232697,256,200,GET,Interact3
2014-06-15 07:42:14.313880,256,200,GET,Interact3
2014-06-15 07:46:38.641068,256,200,GET,Interact3


# Looking Ahead

Parameterization enables one part of the SQL and Python integration: being able to use values in Python code in the notebook, and passing them in as part of the query when retrieving data from BigQuery.

The next notebook will cover the other part of the SQL and Python integration: retrieving query results into the notebook for use with Python code.