In [1]:
%matplotlib inline

In [2]:
%load_ext google.cloud.bigquery

# BigQuery Query Magic

Jupyter magics are notebook-specific shortcuts that allow you to run commands with minimal syntax. Jupyter notebooks come pre-loaded with many [built-in commands](https://ipython.readthedocs.io/en/stable/interactive/magics.html). The BigQuery client library, `google-cloud-bigquery`, provides a cell magic, `%%bigquery`, which runs a SQL query and returns the results as a Pandas DataFrame.

## Run a query on a public dataset

The following example queries the BigQuery `usa_names` public dataset, which is a Social Security Administration dataset that contains all names from Social Security card applications for births that occurred in the United States after 1879.

The example below shows how to invoke the magic (`%%bigquery`) and pass in a Standard SQL query in the body of the code cell. The results are displayed below the input cell as a [Pandas DataFrame](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html).

In [3]:
%%bigquery
SELECT name, SUM(number) as count
FROM `bigquery-public-data.usa_names.usa_1910_current`
GROUP BY name
ORDER BY count DESC
LIMIT 10

Unnamed: 0,name,count
0,James,5001762
1,John,4875934
2,Robert,4743843
3,Michael,4354622
4,William,3886371
5,Mary,3748377
6,David,3595923
7,Richard,2542659
8,Joseph,2518578
9,Charles,2273860


## Display verbose output

As the query job is running, status messages below the cell update with the query job ID and the amount of time the query has been running. By default, this output is erased and replaced with the results of the query. If you pass the `--verbose` flag, the output will remain below the cell after query completion.

In [4]:
%%bigquery --verbose
SELECT name, SUM(number) as count
FROM `bigquery-public-data.usa_names.usa_1910_current`
GROUP BY name
ORDER BY count DESC
LIMIT 10

Executing query with job ID: 791c0804-adf8-432a-8618-dab212848f03
Query executing: 0.48s


Query complete after 0.93s


Unnamed: 0,name,count
0,James,5001762
1,John,4875934
2,Robert,4743843
3,Michael,4354622
4,William,3886371
5,Mary,3748377
6,David,3595923
7,Richard,2542659
8,Joseph,2518578
9,Charles,2273860


## Explicitly specify a project

By default, the `%%bigquery` magic command uses the project associated with your credentials to run the query. You may also explicitly provide a project ID using the `--project` flag. Note that your credentials must have permissions to create query jobs in the project you specify.

In [5]:
project_id = 'my-project-id'

In [6]:
project_id = 'ajhamilton-scratch'

In [7]:
%%bigquery --project $project_id
SELECT name, SUM(number) as count
FROM `bigquery-public-data.usa_names.usa_1910_current`
GROUP BY name
ORDER BY count DESC
LIMIT 10

Unnamed: 0,name,count
0,James,5001762
1,John,4875934
2,Robert,4743843
3,Michael,4354622
4,William,3886371
5,Mary,3748377
6,David,3595923
7,Richard,2542659
8,Joseph,2518578
9,Charles,2273860


## Assign the query results to a variable

If you would like to save the results of your query to a variable, provide a variable name as a parameter to `%%bigquery`. The example below saves the results of the query to a variable named `df`. Note that when a variable is provided, the results are not displayed below the cell invoking the magic command.

In [8]:
%%bigquery df
SELECT name, SUM(number) as count
FROM `bigquery-public-data.usa_names.usa_1910_current`
GROUP BY name
ORDER BY count DESC
LIMIT 10

Unnamed: 0,name,count
0,James,5001762
1,John,4875934
2,Robert,4743843
3,Michael,4354622
4,William,3886371
5,Mary,3748377
6,David,3595923
7,Richard,2542659
8,Joseph,2518578
9,Charles,2273860


In [9]:
df

Unnamed: 0,name,count
0,James,5001762
1,John,4875934
2,Robert,4743843
3,Michael,4354622
4,William,3886371
5,Mary,3748377
6,David,3595923
7,Richard,2542659
8,Joseph,2518578
9,Charles,2273860


## Run a parameterized query

Parameterized queries are useful if you need to run a query with certain parameters calculated at run time. Note that the value types must be JSON serializable. The example below defines a parameters dictionary and passes it to the `--params` flag. The key of the dictionary is the name of the parameter, and the value of the dictionary is the value of the parameter.

In [10]:
params = {"limit": 10}

In [11]:
%%bigquery --params $params
SELECT name, SUM(number) as count
FROM `bigquery-public-data.usa_names.usa_1910_current`
GROUP BY name
ORDER BY count DESC
LIMIT @limit

Unnamed: 0,name,count
0,James,5001762
1,John,4875934
2,Robert,4743843
3,Michael,4354622
4,William,3886371
5,Mary,3748377
6,David,3595923
7,Richard,2542659
8,Joseph,2518578
9,Charles,2273860
