# Introduction

You've learned how to use these clauses in your queries: 

    SELECT ... 
    FROM ...
    (WHERE) ...
    GROUP BY ...
    (HAVING) ...
    ORDER BY

With all this, your queries are getting pretty long, which can make them hard understand (and debug).

You are about to learn how to use AS and WITH to tidy up your queries and make them easier to read.


# AS

**AS** lets you refer to the the columns generated by your queries with different names, which is also know as "aliasing". (Similar to how Python uses `as` for aliasing when doing imports like `import pandas as pd` or `import seaborn as sns`.)

To use **AS** in SQL, insert it right after the name of the column you select. Here's an example of a query **without** an **AS** clause:  

        SELECT EXTRACT(DAY FROM column_with_timestamp), data_point_3
        FROM `bigquery-public-data.imaginary_dataset.imaginary_table`
And here's an example of the same query, but with **AS**.

        SELECT EXTRACT(DAY FROM column_with_timestamp) AS day,
                data_point_3 AS data
        FROM `bigquery-public-data.imaginary_dataset.imaginary_table`
Both of these queries will return the exact same table, but in the second query the columns returned will be called `day` and `data`, rather than the default names of `_f0` and `data_point_3`.

# WITH... AS

On its own, **AS** is a convenient way to make your code easier to read and tidy up the data returned by your query. It's even more powerful when combined with **WITH** in what's called a "common table expression" or CTE.

> **Common table expression**: A temporary table that you return within your query. You can write queries against the new table you've created. CTE's only exist inside the query where you create them, so you can't reference them in later queries.

CTE's are helpful for splitting your queries into readable chunks.

We'll revisit the pets table, but now it includes information on the ages of all the different animals. These are in a column called "Years_old":

![](https://i.imgur.com/01s9TwR.png)

You might want to ask questions about older animals in particular.  So you can start by creating a CTE which only contains information about animals more than five years old like this:

    # note that this query won't return anything!
    WITH Seniors AS 
            (
                SELECT ID, Name
                FROM `bigquery-public-data.pet_records.pets`
                WHERE Years_old > 5
            )
This creates the following temporary table that you can then refer to in the rest of our query, which only has the ID and Name of the animals that are seniors:

![](https://i.imgur.com/LBippKL.png)

If you wanted additional information about this table, you could write a query under it. So this query will create the CTE shown above, and then return all the ID's from it (in this case just 2 and 4).

    WITH Seniors AS 
            (
                SELECT ID, Name
                FROM `bigquery-public-data.pet_records.pets`
                WHERE Years_old > 5
            )
    SELECT ID
    FROM Seniors
    
You could do this without a CTE, but if this were the first part of a very long query, removing the CTE would make it much harder to follow.

# Example: How many Bitcoin transactions are made per month?

We're going to use a common table expression (CTE) to find out how many Bitcoin transactions were made each day for the entire timespan of a bitcoin transaction dataset.

We'll investigate the "transactions" table. Here is a view of the first few rows.

In [None]:
# import package with helper functions 
import bq_helper

# create a helper object for this dataset
bitcoin_blockchain = bq_helper.BigQueryHelper(active_project="bigquery-public-data",
                                              dataset_name="crypto_bitcoin")

# print the first couple rows of the "transactions" table
bitcoin_blockchain.head("transactions")

Since the "block_timestamp" column contains the date of each transaction in DATETIME format, we'll convert these into DATE format using the `DATE()` command.

We do that using a CTE, and then the next part of the query counts the number of transactions for each date and sorts the table so that earlier dates appear first. 

In [None]:
query = """ WITH time AS 
            (
                SELECT DATE(block_timestamp) AS trans_date
                FROM `bigquery-public-data.crypto_bitcoin.transactions`
            )
            SELECT COUNT(1) AS transactions,
                   trans_date
            FROM time
            GROUP BY trans_date
            ORDER BY trans_date
        """

# note that max_gb_scanned is set to 25, rather than 1. This was a huge dataset
transactions_by_date = bitcoin_blockchain.query_to_pandas_safe(query, max_gb_scanned=25)

Since they're returned sorted, we can plot the raw results to show us the number of Bitcoin transactions per day over the whole timespan of this dataset.

In [None]:
transactions_by_date.set_index('trans_date').plot()

As you can see, common table expressions let you shift a lot of your data cleaning into SQL. That's an especially good thing in the case of BigQuery because it is vastly faster than doing the work in Pandas.

# Your Turn
You now have the tools to stay organized even when writing more complex queries.  Now **[use it here](#$NEXT_NOTEBOOK_URL$)**.
