# Intro

So far, you've learned how to use the following clauses: 
    
    SELECT ... 
    FROM ...
    (WHERE) ...
    GROUP BY ...
    (HAVING) ...

You also learned how to use aggregations like COUNT().

Now you'll learn how change the order of your results using the ORDER BY clause. You'll also see how to work with dates in SQL.


### ORDER BY

ORDER BY is usually the last clause you'll put in your query, since you're going to want to use it to sort the results returned by the rest of your query.

Let's see an example on this familiar table. 

![](https://i.imgur.com/QRgb4iL.png). 

#### Ordering by a numeric column

We can reorder this with the following. 

    SELECT ID, Name, Animal
    FROM `bigquery-public-data.pet_records.pets`
    ORDER BY ID

The results look like

![](https://i.imgur.com/zEXDTKS.png)

    
#### Ordering by a text column

You can also order by text columns, and alphabetical order is used.

    SELECT ID, Name, Animal
    FROM `bigquery-public-data.pet_records.pets`
    ORDER BY Animal
![](https://i.imgur.com/E7qjnf9.png)

#### Reversing the order

You can reverse the sort order (reverse alphabetical order for text columns or high to low for numeric columns) using the **DESC** argument (it is short for 'descending')

This query sorts the selected columns by the Animal column, but the values that are last in alphabetic order will be returned first.

    SELECT ID, Name, Animal
    FROM `bigquery-public-data.pet_records.pets`
    ORDER BY Animal DESC
![](https://i.imgur.com/DREYNFF.png)
 
### Dates

Finally, let's talk about dates, because they come up very frequently in most databases.

There are two different ways that a date can be stored in BigQuery: as a DATE or as a DATETIME. 

The **DATE** format has the year first, then the month, and then the day. It looks like this:

    YYYY-[M]M-[D]D
* YYYY: Four-digit year
* [M]M: One or two digit month
* [D]D: One or two digit day


The **DATETIME** format is like the date format... but with time added at the end.


Often you'll want to look at part of a date, like the year or the day. You can do this with the EXTRACT. 

This query will return one column with just the day of each date in the column_with_timestamp column: 

            SELECT EXTRACT(DAY FROM column_with_timestamp)
            FROM `bigquery-public-data.imaginary_dataset.imaginary_table`

SQL is very smart about dates and we can ask for information beyond just extracting part of the cell. For example, this query returns one column with just the week in the year (between 1 and 53) of each date in the column_with_timestamp column: 

            SELECT EXTRACT(WEEK FROM column_with_timestamp)
            FROM `bigquery-public-data.imaginary_dataset.imaginary_table`

SQL has a lot of power when it comes to dates, and that lets you ask very specific questions using this information. You can find all the functions you can use with dates in BigQuery [on this page](https://cloud.google.com/bigquery/docs/reference/legacy-sql), under "Date and time functions".  

### Example: Which day of the week has the most fatal motor accidents?

Let's use the US Traffic Fatality Records database, which contains information on traffic accidents in the US where at least one person died.

First, we need to get our environment set up.

In [1]:
# import package with helper functions 
import bq_helper

# create a helper object for this dataset
accidents = bq_helper.BigQueryHelper(active_project="bigquery-public-data",
                                   dataset_name="nhtsa_traffic_fatalities")

ModuleNotFoundError: No module named 'bq_helper'

Now we'll count the unique id's (in this table they're called "consecutive_number") as well as the day of the week for each accident. Then sort the table so the days with the most accidents are on returned first.

In [None]:
# query to find out the number of accidents which 
# happen on each day of the week
query = """SELECT COUNT(consecutive_number) num_accidents, 
                  EXTRACT(DAYOFWEEK FROM timestamp_of_crash)
            FROM `bigquery-public-data.nhtsa_traffic_fatalities.accident_2015`
            GROUP BY EXTRACT(DAYOFWEEK FROM timestamp_of_crash)
            ORDER BY COUNT(consecutive_number) DESC
        """

As usual, we run it as follows:

In [None]:
# the query_to_pandas_safe method will cancel the query if
# it would use too much of your quota, with the limit set 
# to 1 GB by default
accidents_by_day = accidents.query_to_pandas_safe(query)

That gives a Pandas dataframe. If you know matplotlib already, you might plot that as follows:

In [None]:
# library for plotting
import matplotlib.pyplot as plt

# make a plot to show that our data is, actually, sorted:
plt.plot(accidents_by_day.num_accidents)
plt.title("Number of Accidents by Rank of Day \n (Most to least dangerous)")
plt.show()

In [None]:
print(accidents_by_day)

To map the numbers returned for the day of the week (the second column) to the actual day, you might consult [the BigQuery documentation on the DAYOFWEEK function](https://cloud.google.com/bigquery/docs/reference/legacy-sql#dayofweek). It says that it returns "an integer between 1 (Sunday) and 7 (Saturday), inclusively". So, in 2015 most fatal motor accidents occur on Sunday and Saturday, while the fewest happen on Tuesday.

# Your Turn
Order by can make your results easier to interpret quickly. **[Try it yourself](https://www.kaggle.com/dansbecker/exercise-order-by)**.
