# ORDER BY
ORDER BY is usually the last clause in your query, and it sorts the results returned by the rest of your query.

# Dates
There are two ways that dates can be stored in BigQuery: as a **DATE** or as a **DATETIME**.

The **DATE** format has the year first, then the month, and then the day. It looks like this: YYYY-[M]M-[D]D

The **DATETIME** format is like the date format ... but with time added at the end.

# Example: Which day of the week has the most fatal motor accidents?
Let's use the US Traffic Fatality Records database, which contains information on traffic accidents in the US where at least one person died.

We'll investigate the **accident_2015** table.

In [2]:
from google.cloud import bigquery

client = bigquery.Client()
dataset_ref=client.dataset("nhtsa_traffic_fatalities",project="bigquery-public-data")

dataset=client.get_dataset(dataset_ref)

table_ref=dataset_ref.table("accident_2015")
table=client.get_table(table_ref)
client.list_rows(table,max_results=5).to_dataframe()

Using Kaggle's public dataset BigQuery integration.


  # Remove the CWD from sys.path while we load stuff.


Unnamed: 0,state_number,state_name,consecutive_number,number_of_vehicle_forms_submitted_all,number_of_motor_vehicles_in_transport_mvit,number_of_parked_working_vehicles,number_of_forms_submitted_for_persons_not_in_motor_vehicles,number_of_persons_not_in_motor_vehicles_in_transport_mvit,number_of_persons_in_motor_vehicles_in_transport_mvit,number_of_forms_submitted_for_persons_in_motor_vehicles,...,minute_of_ems_arrival_at_hospital,related_factors_crash_level_1,related_factors_crash_level_1_name,related_factors_crash_level_2,related_factors_crash_level_2_name,related_factors_crash_level_3,related_factors_crash_level_3_name,number_of_fatalities,number_of_drunk_drivers,timestamp_of_crash
0,19,Iowa,190257,1,1,0,0,0,3,3,...,99,0,,0,,0,,1,0,2015-12-04 12:42:00+00:00
1,19,Iowa,190195,1,1,0,0,0,1,1,...,88,0,,0,,0,,1,0,2015-09-14 02:06:00+00:00
2,19,Iowa,190122,1,1,0,0,0,1,1,...,31,0,,0,,0,,1,0,2015-07-12 22:13:00+00:00
3,19,Iowa,190205,2,2,0,0,0,3,3,...,99,0,,0,,0,,1,0,2015-09-26 12:48:00+00:00
4,19,Iowa,190239,1,1,0,1,1,1,1,...,46,19,Recent Previous Crash Scene Nearby (Since 1989),0,,0,,1,0,2015-10-26 23:12:00+00:00


Let's use the table to determine how the number of accidents varies with the day of the week

In [3]:
query="""
SELECT EXTRACT(DAYOFWEEK FROM timestamp_of_crash) AS day_of_week,
COUNT(consecutive_number) AS num_accidents
FROM `bigquery-public-data.nhtsa_traffic_fatalities.accident_2015`
GROUP BY day_of_week
ORDER BY num_accidents DESC
"""

In [4]:
# Set up the query (cancel the query if it would use too much of 
# your quota, with the limit set to 1 GB)
safe_config = bigquery.QueryJobConfig(maximum_bytes_billed=10**9)
query_job = client.query(query, job_config=safe_config)

# API request - run the query, and convert the results to a pandas DataFrame
accidents_by_day = query_job.to_dataframe()

# Print the DataFrame
accidents_by_day

  "Cannot create BigQuery Storage client, the dependency "


Unnamed: 0,day_of_week,num_accidents
0,7,5659
1,1,5298
2,6,4916
3,5,4460
4,4,4182
5,2,4038
6,3,3985


we notice that it returns "an integer between 1 (Sunday) and 7 (Saturday), inclusively". 
So, in 2015, most fatal motor accidents in the US occured on **Sunday** and **Saturday**, while the fewest happened on Tuesday.