# Order by

## ORDER BY

`ORDER BY` is usually the last clause in your query, and it sorts the results returned by the rest of your query.

![https://i.imgur.com/6o9LuTA.png](https://i.imgur.com/6o9LuTA.png)

![https://i.imgur.com/ooxuzw3.png](https://i.imgur.com/ooxuzw3.png)

You can reverse the order using the `DESC` argument (short for 'descending'). 

![https://i.imgur.com/IElLJrR.png](https://i.imgur.com/IElLJrR.png)

## DATES

There are two ways that dates can be stored in BigQuery: as a `DATE` or as a `DATETIME`.

The `DATE` format has the year first, then the month, and then the day. It looks like this: `YYYY-[M]M-[D]D`.

The DATETIME format is with time added at the end.

## EXTRACT 

Often you'll want to look at part of a date, like the year or the day. You can do this with `EXTRACT`.

![https://i.imgur.com/vhvHIh0.png](https://i.imgur.com/vhvHIh0.png)

![https://i.imgur.com/PhoWBO0.png](https://i.imgur.com/PhoWBO0.png)

![https://i.imgur.com/A5hqGxY.png](https://i.imgur.com/A5hqGxY.png)


In [1]:
from google.cloud import bigquery
client = bigquery.Client()
dataset_ref = client.dataset("nhtsa_traffic_fatalities", project="bigquery-public-data")
dataset = client.get_dataset(dataset_ref)
table_ref = dataset_ref.table("accident_2015")
table = client.get_table(table_ref)
client.list_rows(table, max_results=5).to_dataframe()

Unnamed: 0,state_number,state_name,consecutive_number,number_of_vehicle_forms_submitted_all,number_of_motor_vehicles_in_transport_mvit,number_of_parked_working_vehicles,number_of_forms_submitted_for_persons_not_in_motor_vehicles,number_of_persons_not_in_motor_vehicles_in_transport_mvit,number_of_persons_in_motor_vehicles_in_transport_mvit,number_of_forms_submitted_for_persons_in_motor_vehicles,...,minute_of_ems_arrival_at_hospital,related_factors_crash_level_1,related_factors_crash_level_1_name,related_factors_crash_level_2,related_factors_crash_level_2_name,related_factors_crash_level_3,related_factors_crash_level_3_name,number_of_fatalities,number_of_drunk_drivers,timestamp_of_crash
0,19,Iowa,190204,1,1,0,0,0,1,1,...,2,0,,0,,0,,1,1,2015-09-11 20:20:00+00:00
1,19,Iowa,190233,1,1,0,0,0,1,1,...,88,0,,0,,0,,1,1,2015-11-01 00:30:00+00:00
2,19,Iowa,190179,1,1,0,0,0,2,2,...,1,0,,0,,0,,1,0,2015-05-04 16:18:00+00:00
3,19,Iowa,190248,1,1,0,0,0,4,4,...,99,0,,0,,0,,2,0,2015-11-17 12:26:00+00:00
4,19,Iowa,190231,1,1,0,0,0,1,1,...,88,0,,0,,0,,1,0,2015-10-31 04:49:00+00:00


In [2]:
# Query to find out the number of accidents for each day of the week
query = """
        SELECT COUNT(consecutive_number) AS num_accidents, 
               EXTRACT(DAYOFWEEK FROM timestamp_of_crash) AS day_of_week
        FROM `bigquery-public-data.nhtsa_traffic_fatalities.accident_2015`
        GROUP BY day_of_week
        ORDER BY num_accidents DESC
        """

# Set up the query (cancel the query if it would use too much of 
# your quota, with the limit set to 1 GB)
safe_config = bigquery.QueryJobConfig(maximum_bytes_billed=10**9)
query_job = client.query(query, job_config=safe_config)

# API request - run the query, and convert the results to a pandas DataFrame
accidents_by_day = query_job.to_dataframe()

# Print the DataFrame
accidents_by_day

Unnamed: 0,num_accidents,day_of_week
0,5659,7
1,5298,1
2,4916,6
3,4460,5
4,4182,4
5,4038,2
6,3985,3


## Exercise

In [1]:
from google.cloud import bigquery
 
client = bigquery.Client()
dataset_ref = client.dataset("world_bank_intl_education", project="bigquery-public-data")
dataset = client.get_dataset(dataset_ref)
table_ref = dataset_ref.table("international_education")
table = client.get_table(table_ref)
client.list_rows(table, max_results=5).to_dataframe()

Unnamed: 0,country_name,country_code,indicator_name,indicator_code,value,year
0,Europe & Central Asia (excluding high income),ECA,"Population, female (% of total)",SP.POP.TOTL.FE.ZS,52.157917,2016
1,High income,HIC,"Population, female (% of total)",SP.POP.TOTL.FE.ZS,50.224688,2016
2,Low income,LIC,"Labor force, female (% of total labor force)",SL.TLF.TOTL.FE.ZS,46.963309,2016
3,Middle East & North Africa,MEA,"Population, female (% of total)",SP.POP.TOTL.FE.ZS,48.29302,2016
4,Middle East & North Africa (excluding high inc...,MNA,"Population, female (% of total)",SP.POP.TOTL.FE.ZS,49.637938,2016


### 1) Government expenditure on education

Which countries spend the largest fraction of GDP on education? 

In [2]:
country_spend_pct_query = """
                          SELECT country_name, AVG(value) as avg_ed_spending_pct
                          FROM `bigquery-public-data.world_bank_intl_education.international_education`
                          WHERE indicator_code = 'SE.XPD.TOTL.GD.ZS' AND year BETWEEN 2010 and 2017
                          GROUP BY country_name
                          ORDER BY avg_ed_spending_pct DESC 
                          """
safe_config = bigquery.QueryJobConfig(maximum_bytes_billed=10**10)
country_spend_pct_query_job = client.query(country_spend_pct_query, job_config=safe_config)
country_spending_results = country_spend_pct_query_job.to_dataframe()
print(country_spending_results.head())

            country_name  avg_ed_spending_pct
0                   Cuba            12.837270
1  Micronesia, Fed. Sts.            12.467750
2        Solomon Islands            10.001080
3                Moldova             8.372153
4                Namibia             8.349610


### 2) Identify interesting codes to explore

Write a query below that selects the indicator code and indicator name for all codes with at least 175 rows in the year 2016.

In [3]:
code_count_query = """
    SELECT indicator_code, indicator_name, COUNT(1) as num_rows
    FROM `bigquery-public-data.world_bank_intl_education.international_education`
    WHERE year = 2016
    GROUP BY indicator_code, indicator_name
    HAVING num_rows >= 175
    ORDER BY num_rows DESC
"""
safe_config = bigquery.QueryJobConfig(maximum_bytes_billed=10**10)
code_count_query_job = client.query(code_count_query, job_config=safe_config)
code_count_results = code_count_query_job.to_dataframe()
print(code_count_results.head())

      indicator_code                   indicator_name  num_rows
0        SP.POP.GROW     Population growth (annual %)       232
1        SP.POP.TOTL                Population, total       232
2     IT.NET.USER.P2  Internet users (per 100 people)       223
3  SP.POP.TOTL.FE.ZS  Population, female (% of total)       213
4  SP.POP.TOTL.MA.ZS    Population, male (% of total)       213
