This series of Notebooks were fork from the Kaggle Scavernger Hunt - SQL Challenge.
I've modified and show my SQL 

In [29]:
# import package with helper functions 
import bq_helper

# create a helper object for this dataset
open_aq = bq_helper.BigQueryHelper(active_project="bigquery-public-data",
                                   dataset_name="openaq")

# print all the tables in this dataset (there's only one!)
open_aq.list_tables()

['global_air_quality']

I'm going to take a peek at the first couple of rows to help me see what sort of data is in this dataset.

In [30]:
# print the first couple rows of the "global_air_quality" dataset
open_aq.head("global_air_quality")

Unnamed: 0,location,city,country,pollutant,value,timestamp,unit,source_name,latitude,longitude,averaged_over_in_hours
0,Mobile_Cle Elum,037,US,pm25,0.0,2017-09-26 20:00:00+00:00,µg/m³,AirNow,47.19763,-120.95823,1.0
1,Mobile_WhiteSalmon,039,US,pm25,0.0,2017-09-26 20:00:00+00:00,µg/m³,AirNow,45.732414,-121.49233,1.0
2,Mobile_Newport,051,US,pm25,0.0,2017-09-21 18:00:00+00:00,µg/m³,AirNow,48.186485,-117.04916,1.0
3,FR20047,Ain,FR,no2,45.4,2018-02-13 07:00:00+00:00,µg/m³,EEA France,45.823223,4.953958,1.0
4,FR20047,Ain,FR,o3,2.13,2018-02-13 07:00:00+00:00,µg/m³,EEA France,45.823223,4.953958,1.0


Great, everything looks good! Now that I'm set up, I'm going to put together a query. I want to select all the values from the "city" column for the rows there the "country" column is "us" (for "United States"). 

> **What's up with the triple quotation marks (""")?** These tell Python that everything inside them is a single string, even though we have line breaks in it. The line breaks aren't necessary, but they do make it much easier to read your query.

In [31]:
# query to select all the items from the "city" column where the
# "country" column is "us"
query = """SELECT city
            FROM `bigquery-public-data.openaq.global_air_quality`
            WHERE country = 'US'
        """

> **Important:**  Note that the argument we pass to FROM is *not* in single or double quotation marks (' or "). It is in backticks (\`). If you use quotation marks instead of backticks, you'll get this error when you try to run the query: `Syntax error: Unexpected string literal` 

Now I can use this query to get information from our open_aq dataset. I'm using the `BigQueryHelper.query_to_pandas_safe()` method here because it won't run a query if it's larger than 1 gigabyte, which helps me avoid accidentally running a very large query. See the [Scavenger Hunt Handbook ](https://www.kaggle.com/rtatman/sql-scavenger-hunt-handbook/)for more details. 

In [32]:
# the query_to_pandas_safe will only return a result if it's less
# than one gigabyte (by default)
us_cities = open_aq.query_to_pandas_safe(query)

Now I've got a dataframe called us_cities, which I can use like I would any other dataframe:

In [33]:
# What five cities have the most measurements taken there?
us_cities.city.value_counts().head()

Phoenix-Mesa-Scottsdale                     85
Houston                                     83
Los Angeles-Long Beach-Santa Ana            60
New York-Northern New Jersey-Long Island    57
Riverside-San Bernardino-Ontario            56
Name: city, dtype: int64

# Scavenger hunt
___

Now it's your turn! Here's the questions I would like you to get the data to answer:

* Which countries use a unit other than ppm to measure any type of pollution? (Hint: to get rows where the value *isn't* something, use "!=")
* Which pollutants have a value of exactly 0?

In order to answer these questions, you can fork this notebook by hitting the blue "Fork Notebook" at the very top of this page (you may have to scroll up).  "Forking" something is making a copy of it that you can edit on your own without changing the original.

In [34]:
# Your code goes here :)
open_aq.list_tables()

['global_air_quality']

In [35]:
open_aq.head('global_air_quality')

Unnamed: 0,location,city,country,pollutant,value,timestamp,unit,source_name,latitude,longitude,averaged_over_in_hours
0,Mobile_Cle Elum,037,US,pm25,0.0,2017-09-26 20:00:00+00:00,µg/m³,AirNow,47.19763,-120.95823,1.0
1,Mobile_WhiteSalmon,039,US,pm25,0.0,2017-09-26 20:00:00+00:00,µg/m³,AirNow,45.732414,-121.49233,1.0
2,Mobile_Newport,051,US,pm25,0.0,2017-09-21 18:00:00+00:00,µg/m³,AirNow,48.186485,-117.04916,1.0
3,FR20047,Ain,FR,no2,45.4,2018-02-13 07:00:00+00:00,µg/m³,EEA France,45.823223,4.953958,1.0
4,FR20047,Ain,FR,o3,2.13,2018-02-13 07:00:00+00:00,µg/m³,EEA France,45.823223,4.953958,1.0


In [36]:
open_aq.table_schema('global_air_quality')

[SchemaField('location', 'string', 'NULLABLE', 'Location where data was measured', ()),
 SchemaField('city', 'string', 'NULLABLE', 'City containing location', ()),
 SchemaField('country', 'string', 'NULLABLE', 'Country containing measurement in 2 letter ISO code', ()),
 SchemaField('pollutant', 'string', 'NULLABLE', 'Name of the Pollutant being measured. Allowed values: PM25, PM10, SO2, NO2, O3, CO, BC', ()),
 SchemaField('value', 'float', 'NULLABLE', 'Latest measured value for the pollutant', ()),
 SchemaField('timestamp', 'timestamp', 'NULLABLE', 'The datetime at which the pollutant was measured, in ISO 8601 format', ()),
 SchemaField('unit', 'string', 'NULLABLE', 'The unit the value was measured in coded by UCUM Code', ()),
 SchemaField('source_name', 'string', 'NULLABLE', 'Name of the source of the data', ()),
 SchemaField('latitude', 'float', 'NULLABLE', 'Latitude in decimal degrees. Precision >3 decimal points.', ()),
 SchemaField('longitude', 'float', 'NULLABLE', 'Longitude in d

In [37]:
open_aq.head("global_air_quality", selected_columns="unit", num_rows=20)

Unnamed: 0,unit
0,µg/m³
1,µg/m³
2,µg/m³
3,µg/m³
4,µg/m³
5,µg/m³
6,µg/m³
7,µg/m³
8,µg/m³
9,µg/m³


In [38]:
query = """SELECT country
            FROM `bigquery-public-data.openaq.global_air_quality`
            WHERE unit != 'ppm'
        """
open_aq.estimate_query_(query)

AttributeError: 'BigQueryHelper' object has no attribute 'estimate_query_'

In [None]:
countris_not_use_ppm = open_aq.query_to_pandas(query)
countris_not_use_ppm.head()

In [None]:
countris_not_use_ppm = countris_not_use_ppm.drop_duplicates()

In [None]:
countris_not_use_ppm.head()

In [None]:
query2 = """
SELECT pollutant
FROM `bigquery-public-data.openaq.global_air_quality`
WHERE value = 0
"""
pullutants_with_zero_value = open_aq.query_to_pandas(query2)

In [None]:
pullutants_with_zero_value.describe()

In [None]:
pullutants_with_zero_value.pollutant.value_counts()

Please feel free to ask any questions you have in this notebook or in the [Q&A forums](https://www.kaggle.com/questions-and-answers)! 

Also, if you want to share or get comments on your kernel, remember you need to make it public first! You can change the visibility of your kernel under the "Settings" tab, on the right half of your screen.