# Get Started
Fork this notebook by hitting the blue "Fork Notebook" button at the top of this page.  "Forking" makes a copy that you can edit on your own without changing the original.

After forking this notebook, run the code in the following cell.

In [1]:
# import package with helper functions 
import bq_helper

# create a helper object for this dataset
open_aq = bq_helper.BigQueryHelper(active_project="bigquery-public-data",
                                   dataset_name="openaq")

# print all the tables in this dataset (there's only one!)
open_aq.list_tables()

['global_air_quality']

Then write and run the code to answer the questions below.

# Question

#### 1) Which countries use a unit other than ppm to measure any type of pollution? 
(Hint: to get rows where the value *isn't* something, use "!=")

In [2]:
open_aq.table_schema(open_aq.list_tables()[0])

[SchemaField('location', 'STRING', 'NULLABLE', 'Location where data was measured', ()),
 SchemaField('city', 'STRING', 'NULLABLE', 'City containing location', ()),
 SchemaField('country', 'STRING', 'NULLABLE', 'Country containing measurement in 2 letter ISO code', ()),
 SchemaField('pollutant', 'STRING', 'NULLABLE', 'Name of the Pollutant being measured. Allowed values: PM25, PM10, SO2, NO2, O3, CO, BC', ()),
 SchemaField('value', 'FLOAT', 'NULLABLE', 'Latest measured value for the pollutant', ()),
 SchemaField('timestamp', 'TIMESTAMP', 'NULLABLE', 'The datetime at which the pollutant was measured, in ISO 8601 format', ()),
 SchemaField('unit', 'STRING', 'NULLABLE', 'The unit the value was measured in coded by UCUM Code', ()),
 SchemaField('source_name', 'STRING', 'NULLABLE', 'Name of the source of the data', ()),
 SchemaField('latitude', 'FLOAT', 'NULLABLE', 'Latitude in decimal degrees. Precision >3 decimal points.', ()),
 SchemaField('longitude', 'FLOAT', 'NULLABLE', 'Longitude in d

In [8]:
query="SELECT distinct country FROM `bigquery-public-data.openaq.global_air_quality` WHERE pollutant != 'ppm'"
open_aq.estimate_query_size(query)


0.0001390436664223671

In [9]:
open_aq.query_to_pandas_safe(query, max_gb_scanned=0.1)

Unnamed: 0,country
0,US
1,FR
2,TH
3,IN
4,NL
5,GB
6,CH
7,TR
8,PT
9,ES


#### 2) Which pollutants have a value of exactly 0?

In [11]:
query="SELECT distinct pollutant FROM `bigquery-public-data.openaq.global_air_quality` WHERE value = 0"
open_aq.estimate_query_size(query)

open_aq.query_to_pandas_safe(query, max_gb_scanned=0.1)


Unnamed: 0,pollutant
0,pm25
1,so2
2,o3
3,pm10
4,no2
5,co
6,bc


# Keep Going
After finishing this exercise, click [here](https://www.kaggle.com/dansbecker/group-by-having-count/).  You will learn about the **GROUP BY** command and its extensions.  This is especially valuable in large datasets like what you find in BigQuery.

# Help and Feedback 
Bring any comments or questions to the [Learn Discussion Forum](https://www.kaggle.com/learn-forum).

If you want comments or help on your code, make it "public" first using the "Settings" tab on this page.

---

*This tutorial is part of the [SQL Series](https://www.kaggle.com/learn/sql) on Kaggle Learn.*