# **Injesting Global Surface Temperature Data into BigQuery**
This procedure will load the global surface temperature data in the five .csv files into BigQuery. The files are presently already uploaded into GCP but need to be split into tables through the following process. The data was obtained from Kaggle (https://www.kaggle.com/berkeleyearth/climate-change-earth-surface-temperature-data#GlobalTemperatures.csv)


In [1]:
#name the dataset
dataset_id = "kaggle_staging"

In [2]:
#create the actual dataset using bq cli
!bq --location=US mk --dataset {dataset_id}

Dataset 'electric-spark-266716:kaggle_staging' successfully created.


## **Global Land Temperatures by City**

In [3]:
#load the city data and create the table
!bq --location=US load --autodetect --skip_leading_rows=1 \
--source_format=CSV {dataset_id}.Global_Land_Temperatures_by_City\
"gs://global_surface_temperatures/global_surface_temperatures_dataset/GlobalLandTemperaturesByCity.csv"

Waiting on bqjob_r3e827a8b38fe96a1_000001700e2c0c9d_1 ... (65s) Current status: DONE   


### Check the contents of the table by drawing 10 sample records

In [21]:
%%bigquery
SELECT * FROM kaggle_staging.Global_Land_Temperatures_by_City limit 10

Unnamed: 0,dt,AverageTemperature,AverageTemperatureUncertainty,City,Country,Latitude,Longitude
0,1846-11-01,18.032,2.435,Adelaide,Australia,34.56S,138.16E
1,1853-03-01,18.621,1.895,Adelaide,Australia,34.56S,138.16E
2,1856-02-01,21.789,2.249,Adelaide,Australia,34.56S,138.16E
3,1867-08-01,11.281,1.13,Adelaide,Australia,34.56S,138.16E
4,1874-07-01,8.758,0.616,Adelaide,Australia,34.56S,138.16E
5,1904-04-01,18.426,0.351,Adelaide,Australia,34.56S,138.16E
6,1910-02-01,23.4,0.843,Adelaide,Australia,34.56S,138.16E
7,1928-06-01,10.649,0.228,Adelaide,Australia,34.56S,138.16E
8,1939-02-01,22.783,0.356,Adelaide,Australia,34.56S,138.16E
9,1942-12-01,21.319,0.54,Adelaide,Australia,34.56S,138.16E


### Exploratory Query
#### This query generates the temperature average across the entire time period for cities in the US, Canada, and Mexico and lists them as those from the coldest to hottest.

In [6]:
%%bigquery
SELECT City,Country,AVG(AverageTemperature) as TotalAvg
FROM kaggle_staging.Global_Land_Temperatures_by_City
WHERE Country = "United States" or Country = "Mexico" or Country = "Canada"
GROUP BY City,Country
ORDER BY AVG(AverageTemperature)

Unnamed: 0,City,Country,TotalAvg
0,Anchorage,United States,-2.301646
1,Winnipeg,Canada,1.077861
2,Quebec,Canada,1.093973
3,Saskatoon,Canada,1.240279
4,Edmonton,Canada,1.368643
...,...,...,...
365,Campeche,Mexico,26.052384
366,Carmen,Mexico,26.129721
367,Acapulco,Mexico,26.162964
368,Chetumal,Mexico,26.609823


## **Global Land Temperatures by Country**

In [4]:
#load the country data and create the table
!bq --location=US load --autodetect --skip_leading_rows=1 \
--source_format=CSV {dataset_id}.Global_Land_Temperatures_by_Country\
"gs://global_surface_temperatures/global_surface_temperatures_dataset/GlobalLandTemperaturesByCountry.csv"

Waiting on bqjob_r7d1503b383b1b062_000001700e2e861d_1 ... (8s) Current status: DONE   


### Check the contents of the table by drawing 10 sample records

In [20]:
%%bigquery
SELECT * FROM kaggle_staging.Global_Land_Temperatures_by_Country limit 10

Unnamed: 0,dt,AverageTemperature,AverageTemperatureUncertainty,Country
0,1818-03-01,,,Asia
1,1819-05-01,,,Asia
2,1820-01-01,,,Asia
3,2013-09-01,,,Asia
4,1818-04-01,9.478,4.0,Asia
5,1949-10-01,8.773,0.125,Asia
6,1966-04-01,7.564,0.125,Asia
7,1975-12-01,-5.09,0.125,Asia
8,1978-03-01,1.804,0.125,Asia
9,1979-11-01,0.206,0.125,Asia


### Exploratory Queries
#### This finds the number of countries represented within the table.

In [30]:
%%bigquery
SELECT count(DISTINCT country) as num_countries 
FROM kaggle_staging.Global_Land_Temperatures_by_Country

Unnamed: 0,num_countries
0,243


#### The below query lists the date, country, and average temperatures all above 15 degrees, ordered by hottest locations to coolest.

In [1]:
%%bigquery
SELECT dt, Country, AverageTemperature
FROM kaggle_staging.Global_Land_Temperatures_by_Country
WHERE AverageTemperature > 15
ORDER BY AverageTemperature DESC

Unnamed: 0,dt,Country,AverageTemperature
0,2012-07-01,Kuwait,38.842
1,2000-07-01,Kuwait,38.705
2,2010-07-01,Kuwait,38.495
3,1998-08-01,Kuwait,38.436
4,2000-08-01,Kuwait,38.315
...,...,...,...
352749,1946-08-01,North America,15.001
352750,2005-09-01,Czech Republic,15.001
352751,1853-02-01,Taiwan,15.001
352752,1909-06-01,Czech Republic,15.001


## **Global Land Temperatures by Major City**

In [5]:
#load the major city data and create the table
!bq --location=US load --autodetect --skip_leading_rows=1 \
--source_format=CSV {dataset_id}.Global_Land_Temperatures_by_Major_City\
"gs://global_surface_temperatures/global_surface_temperatures_dataset/GlobalLandTemperaturesByMajorCity.csv"

Waiting on bqjob_r99046d38ac7f8bc_000001700e30285a_1 ... (5s) Current status: DONE   


### Check the contents of the table by drawing 10 sample records

In [22]:
%%bigquery
SELECT * FROM kaggle_staging.Global_Land_Temperatures_by_Major_City limit 10

Unnamed: 0,dt,AverageTemperature,AverageTemperatureUncertainty,City,Country,Latitude,Longitude
0,1743-11-01,7.541,1.753,London,United Kingdom,52.24N,0.00W
1,1744-04-01,8.296,2.501,London,United Kingdom,52.24N,0.00W
2,1744-05-01,10.966,1.471,London,United Kingdom,52.24N,0.00W
3,1744-06-01,14.522,1.552,London,United Kingdom,52.24N,0.00W
4,1744-07-01,15.964,1.646,London,United Kingdom,52.24N,0.00W
5,1744-09-01,13.064,1.628,London,United Kingdom,52.24N,0.00W
6,1744-10-01,9.597,1.651,London,United Kingdom,52.24N,0.00W
7,1744-11-01,6.55,1.526,London,United Kingdom,52.24N,0.00W
8,1744-12-01,3.752,1.84,London,United Kingdom,52.24N,0.00W
9,1745-01-01,2.548,1.843,London,United Kingdom,52.24N,0.00W


### Exploratory Queries
#### This query returns a list of cities and dates with an average temperature uncertainty of greater than 1 degree Celsius, ordered from greatest to least uncertainty.

In [2]:
%%bigquery
SELECT dt, AverageTemperatureUncertainty, City
FROM kaggle_staging.Global_Land_Temperatures_by_Major_City
WHERE AverageTemperatureUncertainty > 1
ORDER BY AverageTemperatureUncertainty DESC

Unnamed: 0,dt,AverageTemperatureUncertainty,City
0,1770-04-01,14.037,Berlin
1,1768-01-01,13.971,Berlin
2,1768-01-01,13.560,London
3,1768-01-01,13.224,Paris
4,1758-10-01,13.170,Berlin
...,...,...,...
76363,1859-08-01,1.001,Nanjing
76364,1875-08-01,1.001,Changchun
76365,1884-03-01,1.001,Changchun
76366,1920-01-01,1.001,Seoul


## **Global Land Temperatures by State**

In [6]:
#load the state data and create the table
!bq --location=US load --autodetect --skip_leading_rows=1 \
--source_format=CSV {dataset_id}.Global_Land_Temperatures_by_State\
"gs://global_surface_temperatures/global_surface_temperatures_dataset/GlobalLandTemperaturesByState.csv"

Waiting on bqjob_r436befc018cab104_000001700e31f39c_1 ... (15s) Current status: DONE   


### Check the contents of the table by drawing 10 sample records

In [23]:
%%bigquery
SELECT * FROM kaggle_staging.Global_Land_Temperatures_by_State limit 10

Unnamed: 0,dt,AverageTemperature,AverageTemperatureUncertainty,State,Country
0,1841-01-01,0.581,2.745,Anhui,China
1,1841-02-01,2.716,1.832,Anhui,China
2,1841-03-01,6.871,1.852,Anhui,China
3,1841-04-01,13.748,2.2,Anhui,China
4,1841-05-01,19.662,1.58,Anhui,China
5,1841-06-01,23.152,3.565,Anhui,China
6,1841-07-01,27.278,2.381,Anhui,China
7,1841-08-01,26.07,2.379,Anhui,China
8,1841-09-01,21.177,1.802,Anhui,China
9,1841-10-01,15.574,1.906,Anhui,China


### Exploratory Queries
#### This query computes the average temperature uncertainty of each state in the United States (and Washington, D.C.) across the entire time period and orders them from lowest to highest average temperature uncertainty.

In [11]:
%%bigquery
SELECT AVG(AverageTemperatureUncertainty) as TemperatureUncertainty, State, Country
FROM kaggle_staging.Global_Land_Temperatures_by_State
WHERE Country = "United States"
GROUP BY State, Country
ORDER BY AVG(AverageTemperatureUncertainty)

Unnamed: 0,TemperatureUncertainty,State,Country
0,0.403861,Hawaii,United States
1,0.57996,California,United States
2,0.725007,Arizona,United States
3,0.735509,Washington,United States
4,0.761608,Oregon,United States
5,0.794785,Texas,United States
6,0.796113,Nevada,United States
7,0.810993,Utah,United States
8,0.824798,Oklahoma,United States
9,0.825624,New Mexico,United States


#### This query checks to make sure that the 50 states and Washington, D.C. are within the table

In [10]:
%%bigquery
SELECT DISTINCT State
FROM kaggle_staging.Global_Land_Temperatures_by_State
WHERE Country = "United States"

Unnamed: 0,State
0,Alabama
1,Alaska
2,Arizona
3,Arkansas
4,California
5,Colorado
6,Connecticut
7,Delaware
8,District Of Columbia
9,Florida


## **Global Temperatures**

In [7]:
#load the global temperatures data and create the table
!bq --location=US load --autodetect --skip_leading_rows=1 \
--source_format=CSV {dataset_id}.Global_Temperatures\
"gs://global_surface_temperatures/global_surface_temperatures_dataset/GlobalTemperatures.csv"

Waiting on bqjob_r7d52ae5a7127b4ae_000001700e352663_1 ... (3s) Current status: DONE   


### Check the contents of the table by drawing 10 sample records

In [24]:
%%bigquery
SELECT * FROM kaggle_staging.Global_Temperatures limit 10

Unnamed: 0,dt,LandAverageTemperature,LandAverageTemperatureUncertainty,LandMaxTemperature,LandMaxTemperatureUncertainty,LandMinTemperature,LandMinTemperatureUncertainty,LandAndOceanAverageTemperature,LandAndOceanAverageTemperatureUncertainty
0,1750-01-01,3.034,3.574,,,,,,
1,1750-02-01,3.083,3.702,,,,,,
2,1750-03-01,5.626,3.076,,,,,,
3,1750-04-01,8.49,2.451,,,,,,
4,1750-05-01,11.573,2.072,,,,,,
5,1750-06-01,12.937,1.724,,,,,,
6,1750-07-01,15.868,1.911,,,,,,
7,1750-08-01,14.75,2.231,,,,,,
8,1750-09-01,11.413,2.637,,,,,,
9,1750-10-01,6.367,2.668,,,,,,


### Exploratory Queries
#### This query will list the date, LandAverageTemperature, and LandAndOceanAverageTemperature and order them in ascending order by the LandAverageTemperature for dates from 1830 onwards

In [14]:
%%bigquery
SELECT dt, LandAverageTemperature, LandAndOceanAverageTemperature
FROM kaggle_staging.Global_Temperatures
WHERE dt >= "1830-01-01"
ORDER BY LandAverageTemperature 


Unnamed: 0,dt,LandAverageTemperature,LandAndOceanAverageTemperature
0,1838-01-01,-0.557,
1,1834-01-01,0.334,
2,1861-01-01,0.404,12.475
3,1893-01-01,0.500,12.702
4,1848-01-01,0.510,
...,...,...,...
2227,2007-07-01,15.230,17.485
2228,2009-07-01,15.231,17.578
2229,1998-07-01,15.340,17.609
2230,2002-07-01,15.354,17.487000000000002
