![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

<a href="https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fcallysto%2Fcurriculum-notebooks&branch=master&subPath=Mathematics/StatisticsProject/AccessingData/climate-monthly.ipynb&depth=1" target="_parent"><img src="https://raw.githubusercontent.com/callysto/curriculum-notebooks/master/open-in-callysto-button.svg?sanitize=true" width="123" height="24" alt="Open in Callysto"/></a>

# Monthly Climate Data

## Temperature

We can get global surface temperature data from 1880 to the present from [NASA GISS (Goddard Institute for Space Studies)](https://data.giss.nasa.gov/gistemp/) averaged by month, season, or year.

In [None]:
global_temperature_url = 'https://data.giss.nasa.gov/gistemp/tabledata_v4/GLB.Ts+dSST.csv'

import pandas as pd
df = pd.read_csv(global_temperature_url, skiprows=1) # skip the first row in global_temperature_url CSV file
df

To look at just the data averaged by month we can drop the other columns.

In [None]:
dfm = df.drop(columns=['J-D', 'D-N', 'DJF', 'MAM', 'JJA', 'SON'])
dfm

There are also many other [data sets available](https://data.giss.nasa.gov/gistemp), such as average surface temperatures by zone.

In [None]:
zonal_temperatures_url = 'https://data.giss.nasa.gov/gistemp/tabledata_v4/ZonAnn.Ts+dSST.csv'

dfz = pd.read_csv(zonal_temperatures_url) # this one doesn't need skiprows
dfz

## Precipitation

For precipitation we'll look at raw weather station data from [Global Historical Climatology Network monthly (GHCNm)](https://www.ncei.noaa.gov/products/land-based-station/global-historical-climatology-network-monthly).

This is a large dataset, so it will take a minute to download and decompress.

In [None]:
import requests
import gzip
import os
r = requests.get('https://www.ncei.noaa.gov/pub/data/ghcn/v2/v2.prcp_adj.gz')
with open('v2.prcp.gz', 'wb') as file: # save the downloaded file
    file.write(r.content)
with gzip.open('v2.prcp.gz') as file: # decompress
    precipitation = file.read()
with open('v2.prcp', 'wb') as file: # save the decompressed file
    file.write(precipitation)
os.remove('v2.prcp.gz') # delete the downloaded file

Next we read that large file into a dataframe and delete the file. Again, this will take a minute.

In [None]:
dfp = pd.read_fwf('v2.prcp', header=None, colspecs=[(0,16), (17,21), (22,26), (27,31), (32,36), (37,41), (42,46), (47,51), (52,56), (57,61), (62,66), (67,71), (72,76)])
#os.remove('v2.prcp') # put a  #  in front of this line to avoid deleting the file
dfp

Column `0` of the dataframe [defines](https://www.ncei.noaa.gov/pub/data/ghcn/v2/v2.prcp.readme) the station number and year. The first three digits are the [country code](https://www.ncei.noaa.gov/pub/data/ghcn/v2/v2.country.codes), the next five are the [World Meteorological Organization Station Identifier](https://community.wmo.int/wigos-station-identifier), the next four are the modifier and duplicate numbers (which we'll ignore), and the last four are the year.

We'll slice that column into individual columns, and also rename Columns `1` through `12` as months.

In [None]:
dfp.columns = [0, 'January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
dfp['Country Code'] = dfp[0].astype(str).str.slice(0,3).astype(int)
dfp['WSI'] = dfp[0].astype(str).str.slice(3,8).astype(int)
dfp['Modifier'] = dfp[0].astype(str).str.slice(8,12).astype(int)
dfp['Year'] = dfp[0].astype(str).str.slice(12,16).astype(int)
dfp.drop(columns=[0], inplace=True)
dfp

To convert the country code to a name, we'll reference the [Country Codes file](https://www.ncei.noaa.gov/pub/data/ghcn/v2/v2.country.codes).

In [None]:
country_codes = pd.read_fwf('https://www.ncei.noaa.gov/pub/data/ghcn/v2/v2.country.codes', header=None, names=['Country Code','Country'])
precipitation = pd.merge(dfp, country_codes, on='Country Code', how='left')
precipitation

We now have a dataframe called `precipitation`. To do some final cleaning we'll replace trace (`8888`) with `0` and remove rows with missing values (`9999`).

In [None]:
precipitation.replace(to_replace=8888, value=0, inplace=True)
from numpy import NaN
precipitation.replace(to_replace=9999, value=NaN, inplace=True)
precipitation.dropna(axis=0, how='any', inplace=True) # drop any rows containing NaN
precipitation

[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)