# Climate Data Fetch

This notebook downloads climate data from NOAA's FTP server for all stations listed in the stations file, and for the specified years.

## Parameters

**stations_file**: the name of the CSV listing all the Meteoroligcal stations to look for. The first column is
the index of station IDs from NOAA's Integrated Surface Database (ISD). See https://www.ncdc.noaa.gov/isd/data-access

**fetch_host**: the domain name of the FTP server where the files are.

**fetch_dir**: the pattern of the directory structure containing the files.

In [2]:
stations_file = "Salish Sea met stations.csv"
fetch_host = "ftp.ncei.noaa.gov"
fetch_dir = "/pub/data/noaa/isd-lite/{year}/"

ftplib is included with Python 3. Pandas is required.

In [3]:
import pandas as pd
from ftplib import FTP

Read the stations CSV into a Pandas dataframe, indexed by the ISD number (column 0)

In [4]:
stations = pd.read_csv(stations_file, index_col=0)
stations.head()

Unnamed: 0_level_0,Name
ISD number,Unnamed: 1_level_1
727935-24234,Boeing Field
727935-99999,Boeing Field King Co
727976-24217,Bellingham Airport
720749-24255,Whidbey Airport
727923-94225,Hoquiam Airport


Connect to the FTP server.

Note that the server has an inactivity timeout, so you may need to re-run this cell to reconnect if you get
a timeout message later on.

In [9]:
ftp = FTP(fetch_host)
ftp.login()



The main code. Each file is a single station (named by station ID) for a single year. So iterate over the years,
and download all the available station files. The files which are available are determined by retrieving an FTP file listing, running the function file_callback on each one to check if it is in the stations Dataframe. If it is, add it to the list `avail_files`, which then gets iterated on to download the files one by one.

In [6]:
def do_download(ftp, stations, years, save_dir):
    avail_files = []
    def file_callback(line):
        filename = " ".join(line.split()).split(" ")[-1]
        stationid = filename[0:12]
        if stationid in stations.index:
            avail_files.append(filename)

    for year in years:
        avail_files = []
        ftp.cwd(fetch_dir.format(year=year))
        ftp.retrlines('LIST', callback = file_callback)
        for f in avail_files:
            with open(save_dir + f, 'wb') as fp:
                ftp.retrbinary("RETR {0}".format(f), fp.write)

Download all the climate data for all stations in the DF since 1980.

This cell can take a while to run.

In [None]:
do_download(ftp, stations, range(1980, 2021), "data/all-since-1980/")

When the download is finished, all the files will be in the save_dir. Files are gzipped.
Processing of the files into a single comprehensive dataset is done in ProcessClimateData

## Long Run Fetch

Download all the data for the six stations identified in the long run stations file, as far back as we can reasonably go.

In [10]:
longrun_stations = pd.read_csv('met_stations_longrun.csv', index_col=0)
do_download(ftp, longrun_stations, range(1960, 2021), "data/long-run-1960/")