Weather Data Project

We'll experiment with historical weather data going back 50 years and sometimes more. We are going to do a lot. We are going to load station and temperature data from publicly available text files from the National Oceanic and Atmosphere Administration.

We're going to plot temperature data. And we realize to do so effectively, you first need to integrate missing data and to smooth them. We will compute the daily records at the given location and, finally, for your challenge, I will ask you to compare the warmest year of a cold location with the coldest year of a warm one.

**Downloading and Parsing data files**

1. Download a file over FTP
2. parse a space-separated file into a python dict

In [1]:
import numpy as np
import matplotlib.pyplot as pp
import seaborn

In [2]:
% matplotlib inline

In [5]:
# To download file using urllib

import urllib.request

urllib.request.urlretrieve('ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/ghcnd-stations.txt','stations.txt')  #second argument local name of the file

('stations.txt', <email.message.Message at 0x1a17f895f8>)

In [7]:
#Use readlines and use slicing to select the first few items in the list.

open('stations.txt','r').readlines()[:10]

['ACW00011604  17.1167  -61.7833   10.1    ST JOHNS COOLIDGE FLD                       \n',
 'ACW00011647  17.1333  -61.7833   19.2    ST JOHNS                                    \n',
 'AE000041196  25.3330   55.5170   34.0    SHARJAH INTER. AIRP            GSN     41196\n',
 'AEM00041194  25.2550   55.3640   10.4    DUBAI INTL                             41194\n',
 'AEM00041217  24.4330   54.6510   26.8    ABU DHABI INTL                         41217\n',
 'AEM00041218  24.2620   55.6090  264.9    AL AIN INTL                            41218\n',
 'AF000040930  35.3170   69.0170 3366.0    NORTH-SALANG                   GSN     40930\n',
 'AFM00040938  34.2100   62.2280  977.2    HERAT                                  40938\n',
 'AFM00040948  34.5660   69.2120 1791.3    KABUL INTL                             40948\n',
 'AFM00040990  31.5000   65.8500 1010.0    KANDAHAR AIRPORT                       40990\n']

Some stations are tagged as GSN. That's the GCOS Surface Network.
We'll concentrate on those. So let's gather some data from this text file. We go through, read all the lines. We skip those that do not have the GSN keywords.And we just collect the station names in a dictionary indexed by the station code. 

In [9]:
stations = {}

for line in open('stations.txt','r'):   #iterate through each line as strings
    if 'GSN' in line:
        fields = line.split()   #split line/string with GSN by whitespaces and assign resulting fields to a python list
        
        #use first item i.e, the station code as the key
        # use fifth and following items for the name of station, joined using a space between them
        
        stations[fields[0]] = ' '.join(fields[4:])

In [10]:
len(stations)

997

we should really concentrate on a few only. So let's write a function that lets us look for interesting patterns in the station name. We'll call it findstation(). And let's build a dictionary **found** using a comprehension of the station codes and names where the pattern that we're interested in is found within the name. Let's just print it.

In [11]:
def findstation(s):
    found = {code: name for code,name in stations.items() if s in name}
    print(found)

In [12]:
findstation('LIHUE')

{'USW00022536': 'HI LIHUE WSO AP 1020.1 GSN 91165'}


In [13]:
findstation('SAN DIEGO')

{'USW00023188': 'CA SAN DIEGO LINDBERGH FLD GSN 72290'}


In [14]:
findstation('MINNEAPOLIS')

{'USW00014922': 'MN MINNEAPOLIS/ST PAUL AP GSN HCN 72658'}


In [16]:
findstation('IRKUTSK')

{'RSM00030710': 'IRKUTSK GSN 30710'}


Next, we need to load the temperature data

1. Parsing a fixed-field text file using np.genfromtxt

2. Using ranges of NumPy datetime objects