# `clean_ireland_stations_metadata.ipynb`

### Author: Anthony Hein

#### Last updated: 10/18/2021

# Overview:

Clean the metadata collected on Ireland weather stations. This was obtained by manually pruning the dataset at [https://cli.fusio.net/cli/climate_data/webdata/StationDetails.csv](https://cli.fusio.net/cli/climate_data/webdata/StationDetails.csv) to only include those stations with hourly data for the past 30 years.

---

## Setup

In [1]:
import git
import os
from typing import List
from tqdm import tqdm
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
BASE_DIR = git.Repo(os.getcwd(), search_parent_directories=True).working_dir
BASE_DIR

'/Users/anthonyhein/Desktop/SML310/project'

---

## Load `ireland_stations_metadata.csv`

In [3]:
ireland_stations_metadata = pd.read_csv(f"{BASE_DIR}/raw/csv/ireland_stations_metadata.csv", low_memory=False) 
ireland_stations_metadata.head()

Unnamed: 0,County,Station Number,name,Height(m),Easting,Northing,Latitude,Longitude,Open Year,Close Year
0,Westmeath,2222,MULLINGAR S.W.S.,111,242700,252700,533120,72120,1943,1974.0
1,Monaghan,2437,CLONES,89,250000,326300,541100,71400,1950,2008.0
2,Galway,2021,GALWAY S.W.S.,20,132700,225600,531634,90034,1978,1990.0
3,Offaly,4919,BIRR,72,207400,204400,530525,75325,1954,2009.0
4,Kilkenny,3613,KILKENNY,65,249400,157400,523955,71610,1957,2008.0


In [4]:
ireland_stations_metadata.shape

(33, 10)

---

## Fix Close Year

First, we will replace all `Close Years` which are `NaN` with 2022, since we do not have racing data in 2022 and so this does not change the analysis.

In [5]:
sum(ireland_stations_metadata['Close Year'].isnull())

25

In [6]:
ireland_stations_metadata['Close Year'] = ireland_stations_metadata['Close Year'].fillna(2022)

In [7]:
sum(ireland_stations_metadata['Close Year'].isnull())

0

We also want to convert this to an integer.

In [8]:
ireland_stations_metadata.dtypes

County             object
Station Number      int64
name               object
Height(m)           int64
Easting             int64
Northing            int64
Latitude            int64
Longitude           int64
Open Year           int64
Close Year        float64
dtype: object

In [9]:
ireland_stations_metadata = ireland_stations_metadata.astype({'Close Year': 'int64'})

In [10]:
ireland_stations_metadata.dtypes

County            object
Station Number     int64
name              object
Height(m)          int64
Easting            int64
Northing           int64
Latitude           int64
Longitude          int64
Open Year          int64
Close Year         int64
dtype: object

---

## Fix Latitude and Longitude

For some reason, the latitude and longitude coordinates are represented without the decimal points or sign. We can correct this by inserting the decimal point through division (determined by manual inspection).

In [11]:
ireland_stations_metadata['Latitude'].head()

0    533120
1    541100
2    531634
3    530525
4    523955
Name: Latitude, dtype: int64

In [12]:
ireland_stations_metadata['Latitude'] = ireland_stations_metadata['Latitude'].map(lambda x: x / 10000)

In [13]:
ireland_stations_metadata['Latitude'].head()

0    53.3120
1    54.1100
2    53.1634
3    53.0525
4    52.3955
Name: Latitude, dtype: float64

In [14]:
ireland_stations_metadata['Longitude'].head()

0    72120
1    71400
2    90034
3    75325
4    71610
Name: Longitude, dtype: int64

In [15]:
ireland_stations_metadata['Longitude'] = ireland_stations_metadata['Longitude'].map(lambda x: - x / 10000)

In [16]:
ireland_stations_metadata['Longitude'].head()

0   -7.2120
1   -7.1400
2   -9.0034
3   -7.5325
4   -7.1610
Name: Longitude, dtype: float64

---

## Sanity Check

In [17]:
ireland_stations_metadata

Unnamed: 0,County,Station Number,name,Height(m),Easting,Northing,Latitude,Longitude,Open Year,Close Year
0,Westmeath,2222,MULLINGAR S.W.S.,111,242700,252700,53.312,-7.212,1943,1974
1,Monaghan,2437,CLONES,89,250000,326300,54.11,-7.14,1950,2008
2,Galway,2021,GALWAY S.W.S.,20,132700,225600,53.1634,-9.0034,1978,1990
3,Offaly,4919,BIRR,72,207400,204400,53.0525,-7.5325,1954,2009
4,Kilkenny,3613,KILKENNY,65,249400,157400,52.3955,-7.161,1957,2008
5,Limerick,611,FOYNES AIRPORT,4,124900,151400,52.363,-9.063,1937,1946
6,Cork,704,MIDLETON S.W.S.,11,185900,72500,51.5415,-8.122,1946,1955
7,Wexford,2615,Rosslare,26,313700,112200,52.15,-6.2005,1956,2008
8,Donegal,1575,MALIN HEAD,20,241939,458562,55.2219,-7.2021,2009,2022
9,Donegal,2075,FINNER,33,184300,360635,54.2938,-8.1435,2010,2022


---

## Save Dataframes

In [18]:
ireland_stations_metadata.to_csv(f"{BASE_DIR}/data/csv/ireland_stations_metadata.csv", index=False)

---