# Dataset Information

This project uses open-source **Long term (1850-2010) Island of Ireland Precipitation (IIP) network Dataset**.

## Dataset Source
The rainfall dataset is publicly available at https://edepositireland.ie/handle/2262/76134

## Citation
If you use this dataset in your own work, please cite it as follows:  

> `S. Noone, C. Murphy, J. Coll, T. Matthews, D. Mullan, R.L. Wilby, S. Walsh, 'Long term (1850-2010) Island of Ireland Precipitation (IIP) network', [dataset], Met Éireann, 2015-09`

## Related article

> https://mural.maynoothuniversity.ie/id/eprint/8729/

## Notes
- The zip file contains data and metadata for the Long term (1850-2010) Island of Ireland Precipitation (IIP) network. 

- This project is for **research demonstration purposes only** and does not represent official analysis by the data provider.  

In [1]:
import os, zipfile
import pandas as pd
from PyPDF2 import PdfReader

In [2]:
# Path to your downloaded zip file
zip_path = "Long-Term-IIP-network 12042016.zip"   # change this to your actual filename

extract_to = "raw_data"

# Make sure the folder exists
os.makedirs(extract_to, exist_ok=True)

# Extract the zip
with zipfile.ZipFile(zip_path, 'r') as zip_ref:
    zip_ref.extractall(extract_to)

print(f"Extracted {zip_path} to {extract_to}")

Extracted Long-Term-IIP-network 12042016.zip to raw_data


### Checking the Readme file.

In [3]:
!cat raw_data/readme.txt

This zip file contains data and metadata for the Long term (1850-2010) Island of Ireland Precipitation (IIP) network.

There are 25 files, NAME.csv, containing monthly rainfall totals for each station in the network, each of these files also contains information on the station location and altitude.

The file IIP_National series.csv contains the rainfall averaged over the 25 stations.

The file IIP station metadata.pdf contains station metadata and details of how the series were constructed.

### Checking the pdf file.

In [4]:
pdf_path = "raw_data/IIP_station_metadata.pdf"
reader = PdfReader(pdf_path)

# Extract text from all pages
for i, page in enumerate(reader.pages):
    text = page.extract_text()
    print(f"--- Page {i+1} ---")
    print(text[:1000])  # show first 500 chars per page

--- Page 1 ---
1 
 Homogenisation and analysis of an expanded long-term monthly rainfall 
network for the Island of Ireland (1850 -2010)  
Supplementary Information  
SI 1.0 Bridging of discontinuous stations  
Bridging was required for three of the new series and ten stations in the CRU archive  because  of 
station clos ures/moves . Bridging was undertaken  using seasonal regression on overlapping records to 
derive adjustment factors. For each station, details on derived regressions and adjustment factors are 
given in SI Table 1. All regression models were significant at the 0.05 level. For these stations 
appropriate bridging stations could be found in close proximity. The poorest regressions were derived 
for Roches Point, Derry, Belfast and Ardara where a lack of suitable local bridgi ng stations meant 
candidates were derived from further afield. However, with the exception of Enniscorthy , all 
seasonally derived correction factors are <10% but typically much lower  (SI Table 

In [5]:
# Getting a list of all files that are extracted.

all_files = []
for root, dirs, files in os.walk(extract_to):
    for f in files:
        all_files.append(os.path.join(root, f))

len(all_files), all_files

(28,
 ['raw_data/Cork Airport.csv',
  'raw_data/Killarney.csv',
  'raw_data/Waterford.csv',
  'raw_data/Cappoquinn.csv',
  'raw_data/Dublin Airport.csv',
  'raw_data/Phoenix Park.csv',
  'raw_data/Strokestown.csv',
  'raw_data/Athboy.csv',
  'raw_data/Belfast.csv',
  'raw_data/Enniscorthy.csv',
  'raw_data/Shannon Airport.csv',
  'raw_data/Malin Head.csv',
  'raw_data/Derry.csv',
  'raw_data/Birr.csv',
  'raw_data/Mullingar.csv',
  'raw_data/Valentia.csv',
  'raw_data/IIP_station_metadata.pdf',
  'raw_data/Rathdrum.csv',
  'raw_data/Roches Point.csv',
  'raw_data/Markree Castle.csv',
  'raw_data/readme.txt',
  'raw_data/University College Galway.csv',
  'raw_data/IIP_National series.csv',
  'raw_data/Foulkesmills.csv',
  'raw_data/Armagh.csv',
  'raw_data/Portlaw.csv',
  'raw_data/Ardara.csv',
  'raw_data/Drumsna.csv'])

In [6]:
# Saving all file names, to use in different notebooks.

with open("my_list.txt", "w") as f: 
    for item in all_files:
        f.write(item + "\n")

In [7]:
# Example file

csv_path = "raw_data/ardara.csv"

# Load CSV into a DataFrame
df = pd.read_csv(csv_path)

df

Unnamed: 0,Easting,Northings,Latitude,Longitude,elevation (metres),Station,County,Unnamed: 7,Unnamed: 8,Unnamed: 9,Unnamed: 10,Unnamed: 11,Unnamed: 12
0,180787.66,394679.05,54.7996033,-8.2994908,15,Ardara,Donegal,,,,,,
1,Year,Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec
2,1850,169,162.4,62.2,159.5,101.3,100.4,134.4,156.1,137.6,120,173,117.7
3,1851,236.4,112.9,104.2,78.1,61.1,147.4,160.8,181.2,106,189.2,109.8,103.4
4,1852,249.7,191,50.8,67.7,72.3,246.5,123.2,167.3,85.5,122.1,329.5,339.5
...,...,...,...,...,...,...,...,...,...,...,...,...,...
158,2006,103.2,103.1,146.6,90.3,114.1,58.9,104.7,95.9,157.8,175.9,214.9,325
159,2007,226.5,137.3,132.3,41.3,144,126.2,215.4,142.2,124.6,122.3,134.5,205.6
160,2008,217.6,157.6,219,74.5,31.7,145.3,82.3,174.7,161.6,296.2,184.2,183.2
161,2009,287.6,56.9,114.6,155.2,134,69.5,218.2,249.7,97.6,157.5,329.1,104.7


In [8]:
# NAN value check across all files.

all_files_null_count = {}

for file_name in all_files:
    null_count = {}
    if ".csv" in file_name:
        df = pd.read_csv(file_name)
        for col_name in df.columns:
            null_count[col_name] = df[col_name].isnull().sum()
        all_files_null_count[file_name] = null_count
    
all_files_null_count

{'raw_data/Cork Airport.csv': {'Easting ': np.int64(0),
  'Northings': np.int64(0),
  'Latitude': np.int64(0),
  'Longitude': np.int64(0),
  'elevation (metres)': np.int64(0),
  'Station': np.int64(0),
  'County': np.int64(0),
  'Unnamed: 7': np.int64(1),
  'Unnamed: 8': np.int64(1),
  'Unnamed: 9': np.int64(1),
  'Unnamed: 10': np.int64(1),
  'Unnamed: 11': np.int64(1),
  'Unnamed: 12': np.int64(1)},
 'raw_data/Killarney.csv': {'Easting ': np.int64(0),
  'Northings': np.int64(0),
  'Latitude': np.int64(0),
  'Longitude': np.int64(0),
  'elevation (metres)': np.int64(0),
  'Station': np.int64(0),
  'County': np.int64(0),
  'Unnamed: 7': np.int64(1),
  'Unnamed: 8': np.int64(1),
  'Unnamed: 9': np.int64(1),
  'Unnamed: 10': np.int64(1),
  'Unnamed: 11': np.int64(1),
  'Unnamed: 12': np.int64(1)},
 'raw_data/Waterford.csv': {'Easting ': np.int64(0),
  'Northings': np.int64(0),
  'Latitude': np.int64(0),
  'Longitude': np.int64(0),
  'elevation (metres)': np.int64(0),
  'Station': np.int6

### In each file, the columns with names starting with `Unnamed` contain one apparent missing value, which is not actually missing data but rather a result of the table’s structure. Example shown below,

In [9]:
all_files_null_count['raw_data/IIP_National series.csv']

{'Island of Ireland precipitation (IIP) National series (averaged monthly precipitation of the 25 station series)': np.int64(1),
 'Unnamed: 1': np.int64(1),
 'Unnamed: 2': np.int64(1),
 'Unnamed: 3': np.int64(1),
 'Unnamed: 4': np.int64(1),
 'Unnamed: 5': np.int64(1),
 'Unnamed: 6': np.int64(1),
 'Unnamed: 7': np.int64(1),
 'Unnamed: 8': np.int64(1),
 'Unnamed: 9': np.int64(1),
 'Unnamed: 10': np.int64(1),
 'Unnamed: 11': np.int64(1),
 'Unnamed: 12': np.int64(1)}

In [10]:
# Load CSV into a DataFrame
df = pd.read_csv("raw_data/IIP_National series.csv")

df

Unnamed: 0,Island of Ireland precipitation (IIP) National series (averaged monthly precipitation of the 25 station series),Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,Unnamed: 10,Unnamed: 11,Unnamed: 12
0,,,,,,,,,,,,,
1,Year,Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec
2,1850,113.5,86.4,38.6,138.3,59.7,54.5,103.4,76.3,70.5,62.7,113.1,104
3,1851,195,55.2,79.9,54.7,45,101.5,90.3,96.6,49.4,113.4,45.4,68.8
4,1852,161.4,92.4,44.6,44.4,70.2,179.4,68.6,110.1,57.3,88.7,223.2,208.2
...,...,...,...,...,...,...,...,...,...,...,...,...,...
158,2006,57.6,54.6,120.3,47.3,126.2,29.8,55.9,70.8,162.6,145.6,142.3,170.5
159,2007,107.9,107.8,80.3,26.9,65.9,141.3,133,100.4,52.8,52.2,75,123.1
160,2008,174.3,58,113.5,45.1,42.3,101.9,123.8,175.9,108.2,152.4,80.4,73.7
161,2009,152.9,36.5,48.7,120.9,90.7,73.2,174.7,148.4,49.4,129.9,244.9,105.4
