# Fetching Data for the Project

- [Pandas](https://pandas.pydata.org/docs/)
- [Requests: HTTP for Humans™](https://requests.readthedocs.io/en/latest/)
- [io — Core tools for working with streams](https://docs.python.org/3/library/io.html)
- [os — Miscellaneous operating system interfaces](https://docs.python.org/3/library/os.html)

In [1]:
import pandas as pd
import requests
from io import StringIO
import os

***

## Selection of Data for Analysis

Based on my research and exploration of available data sources, I decided to take a broader approach and conduct an overall analysis of the current weather stations in Ireland and their corresponding data availability. This decision was made to gain a comprehensive understanding of the wind speed variability and its distribution over time. By focusing on these stations, I aim to explore how wind speeds fluctuate across the country, providing a solid foundation for further analysis of wind patterns and trends. This will help shape the direction of my future analyses, enabling more detailed assessments of wind energy potential in Ireland. 
The names and IDs used in the dictionary were collected manually.

In [2]:
# Dictionary of station names and IDs
stations = {
    "Athenry": 1875,
    "Ballyhaise": 675,
    "Belmullet": 2375,
    "Casement": 3723,
    "Claremorris": 2175,
    "Cork Airport": 3904,
    "Dublin Airport": 532,
    "Dunsany": 1375,
    "Finner": 2075,
    "Gurteen": 1475,
    "Johnstown Castle": 1775,
    "Knock Airport": 4935,
    "Malin Head": 1575,
    "Mace Head": 275,
    "Moore Park": 575,
    "Mount Dillon": 1975,
    "Mullingar": 875,
    "Newport": 1175,
    "Oak Park": 375,
    "Roches Point": 1075,
    "Shannon Airport": 518,
    "Sherkin Island": 775,
    "Valentia Observatory": 2275
}

In [3]:
# Base URL
base_url = "https://cli.fusio.net/cli/climate_data/webdata/dly{}.csv"

# Folder to store data
data_folder = "Data"
os.makedirs(data_folder, exist_ok=True)

# Loop to fetch and save data
for station_name, station_id in stations.items():
    url = base_url.format(station_id)
    print(f"Fetching data from {url} for station: {station_name} (ID: {station_id})...")

    try:
        # Download the data, skipping the first 24 rows
        response = requests.get(url)
        response.raise_for_status()
        station_data = pd.read_csv(StringIO(response.text), skiprows=24, skipinitialspace=True)
        
        # Save to CSV in the Data folder with station name and ID in the filename
        file_path = os.path.join(data_folder, f"{station_name}_station_{station_id}.csv")
        station_data.to_csv(file_path, index=False)
    
    
    except Exception as e:
        print(f"Error fetching data for station {station_name} (ID: {station_id}): {e}")


Fetching data from https://cli.fusio.net/cli/climate_data/webdata/dly1875.csv for station: Athenry (ID: 1875)...
Fetching data from https://cli.fusio.net/cli/climate_data/webdata/dly675.csv for station: Ballyhaise (ID: 675)...
Fetching data from https://cli.fusio.net/cli/climate_data/webdata/dly2375.csv for station: Belmullet (ID: 2375)...
Fetching data from https://cli.fusio.net/cli/climate_data/webdata/dly3723.csv for station: Casement (ID: 3723)...
Fetching data from https://cli.fusio.net/cli/climate_data/webdata/dly2175.csv for station: Claremorris (ID: 2175)...
Fetching data from https://cli.fusio.net/cli/climate_data/webdata/dly3904.csv for station: Cork Airport (ID: 3904)...
Fetching data from https://cli.fusio.net/cli/climate_data/webdata/dly532.csv for station: Dublin Airport (ID: 532)...
Fetching data from https://cli.fusio.net/cli/climate_data/webdata/dly1375.csv for station: Dunsany (ID: 1375)...
Fetching data from https://cli.fusio.net/cli/climate_data/webdata/dly2075.csv 

In [4]:
# Define the columns of interest
columns_of_interest = ['date', 'maxtp', 'mintp', 'rain', 'cbl', 'wdsp', 'hm', 'ddhm', 'hg']

In [5]:
# Reverse the dictionary for quick lookup
station_ids_to_names = {v: k for k, v in stations.items()}

# Output file path
output_file = "All_stations_data.csv"

# Initialize an empty list for combined data
combined_data = []

# Loop through all files in the folder
for file_name in os.listdir(data_folder):
    try:
        # Extract station ID from the file name
        station_id = int(file_name.split('_')[-1].split('.')[0])
        station_name = station_ids_to_names.get(station_id, "Unknown")
        
        # Read the data and add columns
        file_path = os.path.join(data_folder, file_name)
        station_data = pd.read_csv(file_path)[columns_of_interest]
        station_data["Station_ID"] = station_id
        station_data["Station_Name"] = station_name
        
        # Add the data to the combined list
        combined_data.append(station_data)
    except Exception as e:
        print(f"Error processing file {file_name}: {e}")

# Combine and save the final DataFrame
pd.concat(combined_data, ignore_index=True).to_csv(output_file, index=False)
print(f"All data saved to {output_file}")


All data saved to All_stations_data.csv


***

# END