#### Author: Faolán Hamilton

Get the data from this link - https://cli.fusio.net/cli/climate_data/webdata/hly4935.csv

## Part 1 - 60%
Plot:

- The temperature
- The mean temperature each day
- The mean temperature for each month


## Part 2 - 40%

Plot:

- The Windspeed (there is data missing from this column)
- The rolling windspeed (say over 24 hours)
- The max windspeed for each day
- The monthly mean of the daily max windspeeds (yer I am being nasty here)

You do not need to over comment your code. Marks will be given for how nice the plots are.

-------------------------------------------------------------------------------

In [None]:
# import the key modules to be used
from datetime import datetime as dt

from datetime import date as date

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

## Step 1 - Cleaning the data

##### To get the csv to open, I thought I had to download the data from the link, bring it into the assignment folder and delete the first several rows to clean the data - all I had to do was use the 'skiprows' function!

In [None]:
# read in data
df = pd.read_csv("https://cli.fusio.net/cli/climate_data/webdata/hly4935.csv", skiprows=23)
df.head(2)

#### Looking at the question, I will only need temperature and windspeed so it is a good idea to remove unneccessary columns

In [None]:
# see all column names
headers = df.columns.tolist()
headers

In [None]:
# remove the headers that are not relevant
drop_col_list = ['ind','rain','ind.1', 'ind.2', 'wetb', 'dewpt', 'vappr', 'rhum', 'msl', 'ind.3','ind.4', 'wddir', 'ww', 'w','sun','vis','clht','clamt']
df.drop(columns=drop_col_list, inplace=True)
df.head(2)

In [None]:
# I want to see the dtype of each column
df.info()

### I want to clean up the dtype, starting with the date column

###### Datetime conversion source: https://www.geeksforgeeks.org/pandas/convert-the-column-type-from-string-to-datetime-format-in-pandas-dataframe/

In [None]:
# convert the date column to a datetime format
df['date'] = pd.to_datetime(df['date'], format="%d-%b-%Y %H:%M")
df['date']

In [None]:
# set the date column as the index to search by dates easily
df.set_index('date', inplace=True)
df.head(2)

In [None]:
# set the Wind speed column to numeric value
df['wdsp'] = pd.to_numeric(df['wdsp'], errors='coerce')
df.head(2)

### The heading names are not entirely clear, I want to rename them

###### renaming columns (https://www.geeksforgeeks.org/python/how-to-rename-multiple-column-headers-in-a-pandas-dataframe/)

###### renaming index (https://stackoverflow.com/questions/19851005/rename-pandas-dataframe-index)

In [None]:
# improve the naming convention of the headings

df.columns = df.columns.str.replace('temp', 'Temperature (C)').str.replace('wdsp', 'Wind Speed (km)')
df.index.names = ['Date and Time']
df.head(2)

##### I want to check if there are any null values

In [None]:
# check for nulls
df.isnull().sum()

###### check which rows are nulls in the DataFrame (https://stackoverflow.com/questions/27159189/find-empty-or-nan-entry-in-pandas-dataframe)

In [None]:
# See which rows have empty values to double check later
df[(df['Wind Speed (km)'].isnull())].index

In [None]:
# check what the NaN value looks like
df.loc['1996-08-01 22:00:00']

#### There are 50 null values in Wind Speed, I will need to clean this data up - my chosen method will be to replace the null values with 0 to be consistent with the existing data structure

In [None]:
# fill NaN value with '0.0'
df.fillna(value='0.0', inplace=True)
df.head(2)

In [None]:
# check to see if that worked
df.loc['1996-08-01 22:00:00']

In [None]:
# checking again for nulls
df.isnull().sum()

------------------------------------------------------------------------

## Part 1 of the assignment

###### Pandas resampling documentation - https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#dateoffset-objects


### Plotting Temperature

###### setting yticklabels on sns - https://stackoverflow.com/questions/56605113/how-to-set-x-axis-ticklabels-in-a-seaborn-plot

In [None]:
# Plotting the Temperature
plot = sns.lineplot(df['Temperature (C)'], color="g")
plot.set_title("Temperature from 1996 to 2025 in Celsius")
plot.set_xlabel('Year')
plot.set_yticklabels(['-5°C','0°C', '5°C', '10°C','15°C','20°C','25°C','30°C'])

In [None]:
# Setting the mean temperature of each day 
meandaytemp = df['Temperature (C)'].resample("D").mean()
meandaytemp

#### The data starts on the 1996-04-10, but doesn't become consistent until 1996-07-31. I will drop the na values between this time

In [None]:
# drop na val
meandaytemp.dropna(inplace=True)
meandaytemp

In [None]:
# plotting the mean day Temperature
plot = sns.lineplot(meandaytemp, color="r")
plot.set_title("Average Temperature per day from 1996 to 2025 in Celsius")
plot.set_xlabel('Year')
plot.set_yticklabels(['-5°C','0°C', '5°C', '10°C','15°C','20°C','25°C'])

In [None]:
# getting the mean temperature per month 
mean_month_temp = df['Temperature (C)'].resample("ME").mean()
mean_month_temp

In [None]:
# drop na values from the mean temperature per month 
mean_month_temp.dropna(inplace=True)
mean_month_temp

##### I did not use these sources in the end but they were great to research

###### plotting the axis based on the index - https://stackoverflow.com/questions/22356881/using-a-pandas-dataframe-index-as-values-for-x-axis-in-matplotlib-plot

###### plotting the axis based on the index - https://duckduckgo.com/?q=matplotlib+madates+formatter+plot+axis+based+on+index&atb=v491-1&kbg=-1&ia=web

In [None]:
# plotting the mean month Temperature

plot = sns.lineplot(mean_month_temp, color="purple")
plot.set_title("Average Temperature per month from 1996 to 2025 in Celsius")
plot.set_xlabel('Year')
plot.set_yticklabels(['-5°C','0°C', '5°C', '10°C','15°C','20°C','25°C','30°C', '35°C', '40°C'])

--------------------------------------------------------------------
## Part 2 of the assignment

The Windspeed

In [None]:
plot = sns.lineplot(df['Wind Speed (km)'], color="orange")
plot.set_title("Wind Speed from 1996 to 2025 in Kilometres per Hour")
plot.set_xlabel('Year')
plot.set_yticklabels(['-10km/hr','0km/hr', '10km/hr', '20km/hr','30km/hr','40km/hr','50km/hr', '60km/hr'])

The Rolling Windspeed

In [None]:
# Using the end of the dataset to plot the 30th of November
twentyfour_hrs = df['Wind Speed (km)'].tail(25)

In [None]:
# Plotting the data
plot = sns.lineplot(twentyfour_hrs, color="orange")
plot.set_title("Wind Speed the 30th of November 2025 in Kilometres per Hour")
plot.set_xlabel('Military Hour Time')
plot.set_xticklabels(['00','03', '06', '09','12','15','18', '21', '00'])

The max windspeed for each day

The monthly mean of the daily max windspeeds

##### Calculate the Daily Max windspeed first (done above) then the monthly mean of these 