# Rainfall Trends and Patterns in Dublin (Ringsend)

The final project for Programming for Data Analytics '24-'25

Author: Atacan Buyuktalas

## Introduction

- Objective

    This project analyzes rainfall data from the Dublin (Ringsend) weather station from 1941 to August 2024. It aims to uncover trends, seasonal patterns, and significant rainfall events.

- Key Questions

    1.	How has total annual rainfall changed over time?
	2.	Which months experience the highest and lowest rainfall?
	3.	What trends exist in the number of rain (rd) and wet days (wd)?
	4.	How has the greatest daily rainfall (gdf) varied?
	5.	Can we predict future annual rainfall based on historical data?

## Loading and Exploring Dataset

The data set it taken from [Met Eireann](https://www.met.ie/climate/available-data/historical-data). 

In [1]:
# Import required libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LinearRegression
import numpy as np

In [None]:
# Read the data
file_path = 'data/dublin_1941_2024.csv'
df = pd.read_csv(file_path, skiprows=13)

# Display the first 5 rows of the data
print(df.head())

# Check for missing values and data types
print(df.info())

   year  month  ind   rain   gdf  rd  wd
0  1941      1    0  112.8  13.0  18  18
1  1941      2    0   69.5  13.0  22  15
2  1941      3    0  111.0  50.0  21  13
3  1941      4    0   68.6  16.5  15  12
4  1941      5    0   66.4  20.1  13  10
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 971 entries, 0 to 970
Data columns (total 7 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   year    971 non-null    int64 
 1   month   971 non-null    int64 
 2   ind     971 non-null    int64 
 3   rain    971 non-null    object
 4   gdf     971 non-null    object
 5   rd      971 non-null    object
 6   wd      971 non-null    object
dtypes: int64(3), object(4)
memory usage: 53.2+ KB
None


In [11]:
# Convert columns to appropriate numeric types
df['rain'] = pd.to_numeric(df['rain'], errors='coerce')
df['gdf'] = pd.to_numeric(df['gdf'], errors='coerce')
df['rd'] = pd.to_numeric(df['rd'], errors='coerce')
df['wd'] = pd.to_numeric(df['wd'], errors='coerce')

### Handling missing values

- Using [`replace()`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.replace.html#pandas.DataFrame.replace) operator to reveal null values and counting them using [`isnull()`](https://pandas.pydata.org/docs/reference/api/pandas.isnull.html#pandas-isnull) and `sum()`.

- 

In [13]:
# Replace missing values with NaN
df.replace(' ', np.nan, inplace=True)

# Check for missing values
print(df.isnull().sum())

print(df.info())

year       0
month      0
ind        0
rain      50
gdf      104
rd        91
wd        91
dtype: int64
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 971 entries, 0 to 970
Data columns (total 7 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   year    971 non-null    int64  
 1   month   971 non-null    int64  
 2   ind     971 non-null    int64  
 3   rain    921 non-null    float64
 4   gdf     867 non-null    float64
 5   rd      880 non-null    float64
 6   wd      880 non-null    float64
dtypes: float64(4), int64(3)
memory usage: 53.2 KB
None
