## 01 SINGAPORE'S WEATHER DATA: 1983 - 2019 (SEPT)

The raw folder have
- 438 CSV files containing daily weather data for Singapore from 1983 - 2019 (June)

- a "monthly_data" sub-folder containing monthly average data for rainfall, maximum and mean temperatures.

In [1]:
import glob
import pandas as pd

ModuleNotFoundError: No module named 'pandas'

# 1. DAILY WEATHER DATA 

In [None]:
# Combining the separate CSV files into one
raw = pd.concat(
    [pd.read_csv(f) for f in glob.glob("../raw/*.csv")], ignore_index=True
)

In [None]:
# Adding a datetime col in the year-month-day format
raw["Date"] = pd.to_datetime(
    raw["Year"].astype(str)
    + "-"
    + raw["Month"].astype(str)
    + "-"
    + raw["Day"].astype(str)
)

In [None]:
raw["Month_Name"] = raw["Date"].dt.month_name()
raw["Quarter"] = raw["Date"].dt.quarter

In [None]:
# Converting values in the Max/Mean Wind into numeric data type
raw["Max Wind Speed (km/h)"] = pd.to_numeric(
    raw["Max Wind Speed (km/h)"], errors="coerce"
)
raw["Mean Wind Speed (km/h)"] = pd.to_numeric(
    raw["Mean Wind Speed (km/h)"], errors="coerce"
)

In [None]:
raw.info()

#### Fill the missing entries in Mean Wind Speed and Max Wind Speed columns with mean values of both cols

In [None]:
raw["Max Wind Speed (km/h)"] = raw["Max Wind Speed (km/h)"].fillna(
    raw["Max Wind Speed (km/h)"].mean()
)
raw["Mean Wind Speed (km/h)"] = raw["Mean Wind Speed (km/h)"].fillna(
    raw["Mean Wind Speed (km/h)"].mean()
)

In [None]:
# Dropping cols that I won't need for visualisation or modelling
raw = raw.drop(
    columns=[
        "Station",
        "Highest 30 Min Rainfall (mm)",
        "Highest 60 Min Rainfall (mm)",
        "Highest 120 Min Rainfall (mm)",
    ]
)

In [None]:
# Slight rearrangement of cols for clarity
cols = [
    "Date",
    "Year",
    "Month",
    "Month_Name",
    "Quarter",
    "Day",
    "Daily Rainfall Total (mm)",
    "Mean Temperature (°C)",
    "Maximum Temperature (°C)",
    "Minimum Temperature (°C)",
    "Mean Wind Speed (km/h)",
    "Max Wind Speed (km/h)",
]

In [None]:
weather = raw[cols].copy()

In [None]:
weather = weather.sort_values('Date', ascending=False)

In [None]:
weather.info()
# no null values

In [None]:
weather.columns

In [None]:
weather.describe()
# The Daily Rainfall cols have some obvious outliers. But let's deal with that later, as and when required

In [None]:
weather.head()

In [None]:
# weather.to_csv('../data/weather.csv', index=False)

## 2. MONTHLY DATA
Here, I'll do some light processing of the monthly average data for rainfall, maximum and mean temperatures. They are in the raw folder's "monthly_data" sub-folder.

### 2.1 MONTHLY RAINFALL RECORDS

In [None]:
monthly_rain = pd.read_csv('../raw/monthly_data/monthly_rain.csv')

In [None]:
monthly_rain["month"] = pd.to_datetime(monthly_rain["month"])
monthly_rain["year"] = monthly_rain["month"].dt.year
monthly_rain["month"] = monthly_rain["month"].dt.month

In [None]:
monthly_rain = monthly_rain.rename(columns = {"year": "Year", 
                                              "month": "Month", 
                                              "total_rainfall": "Total_Monthly_Rainfall (mm)"})

In [None]:
# For consistency with the daily records, I'll start with entries from 1983 for the monthly datasets as well 
cols_rain = ["Total_Monthly_Rainfall (mm)", "Year", "Month"]
monthly_rain = monthly_rain[cols_rain].copy()
monthly_rain = monthly_rain[monthly_rain["Year"] >= 1983]

In [None]:
#monthly_rain.to_csv('../data/rain_monthly.csv', index=False)

In [None]:
monthly_rain.tail()

### 2.2 MONTHLY MEAN TEMPERATURES

In [None]:
mean_temp = pd.read_csv('../raw/monthly_data/monthly_temp_mean.csv')

In [None]:
mean_temp["month"] = pd.to_datetime(mean_temp["month"])
mean_temp["year"] = mean_temp["month"].dt.year
mean_temp["month"] = mean_temp["month"].dt.month

In [None]:
mean_temp = mean_temp.rename(
    columns={
        "year": "Year",
        "month": "Month",
        "mean_temp": "Mean_Monthly_Temperature (°C)",
    }
)

In [None]:
cols_temp_mean = ["Mean_Monthly_Temperature (°C)", "Year", "Month"]
mean_temp = mean_temp[cols_temp_mean].copy()
mean_temp = mean_temp[mean_temp["Year"] >= 1983]

In [None]:
#mean_temp.to_csv('../data/mean_temp_monthly.csv', index=False)

### 2.3 MONTHLY MAX TEMPERATURES

In [None]:
max_temp = pd.read_csv('../raw/monthly_data/monthly_temp_max.csv')

In [None]:
max_temp["month"] = pd.to_datetime(max_temp["month"])
max_temp["year"] = max_temp["month"].dt.year
max_temp["month"] = max_temp["month"].dt.month

In [None]:
max_temp = max_temp.rename(
    columns={
        "year": "Year",
        "month": "Month",
        "max_temperature": "Max_Monthly_Temperature (°C)",
    }
)

In [None]:
cols_temp_max = ["Max_Monthly_Temperature (°C)", "Year", "Month"]
max_temp = max_temp[cols_temp_max].copy()
max_temp = max_temp[max_temp["Year"] >= 1983]

In [None]:
#max_temp.to_csv('../data/max_temp_monthly.csv', index=False)