# PART 6A: 2019 - SINGAPORE'S WARMEST YEAR ON RECORD

Temperature records tumbled across the globe in 2019, with scientists declaring the year to be the second-warmest since records began in 1880. [2010 - 2019 was also the warmest decade in modern times](https://public.wmo.int/en/media/press-release/wmo-confirms-2019-second-hottest-year-record).

In Singapore, the annual mean temperature hit 28.44°C in 2019, making it the warmest year on record. Singapore's [National Environment Agency declared 2019 as a "joint warmest year" with 2016](https://www.nea.gov.sg/media/news/news/index/2019-is-singapore-s-joint-warmest-year-on-record) due to a decimal rounding issue. 2010 - 2019 was also the warmest decade on record in Singapore.

If you stick to 2 decimal places in the calculations, the annual mean temperature in 2016 was 28.43°C. Sure, it's just a 0.01°C difference. But to paraphrase Hemingway, disasters happen "gradually, and then suddenly". 

In this notebook, I'll assemble the raw data.

## FILE ORGANISATION:
The original data files, as downloaded from the [Singapore Met Office](http://www.weather.gov.sg/climate-historical-daily/) and Data.gov.sg, are in the raw folder. The files are mostly clean, save for some missing values for mean and max wind speed. I've lightly processed the files and saved the output to the data folder so that I can call them up easily for future data projects.

You can make a different version of the dataset by concating the raw files over a different time frame, or with more elaborate feature engineering.

What you'll find in the raw folder:
- 444 CSV files containing daily weather data for Singapore from 1983 - 2019.

The files in the data folder have been processed by the code below.

### NOTE:
- The "monthly_data" sub-folder, which contains monthly average data for rainfall, maximum and mean temperatures, has not been updated. 

In [1]:
import glob
import pandas as pd

# 1. DAILY WEATHER DATA 

In [2]:
# Combining the separate CSV files into one
raw = pd.concat(
    [pd.read_csv(f) for f in glob.glob("../raw/*.csv")], ignore_index=True
)

In [3]:
# Adding a datetime col in the year-month-day format
raw["Date"] = pd.to_datetime(
    raw["Year"].astype(str)
    + "-"
    + raw["Month"].astype(str)
    + "-"
    + raw["Day"].astype(str)
)

In [4]:
raw["Month_Name"] = raw["Date"].dt.month_name()
raw["Quarter"] = raw["Date"].dt.quarter

In [5]:
# Converting values in the Max/Mean Wind into numeric data type
raw["Max Wind Speed (km/h)"] = pd.to_numeric(
    raw["Max Wind Speed (km/h)"], errors="coerce"
)
raw["Mean Wind Speed (km/h)"] = pd.to_numeric(
    raw["Mean Wind Speed (km/h)"], errors="coerce"
)

In [6]:
raw.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13514 entries, 0 to 13513
Data columns (total 16 columns):
Station                          13514 non-null object
Year                             13514 non-null int64
Month                            13514 non-null int64
Day                              13514 non-null int64
Daily Rainfall Total (mm)        13514 non-null float64
Highest 30 Min Rainfall (mm)     13514 non-null object
Highest 60 Min Rainfall (mm)     13514 non-null object
Highest 120 Min Rainfall (mm)    13514 non-null object
Mean Temperature (°C)            13514 non-null float64
Maximum Temperature (°C)         13514 non-null float64
Minimum Temperature (°C)         13514 non-null float64
Mean Wind Speed (km/h)           13504 non-null float64
Max Wind Speed (km/h)            13503 non-null float64
Date                             13514 non-null datetime64[ns]
Month_Name                       13514 non-null object
Quarter                          13514 non-null int64
d

#### Fill the missing entries in Mean Wind Speed and Max Wind Speed columns with mean values of both cols

In [7]:
raw["Max Wind Speed (km/h)"] = raw["Max Wind Speed (km/h)"].fillna(
    raw["Max Wind Speed (km/h)"].mean()
)
raw["Mean Wind Speed (km/h)"] = raw["Mean Wind Speed (km/h)"].fillna(
    raw["Mean Wind Speed (km/h)"].mean()
)

In [8]:
# Dropping cols that I won't need for visualisation or modelling
raw = raw.drop(
    columns=[
        "Station",
        "Highest 30 Min Rainfall (mm)",
        "Highest 60 Min Rainfall (mm)",
        "Highest 120 Min Rainfall (mm)",
    ]
)

In [9]:
# Slight rearrangement of cols for clarity
cols = [
    "Date",
    "Year",
    "Month",
    "Month_Name",
    "Quarter",
    "Day",
    "Daily Rainfall Total (mm)",
    "Mean Temperature (°C)",
    "Maximum Temperature (°C)",
    "Minimum Temperature (°C)",
    "Mean Wind Speed (km/h)",
    "Max Wind Speed (km/h)",
]

In [10]:
weather = raw[cols].copy()

In [11]:
weather = weather.sort_values('Date', ascending=False)

In [12]:
weather.info()
# no null values

<class 'pandas.core.frame.DataFrame'>
Int64Index: 13514 entries, 13268 to 4385
Data columns (total 12 columns):
Date                         13514 non-null datetime64[ns]
Year                         13514 non-null int64
Month                        13514 non-null int64
Month_Name                   13514 non-null object
Quarter                      13514 non-null int64
Day                          13514 non-null int64
Daily Rainfall Total (mm)    13514 non-null float64
Mean Temperature (°C)        13514 non-null float64
Maximum Temperature (°C)     13514 non-null float64
Minimum Temperature (°C)     13514 non-null float64
Mean Wind Speed (km/h)       13514 non-null float64
Max Wind Speed (km/h)        13514 non-null float64
dtypes: datetime64[ns](1), float64(6), int64(4), object(1)
memory usage: 1.3+ MB


In [13]:
weather.columns

Index(['Date', 'Year', 'Month', 'Month_Name', 'Quarter', 'Day',
       'Daily Rainfall Total (mm)', 'Mean Temperature (°C)',
       'Maximum Temperature (°C)', 'Minimum Temperature (°C)',
       'Mean Wind Speed (km/h)', 'Max Wind Speed (km/h)'],
      dtype='object')

In [14]:
weather.describe()
# The Daily Rainfall cols have some obvious outliers. But let's deal with that later, as and when required

Unnamed: 0,Year,Month,Quarter,Day,Daily Rainfall Total (mm),Mean Temperature (°C),Maximum Temperature (°C),Minimum Temperature (°C),Mean Wind Speed (km/h),Max Wind Speed (km/h)
count,13514.0,13514.0,13514.0,13514.0,13514.0,13514.0,13514.0,13514.0,13514.0,13514.0
mean,2000.999334,6.523013,2.508584,15.729392,5.830132,27.666886,31.519476,24.904077,7.450822,34.04365
std,10.677276,3.448808,1.117115,8.800314,14.448264,1.176403,1.57358,1.267669,3.475324,8.027471
min,1983.0,1.0,1.0,1.0,0.0,22.8,23.6,20.2,0.2,4.7
25%,1992.0,4.0,2.0,8.0,0.0,26.9,30.8,24.0,4.8,28.8
50%,2001.0,7.0,3.0,16.0,0.0,27.7,31.8,24.9,6.8,33.1
75%,2010.0,10.0,4.0,23.0,4.4,28.6,32.5,25.8,9.7,38.2
max,2019.0,12.0,4.0,31.0,216.2,30.9,36.0,29.1,22.2,90.7


In [15]:
weather.tail()

Unnamed: 0,Date,Year,Month,Month_Name,Quarter,Day,Daily Rainfall Total (mm),Mean Temperature (°C),Maximum Temperature (°C),Minimum Temperature (°C),Mean Wind Speed (km/h),Max Wind Speed (km/h)
4389,1983-01-05,1983,1,January,1,5,0.0,27.1,31.8,23.7,10.3,34.6
4388,1983-01-04,1983,1,January,1,4,0.0,27.3,30.8,25.0,12.6,42.1
4387,1983-01-03,1983,1,January,1,3,2.9,27.0,31.3,24.5,10.7,42.8
4386,1983-01-02,1983,1,January,1,2,0.4,26.8,30.6,24.8,9.4,43.2
4385,1983-01-01,1983,1,January,1,1,0.3,26.5,28.7,25.1,5.5,29.9


In [16]:
# weather.to_csv('../data/weather_1983_2019_full.csv', index=False)