# Data Preparation

After collecting the data in the MySQL database, we will download the available data into csv files. Just by looking at the message displayed when running the <code>SELECT</code> statements in SQL, we notice that the number of records for each variable are different.<br>
<img src="Images/sql records.png" width=400 align="left">

Hence we will have to do some transformations to the data before merging into a single dataset.

## Step 1: Load the Relevant Libraries

Since we are dealing with data involving the latitudes and longitudes of the weather stations that NEA takes its data from, load in the geospatial packages <code>geopandas</code>,<code>fiona</code> and <code>shapely</code> in addition to the standard packages like <code>pandas</code> and <code>numpy</code>.

In [1]:
import pandas as pd
import numpy as np
import math
import geopandas as gpd
import fiona
import xmltodict
from shapely.geometry import Point
from IPython.display import IFrame

## Step 2: Read the Raw csv Files

In [2]:
temperature_raw=pd.read_csv("../Data/temperature_raw.csv")
temperature_raw

Unnamed: 0,timestamp,station_id,latitude,longitude,reading
0,2017-01-01 00:00:00,S06,1.35240,103.90070,26.2
1,2017-01-01 00:00:00,S100,1.41720,103.74855,26.3
2,2017-01-01 00:00:00,S102,1.18900,103.76800,27.2
3,2017-01-01 00:00:00,S104,1.44387,103.78538,25.9
4,2017-01-01 00:00:00,S106,1.41680,103.96730,25.0
...,...,...,...,...,...
658209,2021-12-31 23:00:00,S121,1.37288,103.72244,24.2
658210,2021-12-31 23:00:00,S24,1.36780,103.98260,24.6
658211,2021-12-31 23:00:00,S43,1.33990,103.88780,24.9
658212,2021-12-31 23:00:00,S44,1.34583,103.68166,24.0


In [3]:
humidity_raw=pd.read_csv("../Data/humidity_raw.csv")
humidity_raw

Unnamed: 0,timestamp,station_id,latitude,longitude,reading
0,2017-01-01 00:00:00,S06,1.35240,103.90070,82.0
1,2017-01-01 00:00:00,S100,1.41720,103.74855,85.5
2,2017-01-01 00:00:00,S102,1.18900,103.76800,86.6
3,2017-01-01 00:00:00,S107,1.31350,103.96250,89.8
4,2017-01-01 00:00:00,S108,1.27990,103.87030,87.1
...,...,...,...,...,...
615465,2021-12-31 23:00:00,S121,1.37288,103.72244,94.7
615466,2021-12-31 23:00:00,S24,1.36780,103.98260,92.2
615467,2021-12-31 23:00:00,S43,1.33990,103.88780,91.7
615468,2021-12-31 23:00:00,S44,1.34583,103.68166,90.5


In [4]:
winddirection_raw=pd.read_csv("../Data/winddirection_raw.csv")
winddirection_raw

Unnamed: 0,timestamp,station_id,latitude,longitude,reading
0,2017-01-01 00:00:00,S06,1.35240,103.90070,10
1,2017-01-01 00:00:00,S100,1.41720,103.74855,55
2,2017-01-01 00:00:00,S102,1.18900,103.76800,346
3,2017-01-01 00:00:00,S104,1.44387,103.78538,8
4,2017-01-01 00:00:00,S106,1.41680,103.96730,21
...,...,...,...,...,...
589431,2021-12-31 23:00:00,S117,1.25600,103.67900,69
589432,2021-12-31 23:00:00,S24,1.36780,103.98260,238
589433,2021-12-31 23:00:00,S44,1.34583,103.68166,103
589434,2021-12-31 23:00:00,S50,1.33370,103.77680,96


In [5]:
windspeed_raw=pd.read_csv("../Data/windspeed_raw.csv")
windspeed_raw

Unnamed: 0,timestamp,station_id,latitude,longitude,reading
0,2017-01-01 00:00:00,S06,1.35240,103.90070,1.2
1,2017-01-01 00:00:00,S100,1.41720,103.74855,1.3
2,2017-01-01 00:00:00,S102,1.18900,103.76800,12.0
3,2017-01-01 00:00:00,S104,1.44387,103.78538,3.5
4,2017-01-01 00:00:00,S106,1.41680,103.96730,2.9
...,...,...,...,...,...
592214,2021-12-31 23:00:00,S117,1.25600,103.67900,6.5
592215,2021-12-31 23:00:00,S24,1.36780,103.98260,4.9
592216,2021-12-31 23:00:00,S44,1.34583,103.68166,1.9
592217,2021-12-31 23:00:00,S50,1.33370,103.77680,1.6


In [6]:
rainfall_raw=pd.read_csv("../Data/rainfall_raw.csv")
rainfall_raw

Unnamed: 0,timestamp,station_id,latitude,longitude,reading
0,2017-01-01 00:00:00,S06,1.35240,103.90070,0.0
1,2017-01-01 00:00:00,S07,1.34150,103.83340,0.0
2,2017-01-01 00:00:00,S08,1.37010,103.82710,0.0
3,2017-01-01 00:00:00,S100,1.41720,103.74855,0.0
4,2017-01-01 00:00:00,S101,1.35053,103.71340,0.0
...,...,...,...,...,...
2448358,2021-12-31 23:00:00,S88,1.34270,103.84820,0.0
2448359,2021-12-31 23:00:00,S89,1.31985,103.66162,0.0
2448360,2021-12-31 23:00:00,S90,1.31910,103.81910,0.0
2448361,2021-12-31 23:00:00,S900,1.41284,103.86922,0.0


## Step 3: Labelling Categorical Data

2 simple data transformations that we can consider doing first are: <br>
   1. Convert the rainfall readings into a binary value where '1' indicates the presence of rain and '0' indicates the absence.
   2. Convert the wind direction readings into a cardinal direction

In [7]:
rainfall_raw["reading"]=rainfall_raw["reading"].apply(lambda x: 1 if x>0 else 0)
rainfall_raw

Unnamed: 0,timestamp,station_id,latitude,longitude,reading
0,2017-01-01 00:00:00,S06,1.35240,103.90070,0
1,2017-01-01 00:00:00,S07,1.34150,103.83340,0
2,2017-01-01 00:00:00,S08,1.37010,103.82710,0
3,2017-01-01 00:00:00,S100,1.41720,103.74855,0
4,2017-01-01 00:00:00,S101,1.35053,103.71340,0
...,...,...,...,...,...
2448358,2021-12-31 23:00:00,S88,1.34270,103.84820,0
2448359,2021-12-31 23:00:00,S89,1.31985,103.66162,0
2448360,2021-12-31 23:00:00,S90,1.31910,103.81910,0
2448361,2021-12-31 23:00:00,S900,1.41284,103.86922,0


#### What is a cardinal direction?

Wind direction is commonly described using a 16-point rose compass.<br>
<img src="./Images/compass.png" width=400 align="left">

Thus, we can convert the wind direction reading in degrees into a direction on a rose compass.

In [8]:
def cardinal_direction(degree):
    directions=["N","NNE","NE","ENE","E","ESE", "SE","SSE","S","SSW","SW","WSW", "W","WNW","NW","NNW","N"]
    return(directions[math.floor((degree+11.25)/22.5)])

In [9]:
winddirection_raw["reading"]=winddirection_raw["reading"].apply(cardinal_direction)
winddirection_raw

Unnamed: 0,timestamp,station_id,latitude,longitude,reading
0,2017-01-01 00:00:00,S06,1.35240,103.90070,N
1,2017-01-01 00:00:00,S100,1.41720,103.74855,NE
2,2017-01-01 00:00:00,S102,1.18900,103.76800,NNW
3,2017-01-01 00:00:00,S104,1.44387,103.78538,N
4,2017-01-01 00:00:00,S106,1.41680,103.96730,NNE
...,...,...,...,...,...
589431,2021-12-31 23:00:00,S117,1.25600,103.67900,ENE
589432,2021-12-31 23:00:00,S24,1.36780,103.98260,WSW
589433,2021-12-31 23:00:00,S44,1.34583,103.68166,ESE
589434,2021-12-31 23:00:00,S50,1.33370,103.77680,E


## Step 4: Converting Each Coordinate into a Geographical Location

One reason for the difference in the number of records for each variable has to do with the fact that there are a different number of weather stations recording different data all over Singapore. There are a much greater rainfall weather stations that the API can call compared to temperature weather stations for example.<br><br>

Hence, what we can do is to group the weather stations in some way and aggregate the weather stations that are close by. And to do so, we can make use of the different planning areas in Singapore. Singapore can be divided into several planning areas as shown below:

In [10]:
map_url='https://data.gov.sg/dataset/master-plan-2014-planning-area-boundary-web/resource/f622fcd3-3478-4183-bda8-74779b35fe14/view/248b1708-29c4-40bd-916f-7b0315f9f399'
IFrame(map_url, width=700, height=350)

(As of 7th April 2022, there is a server error on Data.gov.sg preventing the render of a KML map)

Therefore, we must first read the planning area kml files from data.gov.sg. <br>
Then, we create a function to take the latitudes and longitudes of the data to return a planning area.

In [11]:
gpd.io.file.fiona.drvsupport.supported_drivers['KML'] = 'rw'
df = gpd.read_file("../Data/planning-boundary-area.kml", driver='KML')
df.head()

Unnamed: 0,Name,Description,geometry
0,kml_1,<center><table><tr><th colspan='2' align='cent...,"POLYGON Z ((103.81740 1.29433 0.00000, 103.817..."
1,kml_2,<center><table><tr><th colspan='2' align='cent...,"POLYGON Z ((103.77445 1.39029 0.00000, 103.774..."
2,kml_3,<center><table><tr><th colspan='2' align='cent...,"POLYGON Z ((103.79766 1.34813 0.00000, 103.798..."
3,kml_4,<center><table><tr><th colspan='2' align='cent...,"POLYGON Z ((103.80578 1.41436 0.00000, 103.805..."
4,kml_5,<center><table><tr><th colspan='2' align='cent...,"POLYGON Z ((103.98693 1.39794 0.00000, 103.987..."


While we now have the coordinates of each planning area, the name of each planning area is still embedded in the <code>xml</code> 'Description' column. Hence we need to parse the <code>xml</code> data to obtain the names of the planning area.

In [12]:
def convert_xml_description(xml_text):
    parsed_text=xmltodict.parse(xml_text)
    return parsed_text["center"]["table"]["tr"][1]["td"].lower()

In [13]:
df["Description"]=df["Description"].apply(convert_xml_description)
df.head()

Unnamed: 0,Name,Description,geometry
0,kml_1,bukit merah,"POLYGON Z ((103.81740 1.29433 0.00000, 103.817..."
1,kml_2,bukit panjang,"POLYGON Z ((103.77445 1.39029 0.00000, 103.774..."
2,kml_3,bukit timah,"POLYGON Z ((103.79766 1.34813 0.00000, 103.798..."
3,kml_4,central water catchment,"POLYGON Z ((103.80578 1.41436 0.00000, 103.805..."
4,kml_5,changi,"POLYGON Z ((103.98693 1.39794 0.00000, 103.987..."


Now we are ready to convert the weather data's coordinates into a planning area. The function we create will check if the weather station belongs to any planning area and thus returns the name of the planning area in is in. This step may take a while as the data has over 400k records.

In [14]:
def get_planning_area(latitude,longitude):
    p=Point(longitude,latitude)
    for i in range(len(df)):
        if p.within(df["geometry"][i]):
            return df["Description"][i]

In [15]:
temperature_raw["planning area"]=temperature_raw.apply(lambda x:get_planning_area(x.latitude,x.longitude),axis=1)
temperature_raw.head()

Unnamed: 0,timestamp,station_id,latitude,longitude,reading,planning area
0,2017-01-01 00:00:00,S06,1.3524,103.9007,26.2,paya lebar
1,2017-01-01 00:00:00,S100,1.4172,103.74855,26.3,sungei kadut
2,2017-01-01 00:00:00,S102,1.189,103.768,27.2,western islands
3,2017-01-01 00:00:00,S104,1.44387,103.78538,25.9,woodlands
4,2017-01-01 00:00:00,S106,1.4168,103.9673,25.0,north-eastern islands


In [16]:
humidity_raw["planning area"]=humidity_raw.apply(lambda x:get_planning_area(x.latitude,x.longitude),axis=1)
humidity_raw.head()

Unnamed: 0,timestamp,station_id,latitude,longitude,reading,planning area
0,2017-01-01 00:00:00,S06,1.3524,103.9007,82.0,paya lebar
1,2017-01-01 00:00:00,S100,1.4172,103.74855,85.5,sungei kadut
2,2017-01-01 00:00:00,S102,1.189,103.768,86.6,western islands
3,2017-01-01 00:00:00,S107,1.3135,103.9625,89.8,bedok
4,2017-01-01 00:00:00,S108,1.2799,103.8703,87.1,marina south


In [17]:
winddirection_raw["planning area"]=winddirection_raw.apply(lambda x:get_planning_area(x.latitude,x.longitude),axis=1)
winddirection_raw.head()

Unnamed: 0,timestamp,station_id,latitude,longitude,reading,planning area
0,2017-01-01 00:00:00,S06,1.3524,103.9007,N,paya lebar
1,2017-01-01 00:00:00,S100,1.4172,103.74855,NE,sungei kadut
2,2017-01-01 00:00:00,S102,1.189,103.768,NNW,western islands
3,2017-01-01 00:00:00,S104,1.44387,103.78538,N,woodlands
4,2017-01-01 00:00:00,S106,1.4168,103.9673,NNE,north-eastern islands


In [18]:
windspeed_raw["planning area"]=windspeed_raw.apply(lambda x:get_planning_area(x.latitude,x.longitude),axis=1)
windspeed_raw.head()

Unnamed: 0,timestamp,station_id,latitude,longitude,reading,planning area
0,2017-01-01 00:00:00,S06,1.3524,103.9007,1.2,paya lebar
1,2017-01-01 00:00:00,S100,1.4172,103.74855,1.3,sungei kadut
2,2017-01-01 00:00:00,S102,1.189,103.768,12.0,western islands
3,2017-01-01 00:00:00,S104,1.44387,103.78538,3.5,woodlands
4,2017-01-01 00:00:00,S106,1.4168,103.9673,2.9,north-eastern islands


In [19]:
rainfall_raw["planning area"]=rainfall_raw.apply(lambda x:get_planning_area(x.latitude,x.longitude),axis=1)
rainfall_raw.head()

Unnamed: 0,timestamp,station_id,latitude,longitude,reading,planning area
0,2017-01-01 00:00:00,S06,1.3524,103.9007,0,paya lebar
1,2017-01-01 00:00:00,S07,1.3415,103.8334,0,central water catchment
2,2017-01-01 00:00:00,S08,1.3701,103.8271,0,central water catchment
3,2017-01-01 00:00:00,S100,1.4172,103.74855,0,sungei kadut
4,2017-01-01 00:00:00,S101,1.35053,103.7134,0,jurong west


In [20]:
#temperature_raw.to_csv("temperature_transformed.csv", index=False)
#humidity_raw.to_csv("humidity_transformed.csv", index=False)
#winddirection_raw.to_csv("winddirection_transformed.csv", index=False)
#windspeed_raw.to_csv("windspeed_transformed.csv", index=False)
#rainfall_raw.to_csv("rainfall_transformed.csv", index=False)

As the previous step may take a while, the above cell is written to save the current progress and is optional.

## Step 5: Aggregating by Planning Area

The next step is to aggregate the variables by planning FOR EACH timestamp. We use the <code>.gruopby()</code> and <code>.mean()</code> functions for numerical variables, and <code>.groupby()</code> and <code>.value_counts()</code> functions for categorical ones.

In [22]:
temperature=temperature_raw.groupby(["timestamp","planning area"])["reading"].mean().to_frame().reset_index()
temperature

Unnamed: 0,timestamp,planning area,reading
0,2017-01-01 00:00:00,ang mo kio,26.10
1,2017-01-01 00:00:00,bedok,27.10
2,2017-01-01 00:00:00,changi,26.15
3,2017-01-01 00:00:00,changi bay,26.60
4,2017-01-01 00:00:00,hougang,26.90
...,...,...,...
610305,2021-12-31 23:00:00,queenstown,24.50
610306,2021-12-31 23:00:00,sungei kadut,24.50
610307,2021-12-31 23:00:00,tuas,25.30
610308,2021-12-31 23:00:00,western water catchment,24.10


In [23]:
humidity=humidity_raw.groupby(["timestamp","planning area"])["reading"].mean().to_frame().reset_index()
humidity

Unnamed: 0,timestamp,planning area,reading
0,2017-01-01 00:00:00,ang mo kio,90.5
1,2017-01-01 00:00:00,bedok,89.8
2,2017-01-01 00:00:00,changi,92.9
3,2017-01-01 00:00:00,changi bay,87.1
4,2017-01-01 00:00:00,hougang,88.1
...,...,...,...
577192,2021-12-31 23:00:00,marina south,99.5
577193,2021-12-31 23:00:00,sungei kadut,96.5
577194,2021-12-31 23:00:00,tuas,83.2
577195,2021-12-31 23:00:00,western water catchment,92.6


In [96]:
winddirection=winddirection_raw.groupby(["timestamp","planning area"])["reading"].value_counts().to_frame()
winddirection=winddirection.index.to_frame().reset_index(drop=True)
winddirection=winddirection.groupby(["timestamp","planning area"]).head(1)
winddirection

Unnamed: 0,timestamp,planning area,reading
0,2017-01-01 00:00:00,ang mo kio,NNE
1,2017-01-01 00:00:00,bukit timah,NNW
2,2017-01-01 00:00:00,changi,NNW
3,2017-01-01 00:00:00,changi bay,N
4,2017-01-01 00:00:00,hougang,NE
...,...,...,...
584534,2021-12-31 23:00:00,sungei kadut,SE
584535,2021-12-31 23:00:00,tuas,WSW
584536,2021-12-31 23:00:00,western islands,ENE
584537,2021-12-31 23:00:00,western water catchment,ESE


In [73]:
winddirection_raw.groupby(["timestamp","planning area"])["reading"].value_counts().to_frame()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,reading
timestamp,planning area,reading,Unnamed: 3_level_1
2017-01-01 00:00:00,ang mo kio,NNE,1
2017-01-01 00:00:00,bukit timah,NNW,1
2017-01-01 00:00:00,changi,NNW,2
2017-01-01 00:00:00,changi bay,N,1
2017-01-01 00:00:00,hougang,NE,1
...,...,...,...
2021-12-31 23:00:00,sungei kadut,SE,1
2021-12-31 23:00:00,tuas,WSW,1
2021-12-31 23:00:00,western islands,ENE,1
2021-12-31 23:00:00,western water catchment,ESE,1


In [31]:
windspeed=windspeed_raw.groupby(["timestamp","planning area"])["reading"].mean().to_frame().reset_index()
windspeed

Unnamed: 0,timestamp,planning area,reading
0,2017-01-01 00:00:00,ang mo kio,2.00
1,2017-01-01 00:00:00,bedok,1.40
2,2017-01-01 00:00:00,bukit timah,5.70
3,2017-01-01 00:00:00,changi,1.65
4,2017-01-01 00:00:00,changi bay,2.10
...,...,...,...
577685,2021-12-31 23:00:00,sungei kadut,0.70
577686,2021-12-31 23:00:00,tuas,1.30
577687,2021-12-31 23:00:00,western islands,6.50
577688,2021-12-31 23:00:00,western water catchment,1.90


In [95]:
rainfall=rainfall_raw.groupby(["timestamp","planning area"])["reading"].value_counts().to_frame()
rainfall=rainfall.index.to_frame().reset_index(drop=True)
rainfall=rainfall.groupby(["timestamp","planning area"]).head(1)
rainfall

Unnamed: 0,timestamp,planning area,reading
0,2017-01-01 00:00:00,ang mo kio,0
1,2017-01-01 00:00:00,bedok,0
2,2017-01-01 00:00:00,boon lay,0
3,2017-01-01 00:00:00,bukit panjang,0
4,2017-01-01 00:00:00,bukit timah,0
...,...,...,...
1512492,2021-12-31 23:00:00,toa payoh,0
1512493,2021-12-31 23:00:00,tuas,0
1512494,2021-12-31 23:00:00,western water catchment,0
1512495,2021-12-31 23:00:00,woodlands,0


## Step 6: Merging DataFrames

We conduct a inner join on the columns <code>["timestamp","planning area"]</code> all the 5 DataFrames to merge everything into a single dataset.

In [97]:
weather=temperature.merge(humidity, on=["timestamp","planning area"],suffixes=('_temperature', '_humidity'))
weather=weather.merge(winddirection, on=["timestamp","planning area"])
weather=weather.merge(windspeed, on=["timestamp","planning area"],suffixes=('_winddirection', '_windspeed'))
weather=weather.rename(columns=lambda x: x.lstrip("reading_"))
weather=weather.merge(rainfall, on=["timestamp","planning area"])
weather=weather.rename(columns={"reading":"rainfall"})
weather

Unnamed: 0,timestamp,planning area,temperature,humidity,winddirection,windspeed,rainfall
0,2017-01-01 00:00:00,ang mo kio,26.10,90.5,NNE,2.00,0
1,2017-01-01 00:00:00,changi,26.15,92.9,NNW,1.65,0
2,2017-01-01 00:00:00,changi bay,26.60,87.1,N,2.10,0
3,2017-01-01 00:00:00,hougang,26.90,88.1,NE,2.20,0
4,2017-01-01 00:00:00,marina south,27.00,87.1,N,11.80,0
...,...,...,...,...,...,...,...
512439,2021-12-31 23:00:00,marina south,24.60,99.5,SE,2.10,1
512440,2021-12-31 23:00:00,sungei kadut,24.50,96.5,SE,0.70,0
512441,2021-12-31 23:00:00,tuas,25.30,83.2,WSW,1.30,0
512442,2021-12-31 23:00:00,western water catchment,24.10,92.6,ESE,1.90,0


Lastly, since we want to predict based on HISTORICAL data, we shift the data points by 1h.

In [98]:
runtimes=list(pd.date_range('2017-01-01 00:00:00',
                            '2021-12-31 23:59:59',
                            freq='60T').strftime('%Y-%m-%d %H:%M:%S'))
available_areas=list(weather["planning area"].unique())
expected_rows=len(runtimes)*len(available_areas)
expected_rows

745008

While there were 0 missing rows introduced during the scraping, the dataset actually lacks data for EVERY single timestamp for EVERY single region. Hence, we intoriduce back some rows of NA to make the dataset not have any consecutive period gaps by using a left join.

In [99]:
expanded_runtime=pd.DataFrame(runtimes,columns=["timestamp"])
expanded_runtime['key'] = 1

expanded_areas=pd.DataFrame(available_areas,columns=["planning area"])
expanded_areas['key'] = 1

expanded_df=expanded_runtime.merge(expanded_areas, how="outer",on="key").drop("key", 1)
expanded_df

Unnamed: 0,timestamp,planning area
0,2017-01-01 00:00:00,ang mo kio
1,2017-01-01 00:00:00,changi
2,2017-01-01 00:00:00,changi bay
3,2017-01-01 00:00:00,hougang
4,2017-01-01 00:00:00,marina south
...,...,...
745003,2021-12-31 23:00:00,woodlands
745004,2021-12-31 23:00:00,yishun
745005,2021-12-31 23:00:00,western water catchment
745006,2021-12-31 23:00:00,southern islands


In [100]:
weather_full=expanded_df.merge(weather,on=["timestamp","planning area"],how="left")
weather_full

Unnamed: 0,timestamp,planning area,temperature,humidity,winddirection,windspeed,rainfall
0,2017-01-01 00:00:00,ang mo kio,26.10,90.5,NNE,2.00,0.0
1,2017-01-01 00:00:00,changi,26.15,92.9,NNW,1.65,0.0
2,2017-01-01 00:00:00,changi bay,26.60,87.1,N,2.10,0.0
3,2017-01-01 00:00:00,hougang,26.90,88.1,NE,2.20,0.0
4,2017-01-01 00:00:00,marina south,27.00,87.1,N,11.80,0.0
...,...,...,...,...,...,...,...
745003,2021-12-31 23:00:00,woodlands,25.30,93.7,ENE,7.30,0.0
745004,2021-12-31 23:00:00,yishun,,,,,
745005,2021-12-31 23:00:00,western water catchment,24.10,92.6,ESE,1.90,0.0
745006,2021-12-31 23:00:00,southern islands,,,,,


<code>.shift()</code> function are used to create past weather variables and upon merging with the main dataset, we will drop the NA values.

In [101]:
weather_shifted=weather_full.iloc[:,1:].groupby(["planning area"]).shift(1)
weather_shifted=weather_shifted.add_prefix("past_")
weather_shifted

Unnamed: 0,past_temperature,past_humidity,past_winddirection,past_windspeed,past_rainfall
0,,,,,
1,,,,,
2,,,,,
3,,,,,
4,,,,,
...,...,...,...,...,...
745003,24.40,96.7,S,2.4,0.0
745004,,,,,
745005,24.35,93.9,SE,4.1,0.0
745006,,,,,


In [102]:
weather_full=pd.concat([weather_full,weather_shifted],axis=1)
weather_full=weather_full.dropna().reset_index(drop=True)
weather_full

Unnamed: 0,timestamp,planning area,temperature,humidity,winddirection,windspeed,rainfall,past_temperature,past_humidity,past_winddirection,past_windspeed,past_rainfall
0,2017-01-01 01:00:00,ang mo kio,25.9,91.4,NNE,1.5,0.0,26.10,90.5,NNE,2.00,0.0
1,2017-01-01 01:00:00,changi,26.1,93.3,NNW,1.7,0.0,26.15,92.9,NNW,1.65,0.0
2,2017-01-01 01:00:00,changi bay,26.7,87.6,N,2.4,0.0,26.60,87.1,N,2.10,0.0
3,2017-01-01 01:00:00,hougang,27.0,87.4,NE,3.1,0.0,26.90,88.1,NE,2.20,0.0
4,2017-01-01 01:00:00,marina south,27.0,87.7,N,12.1,0.0,27.00,87.1,N,11.80,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...
495564,2021-12-31 23:00:00,sungei kadut,24.5,96.5,SE,0.7,0.0,24.60,95.7,SSE,0.40,0.0
495565,2021-12-31 23:00:00,tuas,25.3,83.2,WSW,1.3,0.0,25.20,87.2,ESE,1.00,0.0
495566,2021-12-31 23:00:00,woodlands,25.3,93.7,ENE,7.3,0.0,24.40,96.7,S,2.40,0.0
495567,2021-12-31 23:00:00,western water catchment,24.1,92.6,ESE,1.9,0.0,24.35,93.9,SE,4.10,0.0


In [103]:
weather_full=weather_full.drop(columns=["temperature","humidity","winddirection","windspeed"])
weather_full=pd.concat([weather_full.iloc[:,:2],weather_full.iloc[:,3:-1],weather_full.iloc[:,2:3]],axis=1)
weather_full.to_csv("../Data/weather_data_1.csv",index=False)