# **Short Term Electricity Demand Forecasting using Weather Data (Delhi, 5-minute resolution)**

In [25]:
import pandas as pd

In [26]:
data = pd.read_csv("/content/powerdemand_5min_2021_to_2024_with weather.csv")

In [27]:
data.shape

(393440, 15)

In [28]:
data.columns

Index(['Unnamed: 0', 'datetime', 'Power demand', 'temp', 'dwpt', 'rhum',
       'wdir', 'wspd', 'pres', 'year', 'month', 'day', 'hour', 'minute',
       'moving_avg_3'],
      dtype='object')

In [29]:
data.dtypes

Unnamed: 0,0
Unnamed: 0,int64
datetime,object
Power demand,float64
temp,float64
dwpt,float64
rhum,float64
wdir,float64
wspd,float64
pres,float64
year,int64


In [30]:
data.head(5)

Unnamed: 0.1,Unnamed: 0,datetime,Power demand,temp,dwpt,rhum,wdir,wspd,pres,year,month,day,hour,minute,moving_avg_3
0,0,2021-01-01 00:30:00,2014.0,8.0,6.9,93.0,0.0,0.0,1017.0,2021,1,1,0,30,
1,1,2021-01-01 00:35:00,2005.63,8.0,6.9,93.0,0.0,0.0,1017.0,2021,1,1,0,35,
2,2,2021-01-01 00:40:00,1977.6,8.0,6.9,93.0,0.0,0.0,1017.0,2021,1,1,0,40,1999.076667
3,3,2021-01-01 00:45:00,1976.44,8.0,6.9,93.0,0.0,0.0,1017.0,2021,1,1,0,45,1986.556667
4,4,2021-01-01 00:50:00,1954.37,8.0,6.9,93.0,0.0,0.0,1017.0,2021,1,1,0,50,1969.47


In [31]:
data.isnull().sum()

Unnamed: 0,0
Unnamed: 0,0
datetime,0
Power demand,0
temp,0
dwpt,0
rhum,0
wdir,540
wspd,0
pres,0
year,0


Power demand columns is the target variable.
temp, dwpt, rhum, wdir, wspd, pres  are the weather features.

1.temp is Air Temperature (°C)

2.dwpt is Dew Point Temperature (°C) (indirectly relates to moisture in air)

3.rhum is Relative Humidity (%)

4.wdir is Wind Direction (degrees, 0–360° i.e N, E, S, W)

5.wspd is Wind Speed (m/s or km/h, depends on source)

6.pres is Air Pressure (hPa / millibars)

year, month, day, hour, minute are extracted time components.


We can see there are 540 null value in wdir column and that can be dropped. "unnamed" column and be dropped.


**DATA CLEANING**

In [32]:
data = data.drop(columns=["Unnamed: 0"])

**Handling missing values**

By using interpolation method the missing values are handled.There are 540 missing values in wdir and 2 in miving-avg_3 column. For wind direction column linear interpolation is used and for the moving avg column forwardfill(ffill) or backwaardfill(bfill)is used.

In [37]:
# Interpolate wind direction
data['wdir'] = data['wdir'].interpolate(method='linear')

In [43]:
data[data["moving_avg_3"].isnull()]


Unnamed: 0,datetime,Power demand,temp,dwpt,rhum,wdir,wspd,pres,year,month,day,hour,minute,moving_avg_3
0,2021-01-01 00:30:00,2014.0,8.0,6.9,93.0,0.0,0.0,1017.0,2021,1,1,0,30,
1,2021-01-01 00:35:00,2005.63,8.0,6.9,93.0,0.0,0.0,1017.0,2021,1,1,0,35,


No previous values exist at the very start.Therefore bfill is used to handle missing values in Moving_avg_3 column.

In [46]:
# Fill missing moving average
data["moving_avg_3"] = data["moving_avg_3"].bfill()


In [47]:
data.isnull().sum()

Unnamed: 0,0
datetime,0
Power demand,0
temp,0
dwpt,0
rhum,0
wdir,0
wspd,0
pres,0
year,0
month,0


In [48]:
data.head(5)

Unnamed: 0,datetime,Power demand,temp,dwpt,rhum,wdir,wspd,pres,year,month,day,hour,minute,moving_avg_3
0,2021-01-01 00:30:00,2014.0,8.0,6.9,93.0,0.0,0.0,1017.0,2021,1,1,0,30,1999.076667
1,2021-01-01 00:35:00,2005.63,8.0,6.9,93.0,0.0,0.0,1017.0,2021,1,1,0,35,1999.076667
2,2021-01-01 00:40:00,1977.6,8.0,6.9,93.0,0.0,0.0,1017.0,2021,1,1,0,40,1999.076667
3,2021-01-01 00:45:00,1976.44,8.0,6.9,93.0,0.0,0.0,1017.0,2021,1,1,0,45,1986.556667
4,2021-01-01 00:50:00,1954.37,8.0,6.9,93.0,0.0,0.0,1017.0,2021,1,1,0,50,1969.47
