<a href="https://colab.research.google.com/github/bishram-acharya/hitachi_solution/blob/main/hitachi_solution.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ⚡ Hitachi Technergy: Energy Forecasting
In this notebook, we're going to predict the demand and price of energy for 7 days.

## 1. Problem Statement :
- Predict the demand of energy given its weather/environment data
- Predict the price of energy give past date data(hourly)

## 2. Data :
The data has been provided by Hitachi energy itself. There are three main datasets :
- Demand Forecasting Demand Data upto Feb 21.csv, which provides the hourly demand in MW.
- Demand Forecasting Weather Data upto Feb 28.csv, which provides hourly values of input features(independent variables).
- Price Forecasting data upto December 24.csv, which provides hourly values of energy prices

## 3. Features :
# feature dictionary here

## 4. Expected Output:
# 7 days forecast given the input features for demand and prices given past data






In [5]:
# Importing necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sklearn

## Exploratory Data Analysis

### Reading Data and simple insights

In [13]:
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

Mounted at /content/drive


In [15]:
# Importing Data with weather features
df=pd.read_csv('/content/drive//My Drive/Documents and Data for forecasting LOCUS/Demand Forecasting/Demand Forecasting Weather Data upto Feb 28.csv')
df_demand=pd.read_csv('/content/drive//My Drive/Documents and Data for forecasting LOCUS/Demand Forecasting/Demand Forecasting Demand Data upto Feb 21.csv')

In [None]:
df.head()

Unnamed: 0,Name,datetime,Temperature,feelslike,dewpoint,humidity,precipitation,precipprob,preciptype,snow,...,visibility,solarradiation,uvindex,severerisk,conditions,Unnamed: 21,Unnamed: 22,Unnamed: 23,Unnamed: 24,Unnamed: 25
0,Pokhara,1/1/2020 0:00,38.0,34.4,29.0,69.76,0.0,0,,0.0,...,9.9,0.0,0,,Overcast,,,,,
1,Pokhara,1/1/2020 1:00,38.0,34.4,29.9,72.37,0.0,0,,0.0,...,9.9,0.0,0,,Overcast,,,,,
2,Pokhara,1/1/2020 2:00,38.0,35.7,30.8,75.06,0.0,0,,0.0,...,9.9,0.0,0,,Overcast,,,,,
3,Pokhara,1/1/2020 3:00,37.1,33.4,29.9,74.97,0.0,0,,0.0,...,9.9,0.0,0,,Partially cloudy,,,,,
4,Pokhara,1/1/2020 4:00,35.1,32.3,29.0,78.41,0.0,0,,0.0,...,9.9,0.0,0,,Clear,,,,,


In [None]:
# It seems that there are some unnecessary headerless columns. Dropping headerless columns that came with data:
df.drop(columns=['Unnamed: 21', 'Unnamed: 22', 'Unnamed: 23', 'Unnamed: 24', 'Unnamed: 25'], inplace=True)

In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 27720 entries, 0 to 27719
Data columns (total 21 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Name              27720 non-null  object 
 1   datetime          27720 non-null  object 
 2   Temperature       27720 non-null  float64
 3   feelslike         27720 non-null  float64
 4   dewpoint          27720 non-null  float64
 5   humidity          27720 non-null  float64
 6   precipitation     27720 non-null  float64
 7   precipprob        27720 non-null  int64  
 8   preciptype        2751 non-null   object 
 9   snow              27702 non-null  float64
 10  snowdepth         27702 non-null  float64
 11  windgust          12223 non-null  float64
 12  windspeed         27720 non-null  float64
 13  winddirection     27720 non-null  float64
 14  sealevelpressure  27702 non-null  float64
 15  cloudcover        27720 non-null  float64
 16  visibility        27702 non-null  float6

In [None]:
# Finding out the count of null values in each feature columns
df.isna().sum()

Name                    0
datetime                0
Temperature             0
feelslike               0
dewpoint                0
humidity                0
precipitation           0
precipprob              0
preciptype          24969
snow                   18
snowdepth              18
windgust            15497
windspeed               0
winddirection           0
sealevelpressure       18
cloudcover              0
visibility             18
solarradiation          0
uvindex                 0
severerisk          17766
conditions              0
dtype: int64

### Parsing Date and time
When we work with time series data, we want to enrich the time & date component as much as possible. We can do that by telling pandas which of our columns has dates in it using the parse_dates parameter.

In [None]:
df['datetime']=pd.to_datetime(df['datetime'])

In [None]:
print(df['datetime'].dtype)

datetime64[ns]


#### Now the datetime column is of data type datetime64

### Summary Statistics

In [None]:
df.describe()

Unnamed: 0,Temperature,feelslike,dewpoint,humidity,precipitation,precipprob,snow,snowdepth,windgust,windspeed,winddirection,sealevelpressure,cloudcover,visibility,solarradiation,uvindex,severerisk
count,27720.0,27720.0,27720.0,27720.0,27720.0,27720.0,27702.0,27702.0,12223.0,27720.0,27720.0,27702.0,27720.0,27702.0,27720.0,27720.0,9954.0
mean,56.846847,55.505018,45.486089,68.485757,0.005618,6.908369,0.001205,0.044258,15.449611,6.499069,158.488175,1018.008913,39.796815,9.183842,163.639946,1.61443,13.055957
std,18.73439,21.558142,18.198644,18.189507,0.044799,25.360098,0.041213,0.377761,8.627404,4.76584,120.235734,6.702345,44.949277,1.902787,265.41778,2.662525,13.499753
min,-5.9,-30.1,-14.9,17.1,0.0,0.0,0.0,0.0,0.2,0.0,0.0,991.4,0.0,0.0,0.0,0.0,3.0
25%,42.2,38.0,29.9,55.1725,0.0,0.0,0.0,0.0,8.3,3.4,30.0,1013.9,0.0,9.9,0.0,0.0,10.0
50%,56.9,56.9,46.1,70.54,0.0,0.0,0.0,0.0,14.3,5.8,180.0,1017.5,0.0,9.9,12.0,0.0,10.0
75%,72.0,72.0,62.1,83.96,0.0,0.0,0.0,0.0,21.9,9.2,260.0,1022.3,100.0,9.9,222.0,2.0,10.0
max,99.0,110.4,79.1,100.0,2.376,100.0,4.7,5.78,57.5,38.0,360.0,1041.5,100.0,34.9,1197.0,10.0,100.0
