# Dataset Field Information #

The dataset is composed of 3 primary data sources:

## Historical Consumption Data ##
This tracks energy usage across 260+ buildings

Each record contains
* obs_id: Unique identifier for each measurement

* SiteId: Building identifier that links across all datasets

* ForecastId: Identifier for forecast time series

* Timestamp: Exact time of the energy measurement

* Value: Actual energy consumption measurement

## Building Metadata ##

Contains building-specific information:
* SiteId: Building identifier

* Surface: Building's surface area

* Sampling: Time interval between measurements (in minutes)

* BaseTemperature: Reference temperature for the building

* IsDayOff: Indicates if it's a non-working day

## Historical Weather Data ##
Temperature data from nearby weather stations (within 30km)

Includes:
* SiteId: Building identifier

* Timestamp: Time of temperature measurement

* Temperature: Recorded temperature value

* Distance: How far the weather station is from the building (in km)

## Public Holidays Data ##
Holiday information that may affect energy consumption

Contains:
* SiteId: Building identifier

* Date: Holiday date

* Holiday: Name/description of the holiday

In [5]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import sklearn as sk
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from sklearn.linear_model import Ridge, LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.metrics import mean_squared_error, r2_score, confusion_matrix, roc_curve, precision_recall_curve, auc

# Reading all of the data files and storing them in a dataframe
holidayData = pd.read_csv('Datasets/power-laws-forecasting-energy-consumption-holidays.csv', delimiter=';')
metaData = pd.read_csv('Datasets/power-laws-forecasting-energy-consumption-metadata.csv', delimiter=';')
submissionForecast = pd.read_csv('Datasets/power-laws-forecasting-energy-consumption-submission-forecast-period.csv', delimiter=';')
testData = pd.read_csv('Datasets/power-laws-forecasting-energy-consumption-test-data.csv', delimiter=';')
trainData = pd.read_csv('Datasets/power-laws-forecasting-energy-consumption-training-data.csv', delimiter=';') 
weatherData = pd.read_csv('Datasets/power-laws-forecasting-energy-consumption-weather.csv', delimiter=';')

# Displaying the first 5 rows of the dataframes
print(holidayData.head())
print(metaData.head())
print(submissionForecast.head())
print(testData.head())
print(trainData.head())
print(weatherData.head())


         Date                Holiday  SiteId
0  2016-02-15  Washington's Birthday       1
1  2017-05-29           Memorial Day       1
2  2017-11-23       Thanksgiving Day       1
3  2017-12-29    New Years Eve Shift       1
4  2017-12-31          New Years Eve       1
   SiteId       Surface  Sampling  BaseTemperature  MondayIsDayOff  \
0     207   7964.873347      30.0             18.0           False   
1       7  15168.125971      30.0             18.0           False   
2      74    424.340663      15.0             18.0           False   
3     239   1164.822636      15.0             18.0           False   
4     274   1468.246690       5.0             18.0           False   

   TuesdayIsDayOff  WednesdayIsDayOff  ThursdayIsDayOff  FridayIsDayOff  \
0            False              False             False           False   
1            False              False             False           False   
2            False              False             False           False   
3        