# Introduction 

In the following notebook, I will be merging 2 cleaned datasets together for an EDA located [here](https://github.com/KishenSharma6/Weather-Energy-Consumption-in-Spain/tree/master/Project%20Codes/02_Exploratory_Data_Analysis).

* Raw data can be found [here](https://github.com/KishenSharma6/Weather-Energy-Consumption-in-Spain/tree/master/Data/01_Raw_Data)
* Cleaned data can be found [here](https://github.com/KishenSharma6/Weather-Energy-Consumption-in-Spain/tree/master/Data/02_Cleaned_Data)

**Read in libraries for notebook**

In [25]:
import numpy as np
import pandas as pd

**Set notebook preferences**

In [26]:
#Set preferences for pandas 
pd.set_option("display.max_columns", 101)

**Read in data**

In [27]:
#Set path to raw data
path = r'C:\Users\kishe\Documents\Data Science\Projects\Python Projects\In Progress\Spain Hourly Energy Demand and Weather'

#Read in cleaded data
weather = pd.read_csv(path + '/Data/02_Cleaned_Data/2020_0505_Cleaned_Weather_Features.csv',
                      parse_dates=['date_time'], index_col='date_time')
energy =  pd.read_csv(path + '/Data/02_Cleaned_Data/2020_0505_Cleaned_Energy_Dataset.csv',
                      parse_dates=['date_time'], index_col='date_time')

# Preview Data

**Weather data**

In [28]:
#View data shape and head
print('Weather data shape:', weather.shape)
display(weather.head())

Weather data shape: (35064, 1)


Unnamed: 0_level_0,temp
date_time,Unnamed: 1_level_1
2015-01-01 00:00:00,30.814633
2015-01-01 01:00:00,30.85286
2015-01-01 02:00:00,30.108448
2015-01-01 03:00:00,30.091044
2015-01-01 04:00:00,30.19262


**Energy data**

In [29]:
#View data shape and head
print('Energy data shape:', energy.shape)
display(energy.head())

Energy data shape: (35064, 2)


Unnamed: 0_level_0,total_load_actual,price_actual
date_time,Unnamed: 1_level_1,Unnamed: 2_level_1
2015-01-01 00:00:00,25385.0,65.41
2015-01-01 01:00:00,24382.0,64.92
2015-01-01 02:00:00,22734.0,64.48
2015-01-01 03:00:00,21286.0,59.32
2015-01-01 04:00:00,20264.0,56.04


# Merge data

In [30]:
#Merge datasets on index
merged_df = pd.merge(energy, weather, left_index=True, right_index=True)

#Rename columns
merged_df.columns = ['load', 'price', 'temp']

#Sort merged columns alphabetically
merged_df = merged_df.reindex(sorted(merged_df.columns), axis=1)

#Check
print('Merged data frame shape: ', merged_df.shape)
display(merged_df.head())

Merged data frame shape:  (35072, 3)


Unnamed: 0_level_0,load,price,temp
date_time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2015-01-01 00:00:00,25385.0,65.41,30.814633
2015-01-01 01:00:00,24382.0,64.92,30.85286
2015-01-01 02:00:00,22734.0,64.48,30.108448
2015-01-01 03:00:00,21286.0,59.32,30.091044
2015-01-01 04:00:00,20264.0,56.04,30.19262


In [31]:
#Reset Index
merged_df.reset_index(inplace=True)

#Create columns extracting time, month, date, year data
merged_df['date'] = merged_df.date_time.dt.date
merged_df['time'] = merged_df.date_time.dt.time
merged_df['weekday'] = merged_df.date_time.dt.day_name()
merged_df['month'] = merged_df.date_time.dt.month_name()
merged_df['year'] = merged_df.date_time.dt.year

#Re-set index
merged_df.set_index('date_time', inplace = True)

#Check
display(merged_df.head())

Unnamed: 0_level_0,load,price,temp,date,time,weekday,month,year
date_time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2015-01-01 00:00:00,25385.0,65.41,30.814633,2015-01-01,00:00:00,Thursday,January,2015
2015-01-01 01:00:00,24382.0,64.92,30.85286,2015-01-01,01:00:00,Thursday,January,2015
2015-01-01 02:00:00,22734.0,64.48,30.108448,2015-01-01,02:00:00,Thursday,January,2015
2015-01-01 03:00:00,21286.0,59.32,30.091044,2015-01-01,03:00:00,Thursday,January,2015
2015-01-01 04:00:00,20264.0,56.04,30.19262,2015-01-01,04:00:00,Thursday,January,2015


# Write merged data to CSV

In [32]:
#View final shape of merged data
print('Final shape of merged data:', merged_df.shape)

#Write file
merged_df.to_csv(path + '/Data/02_Cleaned_Data/2020_0514_Weather_Energy.csv',)

Final shape of merged data: (35072, 8)
