




# Energy Consumption Data Notebook 1: Preparing Our Data For Analysis



---

### Goals For This Notebook:

1 - Import weather and power data.<br>

2 - Do some Exploratory Data Analysis to see what information the data contains.<br>

3 - Rename, reorganize, and add columns so the data is easier to understand, including splitting timestamp data into separate date, month, and time columns.<br>

4 - Merge the data to create one dataframe.<br>

5 - Save our data into a new csv file.<br>

---

### Table of Contents

1 - [Weather Data](#section1)<br>

2 - [Power Data](#section2)<br>

3 - [Merging Data](#section3)<br>

4 - [Datetime](#section4)<br>

5 - [Saving Data](#section5)<br>

---

In this notebook, you will get to know energy consumption data through Exploratory Data Analysis. In particular, you will be looking at two datasets: 1) Weather data collected during the study and 2) Power consumption data of various appliances at the pilot site. You will clean the weather and power consumption data sets and then merge the two datasets together for further data analysis in future notebooks.

Let's first get started by importing the libraries we need:

In [0]:
import pandas as pd
import numpy as np

## 1. Weather Data <a id='section1'></a>

Import "weather_data_2021.csv" using `pd.read_csv()` and save it to the variable `weather` so that we can use it throughout the notebook. We must include the fact that the file is saved in the folder _data_, so the computer knows where to look for the csv file! We add the foldername before the filename and add a slash (/) between - e.g. 'data/results.csv'

In [0]:
# EXERCISE

weather = pd.read_csv("data/...")

Let's do some Exploratory Data Analysis! What do you think that the column "Unnamed: 0" represents? Do you think you know what any of the columns mean? How many rows and columns are there?

Remember you can always refer to notebooks 06 and 07 from the intro to Python unit.

In [0]:
#Exploratory Data Analysis 1 - Try to see the first or last couple rows

weather....()

In [0]:
#Exploratory Data Analysis 2 - How many rows do we have?

len(...)

In [0]:
#Exploratory Data Analysis 3 - What columns do we have?

weather....

In [0]:
#Exploratory Data Analysis 4 - Your choice!



Looking at the table, some of the columns are hard to understand. Lets rename some of the columns in the table. 

We can rename columns by: `dataframe.rename(columns = {"current column name": "new column name", ....})`

In the cell below, we will rename `Unnamed: 0` into `Timestamp`

Follow the same pattern to rename `oat` to `outdoor air temperature (F)` and `solar` as `solar irradiance on PV panels (watts/m^2)`

In [0]:
# EXERCISE - rename your columns!

weather = weather.rename(columns = {"Unnamed: 0": "Timestamp", 
                          "oat": "...",
                         "solar": "..."})
weather.head()

The last thing we need to do is get rid of unnecessary information. If you look at the dataframe, `solar irradiance on PV panels (Watt per m^2)` and `sr` have the same information.

Being a data scientist required you to look up code! Look up (on Google) how to delete columns in Pandas and delete the `sr` column. (There are multiple ways to do this!)

***Make sure your group shows the instructor the method you found and that you successfully removed the `sr` column***

In [0]:
# EXERCISE - Delete the sr column. Make sure to check with an instructor before moving on!

#YOUR CODE HERE

weather.head()

In the cell below, find the number of null values in each of the columns in the weather dataframe.

*Hint: Look at section 1.6 in 07 Pandas DataFrames Notebook*

In [0]:
# EXERCISE - Find the count of null values
weather....().sum()

## 2. Power Data <a id='section2'></a>

Import "power_2021.csv" and save it to the variable `power` so that we can use it throughout the notebook:

In [0]:
# EXERCISE

power = pd....("...")

Similar to weather data, do some Exploratory Data Analysis! What do you think that the column "Unnamed: 0" represents? Do you think you know what any of the columns mean? How many rows and columns are there?

In [0]:
#Exploratory Data Analysis 1 - Try to see the first or last couple rows



In [0]:
#Exploratory Data Analysis 2 - How many rows do we have?



In [0]:
#Exploratory Data Analysis 3 - What columns do we have?



In [0]:
#Exploratory Data Analysis 4 - Your choice, but try something you haven't used yet!



Looking at the table, some of the columns are hard to understand. Lets rename some of the columns in the table. Do the same thing you did for the weather dataframe. 

In the cell below, rename `Unnamed: 0` to `Timestamp`.

Rename `building` to `building total power consumption (watts)`

Rename `freezer` to `freezer power consumption (watts)`

Rename `ref_comp` to `refrigerator power consumption (watts)`

Rename `ref_fan` to `refrigerator fan power consumption (watts)`

Rename `hvac_west` to `west air conditioning power consumption (watts)`

Rename `hvac_east` to `east air conditioning power consumption (watts)`

In [0]:
# EXERCISE

#rename your columns here
power = power.rename(columns = {"Unnamed: 0": "Timestamp", 
                             "building": "building total power consumption (Watts)", 
                             "freezer": "...",
                             "ref_comp": "...",
                             "ref_fan": "...","
                             "...": "west air conditioning power consumption (Watts)", 
                             "...": "east air conditioning power consumption (Watts)"
                            })
power.head()

Similar to the weather data, find the total number of null values in the power dataframe.

In [0]:
# EXERCISE - Find the count of null data
power....

## 3. Merging Data <a id='section3'></a>

In data science, we do not usually just use one dataframe. We often have multiple datasets that we want to use to analyze data. In order to do this, we need to **merge** (put together / join) the datasets together. To merge data, we need to find a column to merge on.

The syntax for merging data tables is: 

`dataframe1.merge(dataframe2, on= "column name that the two dataframes have in common")`

What column name do both the dataframes have in common? Keep that in mind for how you merge the table.

In [0]:
# EXERCISE - Merge the two dataframes together
weather_and_power = weather.merge(..., on="...")
weather_and_power.head()

Add a column in the weather_and_power dataframe called `total power consumption (Watts)` that is the **sum** of all the power consumption columns.  

In [0]:
# EXERCISE - add your columns
weather_and_power["total power consumption (Watts)"] = (weather_and_power["building total power consumption (Watts)"]+
                                                 weather_and_power["freezer power consumption (Watts)"]+ 
                                                 weather_and_power["refrigerator power consumption (Watts)"]+
                                                 weather_and_power["..."]+
                                                 weather_and_power["..."]+
                                                 weather_and_power["..."])

***Run the cell below and double check with an instructor that your `total power consumption` column's numbers are correct.***

In [0]:
weather_and_power.head()

## 4. Datetime <a id='section4'></a>

In this section, we are going to want to look at how we can use datetime methods to clean up our datatable and make it a bit easier to understand.

In [0]:
# import the datetime module
from datetime import datetime

The values of the Timestamp column represent dates and times of when the data for each section of the building was collected. However, since the data type of the column is a `str()`, it will be harder to access information like the date and time to answer what day of the week most energy was used, what hour had the least energy consumption, etc. So first, we will need to convert it to a correct data type.


In [0]:
weather_and_power["Better Timestamp"] = pd.to_datetime(weather_and_power["Timestamp"])

In [0]:
# Run this cell to get the date, time, and month into seperate columns
weather_and_power["date"] = [d.date() for d in weather_and_power['Better Timestamp']]
weather_and_power["time"] = weather_and_power['Better Timestamp'].apply(lambda x: x.time())
weather_and_power["month"] = pd.DatetimeIndex(weather_and_power['date']).month


#converting 24-hour clock time to 12-hour time
def changeformat(time):
    if time.hour > 12:
        newtime = str(time.hour - 12) + (str(time))[2:5] + "pm"
        return newtime
    if time.hour == 12:
        newtime = (str(time))[:5] + "pm"
        return newtime
    if time.hour == 0:
        newtime = "12" + (str(time))[2:5] + "am"
        return newtime
    if type(time) != str:
        newtime = (str(time))[:5] + "am"
        return newtime
    
weather_and_power["12-hr-time"] = weather_and_power["time"].apply(changeformat)

Let's see what our columns and dataframe look like now.

In [0]:
weather_and_power.columns

In [0]:
weather_and_power.head()

## 5. Saving Data <a id='section5'></a>

Let's save our dataframe. Save it as "weather_and_power.csv"

The syntax for saving a dataframe is `datatable.to_csv("name you want")`. As we want to save it in our _data_ folder, we will add that before the name of the file.

We will use this cleaned dataset in our next notebook.

In [0]:
# EXERCISE

weather_and_power.to_csv("...")

Notebook developed by: Rachel McCarty, Kseniya Usovich, Alisa Bettale