# Water and energy use correlation notebook:

### Here we will explore whether there is a relationship between water use and electricity use in our home network.

We've already exported the electricty and water data to a file in our JupyterHub's shared filesystem which is mounted in your home directory at `~/shared/`, our methodology for extracting the data is as follows:

To find correlation between home's energy use and water use, Blucube water data from dataport (water_and_gas.blucube_water_data) and 1-minute interval energy (electricity.eg_realpower_1min) data was used. Blucube data consists of the cumulative device reading (in gallons), so the water usage has been calculated for each interval by subtracting current interval reading from previous interval. After calculating delta usage, only those time intervals with a delta greater than 0 were included in the dataset. This data has then been joined with energy data to find how much electricity was used in those same time intervals when water was used in a home.


In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import csv
import os
import sys
%matplotlib inline
print(sys.version) # prints the python version
print(sys.executable)  # prints the path to the python you're using

In [None]:
# Read processed Data. Blucube water data from dataport has been processed to calculate delta 
# water usage for each minute interval. Only those intervals have been considered where water usage > 0.
data = pd.read_csv('~/shared/elec_water_data.csv')
homes_list = data.dataid.unique() 
homes_list

In [None]:
# Loop through list of homes and find correlation between water and electricity usage and also plot the datapoints
homes_cor = []
for home in homes_list:
    data_to_process = data.loc[(data['dataid'] == home)]
    x = data_to_process["water_use"]
    y = data_to_process["elec_use"]
    correlation = round(x.corr(y),3)
    homes_cor.append(correlation)
    print(str(home) + ' -> ' + str(correlation))
    plt.scatter(x, y, edgecolors='black')
    plt.title('Correlation for home {}'.format(home))
    plt.xlabel('Water Use')
    plt.ylabel('Energy Use')
    plt.show()

In [None]:
avg_cor = Average(homes_cor)   
print("Average correlation for all homes - ", statistics.mean(homes_cor))

## Conclusion:
### From above plots and calculated average correlation we can say that water and electricity usage is *not* positively correlated.