# READ ME
## INSTRUCTIONS
1. Make a copy of this file in and put your name next to it EX: writing_data_to_yourname (This is so we can do the same thing with multiple influx databases)
2. Create your database in influx, and a bucket for the data to go to (in this mine is called API Test)
3. Scroll down to the section at the bottom named "Change this to your database" and look for the comments #Replace with your own. Change token, org, bucket, and cloud_url (In influx you need to generate the API token for your database)
4. At the top toolbar go to runtime->restart session and run all
5. You will have to allow access to your drive so it can get the CSV file
6. Find "prediction_vs_actual.csv" in this same folder and make a copy of it in your drive
6. Go down some towards the load data header where the location of the csv file is, this will have to be changed for your drive


The CSV file used in this code is a spreadsheet from one of the last groups measurements. Using the machine learning model, predictions were made for each appliance and added on. In the future, we would not have the columns named Actual/predicted_appliance name. We would only have measure the total power and then made predictions for each appliance.

## InfluxDB data handling integration
Instead of using csv as input file, we'll use the influxDB api to load in the data. Then we can use the machine learning model to disagregate it and write the new data back into influx. Reference on influxDB API here: https://www.influxdata.com/blog/time-series-forecasting-with-tensorflow-influxdb/

In [1]:
! pip install influxdb-client

Collecting influxdb-client
  Downloading influxdb_client-1.41.0-py3-none-any.whl (744 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m744.6/744.6 kB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting reactivex>=4.0.4 (from influxdb-client)
  Downloading reactivex-4.0.4-py3-none-any.whl (217 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m217.8/217.8 kB[0m [31m18.1 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: reactivex, influxdb-client
Successfully installed influxdb-client-1.41.0 reactivex-4.0.4


Library dependencies

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM, Conv1D, MaxPooling1D, Flatten
from sklearn.ensemble import RandomForestRegressor
from sklearn.tree import DecisionTreeRegressor
from influxdb_client import InfluxDBClient, Point
from influxdb_client.client.write_api import SYNCHRONOUS

load data

In [3]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [4]:
# List of CSV files to use for training
#
# READ ME
# on the left side of your screen click on the folder icon. Run the previous code cell that says mount google drive or click the folder with the drive icon on the left side.
# Navigate to find the prediction_vs_actual.csv file. Right click on it to copy the path to the file then replace the one below
csv_files = ['/content/drive/MyDrive/50_ResidentialPowerDisaggregation_SD_Fall23/1.2 Software/Ralph Test Data/BetaData.csv']  # Add more file names as needed

# Load and concatenate data from multiple CSV files
data_list = []
for csv_file in csv_files:
    data = pd.read_csv(csv_file)
    data_list.append(data)

# Concatenate data from all CSV files
data = pd.concat(data_list, ignore_index=True)

This prints the orignal data from the CSV file

In [5]:

data

Unnamed: 0,timestamp,Total,Washer,BlowerGH,Lights,BlowerBed,CompGH,CompBed,Dryer,Recs1,Recs2,WaterHeater
0,3/18/2024 22:00,164.1,0.5,4.6,25.7,1.7,21.7,5.8,0.0,7.1,5.6,0.0
1,3/18/2024 22:01,164.3,0.5,4.6,25.6,1.7,21.7,5.8,0.0,7.1,5.6,0.0
2,3/18/2024 22:02,164.1,0.6,4.6,25.8,1.8,21.7,5.8,0.0,7.0,5.5,0.0
3,3/18/2024 22:03,164.0,0.5,4.5,25.6,1.8,21.6,5.8,0.0,7.1,5.6,0.0
4,3/18/2024 22:04,164.2,0.6,4.6,25.7,1.7,21.7,5.8,0.0,7.0,5.5,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...
5771,3/22/2024 22:11,7091.1,6.4,8.9,41.7,225.5,25.2,629.4,970.1,10.1,7.3,0.0
5772,3/22/2024 22:12,1931.8,6.0,6.3,43.4,226.8,21.1,628.1,-0.1,7.3,5.8,0.0
5773,3/22/2024 22:13,1943.2,6.0,6.4,43.5,227.9,21.2,631.0,-0.1,7.3,5.8,0.0
5774,3/22/2024 22:14,7393.9,6.4,9.2,41.9,225.9,25.6,628.0,1026.7,10.3,7.5,0.0


Data has to be changed so it is within the 30 day retention period, this was already done for the csv file

# Change this to your database

In [6]:
# Provide connection details

#Replace with your own
token = "91phEq0fsxrxbWsc4r8oYS5yDXnDbaWoR6-vuK052EW0_zcpINSPpTXEg0xviVEPAQWxVv6iWmZZ2NLfKsOPtQ=="
#Replace with your own
org = "NCSU Senior Design Project 50"
#Replace with your own
bucket = "Alpha Demo Data"

#Replace with your own
# InfluxDB Cloud URL
cloud_url = "https://us-east-1-1.aws.cloud2.influxdata.com/"

# Establish InfluxDB connection
client = InfluxDBClient(url=cloud_url, token=token, org=org)

# Check if 'timestamp' and 'value' columns are present
if 'timestamp' not in data.columns:
    raise ValueError("Column 'timestamp' is required in the DataFrame.")

# Convert DataFrame to InfluxDB Points
points = data.apply(lambda row: Point("NILM")
                                     .field("Total", row["Total"])
                                     .field("Washer", row["Washer"])
                                     .field("Lights", row["Lights"])
                                     .field("Dryer", row["Dryer"])
                                     .field("Recs1", row["Recs1"])
                                     .field("Recs2", row["Recs2"])
                                     .field("WaterHeater", row["WaterHeater"])
                                     .time(row["timestamp"]), axis=1)

# Create a write API instance
write_api = client.write_api(write_options=SYNCHRONOUS)

# Write Points to InfluxDB
write_api.write(bucket=bucket, record=points)

Read Data

In [7]:
## query data
query_api = client.query_api()
tables = query_api.query('from(bucket:"Alpha Demo Data") |> range(start: -275y)')

## iterate over queried data
time, power = [], []
for table in tables:
	for row in table.records:
	    	time.append(row.values.get('_time'))
	    	power.append(row.values.get('_value'))

## create dataframe
data = pd.DataFrame({'Date':time, 'Power Data': power})