# Gathering Data From Home Assistant
First, the data is fetched from the home assistant REST API. 
* We are using the `logbook` API, which returns all logged actions in home assistant
* **NOTE**: Logbook only keeps the previous 10 days of actions. This is not so great since we won't have as much data as forseen.
    * It is now that much more important to implement online training here and train the model day by day

## Data Filtering 
Logbook entries contain all logs, which are not necessarily user triggered. Since the goal of this project is to create a recommender for **human triggered** actions, the data is filtered by including only the following categories of home assistant domains:
* automation
* scene
* script
* switch
* light

We could add more here in the future, this is just a start.

In [49]:
import requests
from datetime import datetime, timedelta

TOKEN = ''

def get_all_logbook_entries(timestamp=None):
    url = f'http://ha.local:8123/api/logbook/2013-01-21T02:59:06.015463+00:00?end_time={datetime.now().isoformat()}'
    headers = {
        "Authorization": f"Bearer {TOKEN}",
        "Content-Type": "application/json"
    }
    response = requests.get(url, headers=headers)
    if response.status_code == 200:
        return response.json()
    else:
        print(f"Error: {response.status_code}")
        print(response.text)
        return None

logbook_entries = get_all_logbook_entries()

In [50]:
desired_log_domains = ["automation", "scene", "script", "switch", "light" ]
logbook_entries = [entry for entry in logbook_entries if entry.get('context_domain') in desired_log_domains]
logbook_entries

[{'when': '2024-07-08T00:46:25.845410+00:00',
  'state': 'on',
  'entity_id': 'light.emilys_bed_lamp',
  'name': 'Emily’s Bed Lamp',
  'context_user_id': '628cec48df964ae681e7c5a3b7461d59',
  'context_domain': 'scene',
  'context_service': 'turn_on',
  'context_event_type': 'call_service'},
 {'when': '2024-07-08T00:46:25.895722+00:00',
  'state': 'on',
  'entity_id': 'light.adrians_bed_lamp',
  'name': 'Adrian’s Bed Lamp',
  'context_user_id': '628cec48df964ae681e7c5a3b7461d59',
  'context_domain': 'scene',
  'context_service': 'turn_on',
  'context_event_type': 'call_service'},
 {'when': '2024-07-08T01:04:42.374077+00:00',
  'state': 'off',
  'entity_id': 'light.emilys_bed_lamp',
  'name': 'Emily’s Bed Lamp',
  'context_user_id': '628cec48df964ae681e7c5a3b7461d59',
  'context_domain': 'light',
  'context_service': 'turn_on',
  'context_event_type': 'call_service'},
 {'when': '2024-07-08T01:06:50.272158+00:00',
  'state': 'off',
  'entity_id': 'light.adrians_bed_lamp',
  'name': 'Adria

# Data Transformation
We are interested in the following pieces of data from the logbook entries:
1. `when` - the time stamp of the action
2. `entity_id` - the entity being acted upon
3. `context_domain`, `context_service`, `domain` - information about what kind of action is being called

The rest of the information can be discarded for now.

## Transformations
1. Transform `when` timestamp into minutes since midnight, since the recommender only cares about time-of-day (not the date)
2. Encode `entity_id`, `context_domain`, `context_service` using a `OneHotEncoder`

In [51]:
import pandas as pd
from datetime import datetime


df = pd.DataFrame(logbook_entries)

def datetime_to_minutes_from_midnight(time_string):
    d = datetime.fromisoformat(time_string)
    return d.hour * 60 + d.minute * 1

time = df['when'].apply(lambda x: datetime_to_minutes_from_midnight(x))
time

0       46
1       46
2       64
3       66
4      737
      ... 
157    875
158    948
159    948
160    948
161    966
Name: when, Length: 162, dtype: int64

In [52]:
from sklearn.preprocessing import OneHotEncoder
import numpy as np

encoder = OneHotEncoder(sparse_output=False, handle_unknown='error')
columns_to_encode = ['entity_id', 'context_domain', 'context_service']
encoded_features = encoder.fit_transform(df[columns_to_encode])
categorical_columns = [f'{col}_{cat}' for i, col in enumerate(columns_to_encode) for cat in encoder.categories_[i]]
categorical_columns

['entity_id_light.adrians_bed_lamp',
 'entity_id_light.emilys_bed_lamp',
 'entity_id_light.kitchen_counter_lights',
 'entity_id_light.tv_backlight',
 'entity_id_scene.cooking_time',
 'entity_id_scene.cozy_bedroom',
 'entity_id_scene.cozy_house',
 'entity_id_script.everything_off',
 'entity_id_script.sleep_homehub_display',
 'entity_id_script.start_coffee_machine',
 'entity_id_switch.coffee_machine',
 'entity_id_switch.sofa_lamp',
 'entity_id_switch.tv_speakers',
 'context_domain_light',
 'context_domain_scene',
 'context_domain_script',
 'context_domain_switch',
 'context_service_turn_off',
 'context_service_turn_on']

In [53]:
one_hot_features = pd.DataFrame(encoded_features, columns=categorical_columns)
one_hot_features


Unnamed: 0,entity_id_light.adrians_bed_lamp,entity_id_light.emilys_bed_lamp,entity_id_light.kitchen_counter_lights,entity_id_light.tv_backlight,entity_id_scene.cooking_time,entity_id_scene.cozy_bedroom,entity_id_scene.cozy_house,entity_id_script.everything_off,entity_id_script.sleep_homehub_display,entity_id_script.start_coffee_machine,entity_id_switch.coffee_machine,entity_id_switch.sofa_lamp,entity_id_switch.tv_speakers,context_domain_light,context_domain_scene,context_domain_script,context_domain_switch,context_service_turn_off,context_service_turn_on
0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0
1,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0
2,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0
3,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
157,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0
158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0
159,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0
160,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0


In [54]:
df = pd.concat([time.to_frame('time'), one_hot_features], axis=1)
df.head()

Unnamed: 0,time,entity_id_light.adrians_bed_lamp,entity_id_light.emilys_bed_lamp,entity_id_light.kitchen_counter_lights,entity_id_light.tv_backlight,entity_id_scene.cooking_time,entity_id_scene.cozy_bedroom,entity_id_scene.cozy_house,entity_id_script.everything_off,entity_id_script.sleep_homehub_display,entity_id_script.start_coffee_machine,entity_id_switch.coffee_machine,entity_id_switch.sofa_lamp,entity_id_switch.tv_speakers,context_domain_light,context_domain_scene,context_domain_script,context_domain_switch,context_service_turn_off,context_service_turn_on
0,46,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0
1,46,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0
2,64,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0
3,66,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0
4,737,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0


In [55]:
import torch
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Split the dataset into features and labels
X = df[['time']].values
y = df.drop(columns=['time']).values

# Normalize the time column
scaler = StandardScaler()
X = scaler.fit_transform(X)

# Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Convert to PyTorch tensors
X_train = torch.tensor(X_train, dtype=torch.float32)
X_test = torch.tensor(X_test, dtype=torch.float32)
y_train = torch.tensor(y_train, dtype=torch.float32)
y_test = torch.tensor(y_test, dtype=torch.float32)

In [56]:
from torch.utils.data import DataLoader, TensorDataset

# Create TensorDatasets
train_dataset = TensorDataset(X_train, y_train)
test_dataset = TensorDataset(X_test, y_test)

# Create DataLoaders
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)


In [57]:
import torch.nn as nn

class TransformerModel(nn.Module):
    def __init__(self, input_dim, output_dim, num_heads, num_layers, dropout=0.1):
        super(TransformerModel, self).__init__()
        self.input_dim = input_dim
        self.output_dim = output_dim
        self.transformer = nn.Transformer(d_model=input_dim, nhead=num_heads, num_encoder_layers=num_layers, num_decoder_layers=num_layers, dropout=dropout)
        self.fc_out = nn.Linear(input_dim, output_dim)
    
    def forward(self, src):
        src = src.unsqueeze(1)  # Add a dimension for the sequence length
        out = self.transformer(src, src)
        out = self.fc_out(out.squeeze(1))  # Remove the sequence length dimension
        return out

input_dim = X_train.shape[1]
output_dim = y_train.shape[1]
model = TransformerModel(input_dim=input_dim, output_dim=output_dim, num_heads=1, num_layers=2)



In [58]:
import torch.optim as optim

# Define loss function and optimizer
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop
num_epochs = 20
model.train()
for epoch in range(num_epochs):
    for X_batch, y_batch in train_loader:
        optimizer.zero_grad()
        outputs = model(X_batch)
        loss = criterion(outputs, y_batch)
        loss.backward()
        optimizer.step()
    
    print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')


Epoch [1/20], Loss: 0.4491
Epoch [2/20], Loss: 0.6020
Epoch [3/20], Loss: 0.5942
Epoch [4/20], Loss: 0.5301
Epoch [5/20], Loss: 0.4279
Epoch [6/20], Loss: 0.4354
Epoch [7/20], Loss: 0.4996
Epoch [8/20], Loss: 0.4124
Epoch [9/20], Loss: 0.5578
Epoch [10/20], Loss: 0.4033
Epoch [11/20], Loss: 0.5458
Epoch [12/20], Loss: 0.5949
Epoch [13/20], Loss: 0.3868
Epoch [14/20], Loss: 0.4796
Epoch [15/20], Loss: 0.5219
Epoch [16/20], Loss: 0.5158
Epoch [17/20], Loss: 0.4691
Epoch [18/20], Loss: 0.5035
Epoch [19/20], Loss: 0.3644
Epoch [20/20], Loss: 0.3542


In [59]:
model.eval()
with torch.no_grad():
    total_loss = 0
    for X_batch, y_batch in test_loader:
        outputs = model(X_batch)
        loss = criterion(outputs, y_batch)
        total_loss += loss.item()
    
    avg_loss = total_loss / len(test_loader)
    print(f'Validation Loss: {avg_loss:.4f}')


Validation Loss: 0.4283


In [60]:
label_columns = df.columns[1:]  # Assuming the first column is 'time'

# Example input time
test_time = [[400]]  # Replace with the actual input time you want to test

# Preprocess the input
test_time = scaler.transform(test_time)  # Normalize the input time
test_time_tensor = torch.tensor(test_time, dtype=torch.float32)  # Convert to tensor

# Set the model to evaluation mode
model.eval()

# Disable gradient calculation for inference
with torch.no_grad():
    # Make a prediction
    output = model(test_time_tensor)
    
# Convert the output tensor to numpy array if needed
predicted_labels = output.numpy()

# Map the predicted labels to column names
predicted_labels_dict = {label_columns[i]: predicted_labels[0][i] for i in range(len(label_columns))}

print("Predicted Labels with Column Titles:")
for key, value in predicted_labels_dict.items():
    print(f"{key}: {value}")


Predicted Labels with Column Titles:
entity_id_light.adrians_bed_lamp: -0.07591970264911652
entity_id_light.emilys_bed_lamp: -0.07608328759670258
entity_id_light.kitchen_counter_lights: -0.11741209775209427
entity_id_light.tv_backlight: -0.14501360058784485
entity_id_scene.cooking_time: 0.49363160133361816
entity_id_scene.cozy_bedroom: -0.45383745431900024
entity_id_scene.cozy_house: -0.762355625629425
entity_id_script.everything_off: -0.4139837324619293
entity_id_script.sleep_homehub_display: 0.4847814738750458
entity_id_script.start_coffee_machine: -0.8151819705963135
entity_id_switch.coffee_machine: 0.5204135179519653
entity_id_switch.sofa_lamp: -0.33562326431274414
entity_id_switch.tv_speakers: 0.4947102665901184
context_domain_light: -0.165212482213974
context_domain_scene: 0.4678438901901245
context_domain_script: -0.11785119026899338
context_domain_switch: -0.6373335719108582
context_service_turn_off: -0.7553021311759949
context_service_turn_on: -0.19694578647613525
