# Exploring your Home Assistant data

The goal of this page is to get you familiar with the data in your Home Assistant instance. The page you're reading right now is a Jupyter Notebook. These documents contain instructions for the user and embedded Python code to generate graphs and tables of your data. It's interactive so you can at any time change the code of any example and just press the ▶️ button to update the example with your changes! 

To get started, let's execute all examples on this page: in the menu at the top left, click on "Run" -> "Run All Cells".

In [1]:
#!pip install HASS-data-detective # Install detective

In [2]:
!pip show HASS-data-detective

Name: HASS-data-detective
Version: 2.4
Summary: Tools for studying Home Assistant data.
Home-page: https://github.com/robmarkcole/HASS-data-detective
Author: Robin Cole
Author-email: robmarkcole@gmail.com
License: MIT
Location: /usr/local/lib/python3.9/dist-packages
Requires: pandas, pytz, ruamel.yaml, SQLAlchemy
Required-by: 


In [3]:
import detective.core as detective
import detective.functions as functions
import pandas as pd

db = detective.db_from_hass_config()

YAML tag !include_dir_merge_list is not supported
YAML tag !include_dir_merge_named is not supported
Successfully connected to database sqlite:////config/home-assistant_v2.db
There are 296 entities with data


In [None]:
from collections import Counter, OrderedDict
import json

from detective.time import time_category, sqlalch_datetime, localize, TIME_CATEGORIES

# Prepare a dictionary to track results
results = OrderedDict((time_cat, Counter()) for time_cat in TIME_CATEGORIES)

# We keep track of contexts that we processed so that we will only process
# the first service call in a context, and not subsequent calls.
context_processed = set()

for event in db.perform_query("SELECT * FROM events WHERE event_type = 'call_service' ORDER BY time_fired"):
    entity_ids = None

    # Skip if we have already processed an event that was part of this context
    if event.context_id in context_processed:
        continue

    try:
        event_data = json.loads(event.event_data)
    except ValueError:
        continue

    # Empty event data, skipping (shouldn't happen, but to be safe)
    if not event_data:
        continue

    service_data = event_data.get('service_data')

    # No service data found, skipping
    if not service_data:
        continue

    entity_ids = service_data.get('entity_id')

    # No entitiy IDs found, skip this event
    if entity_ids is None:
        continue

    if not isinstance(entity_ids, list):
        entity_ids = [entity_ids]

    context_processed.add(event.context_id)

    period = time_category(
        localize(sqlalch_datetime(event.time_fired)))

    for entity_id in entity_ids:
        results[period][entity_id] += 1

print("Most popular entities to interact with:")

RESULTS_TO_SHOW = 5

for period, period_results in results.items():
    print()
    
    entities = [
        ent_id for (ent_id, count)
        in period_results.most_common(RESULTS_TO_SHOW)
    ]
    
    result = ', '.join(entities) if entities else '-'
    print(f"{period.capitalize()}: {result}")

In [None]:
db.fetch_all_data_of(("person.charles",))

### Next up

Let's now use pandas to visualise the results.

In [5]:
df = pd.DataFrame.from_dict(results).fillna(0)
df

Unnamed: 0,morning,daytime,evening,night
light.bedroom_1,1553.0,2712.0,6082.0,1729.0
light.bedroom_2,1685.0,3786.0,5623.0,1236.0
light.bedroom_3,1563.0,2717.0,5555.0,1237.0
light.lamp,1416.0,9588.0,8881.0,1815.0
scene.all_lights_off,37.0,28.0,10.0,2.0
light.office_lamp,1031.0,6095.0,2721.0,575.0
light.sitting_room,990.0,5373.0,9578.0,1300.0
light.sitting_room_lamp,1375.0,6159.0,10034.0,1534.0
light.kitchen,1014.0,2991.0,3737.0,190.0
media_player.living_room_tv,6.0,83.0,56.0,15.0


## View states
Detective makes it easy to view your state data as a pandas dataframe.

In [None]:
%%time

df = db.fetch_all_data_of(['sensor.dishwasher_energy_power'])

Our data is now in a Pandas dataframe. Lets show the head of the dataframe:

In [None]:
df.head()

It is necessary to do some formatting of the data before we can plot it, and detective provides several functions to assist. You should familiarise yourself with these functions and create your own.

In [None]:
df = functions.generate_features(df)
df = functions.format_dataframe(df)
df = df.set_index('last_changed')

In [None]:
df.head()

Notice the new feature columns added. It is straightforward to create your own features, for example to add a day_of_week column

In [None]:
df['state'].plot()

## Plot some data
First plot using [Seaborn](https://seaborn.pydata.org/)

In [None]:
#!pip install seaborn # Uncomment to install if required

import seaborn as sns
import matplotlib.pyplot as plt

sns.set()

In [None]:
fig, ax = plt.subplots(1, figsize=(20,6))
sns.lineplot(data=df['state'])

In [None]:
# !pip install pandas-bokeh

In [None]:
from bokeh.plotting import figure
import pandas_bokeh
pandas_bokeh.output_notebook()

In [None]:
df.plot_bokeh()

In [None]:
df['mode'] = ''

In [None]:
df.loc[df['state'] < 1, 'mode'] = 'off'
df.loc[(1 <= df['state']) & (df['state'] < 10), 'mode'] = 'standby'
df.loc[(df['state'] >= 10), 'mode'] = 'running'

In [None]:
df[df['mode'] == 'off']['state'].head()

In [None]:
p = pd.DataFrame({
    'off' : df.loc[df['mode'] == 'off', 'state'],
    'running' : df.loc[df['mode'] == 'running', 'state'],
    'standby' : df.loc[df['mode'] == 'standby', 'state'],
})
p.plot_bokeh()

In [None]:
p.loc[~p['off'].isna(), 'off'] = 0
p.loc[~p['standby'].isna(), 'standby'] = 1
p.loc[~p['running'].isna(), 'running'] = 2
fig, ax = plt.subplots(1, figsize=(20,6))
sns.scatterplot(data=p)