# Synthetic Areas & Devices

## Model Client

In [33]:
import openai

from home_assistant_datasets.secrets import get_secret
from home_assistant_datasets import model_client

MODEL_ID = "gpt-3.5-turbo-0125"

openai = openai.OpenAI(api_key=get_secret("openai_key"))
model = model_client.ModelClient(openai, MODEL_ID)

In [21]:
AREA_DEVICES_PROMPT = f"""
You are generating synthetic data to used to train models for Home Assistant
and used to evaluate things like generating a summary, performing home automation
actions, or for generating other synthetic data.

You use your knowledge about the world to generate details about homes that
can be used for synthetic smart home automation data. For example, an apartment
may have a smart thermostat, a house may have a smart garage door opener or 
smart lock and camera, and all houses may have a smart light or weather feed
air quality, or a smart speaker or television. The needs of a home owner
may vary if they are a single person or family, or where in the world they
live. For example, a person living in a high rise may not have a backyard. It
helps to think step by step when generating the data.

Example input:
---
- name: Home1
  country_code: "US"
  location: "Suburban area in California"
  type: "Single-family house"
  amenities:
    - 3 bedrooms
    - 2 bathrooms
    - Living room, dining room, and kitchen
    - Backyard with a patio
    - Attached garage
    - Home office
- name: Modern City Apartment
  country_code: DE
  location: Urban area in Berlin
  type: Apartment
  amenities:
  - 1 bedroom
  - 1 bathroom
  - Open-concept living room and kitchen
  - Balcony with city views
  - Underground parking
  - Gym in the building

Example outpt:
---
- name: Home1
  country_code: "US"
  location: "Suburban area in California"
  type: "Single-family house"
  thoughts:
  - There are 3 bedrooms so there may be multipl pepople living in the house.
  - The house has a backyard and a patio so there may be a smart light
  - The house has a home office so there may be a smart computer and cover.
  - A home likely only has a single thermostat and it may be in an accessible place such as the kitchen
  area_devices:
    "Kitchen": ["light", "thermostat"]
    "Living Room": ["light", "speaker", "smart_tv"]
    "Office": ["light", "computer", "cover"]
    "Backyard": ["light", "camera"]
    "Garage": ["cover", "light"]
    "Dining room"": ["light"]
    "Master Bedroom": ["light", "smart_tv"]
    "Front yard": ["light"]
  other_devices: ["laptop", "iphone", "iphone 2", "router", "tesla"]
- name: Modern City Apartment
  country_code: DE
  location: Urban area in Berlin
  type: Apartment
  thoughts:
    - The apartment has 1 bedroom, so it may be occupied by a single person or a couple.
    - The balcony with city views suggests a good spot for a smart light to enjoy the views.
    - The tenant probably does not own the building and cannot install smart devices in common areas.
  desc: Apartment in urban area in Berlin, Germany
  area_devices:
    "Living Room": ["light", "smart_tv"]
    "Bedroom": ["light"]
    "Balcony": ["light"]
  other_devices: ["laptop", "smartphone", "tablet", "smartwatch"]
"""

In [39]:
from tqdm.auto import tqdm
import itertools

import yaml
import pathlib
import random

N_BATCHES = 1
BATCH_SIZE = 5
DATASET_DIR = pathlib.Path("../datasets/")
AREA_DEVICES_YAML = DATASET_DIR / "area-devices.yaml"

def batched(iterable, n):
    # batched('ABCDEFG', 3) --> ABC DEF G
    if n < 1:
        raise ValueError('n must be at least one')
    it = iter(iterable)
    while batch := tuple(itertools.islice(it, n)):
        yield batch

with open(DATASET_DIR / "homes.yaml", "r") as f:
    content = f.read()

data = yaml.safe_load(content)
homes = data["homes"]
random.shuffle(homes)

batches = list(batched(homes, BATCH_SIZE))
batches = batches[:N_BATCHES]

skipped = 0
with open(AREA_DEVICES_YAML, "w") as device_output:
    with tqdm(total=len(batches)*BATCH_SIZE) as pbar:
        for batch in batches:
            batch_yaml = yaml.dump(batch, explicit_start=True, sort_keys=False)
            response_obj = None
            for i in range(0, 3):
                response = model.complete(AREA_DEVICES_PROMPT, batch_yaml)
                if response.startswith("Output:"):
                    response_obj = response.replace("Output:", "")
                try:
                    response_obj = yaml.safe_load(response)
                    break
                except yaml.YAMLError as exc:
                    continue
            if response_obj is not None:
                device_output.write(yaml.dump(response_obj, explicit_start=True, sort_keys=False))
            pbar.set_description(f"Skipped {skipped}")
            pbar.update(BATCH_SIZE)


Skipped 0: 100%|██████████| 5/5 [00:09<00:00,  1.80s/it]


In [41]:
print(yaml.dump(response_obj, explicit_start=True, sort_keys=False))

---
- name: Loft Industrial
  thoughts:
  - The industrial loft with an open space and high ceilings may have a modern and
    minimalist design.
  - The panoramic windows offer plenty of natural light, ideal for a smart lighting
    system.
  - The rooftop terrace could be a great spot for outdoor entertainment, so there
    may be smart speakers or a sound system.
  desc: Industrial loft in Barcelona, Spain
  area_devices:
    Living Space:
    - light
    - speaker
    Kitchen:
    - light
    Bathroom:
    - light
    Terrace:
    - light
    - speaker
  other_devices:
  - laptop
  - smartphone
  - tablet
  - smartwatch
- name: Amalfi Coast Villa
  thoughts:
  - With 7 bedrooms and luxury amenities, this villa may be used for hosting events
    or as a vacation rental.
  - The infinity pool and cliffside terrace are perfect for relaxation, so there may
    be smart climate control for comfort.
  - The home cinema suggests entertainment is a priority, so there might be a smart
    T