# Developing a classifier for houses of multiple or single occupancy

In [None]:
# Load required python libraries
import sqlite3
import pandas as pd

# Attach sqlite data base
connection = sqlite3.connect("../data/data.db")
cursor = connection.cursor()

## Import data from Sqlite database to pandas data frame
Sqlite database consists of two tables; homes and motion. 

### homes

homes holds whether the home is a single or multiple occupancy house.

- id is the unique house id (hexadecimal)
- multiple_occupancy is whether house is occupied by more than one person (binary)

### motion

motion holds each motion detection event as a tuple with four entries

- id is the unique event id (hexadecimal)
- home_id is the unique house id (hexadecimal)
- datetime is the time of the motion detection event (YYYY-MM-DD HH:MM:SS+ss)
- location is the room of house that the motion was detected in (string)

In [None]:
homes = pd.read_sql_query("SELECT * FROM homes", connection)
# Change id to homes_id to be consistent and specific
homes = homes.rename(columns={"id":"home_id"})

In [None]:
motion = pd.read_sql_query("SELECT * FROM motion", connection)
# Change id to event_id to be consistent and specific
motion = motion.rename(columns={"id":"event_id"})

In [None]:
motion_homes_join = pd.merge(homes,motion, on="home_id")

## Initial data exploration

Checks to make sure there is no missing data

- **do all home_id's have entries in both tables?**     There are only 50 unique homes with motion data
- do all homes have the same rooms?
- do the events cover the same time period?
- are there similar number of events per house?

In [None]:
print("Number of unique home_ids in homes table: ", homes["home_id"].unique().shape)
print("Number of unique home_ids in motion table: ", motion["home_id"].unique().shape)
print("Number of unique home_ids in motion-homes combined: ", motion_homes_join["home_id"].unique().shape)

In [None]:
motion.groupby("home_id").count()