# Create Dependency Wheel Data

**Author:** Anthony D. Fontanini

**Date:** May 3rd, 2018

This notebook creates the data files in the`data/` folder necessary for the dependency wheels in `dep_wheels.html`.  The script reads the header of each .tsv file in the housing characteristics folder to identify the dependencies.  The dependencies are then saved in a data frame in the form of an adjacency matrix.  The adjacency matrix identifies the dependencies of a housing characteristic along the rows of the matrix and the dependents along the columns of the matrix. The data files for the dependency wheels are then created from the adjacency matrix.  This notebook also saves the adjacency matrix. 

## Import Modules

In [7]:
from os import listdir
from os.path import isfile, join
import numpy as np
import pandas as pd
from IPython.display import display

import json

import matplotlib.pyplot as plt 

## Inputs

- `path_HCs`: Is the path to the ResStock project housing characteristic directory

In [8]:
path_HCs = '../../housing_characteristics'

## Load Housing Characteristic Names into memory

In [9]:
# Load the file names into memory
HC_files = [f for f in listdir(path_HCs) if isfile(join(path_HCs, f))]

# Ignore any hidden files (beginning with ".")
idx = []
for i in range(len(HC_files)):
    if HC_files[i][0] != ".":
        idx.append(i)
HC_files = list(np.array(HC_files)[idx])

# Remove the .tsv from the housing characteristic name
HC_names = HC_files.copy()
i = 0
for HC_str in HC_names:
    HC_names[i] = HC_str.split('.')[0]
    i += 1

## Create the Adjacency Matrix

In [10]:
# Initialize the adjacency matrix
adj_mat = np.zeros((len(HC_names),len(HC_names)))

print("Shape of the Adjacency Matrix")
print(np.shape(adj_mat))
print()

# For each housing characteristic
for i in range(len(HC_files)):

    ## Read the first line of the housing characteristic file
    with open(path_HCs + '/' + HC_files[i]) as f:
        header_str = f.readline()

    ## For each column in the tsv file
    for column_name in header_str.split('\t'):

        ### If there is a dependency
        if len(column_name) > 0:
            if column_name[0] == 'D':
                if column_name.find('Dependency='):
                    print(HC_files[i],column_name)

                #### Get the dependency name
                dependency_str = column_name.split('=')[1]

                #### Find in the housing characteristics names
                j = HC_names.index(dependency_str)

                #### Include the depenancy in the adjacency matrix
                adj_mat[i,j] = 1

# Convert to Pandas
adj_df = pd.DataFrame(adj_mat,index=HC_names,columns=HC_names).T

display(adj_df.head(5))

Shape of the Adjacency Matrix
(83, 83)



Unnamed: 0,Bathroom Spot Vent Hour,Ceiling Fan,Clothes Dryer,Clothes Washer,Cooking Range,Cooling Setpoint,Days Shifted,Dehumidifier,Dishwasher,Door Area,...,Range Spot Vent Hour,Refrigerator,Roof Material,Solar Hot Water,Usage Level,Vintage FPL,Vintage,Water Heater,Window Areas,Windows
Bathroom Spot Vent Hour,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Ceiling Fan,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Clothes Dryer,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Clothes Washer,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Cooking Range,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## Convert Adjacency Matrix Into JSON Input

In [11]:
for direction in ['forward','backward']:
    # Initialization
    composer_lock = {'packages': [{"key": 1}]}
    composer_json = dict()

    # Loop over each housing characteristic
    cnt = 0
    for name in HC_names:
        # Construct a data dict
        data = {}
        data['name'] = name

        # If forward or backward
        if direction == 'forward':
            # Look across the column
            idx = np.where(adj_df[name] == 1)[0]
        elif direction == 'backward':
            # Look across the row
            idx = np.where(adj_df.loc[name] == 1)[0]

        # If there is a dependency
        if len(idx) > 0:
            #Create the require key
            data['require'] = dict()

            # Fill the dependencies
            for i in idx:
                data['require'][adj_df.columns[i]] = 1

        # If this is the first housing characteristic
        if cnt == 0:
            if direction == 'backward':
                out_file = 'data/composer_backward.json'
            elif direction == 'forward':
                out_file = 'data/composer_forward.json'
            # Write the main file
            with open(out_file, 'w') as outfile:
                json.dump(data, outfile)
        else:
            # Add housing characteristic as json format
            composer_lock['packages'].append(data)
        cnt += 1

    # Remove the initialization
    composer_lock['packages'] = composer_lock['packages'][1:]

    # Write the lock filed
    if direction == 'backward':
        out_file = 'data/composer_backward.lock'
    elif direction == 'forward':
        out_file = 'data/composer_forward.lock'
    with open(out_file, 'w') as outfile:
        json.dump(composer_lock, outfile)

## Save the Adjacency Matrix

In [12]:
adj_df.to_csv('adjacency matrix.csv')