<p style="font-size: 48px; text-align: center; color:#009dff;">
              Transition from E+ 24.1.0 to E+ 24.2.0</p>

<hr/>

<p style='font-size: 24px; text-align: center;'>Foreword</p>

<div style='font-size:16px;'><strong>This notebook has two big parts:</strong>

<ul>
  <li><strong>Part 1:</strong> aims to transition all regression tests from one E+ version to the next, and allows you to run each test in both the old and the new version
    <ul>
      <li>These will take quite some time to run (about 1hr to run the tests in the OLD OpenStudio version, transition the IDFs to the new E+ version and run them, and run the tests in the NEW OpenStudio Version, based on almost 200 files currently)</li>
      <li>By default it will just copy over the SQL from the regression test to place in OLD_DIR, but if you want to force rerun the IDF in the old E+ version you can.</li>
      <li>At the end of Part 1, you will have three CSV files, one per version, with the site KBTUs for each test. And you also have an organized tree of VERSION/TEST_NAME/ output directories that have the SQL files we will use for sections 6+.</li>
    </ul>  
  </li>
  <li><strong>Part 2:</strong> aims to analyze the differences between versions
    <ul>
      <li>Section 3.1 just re-queries all SQL file (or you can reload the three CSV files) to highlight the tests with the biggest site KBTU differences</li>
      <li>Section 3.2 provides a high-level interface that only requires to pass a test name and it will query the relevant SQL files and produce visualization (tables, grouped bar charts, and heatmaps) to analyze where differences may be coming from</li>
    </ul>
    If you have already run Part 1 successfully once, you only need to run Section 1. and you can jump to Part 2 directly.
  </li>
</ul>
     
</div> 

----

**Note**

* It might be a good idea to monitor your system after each big tasks to ensure you don't have processes that are still hanging. It happened to me for intersection test for eg.

In [None]:
# Python 2.x / 3.x compatibility
from __future__ import division, print_function

%matplotlib inline

#Import modules
import pandas as pd
import numpy as np
import os
import json
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
import re
import tempfile

#import csv
import glob as gb

from pathlib import Path
from typing import Optional

import datetime
import sqlite3

import shutil
import multiprocessing
import subprocess

#import tqdm
from tqdm.notebook import trange, tqdm

from ipywidgets import HTML
from IPython.display import display

import shlex

# from df2gspread import df2gspread as d2g

mpl.rcParams['figure.figsize'] = (16, 9)
pd.options.display.max_rows = 200

# Args

These should match your actual installation.

In [None]:
IDF_DIR = Path('.').absolute()
IDF_DIR

In [None]:
# Path to the OpenStudio-resources/testruns/ directory
TESTRUNS_DIR = Path('../testruns').resolve()
OS_RES_DIR = Path('..').resolve()

EPLUS_OLD_VERSION = '24.1.0'
OS_OLD_VERSION = '3.8.0'

EPLUS_NEW_VERSION = '24.2.0'
OS_NEW_VERSION = '3.9.0' # This is really an alpha, with 24.2.0 upgrade...

# Careful: We'll cd into the TRANSITION_CLI_DIR to run it, so it must be a writtable location...
TRANSITION_CLI_DIR = Path('~/Software/Others/EnergyPlus-build-release-develop/Products').expanduser()
# TRANSITION_CLI_DIR = Path('~/Software/Others/OS-build-release/EnergyPlus-24.1.0-9d7789a3ac-Linux-Ubuntu20.04-x86_64/PreProcess/IDFVersionUpdater').expanduser()

# Force a given number of parallel process 
# (defaults to nproc - 2, leaving one physical core free if you have hyperthreading)
N = None
WEATHER_FILE= (IDF_DIR / '../weatherdata/USA_IL_Chicago-OHare.Intl.AP.725300_TMY3.epw').resolve()

# Path to EnergyPlus application & idd 
OLD_EPLUS_EXE = Path('/usr/local/EnergyPlus-24-1-0/energyplus')

# NEW_EPLUS_EXE = os.path.join(TRANSITION_CLI_DIR, 'energyplus-9.2.0')
NEW_EPLUS_EXE = Path('/usr/local/EnergyPlus-24-2-0/energyplus')
# NEW_EPLUS_EXE = '/home/julien/Software/Others/EnergyPlus-build-release-develop/Products/energyplus'
# NEW_EPLUS_EXE = '/home/julien/Software/Others/OS-build2/EnergyPlus-8.9.0-1c5ba897d1-Linux-x86_64/EnergyPlus-8-9-0/energyplus-8.9.0'
# NEW_EPLUS_EXE = '/home/julien/Downloads/Temp/EnergyPlus-23.1.0-ff86a13c18-Linux-Ubuntu20.04-x86_64/energyplus'

# Path to OpenStudio CLIs
OLD_OS_CLI = Path('/usr/local/openstudio-3.8.0/bin/openstudio')
NEW_OS_CLI = Path('~/Software/Others/OS-build-release/Products/openstudio').expanduser()
# OLD_IDD_FILE = '/usr/local/EnergyPlus-8-8-0/Energy+.idd'
# NEW_IDD_FILE = os.path.join(TRANSITION_DIR, 'Energy+.idd')

# Put None if you want to run all tests
REGRESSION_TEST_FILTER = '(kiva)|(baseline)'
REGRESSION_TEST_FILTER = None

In [None]:
assert TRANSITION_CLI_DIR.is_dir()
assert WEATHER_FILE.is_file()

assert OLD_EPLUS_EXE.exists()
assert OLD_EPLUS_EXE.exists()
assert OLD_OS_CLI.exists()
assert NEW_OS_CLI.exists()

## Setup

In [None]:
ep_old_v_dashed = EPLUS_OLD_VERSION.replace('.', '-')
ep_new_v_dashed = EPLUS_NEW_VERSION.replace('.', '-')

In [None]:
# For the translation to work,
# you'll have to chdir to the Transition CLI's folder
TRANSITION_CLI = TRANSITION_CLI_DIR / f"Transition-V{ep_old_v_dashed}-to-V{ep_new_v_dashed}"
assert TRANSITION_CLI.is_file()

OLD_IDD_FILE = TRANSITION_CLI_DIR / f"V{ep_old_v_dashed}-Energy+.idd"
NEW_IDD_FILE = TRANSITION_CLI_DIR / f"V{ep_new_v_dashed}-Energy+.idd"
assert OLD_IDD_FILE.is_file()
assert NEW_IDD_FILE.is_file()

TRANSITION_REPORT_VAR_PATH = TRANSITION_CLI_DIR / f"Report Variables {ep_old_v_dashed} to {ep_new_v_dashed}.csv"
if TRANSITION_REPORT_VAR_PATH.is_file():
    print(f"Found {TRANSITION_REPORT_VAR_PATH}")

# TODO: Temp override
# TRANSITION_CLI = Path('/home/julien/Software/Others/EnergyPlus-build-release-develop/Products/Transition-V9-5-0-to-V9-6-0')
TRANSITION_CLI

In [None]:
if False:
    import psutil
    physical_cpus = psutil.cpu_count(logical=False)
    multiprocessing.cpu_count() * (physical_cpus - 1) / physical_cpus

In [None]:
# Number of parallel processes
if not N:
    N = multiprocessing.cpu_count() - 2
    print("Defaulting number of processes to {}".format(N))

In [None]:
# Create directories
OLD_OS_DIR = IDF_DIR / f"{OS_OLD_VERSION}-{EPLUS_OLD_VERSION}"

NEW_OS_DIR = IDF_DIR / f"{OS_NEW_VERSION}-{EPLUS_NEW_VERSION}"

TRANSITION_DIR = IDF_DIR / f"Transition-{EPLUS_NEW_VERSION}"

for p in [OLD_OS_DIR, NEW_OS_DIR, TRANSITION_DIR]:
    if not p.is_dir():
        p.mkdir(parents=True)
        print(f"Creating directory: {p}")

In [None]:
# Create dicts to store all info
OLD_OS_INFO = {'OS_VERSION': OS_OLD_VERSION,
               'EPLUS_VERSION': EPLUS_OLD_VERSION,
               'DIR': OLD_OS_DIR}

NEW_OS_INFO = {'OS_VERSION': OS_NEW_VERSION,
               'EPLUS_VERSION': EPLUS_NEW_VERSION,
               'DIR': NEW_OS_DIR}

TRANSITION_INFO = {'OS_VERSION': 'Transition',
                   'EPLUS_VERSION': EPLUS_NEW_VERSION,
                   'DIR': TRANSITION_DIR}

In [None]:
def stringify_cmd(cmd) -> list[str]:
    """Takes a list of arguments that may be Path or str or int and turns into a List[str].
    So that is it suitable to pass to subprocess.
    """
    return list(map(str, cmd))

# PART 1: Transition all regression tests and run them in both E+ versions

**Part 1 is ommited in the HTML because the interesting part is the output and analysis at the end**

## Run in Previous OpenStudio Version based on old E+

This will go in the `TESTRUNS_DIR ` (`OpenStudio-resources/testruns`) directory and find all IDF files and copy them to the `IDF_DIR` directory (typically the directory in which this notebook resides)

<span style="font-size: 18px; color: red;">It goes without saying: you need to have already run all simulation tests with the last OpenStudio version that is based on the old E+ version before running this section.</span>

### Run the tests in the old version

You can also just do that manually... but if you do, please delete the testruns/ folder beforehand just in case.

In [None]:
# Delete all testruns to ensure we don't end up grabbing the idf and sql from another version
if TESTRUNS_DIR.is_dir():
    shutil.rmtree(TESTRUNS_DIR)

In [None]:
# Pass 'CUSTOMTAG=' if you don't want a tag, or 'CUSTOMTAG=sha' for the build sha,
# or any custom string such as 'CUSTOMTAG=Ubuntu_run1'
CUSTOMTAG = 'SHA'
CUSTOMTAG = ''

if REGRESSION_TEST_FILTER is None:
    filt = ''
else:
    filt = "-n /{}/".format(REGRESSION_TEST_FILTER)

command = "env CUSTOMTAG={c} USE_EPLUS_SPACES=true {cli} {m} {filt}".format(c=CUSTOMTAG,
                                                      m=os.path.join(OS_RES_DIR,
                                                                     'model_tests.rb'),
                                                      cli=OLD_OS_CLI,
                                                      filt=filt)
print(command)

## Copy the Previous IDFs

These end up directly in IDF_DIR. They will get copied to the `OLD_OS_DIR` during the Transition

In [None]:
x = len([x for x in gb.iglob(os.path.join(TESTRUNS_DIR, '**/*/in.idf'))])
x_n = !grep -c "^ *def test_" ../model_tests.rb
x_n = int(x_n[0])
"{}/{} {:.2%}".format(x, x_n, x/x_n)

In [None]:
# Cleanup directory
all_files = gb.glob(os.path.join(IDF_DIR, '*.idf'))
all_files += gb.glob(os.path.join(IDF_DIR, '*.idfnew'))
all_files += gb.glob(os.path.join(IDF_DIR, '*.idfold'))
all_files += gb.glob(os.path.join(IDF_DIR, '*.VCpErr'))

for f in all_files:
    os.remove(f)

In [None]:
found_idfs = []
for f in TESTRUNS_DIR.glob('**/*/in.idf'):
    f2 = f.relative_to(TESTRUNS_DIR)
    
    test_name = f2.parts[0]
    #print(test_name)
    dst_path = IDF_DIR / f"{test_name}.idf"
    shutil.copyfile(f, dst_path)
    found_idfs.append(test_name)
found_idfs = set(found_idfs)
len(found_idfs)

## Copy all existing SQL files

In [None]:
for f in TESTRUNS_DIR.glob("**/*/*.sql"):
    f2 = f.relative_to(TESTRUNS_DIR)
    
    test_name = f2.parts[0]
    # print(test_name)
    dst_folder = OLD_OS_DIR / test_name
    if not dst_folder.is_dir():
        dst_folder.mkdir(parents=False, exist_ok=False)
        
    dst_path = dst_folder / "eplusout.sql"
    # print(dst_path)
    shutil.copyfile(f, dst_path)
    found_sqls.append(test_name)
found_sqls = set(found_sqls)

In [None]:
len(found_idfs), len(found_sqls)

In [None]:
found_idfs - found_sqls

```
# Rename directories and files
for fn in os.listdir(NEW_DIR):
    os.rename(fn, fn.replace('_8.9.0', ''))
```

## Transition all files

In [None]:
def _prepare_temp_dir_for_transition() -> Path:
    temp_dir = Path(tempfile.mkdtemp(prefix='transition-'))
    shutil.copy(OLD_IDD_FILE, temp_dir)
    shutil.copy(NEW_IDD_FILE, temp_dir)
    if TRANSITION_REPORT_VAR_PATH.is_file():
        shutil.copy(TRANSITION_REPORT_VAR_PATH, temp_dir)
    return temp_dir

def translate_file_parallelizable(idf_path: Path):
    assert idf_path.is_file(), f"Could not find {idf_path}"
    
    temp_dir = _prepare_temp_dir_for_transition()
    cmd = stringify_cmd([TRANSITION_CLI, idf_path])
    r = subprocess.run(
        cmd,
        capture_output=True,
        shell=False,
        encoding='utf-8',
        cwd=temp_dir,
    )
    
    if r.returncode == 0:        
        # Move the resulting IDF into the new dir
        new_file = idf_path.with_suffix('.idfnew')
        assert new_file.is_file(), f"Could not find {new_file}"
        new_dest = TRANSITION_DIR / idf_path.name
        shutil.move(new_file, new_dest)

        # Move the old version into its directory
        old_file = idf_path.with_suffix('.idfold')
        assert old_file.is_file(), f"Could not find {old_file}"
        old_dest = OLD_OS_DIR / idf_path.name
        shutil.move(old_file, old_dest)

        # Delete original file
        idf_path.unlink()

        # print('Done for {}.idf - {}'.format(eplus_file, path))
    else:
        print(f"Error for {idf_path}:")
        print(r.stdout)
        print(r.stderr)

In [None]:
def translate_file_regular_serially(path):
    """
    Runs the file throught the transition utility and save in the right folder
    Will move the ori file to the subdirectory OLD_DIR (eg `./8.8.0/`)
    and the transitionned one to NEW_DIR (eg: `./8.9.0/`)
    """
    
    eplus_file, ext = os.path.splitext(os.path.split(path)[1])
    
    cmd = stringify_cmd([TRANSITION_CLI, path])
    process = subprocess.Popen(cmd,
                               shell=False,
                               stdout=subprocess.PIPE,
                               stderr=subprocess.PIPE)
    # wait for the process to terminate
    out, err = process.communicate()
    errcode = process.returncode
    if errcode == 0:        
        # Move the resulting IDF into the new dir
        new_file = os.path.join(IDF_DIR, "{f}.idfnew".format(f=eplus_file))
        new_dest = os.path.join(TRANSITION_DIR, "{f}.idf".format(f=eplus_file))
        shutil.move(new_file, new_dest)
        
        # Move the old version into its directory
        old_file = os.path.join(IDF_DIR, "{f}.idfold".format(f=eplus_file))
        old_dest = os.path.join(OLD_OS_DIR, "{f}.idf".format(f=eplus_file))
        shutil.move(old_file, old_dest)
        
        # Delete original file
        ori_file = os.path.join(IDF_DIR, "{f}.idf".format(f=eplus_file))
        os.remove(ori_file)
        
        # print('Done for {}.idf - {}'.format(eplus_file, path))
    else:
        print("Error for {}".format(path))
        print(out)
        print(err)

In [None]:
files = list(IDF_DIR.glob('*.idf'))
len(files)

In [None]:
# for x in problems.index[problems[next(x for x in problems.columns if x[1] == 'Transition')].isna()]:
#     shutil.copy(f"{OLD_OS_DIR}/{x}.idf", f"{IDF_DIR}/{x}.idf")

**Transitionning in parallel seems to fail, so do it serially...**

**Edit**: I made it work by using a temporary directory

In [None]:
if True:
    # Takes about 10minutes on my machine with 12 threads allocated
    pool = multiprocessing.Pool(processes=N)

    desc = f'<h3>Translation from {EPLUS_OLD_VERSION} to {EPLUS_NEW_VERSION}</h3>'
    label = HTML(desc)
    display(label)
    for _ in tqdm(pool.imap_unordered(translate_file_parallelizable, files), total=len(files)):
        pass

In [None]:
if False:
    os.chdir(TRANSITION_CLI_DIR)

    desc = '<h3>Translation from {} to {}</h3>'.format(EPLUS_OLD_VERSION,
                                                       EPLUS_NEW_VERSION)
    label = HTML(desc)
    for file in tqdm(files):
        translate_file_regular_serially(file)

    os.chdir(IDF_DIR)

In [None]:
# At this point, you shouldn't have any .idf files in the IDF_DIR directory
# If you do, means that the transition failed
all_files = []
for e in ['idf', 'idfnew', 'idfold']:
    all_files += list(IDF_DIR.glob(f'*.{e}'))

all_files

## Analyze VCpErr files

For 8.8.0 to 8.9.0 I had very few VCpErr files, but 8.9.0 to 9.9.0 I have a bunch.
A preliminary look shows that it could be just related to the renaming/replacing of `AirTerminal:SingleDuct:Uncontrolled` to `AirTerminal:SingleDuct:ConstantVolume:NoReheat`
but I'd like to make sure

In [None]:
err_files = gb.glob('*.VCpErr')

In [None]:
len(err_files)

In [None]:
all_err_lines = []
all_dfs = []
for err_file in err_files:
    with open(err_file, 'r') as f:
        lines = f.read().splitlines()
        all_err_lines += lines
        all_dfs.append(pd.DataFrame([[err_file.replace('.VCpErr', '')]*len(lines), lines],
             index=['File', 'Line']).T)

In [None]:
df_err = pd.concat(all_dfs)

In [None]:
# Strip out Line numbers
df_err['Line'] = df_err['Line'].str.replace('line~\d+', '', regex=True)

In [None]:
# Exclude common messages that aren't actually informative
common_things_to_exclude = [
    'Conversion Completed Successfully',
    'Program Version,Conversion',
    'entered with less than minimum number of fields'
]

In [None]:
len(df_err)

In [None]:
df_err = df_err[~df_err['Line'].apply(lambda x: any(thing in x for thing in common_things_to_exclude))]

In [None]:
len(df_err)

Ok, so indeed, most of the warnings come are due to ATU Single Duct Uncontrolled 

In [None]:
df_err['Line'].value_counts().to_frame().style

```
# Strip out the ATU warning
df_err = df_err[~df_err['Line'].str.contains('AirTerminal:SingleDuct')]
len(df_err)
```

In [None]:
# Most problematic files
df_err.groupby('File').count().sort_values(by='Line', ascending=False)

A lot of the warnings are like this one:
   > ** Severe ** Out of range value Numeric Field#10 (Low Speed Standard Design Capacity), value=0.00000, range={>0.0}, in EvaporativeFluidCooler:TwoSpeed=Evaporative Fluid Cooler Two Speed 3
   
In fact, the value for that specific field is `Autosize`
  

In [None]:
df_err.groupby('File')['Line'].value_counts().to_frame()

The real problems are potentially
* asymmetric_interior_constructions.osm: Really this seems only about names that are way too long
* multiple_airloops.rb
* multiple_airloops.osm
* multiple_loops_w_plenums.rb

In [None]:
df_err.set_index('File').loc[[
    #'asymmetric_interior_constructions.osm',
    'multiple_airloops.rb',
    'multiple_airloops.osm',
    'multiple_loops_w_plenums.rb']].reset_index().style

## Run Simulation in E+

In [None]:
GJ_TO_KBTU= 947.8171203133173

SQL_QUERY_TOTAL_SITE_KBTU = "SELECT Value FROM tabulardatawithstrings WHERE \
                              ReportName='AnnualBuildingUtilityPerformanceSummary' AND \
                              ReportForString='Entire Facility' AND \
                              TableName='Site and Source Energy' AND \
                              RowName='Total Site Energy' AND \
                              ColumnName='Total Energy' AND \
                              Units='GJ'"

SQL_QUERY_SIM_INFO = 'SELECT EnergyPlusVersion FROM Simulations'

VERSION_REGEX = re.compile(r'Version (?P<Major>\d+)\.(?P<Minor>\d+)\.'
                           r'(?P<Patch>\d+)-(?P<SHA>\w+),\s+'
                           r'YMD=(?P<datestring>[0-9\.: ]+)')


# Remove all files in the output directory except these
KEEP_EXT = ['.err', '.sql']

In [None]:
def parse_sql_version_and_sitekbtu(output_directory: Path) -> Optional[pd.Series]:
    """
    This function grabs the EnergyPlusVersion and the total site energy
    from the SQL file.
    
    Args:
    -----
    * output_directory (str): the path were the SQL should be.
        eg: `./8.8.0/absorption_chillers.osm_8.8.0/`
    
    Returns:
    ---------
    * pd.Series that has the version with SHA and site kbtu
        (or None if it didn't run), which name is the test_name
        (gotten from the name of the output_directory)
    """
    sql_files = list(Path(output_directory).absolute().glob("*.sql"))

    version_with_sha = None
    site_kbtu = None

    if len(sql_files) == 1:
        sql_path = sql_files[0]
        sql_uri = f"{sql_path.as_uri()}?mode=ro"
        with sqlite3.connect(sql_uri, uri=True) as con:
                cursor = con.cursor()
                r = cursor.execute(SQL_QUERY_SIM_INFO).fetchone()
                if r:
                    simulation_info = r[0]
                    m = VERSION_REGEX.search(simulation_info)
                    if m:
                        gpdict = m.groupdict()
                        version_with_sha = "{}.{}.{}-{}".format(gpdict['Major'],
                                                                gpdict['Minor'],
                                                                gpdict['Patch'],
                                                                gpdict['SHA'])
                else:
                    msg = ("Cannot find the EnergyPlusVersion in the SQL file. "
                           "For:\n{}".format(output_directory))
                    #raise ValueError(msg)
                    print(msg)

                # Get Site kBTU
                r = cursor.execute(SQL_QUERY_TOTAL_SITE_KBTU).fetchone()
                if r:
                    site_gj = float(r[0])
                    site_kbtu = site_gj * GJ_TO_KBTU
                    msg = ("Cannot find the Total Site Energy in the SQL file. "
                           "For:\n{}".format(output_directory))
    return pd.Series([version_with_sha, site_kbtu],
                     index=['E+', 'SiteKBTU'],
                     name = os.path.split(output_directory)[1])

In [None]:
def run_eplus_sim(
    eplus_file: Path, ep_cli: Path,
    expand_objects: bool = False,
    verbose: bool = False
) -> pd.Series:
    eplus_file = Path(eplus_file).absolute()
    assert eplus_file.is_file(), f"Could not find {eplus_file}"
    
    output_directory = eplus_file.parent / eplus_file.stem
    # If directory exists, delete it
    if output_directory.is_dir():
        shutil.rmtree(output_directory)
        
    # Recreate it
    output_directory.mkdir(parents=True, exist_ok=False)
    
    cmd = [
        ep_cli
    ]
    if expand_objects:
        cmd.append('-x')
    cmd += [
        # '-i', OLD_IDD_FILE,
        '-w', WEATHER_FILE,
        '-d', output_directory,
       eplus_file
    ]
    cmd = stringify_cmd(cmd)
    if verbose:
        print(" ".join(cmd))
    r = subprocess.run(
        cmd,
        capture_output=True,
        shell=False,
        encoding='utf-8',
    )
    
    if r.returncode == 0:         
        # Clean up output directory
        [
            x.unlink() for x in output_directory.glob('*') 
            if x.suffix not in KEEP_EXT
        ]
        return parse_sql_version_and_sitekbtu(output_directory)
    else:
        print(f"ERROR with code {r.returncode}: {eplus_file}")
        # print(r.stdout)
        # print(r.stderr)
        return pd.Series([None, None],
                     index=['E+', 'SiteKBTU'],
                     name = os.path.split(output_directory)[1])

def run_OLD_eplus_sim(eplus_file: Path, expand_objects: bool = False, verbose: bool = False):
    """
    Runs the simulation with OLD_EPLUS_EXE and calls parse_sql
    """
    return run_eplus_sim(eplus_file=eplus_file, ep_cli=OLD_EPLUS_EXE,
                         expand_objects=expand_objects, verbose=verbose)

    
def run_NEW_eplus_sim(eplus_file: Path, expand_objects: bool = False, verbose: bool = False) -> pd.Series:
    """
    Runs the simulation with NEW_EPLUS_EXE and calls parse_sql
    """
    return run_eplus_sim(eplus_file=eplus_file, ep_cli=NEW_EPLUS_EXE,
                         expand_objects=expand_objects, verbose=verbose)

In [None]:
# Test one sim
if False:    
    eplus_file = next(TRANSITION_DIR.glob('*.idf'))
    run_NEW_eplus_sim(eplus_file)

### Run all files in Old EnergyPlus version (or just load SQL)

#### Rerun in old E+

This should take about 10-15 minutes depending on your machine.

It is currently disabled (as RawNBConvert) because we should have already copied the needed SQL files from the old OpenStudio version that is based on the old EnergyPlus Version. Switch this cell to "Code" if you do want to rerun with your old installed E+ version

In [None]:
if False:
    files = list(OLD_DIR.glob("*.idf"))

    pool = multiprocessing.Pool(processes=N)

    desc = '<h3>Running files for version {}</h3>'.format(OLD_VERSION)
    label = HTML(desc)
    display(label)
    all_results = []
    for result in tqdm(pool.imap_unordered(run_OLD_eplus_sim, files), total=len(files)):
        all_results.append(result)

```python
# Concat in dataframe and save to CSV
old_results = pd.concat(all_results, axis=1).T
old_results.to_csv(os.path.join(IDF_DIR, 'kbtus_8.8.0.csv'))
```

#### Just parse copied SQLs

In [None]:
old_results = pd.concat([parse_sql_version_and_sitekbtu(os.path.join(OLD_OS_DIR, x)) 
                         for x in os.listdir(OLD_OS_DIR)
                         if os.path.isdir(os.path.join(OLD_OS_DIR, x))],
                        axis=1).T

old_results['OS'] = OS_OLD_VERSION

old_results.to_csv(os.path.join(IDF_DIR, 'kbtus_{o}-{e}.csv'.format(e=EPLUS_OLD_VERSION,
                                                                    o=OS_OLD_VERSION)))

In [None]:
old_results.shape

In [None]:
old_results.head()

#### Copy IDF

In case we need to diff or something

In [None]:
found_idfs = []
for f in gb.iglob(os.path.join(TESTRUNS_DIR, '**/*/in.idf')):
    f2 = os.path.relpath(f, TESTRUNS_DIR)
    
    test_name = os.path.split(os.path.split(f2)[0])[0]
    #print(test_name)
    dst_path = os.path.join(OLD_OS_DIR, "{}.idf".format(test_name))
    shutil.copyfile(f, dst_path)
    found_idfs.append(test_name)
found_idfs = set(found_idfs)

In [None]:
found_idfs = [os.path.basename(x) for x in files]

In [None]:
len(found_idfs)

### Run all transitioned files with new EnergyPlus version
This should take about 10-15 minutes depending on your machine.

In [None]:
files = gb.glob(os.path.join(TRANSITION_DIR, '*.idf'))
print("{} Files in total".format(len(files) ))
print("{} processes".format(N))
files[:N]

In [None]:
# About 15-20minutes with 12 threads for all tests
# New: as of 2.7.0, this takes almost 1.5 hour (with a debug build of E+)!
# With 9.2.0 built locally in release, back to 10min
# files = gb.glob(os.path.join(TRANSITION_DIR, '*.idf'))

pool = multiprocessing.Pool(processes=N)

desc = '<h3>Running Transitioned files in E+ {}</h3>'.format(EPLUS_NEW_VERSION)
label = HTML(desc)
display(label)
all_results = []
for result in tqdm(pool.imap_unordered(run_NEW_eplus_sim, files), total=len(files)):
    all_results.append(result)

**NOTE: in 2.7.0 zone_hvac.rb has been hanging for 2 hr 05 min, I'm killing it**

In [None]:
# If you ever send a Keyboard Interrupt signal to kill the above cell
# (via Kernel > Interrupt), you need to run this too
pool.terminate()
pool.close()

In [None]:
# Concat in dataframe and save to CSV
transitioned_results = pd.concat(all_results, axis=1).T
transitioned_results['OS'] = 'Transition'
transitioned_results.to_csv(os.path.join(IDF_DIR, 'kbtus_Transition-{e}.csv'.format(e=EPLUS_NEW_VERSION)))

In [None]:
transitioned_results.head()

In [None]:
len(old_results.index), len(transitioned_results.index)

In [None]:
set(old_results.index) - set(transitioned_results.index)

#### Why is it so damn Slow?!

Check section 3.1

## Run Simulation in new OpenStudio based on new EnergyPlus

### Run New OS Version regression tests

In [None]:
# Delete all testruns to ensure we don't end up grabbing the idf and sql from another version
# model_tests.rb only cleans out testruns/testXXX directories for tests we do request
# So if you use a regression test filter, you could have left overs
if TESTRUNS_DIR.is_dir():
    shutil.rmtree(TESTRUNS_DIR)

In [None]:
# Pass 'CUSTOMTAG=' if you don't want a tag, or 'CUSTOMTAG=sha' for the build sha,
# or any custom string such as 'CUSTOMTAG=Ubuntu_run1'
CUSTOMTAG='SHA'

# REGRESSION_TEST_FILTER = 'surface_properties_lwr'
# REGRESSION_TEST_FILTER = 'afn_single_zone_nv'
if REGRESSION_TEST_FILTER is None:
    filt = ''
else:
    filt = "-n /{}/".format(REGRESSION_TEST_FILTER)

command = "env CUSTOMTAG={c} USE_EPLUS_SPACES=true {cli} {m} {filt}".format(c=CUSTOMTAG,
                                                      m=os.path.join(OS_RES_DIR,
                                                                     'model_tests.rb'),
                                                      cli=NEW_OS_CLI,
                                                      filt=filt)
print(command)
c_args = shlex.split(command)
c_args

In [None]:
!/home/julien/Software/Others/OS-build-release/Products/openstudio -e 'puts "#{OpenStudio::energyPlusVersion}-#{OpenStudio::energyPlusBuildSHA}"'

In [None]:
process = subprocess.Popen(c_args, shell=False,
                           stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
# wait for the process to terminate
#out, err = process.communicate()
#errcode = process.returncode
lines = []
for line in iter(process.stdout.readline, b''):
    print(line.rstrip().decode())
    lines.append(line)
process.stdout.close()
process.wait()
    
#os.chdir(IDF_DIR)

### Copy IDF

Copy to the `NEW_OS_DIR` directly

In [None]:
TESTRUNS_DIR, NEW_OS_DIR

In [None]:
found_idfs = []
for f in TESTRUNS_DIR.glob('**/*/in.idf'):
    f2 = f.relative_to(TESTRUNS_DIR)
    
    test_name = f2.parts[0]
    #print(test_name)
    dst_path = NEW_OS_DIR / f"{test_name}.idf"
    shutil.copyfile(f, dst_path)
    found_idfs.append(test_name)
found_idfs = set(found_idfs)
len(found_idfs)

In [None]:
found_idfs

### Copy SQL

In [None]:
found_sqls = []

for f in TESTRUNS_DIR.glob("**/*/*.sql"):
    f2 = f.relative_to(TESTRUNS_DIR)
    
    test_name = f2.parts[0]
    # print(test_name)
    dst_folder = NEW_OS_DIR / test_name
    if not dst_folder.is_dir():
        dst_folder.mkdir(parents=False, exist_ok=False)
        
    dst_path = dst_folder / "eplusout.sql"
    # print(dst_path)
    shutil.copyfile(f, dst_path)
    found_sqls.append(test_name)
found_sqls = set(found_sqls)

In [None]:
len(found_idfs), len(found_sqls)

In [None]:
found_idfs - found_sqls

### Parse new SQLs

In [None]:
new_results = pd.concat([parse_sql_version_and_sitekbtu(os.path.join(NEW_OS_DIR, x)) 
                         for x in os.listdir(NEW_OS_DIR)
                         if os.path.isdir(os.path.join(NEW_OS_DIR, x))],
                        axis=1).T

new_results['OS'] = OS_NEW_VERSION

new_results.to_csv(os.path.join(IDF_DIR, 'kbtus_{o}-{e}.csv'.format(e=EPLUS_NEW_VERSION,
                                                                    o=OS_NEW_VERSION)))

In [None]:
new_results.head()

# PART 2: Analyzing differences

## Analyzing E+ runtime

**THIS NO LONGER WORKS, last version where it worked was 9.2.0... I don't know where the runtime went in the SQLFile**

In [None]:
SQL_QUERY_RUNTIME = """SELECT ErrorMessage FROM Errors
                       WHERE ErrorMessage LIKE '%Elapsed%'
"""

TIME_REGEX = re.compile(r'.* Elapsed Time=(?P<hours>\d+)hr *(?P<minutes>\d+)min'
                        r' *(?P<seconds>[\d\.]+)sec')

def parse_sql_runtime(output_directory: Path):
    """
    This function grabs the E+ runtime from the SQL file.
    
    Args:
    -----
    * output_directory (str): the path were the SQL should be.
        eg: `./8.8.0/absorption_chillers.osm_8.8.0/`
    
    Returns:
    ---------
    * runtime (datetime.time)
    """
    output_directory = Path(output_directory).absolute()
    sql_files = list(output_directory.glob("*.sql"))
    
    runtime = None
    if len(sql_files) == 1:
        sql_path = sql_files[0]
        sql_uri = f'{sql_path.as_uri()}?mode=ro'
        with sqlite3.connect(sql_uri, uri=True) as con:
                cursor = con.cursor()
                r = cursor.execute(SQL_QUERY_RUNTIME).fetchone()
                if r:
                    time_info = r[0]
                    m = TIME_REGEX.search(time_info)
                    if m:
                        gpdict = m.groupdict()
                        # runtime = datetime.time(int(gpdict['hours']),
                        #                         int(gpdict['minutes']),
                        #                         int(float(gpdict['seconds'])))
                        runtime = datetime.timedelta(hours=float(gpdict['hours']),
                                                     minutes=float(gpdict['minutes']),
                                                     seconds=float(gpdict['seconds']))
                    else:
                        msg = ("REGEX Failed for "
                               "For:\n{}".format(output_directory))
                        raise msg
                else:
                    msg = ("Cannot find the Runtime in the SQL file. "
                           "For:\n{}".format(output_directory))
                    #raise ValueError(msg)
                    print(msg)
                    
    return [output_directory.name, runtime]

In [None]:
runtimes_transition = [parse_sql_runtime(os.path.join(TRANSITION_DIR, x)) 
                        for x in next(os.walk(TRANSITION_DIR))[1]]
df_time_transition = pd.DataFrame(runtimes_transition, columns=['test', 'runtime']).set_index('test')
# Store as fractional minutes
df_time_transition['Transition'] = df_time_transition['runtime'].dt.total_seconds() / 60
s_time_transition = df_time_transition['Transition']

runtimes_old = [parse_sql_runtime(os.path.join(OLD_OS_DIR, x)) 
                        for x in next(os.walk(OLD_OS_DIR))[1]]
df_time_old = pd.DataFrame(runtimes_old, columns=['test', 'runtime']).set_index('test')
# Store as fractional minutes
df_time_old['OLD_OS'] = df_time_old['runtime'].dt.total_seconds() / 60
s_time_old = df_time_old['OLD_OS']

runtimes_new = [parse_sql_runtime(os.path.join(NEW_OS_DIR, x)) 
                        for x in next(os.walk(NEW_OS_DIR))[1]]
df_time_new = pd.DataFrame(runtimes_new, columns=['test', 'runtime']).set_index('test')
# Store as fractional minutes
df_time_new['NEW_OS'] = df_time_new['runtime'].dt.total_seconds() / 60
s_time_new = df_time_new['NEW_OS']

In [None]:
df_time = pd.concat([s_time_old, s_time_transition, s_time_new], axis=1, sort=True)

In [None]:
df_time

In [None]:
df_time['Abs Diff Min (OLD_OS - Transition)'] = df_time['Transition'] - df_time['OLD_OS']
df_time['Abs Diff Min (OLD_OS - NEW_OS)'] = df_time['NEW_OS'] - df_time['OLD_OS']

df_time.sort_values('Abs Diff Min (OLD_OS - NEW_OS)', ascending=False, inplace=True)


In [None]:
df_time['% Diff Min (OLD_OS - NEW_OS)'] = 100*df_time['Abs Diff Min (OLD_OS - NEW_OS)'] / df_time['OLD_OS']
df_time['% Diff Min (OLD_OS - Transition)'] = 100*df_time['Abs Diff Min (OLD_OS - Transition)'] / df_time['OLD_OS']

In [None]:
df_time.describe().style

In [None]:
compare_from = 'OLD_OS'
compare_to = 'NEW_OS'

In [None]:
df_slower = df_time[df_time[compare_to] > df_time[compare_to].quantile(.75)]

ax = df_slower[[compare_from, compare_to]].sort_values(compare_to, ascending=True).plot(kind='barh', figsize=(9,18))
ax.set_xlim(0)

ax.axvline(df_time[compare_to].mean(), c='r', linestyle='--', label='Transition - Mean of All tests')
ax.axvline(df_time[compare_from].mean(), c='orange', linestyle='--', label='{} - Mean of All tests'.format(compare_from))

ax.set_xlabel('Minutes to run')
ax.legend()

## Reparse SQLs

In [None]:
# Alternatively, we could just reparse the SQLs...

old_results = pd.concat([parse_sql_version_and_sitekbtu(os.path.join(OLD_OS_DIR, x)) 
                         for x in next(os.walk(OLD_OS_DIR))[1]],
                        axis=1).T
old_results['OS'] = OS_OLD_VERSION


transitioned_results = pd.concat([parse_sql_version_and_sitekbtu(os.path.join(TRANSITION_DIR, x)) 
                        for x in next(os.walk(TRANSITION_DIR))[1]],
                        axis=1).T
transitioned_results['OS'] = 'Transition'


new_results = pd.concat([parse_sql_version_and_sitekbtu(os.path.join(NEW_OS_DIR, x)) 
                         for x in next(os.walk(NEW_OS_DIR))[1]],
                        axis=1).T
new_results['OS'] = OS_NEW_VERSION

In [None]:
len(old_results.index), len(transitioned_results.index), len(new_results.index)

In [None]:
set(old_results.index) - set(transitioned_results.index)

In [None]:
set(old_results.index) - set(new_results.index)

In [None]:
set(new_results.index) - set(old_results.index) 

In [None]:
old_results.to_csv(os.path.join(IDF_DIR,
                                'kbtus_{o}-{e}.csv'.format(e=EPLUS_OLD_VERSION,
                                                           o=OS_OLD_VERSION)))

transitioned_results.to_csv(os.path.join(IDF_DIR,
                                         'kbtus_Transition-{e}.csv'.format(e=EPLUS_NEW_VERSION)))

new_results.to_csv(os.path.join(IDF_DIR,
                                'kbtus_{o}-{e}.csv'.format(e=EPLUS_NEW_VERSION,
                                                           o=OS_NEW_VERSION)))

## Reload site kbtu csvs

In [None]:
os.chdir(IDF_DIR)

## Concat frames and look at files that failed to run

In [None]:
df = pd.concat([old_results.dropna(how='all'),
                transitioned_results.dropna(how='all'),
                new_results.dropna(how='all')])

df = df.set_index(['E+', 'OS'], append=True).unstack([1,2])['SiteKBTU']

In [None]:
# Throw those that failed throughout
df = df[~df.isnull().all(axis=1)]
df = df.loc[:, ~df.isnull().all(axis=0)]

In [None]:
# Problems
problems = df[df.isnull().any(axis=1)].copy()

In [None]:
# Strip new stuff
problems = problems[~problems.iloc[:, :-1].isna().all(axis=1)]

In [None]:
def background_colors(val):
    fmt = ''
    s = 'background-color: {}'
    if pd.isnull(val):
        fmt = s.format('#F4C7C3')
    return fmt
print("These are the files were we have some (but not all) failures")
problems.style.map(background_colors)

## Rerun transition files that need expand objects (can't do it in parallel)

In [None]:
files = [f'{TRANSITION_DIR}/{x}.idf' for x
         in problems.index[problems[next(x for x in problems.columns if x[1] == 'Transition')].isna()]]
len(files)

In [None]:
# ERROR: Could not find input data dictionary: /path/to/OpenStudio-resources/update_eplus_compare/Energy+.idd.
shutil.copy(OLD_EPLUS_EXE.parent / 'Energy+.idd', IDF_DIR)

In [None]:
# for f in tqdm(files):
#     run_NEW_eplus_sim(f, expand_objects=True)
    
# Reparse the SQLs above again

In [None]:
# Throwing out these problems for further analysis (they need to be investigated manually)
df = df.dropna(how='any', axis=0)

## First look at where we have deviations

In [None]:
pct_threshold = 0.001
print("Setting % diff threshold to {:.3%}".format(pct_threshold))

In [None]:
# df = df.drop(index='vrf_watercooled.rb')
# df = df.drop(index='autosize_hvac.rb')

In [None]:
df = df[~df.index.str.startswith('sql_')]

### Deviations in Transition and/or new OpenStudio

In [None]:
df_diff = df.pct_change(axis=1).iloc[:, 1:].dropna()
df_diff = df_diff[(df_diff.abs() >= pct_threshold).any(axis=1)]
#df_diff = df_diff.sort_values(by=df_diff.columns[-1],
#                              ascending=True)
# Sort by max absolute diff
df_diff = df_diff.loc[df_diff.abs().max(axis=1).sort_values(ascending=True).index]

In [None]:
fig, (ax0, ax1) = plt.subplots(ncols=2, nrows=1, sharex=True, sharey=True,
                               figsize=(16, len(df_diff)/2))
df_diff.iloc[:,0].plot(kind='barh', ax=ax0)
df_diff.iloc[:,1].plot(kind='barh', ax=ax1)

vals = ax0.get_xticks()
ax0.set_xticklabels(['{:3.2f}%'.format(x*100) for x in vals])

ax0.set_title("% difference between {}\nand {}".format(df.columns[0], df.columns[1]), pad=30)
ax0.set_xlabel("% difference")

ax1.set_title("% difference between {}\nand {}".format(df.columns[1], df.columns[2]), pad=30)
ax1.set_xlabel("% difference")

sns.despine()

ax0.annotate("% diff threshold set to {:.3%}".format(pct_threshold),
            xy=(0,0), xycoords='axes fraction',
            ha='left', va='top',
            xytext=(0, -40), textcoords='offset points', 
            fontsize=14, fontweight='bold',)

#ax0p = ax0.twiny()
#vals = ax0.get_xticks()
#ax0p.set_xticklabels(['{:3.2f}%'.format(x*100) for x in vals])

#ax1p = ax1.twiny()
#vals = ax1.get_xticks()
#ax1.get_xticks
#ax1p.set_xticklabels(['{:3.2f}%'.format(x*100) for x in vals])
#ax1p.set_xticks(vals)

ax0.axvline(0, color='gray')
ax1.axvline(0, color='gray')
#ax0.xaxis.set_ticks_position('both')
#ax0.xaxis.set_ticks_position('both')
ax0.tick_params(labelbottom=True, labeltop=True,
                bottom=True, top=True)
ax1.tick_params(labelbottom=True, labeltop=True,
                bottom=True, top=True)
plt.show()

In [None]:
s_eplus = df_diff.iloc[:, 0]
s_eplus.index = pd.MultiIndex.from_tuples(s_eplus[s_eplus.abs() >= pct_threshold].index.str.split('.').tolist())

In [None]:
s_eplus = s_eplus.swaplevel(0, 1).loc['osm']
s_eplus.name = 'Diff from 24.1.0 to 24.2.0'

In [None]:
s_eplus.to_frame().style.format("{:.2%}")

In [None]:
large_eplus_diffs = s_eplus.index.tolist()
large_eplus_diffs

In [None]:
temp_dir = IDF_DIR / 'large_eplus_diffs'
temp_dir.mkdir(parents=True, exist_ok=True)
for f in large_eplus_diffs:
    old_idf = OLD_OS_DIR / f"{f}.osm.idf"
    assert old_idf.is_file()
    trans_idf = TRANSITION_DIR / f"{f}.osm.idf"
    assert trans_idf.is_file()
    shutil.copyfile(old_idf, temp_dir / f"{f}.osm_{EPLUS_OLD_VERSION.replace('.', '_')}.idf")
    copied_new_idf = temp_dir / f"{f}.osm_{EPLUS_NEW_VERSION.replace('.', '_')}.idf"
    shutil.copyfile(trans_idf, copied_new_idf)
    
    subprocess.check_output([
        '/home/julien/Software/Others/OS-build-release/Products/openstudio',
        '-e',
        f"w = OpenStudio::Workspace.load('{copied_new_idf}').get(); w.save('{copied_new_idf}', true)"
    ])
    

In [None]:
EPLUS_NEW_VERSION.replace('.', '_')

In [None]:
temp_dir

### Deviations from Transition to new OS only

In [None]:
new_os_diff = df_diff[df_diff.iloc[:,-1].abs() >= pct_threshold]
print(f"Deviations from Transition to new OS only: Differences above {pct_threshold:.3%}")
new_os_diff.style.format('{:.3%}')

In [None]:
new_os_diff.index.tolist()

In [None]:
if new_os_diff.empty:
    print("No new diffs due to OS")
else:
    fig, (ax0, ax1) = plt.subplots(ncols=2, nrows=1, sharex=True, sharey=True,
                                   figsize=(16, len(new_os_diff)/2))
    new_os_diff.iloc[:,0].plot(kind='barh', ax=ax0)
    new_os_diff.iloc[:,1].plot(kind='barh', ax=ax1)

    vals = ax0.get_xticks()
    ax0.set_xticklabels(['{:3.2f}%'.format(x*100) for x in vals])

    ax0.set_title("% difference between {}\nand {}".format(df.columns[0], df.columns[1]))
    ax0.set_xlabel("% difference")

    ax1.set_title("% difference between {}\nand {}".format(df.columns[1], df.columns[2]))
    ax1.set_xlabel("% difference")

    sns.despine()

    plt.show()

## Grouped bar chart of differences compared to the old OpenStudio

**How to read this chart**:

The percentage differences are calculated compared to the Old OpenStudio results for both the transitioned results and the new OpenStudio results.

**What you need to pay special attention to is when you don't have the same difference between the Transition to Old OS and the New OS to old OS** (meaning the difference is not E+'s fault, but OpenStudio's).

In [None]:
df.drop(labels='autosize_hvac.rb', errors='ignore', inplace=True, axis=0)

In [None]:
df2 = df.copy()
df2.columns = df2.columns.droplevel(0)

df_diff_from_old_os = df2.iloc[:,1:].subtract(df2.iloc[:,0], axis=0).divide(df2.iloc[:,0], axis=0)

# Keep only over threshold
df_diff_from_old_os = df_diff_from_old_os[(df_diff_from_old_os.abs() >= pct_threshold).any(axis=1)]

# Sort by max absolute diff
df_diff_from_old_os = df_diff_from_old_os.loc[df_diff_from_old_os.abs().max(axis=1).sort_values(ascending=True).index]

fig, ax = plt.subplots(figsize=(16, len(df_diff_from_old_os)/2))
df_diff_from_old_os.plot(kind='barh', stacked=False, ax=ax)

vals = ax.get_xticks()
ax.set_xticklabels(['{:3.2f}%'.format(x*100) for x in vals])

ax.set_title("% difference compared to  {}".format(df.columns[0]))
ax.set_xlabel("% difference")

sns.despine()

plt.show()

# Sort the other way round for the table
(df_diff_from_old_os
    .loc[df_diff_from_old_os.abs().max(axis=1)
                            .sort_values(ascending=False).index]
    .style.format('{:.2%}'))

#html = df_diff_from_old_os.style.format('{:.2%}').render()
#display(HTML(html))

## Stacked bar chart of differences

In [None]:
def find_ymax_bylabel(label, ax, y_is_top=False):
    """
    Given a label, find the y for that bar, and the max x based on two stacked
    bars
    """
    for i, x in enumerate(ax.get_yticklabels()):
        if x.get_text() == label:
            # y = x.get_position()[1]
            
            # Find the max x between the two rects
            rect1 =  ax.patches[i]
            rect_1_xmax = rect1.get_x() + rect1.get_width()
            
            if y_is_top:
                y = rect1.get_y() + rect1.get_height()
            else:
                y = rect1.get_y() + rect1.get_height() / 2.0
                y = x.get_position()[1]
            
            rect2 = ax.patches[int(i+(len(ax.patches) / 2))]
            rect_2_xmax = rect2.get_x() + rect2.get_width()
            
            return y, max(rect_1_xmax, rect_2_xmax)
    return None, None

def plot_stacked_bar_difference_compared_to_base(toplot):
    fig, ax = plt.subplots(figsize=(16, len(toplot)/2))

    # Total % change from old os to new os
    s_tot_change = toplot.sum(axis=1)

    toplot.plot(kind='barh', stacked=True, ax=ax)

    vals = ax.get_xticks()
    ax.set_xticklabels(['{:3.2f}%'.format(x*100) for x in vals])

    ax.set_title("% difference compared to  {}".format(df.columns[0]))
    ax.set_xlabel("% difference")

    # for i, rect in enumerate(ax.patches):
    #     label = ax.get_yticklabels()[int(i % (len(ax.patches) / 2))].get_text()
    #     tot_change = s_tot_change[label]
    #     ax.annotate("{}-{}-{:.3%}".format(i, label, tot_change), 
    #                 xy=(rect.get_x()+rect.get_width(), rect.get_y()))

    # Need to draw first otherwise we can't get the position
    ax.figure.canvas.draw()



    # Add TOTAL % change (sum of both)           
    for label, val in s_tot_change.items():
        ymid, xmax = find_ymax_bylabel(label=label, ax=ax, y_is_top=False)
        ytop, xmax = find_ymax_bylabel(label=label, ax=ax, y_is_top=True)
        if xmax is not None:
            ax.annotate("{:.3%}".format(val),
                        xy=(val, ytop), xycoords='data',
                        ha='center', va='bottom', color='k', fontsize=8,
                        xytext=(0, 4), textcoords='offset points') 
            ax.plot(val, ytop, marker='v', c='#494949', alpha=1, zorder=3)

    # Custom annotations
    #label = 'baseline_sys07.rb'
    #y, xmax = find_ymax_bylabel(label=label, ax=ax, y_is_top=False)
    #if y is not None:
    #    ax.annotate("Slight OpenStudio deviation here", xy=(xmax, y), xycoords='data',
    #                ha='left', va='center',
    #                xytext=(20, 0), textcoords='offset points',
    #                arrowprops=dict(arrowstyle="->",
    #                            connectionstyle="arc3"))

    #label = 'centralheatpumpsystem.rb'
    #y, xmax = find_ymax_bylabel(label=label, ax=ax, y_is_top=False)
    #if y is not None:
    #    ax.annotate("This ruby test is unstable, period", xy=(xmax, y), xycoords='data',
    #                ha='left', va='center',
    #                xytext=(20, 0), textcoords='offset points',
    #                arrowprops=dict(arrowstyle="->",
    #                            connectionstyle="arc3")) 

    sns.despine()



    plt.show()

**How to read this chart**:

The percentage differences are calculated compared to the Old OpenStudio results for both the transitioned results and the new OpenStudio results. Then I do

    % diff New version = % diff New version - % diff Transition

and plot that as a stacked bar chart.
The goal is to more clearly see the differences that are due to OpenStudio by removing the differences due to the new E+.

**What you need to pay special attention to is when you see % differences for the new OS.**

**A cursor along with the total % difference between Old OS and New OS is also plotted**.
Please see the below example to get a better sense of how the graph is constructed

In [None]:
df_diff_from_old_example = pd.DataFrame([[0.003, 0.0005],
                                         [-0.0004, +0.0003],
                                         [0.0002, 0.0004]],
                                        index=['test1.rb', 'test2.osm', 'test3.rb'],
                                        columns=[TRANSITION_INFO['OS_VERSION'],
                                                "{}-{}".format(NEW_OS_INFO['EPLUS_VERSION'],
                                                               NEW_OS_INFO['OS_VERSION'])]
                                       )
plot_stacked_bar_difference_compared_to_base(df_diff_from_old_example)

**Let's do the actual plotting now:**

In [None]:
df_diff_from_old_os_as_pct = df_diff_from_old_os.copy()
df_diff_from_old_os_as_pct.loc[:,OS_NEW_VERSION] = df_diff_from_old_os_as_pct.loc[:,OS_NEW_VERSION] - df_diff_from_old_os_as_pct.loc[:,'Transition']

plot_stacked_bar_difference_compared_to_base(df_diff_from_old_os_as_pct)

## Inspect biggest differences

In [None]:
def get_annual_energy_by_fuel_and_enduse(sql_path, version_info):
    """
    Queries SQL file and returns the ABUPS' End Uses table

    Parameters
    ----------
    sql_path (str): path to the sql file

    Returns
    -------
    df_annual: pd.DataFrame
        Annual End Use table
        index = 'EndUse'
        columns = ['FuelType','Units']
    """

    abs_sql_path = Path(sql_path).absolute()
    sql_uri = '{}?mode=ro'.format(abs_sql_path.as_uri())
    
    # RowName = '#{end_use}'
    # ColumnName='#{fuel_type}'
    annual_end_use_query = """SELECT RowName, ColumnName, Units, Value
        FROM TabularDataWithStrings
        WHERE ReportName='AnnualBuildingUtilityPerformanceSummary'
        AND ReportForString='Entire Facility'
        AND TableName='End Uses'
    """

    with sqlite3.connect(sql_uri, uri=True) as con:
        df_annual = pd.read_sql(annual_end_use_query, con=con)

    # Convert Value to Float
    df_annual['Value'] = pd.to_numeric(df_annual['Value'])

    df_annual = df_annual.set_index(['RowName',
                                     'ColumnName',
                                     'Units'])['Value'].unstack([1, 2])
    df_annual.index.name = 'EndUse'
    df_annual.columns.names = ['FuelType', 'Units']
    
    end_use_order = ['Heating', 'Cooling',
                     'Interior Lighting', 'Exterior Lighting',
                     'Interior Equipment', 'Exterior Equipment',
                     'Fans', 'Pumps', 'Heat Rejection', 'Humidification',
                     'Heat Recovery', 'Water Systems',
                     'Refrigeration', 'Generators']

    col_order = ['Electricity', 'Natural Gas', 'Additional Fuel',
                 'District Cooling', 'District Heating', 'Water']

    # 23.2.0: District Heating was split in "District Heating Water" and "District Heating Steam"
    if tuple([int(x) for x in version_info['EPLUS_VERSION'].split('.')]) >= (23, 2, 0):
        df_annual[('District Heating', 'GJ')] = df_annual[[
            ('District Heating Water', 'GJ'), ('District Heating Steam', 'GJ')
        ]].sum(axis=1)
    
    df_annual = df_annual[[x for x in col_order if x in df_annual.columns.get_level_values(0)]].loc[end_use_order]

    return df_annual

In [None]:
def find_sql_file(test, version_info):
    """
    Find the sql file given a test name and the version_info dict
    
    Args:
    ------
    * test (str): the test name
    * version_info (dict): should have at least one 'DIR' key with the path
    to the directory
    """
    search_path = os.path.join(version_info['DIR'], test, "*.sql")

    sql_files = gb.glob(search_path)

    if len(sql_files) == 0:
        return None
    elif len(sql_files) > 1:
        print("Found more than one sql file for {t} in "
              "{p}".format(t=test, p=search_path))
    return sql_files[0]

In [None]:
def parse_single_end_use(test, version_info, add_eplus_version=True):
    """
    Helper to load the end use by fuel
    
    Args:
    ------
    * test (str): the test name
    * version_info (dict): should have at least one 'DIR' key with the path
    to the directory
    * add_eplus_version (bool): Whether to construct the name as EP-OS or
    just OS. Pass True if both OLD_OS and NEW_OS are the same OS version
    """
    sql_path = find_sql_file(test, version_info=version_info)
    if add_eplus_version:
        idx = "{}-{}".format(version_info['EPLUS_VERSION'], version_info['OS_VERSION'])
    else:
        idx = version_info['OS_VERSION']
        
    if sql_path is None:
        print("Cannot find the sql file for test '{}' and version "
              "{}".format(test, idx))
        return None

    end_use = get_annual_energy_by_fuel_and_enduse(sql_path=sql_path, version_info=version_info)
    end_use.columns = pd.MultiIndex.from_tuples([(idx,) + x for x in end_use.columns],
                                                names = ['Version'] + end_use.columns.names)
    return end_use

In [None]:
def parse_before_after_enduse(test,
                              old_os=True, transition=True, new_os=True,
                              add_eplus_version=True):
    """
    Given a test name, will parse both the old and the new SQL file to return
    a table that has, for both versions, the end use by fuel values
    
    Args:
    ------
    * test (str): the name of the test. eg 'foundation_kiva.osm'
    * old_os, transition, new_os (bool): whether to include these versions
    Note that it relies on the respective global dictionaries
    OLD_OS_INFO, TRANSITION_INFO, NEW_OS_INFO
    * add_eplus_version (bool): Whether to construct the name as EP-OS or
    just OS. Pass True if both OLD_OS and NEW_OS are the same OS version
    
    Returns:
    --------
    * df_all_end_use (pd.DataFrame): a multiindex dataframe of end uses by fuel
        index: ['EndUse'] ('Heating', 'Cooling', etc)
        columns = ['Version', 'FuelType', 'Units']

    """
    concat_list = []
    
    if (old_os + transition + new_os) < 2:
        print("You should request at least 2 versions to compare them...")
        return False
    
    if old_os:
        concat_list.append(parse_single_end_use(test, OLD_OS_INFO,
                                                add_eplus_version=add_eplus_version))
    if transition:
        concat_list.append(parse_single_end_use(test, TRANSITION_INFO,
                                                add_eplus_version=add_eplus_version))
    if new_os:
        concat_list.append(parse_single_end_use(test, NEW_OS_INFO,
                                                add_eplus_version=add_eplus_version))

        
    df_all_end_use = pd.concat([x for x in concat_list if x is not None],
                               axis=1)

    return df_all_end_use

In [None]:
def plot_end_use_diff(df_all_end_use, test,
                      add_legend=True, fontsize=None,
                      outer_i=None, fig=None,
                      add_eplus_version=True
                      ):
    """
    Plots the difference in end use by fuel between the new and old versions
    Will have subplots by units (water versus energy), a subplot is only shown
    if there is consumption in the said end use (eg if no Water, there will 
    be only one subplot)
    Displays a grouped bar chart by end use, and annotates % difference in the
    given end use
    
    Args:
    ------
    * both_end_use (pd.DataFrame): dataframe from `parse_before_after_enduse`
    
    * outer_i and fig: if you want to customize the layout yourself. 
        Pass None otherwise (default)
        
    * old_info, new_info (dict): dict with information such as 'DIR', 
    'EPLUS_VERSION' and 'OS_VERSION'
    
    * add_eplus_version (bool): Whether to construct the name as EP-OS or
    just OS. Pass True if both OLD_OS and NEW_OS are the same OS version
    
    Returns:
    --------
    * None, displays a plot

    """
    
    if fontsize is None:
        fontsize = 10
    
    diff = (df_all_end_use.groupby(level=['Version', 'Units'], axis=1).sum()
                          .replace(0, np.nan)
                          .dropna(how='all', axis=0)
                          .dropna(how='all', axis=1))

    if add_eplus_version:
        l_order = ["{}-{}".format(OLD_OS_INFO['EPLUS_VERSION'],
                                  OLD_OS_INFO['OS_VERSION']),
                   
                   "{}-{}".format(TRANSITION_INFO['EPLUS_VERSION'],
                                  TRANSITION_INFO['OS_VERSION']),
                   
                   "{}-{}".format(NEW_OS_INFO['EPLUS_VERSION'],
                                  NEW_OS_INFO['OS_VERSION']),
                  ]
    else:
        l_order = [OLD_OS_INFO['OS_VERSION'],
                   TRANSITION_INFO['OS_VERSION'],
                   NEW_OS_INFO['OS_VERSION']]
    # Reorder properly
    diff = diff[[x for x in l_order if x in diff.columns]]
    
    grouped = diff.groupby(level='Units', axis=1)

    ncols = min(len(grouped), 2)
    nrows = int(np.ceil(grouped.ngroups/ncols))

    # If you don't supply outer_i, we take care of everything
    if outer_i is None:
        fig, axes = plt.subplots(nrows=nrows, ncols=ncols, figsize=(8*ncols,5*nrows))
        if not isinstance(axes, np.ndarray):
            axes = np.array([axes])
    else:
        inner = mpl.gridspec.GridSpecFromSubplotSpec(nrows, ncols,
                    subplot_spec=outer_i, wspace=0.1, hspace=0.1)
        
        #np.array([["{},{}".format(x,y) for y in range(ncols)] for x in range(nrows)])
        axes = np.array([plt.Subplot(fig, inner[j]) for j in range(nrows*ncols)])
        [fig.add_subplot(ax) for ax in axes]

    # Plot each subplot
    first_legend = add_legend
    for (key, ax) in zip(grouped.groups.keys(), axes.flatten()):
        gp = grouped.get_group(key)
        gp.columns= gp.columns.droplevel('Units')
        gp.index.name = ''
        
        # Sort by max absolute difference
        gp = gp.loc[gp.apply(lambda row: max(row) - min(row), axis=1)
                      .sort_values(ascending=False).index]
        gp.plot(kind='bar', ax=ax)
        if key == 'GJ':
            title = "Energy (GJ)"
        elif key == 'm3':
            title = "Water (m3)"
        else:
            # shouldn't happen
            title = key
            
        # Add labels with fontsize
        ax.set_title(title, fontsize=fontsize+2)
        ax.set_ylabel(key, fontsize=fontsize)
        # Set tick size
        for tick in ax.xaxis.get_major_ticks():
            tick.label.set_fontsize(fontsize)
            tick.label.set_rotation(45)
        for tick in ax.yaxis.get_major_ticks():
            tick.label.set_fontsize(fontsize)
            
        # Add % difference if any
        for i, x in enumerate(ax.get_xticklabels()):
            # Return 'Heating', 'Cooling', etc
            idx = x.get_text()

            # Loop on each successive versions
            for k in range(len(gp.columns)-1):
                v_old = gp.loc[idx].iloc[k]
                v_new = gp.loc[idx].iloc[k+1]
                if abs(v_new - v_old) > 0:
                    pct = (v_new - v_old) / v_old

                    if v_old > v_new:
                        # Base on the first bar
                        rect = ax.patches[i+k*len(ax.patches) // len(gp.columns)]
                        # Offset needed to be at mid between both bars
                        x_offset = rect.get_width()
                    else:
                        # Based on second bar
                        rect = ax.patches[i+(k+1)*len(ax.patches) // len(gp.columns)]
                        x_offset = 0

                    ax.annotate("{:.2%}".format(pct),
                                xy=(rect.get_x() + x_offset, rect.get_height()+0.05),
                                xytext=(15*(2*k-1),20), textcoords='offset points',
                                ha='center', va='bottom',
                                fontweight='normal',
                                fontsize=fontsize-2, color='k',
                                arrowprops=dict(arrowstyle="->",
                                connectionstyle="arc3"))
        
        # Display legend or not
        if not first_legend:
            ax.legend().set_visible(False)
        else:
            ax.legend()
        first_legend = False
            
    sns.despine()
    
    # Title
    title = "End Use for {}".format(test)
    # fig.suptitle(title)
    axes[0].annotate(title,
                     xy=(0.5*ncols, 1.0), xycoords='axes fraction',
                     xytext=(0, 20), textcoords='offset points',
                     va='bottom', ha='center',
                     fontsize=fontsize+4, fontweight='bold',
                     )
    if outer_i is None:
        plt.show()

In [None]:
# testing
#test = 'centralheatpumpsystem.rb'
#test = 'evaporative_cooling.osm'
#df_all_end_use = parse_before_after_enduse(test)
#plot_end_use_diff(df_all_end_use, test)

In [None]:
def table_absolute_difference(df_all_end_use, is_incremental=True,
                             add_eplus_version=True):
    """
    Computes the absolute difference in (GJ/m3).
    If is_incremental, compares from one version to the next
        eg: returns [Transition - 2.4.2] and '2.4.3 - Transition')
    if false, compares to the oldest one
        eg: returns [Transition - 2.4.2] and '2.4.3 - 2.4.2')

    Args:
    ------
    * df_all_end_use (pd.DataFrame): dataframe from parse_before_after_enduse
    * is_incremental (bool): compare each version to the previous version
        or to the oldest one
    * add_eplus_version (bool): Whether to construct the name as EP-OS or
    just OS. Pass True if both OLD_OS and NEW_OS are the same OS version
    
    Returns:
    --------
    * abs_diff (pd.DataFrame)
    * html_abs = HTML object
    
    """
    
    if add_eplus_version:
        l_order = ["{}-{}".format(OLD_OS_INFO['EPLUS_VERSION'],
                                  OLD_OS_INFO['OS_VERSION']),
                   
                   "{}-{}".format(TRANSITION_INFO['EPLUS_VERSION'],
                                  TRANSITION_INFO['OS_VERSION']),
                   
                   "{}-{}".format(NEW_OS_INFO['EPLUS_VERSION'],
                                  NEW_OS_INFO['OS_VERSION']),
                  ]
    else:
        l_order = [OLD_OS_INFO['OS_VERSION'],
                   TRANSITION_INFO['OS_VERSION'],
                   NEW_OS_INFO['OS_VERSION']]
        
    # Sort in the right order
    cols_in_order = [x for x in l_order 
                     if x in df_all_end_use.columns]
        
    abs_diff = df_all_end_use.copy()
    for i, col in enumerate(cols_in_order[1:]):
        if is_incremental:
            k = i
        else:
            k = 0
        abs_diff[col] = df_all_end_use[col] - df_all_end_use[cols_in_order[k]]
    abs_diff = abs_diff[cols_in_order[1:]]
    
    abs_diff = (abs_diff.replace(0, np.nan)
                        .dropna(how='all', axis=0)
                        .dropna(how='all', axis=1))
    
    if is_incremental:
        ann = "<strong>Comparing from one version to the next</strong>"
    else:
        ann = "<strong>Comparing each version to {}</strong>".format(cols_in_order[0])
    if abs_diff.empty:
        # print("There are ZERO absolute differences for {}".format(test))
        html = HTML('<p>{}</p>\n<p style="font-size: 18px; text-align: center">'
                    'There are <strong>ZERO</strong> absolute differences '
                    'for {}</p>'.format(ann, test))
    else:
        html = (abs_diff.style.set_table_styles(styles)
                 .set_caption("Absolute diff for {}\n{}".format(test, ann))
                 .format(lambda x: "{:.0f}".format(x) if not np.isnan(x) else '-'))
        # display(html)
    return abs_diff, html

def table_percent_difference_by_end_use_and_fuel(df_all_end_use,
                                                 is_incremental=True,
                                                 add_eplus_version=True):
    """
    Computes the percentage difference in between the old and the new for each
    end use and fuel.
    
    eg: Heating Electricity % is calculated as
        (heating-electricity-kbtu-new) - (heating-electricity-kbtu-old)
        / (heating-electricity-kbtu-old)
    
    Args:
    ------
    * df_all_end_use (pd.DataFrame): dataframe from parse_before_after_enduse
    
    * add_eplus_version (bool): Whether to construct the name as EP-OS or
    just OS. Pass True if both OLD_OS and NEW_OS are the same OS version
    
    Returns:
    --------
    * pct_diff (pd.DataFrame)
    * html_diff = HTML object
    """
    if add_eplus_version:
        l_order = ["{}-{}".format(OLD_OS_INFO['EPLUS_VERSION'],
                                  OLD_OS_INFO['OS_VERSION']),
                   
                   "{}-{}".format(TRANSITION_INFO['EPLUS_VERSION'],
                                  TRANSITION_INFO['OS_VERSION']),
                   
                   "{}-{}".format(NEW_OS_INFO['EPLUS_VERSION'],
                                  NEW_OS_INFO['OS_VERSION']),
                  ]
    else:
        l_order = [OLD_OS_INFO['OS_VERSION'],
                   TRANSITION_INFO['OS_VERSION'],
                   NEW_OS_INFO['OS_VERSION']]
        
    # Sort in the right order
    cols_in_order = [x for x in l_order 
                     if x in df_all_end_use.columns]
        
    pct_diff = df_all_end_use.copy()
    for i, col in enumerate(cols_in_order[1:]):
        if is_incremental:
            k = i
        else:
            k = 0
        pct_diff[col] = (df_all_end_use[col] - df_all_end_use[cols_in_order[k]]) / df_all_end_use[cols_in_order[k]]

    pct_diff = pct_diff[cols_in_order[1:]]
    
    pct_diff = (pct_diff.replace(0, np.nan)
                        .dropna(how='all', axis=0)
                        .dropna(how='all', axis=1))
    
    if is_incremental:
        ann = "<strong>Comparing from one version to the next</strong>"
    else:
        ann = "<strong>Comparing each version to {}</strong>".format(cols_in_order[0])
    
    if pct_diff.empty:
        # print("There are ZERO percentage differences for {}".format(test))
        html = HTML('<p>{}</p><p style="font-size: 18px; text-align: center">'
                    'There are <strong>ZERO</strong> percentage differences '
                    'for {}</p>'.format(ann, test))
    else:
        html = (pct_diff.style.set_table_styles(styles)
                 .set_caption("Relative individual % diff for each end use and"
                              " fuel for '{}'\n{}".format(test, ann))
                 .format(lambda x: "{:.2%}".format(x) if not np.isnan(x) else '-'))
        # display(html)
    return pct_diff, html

def table_percent_difference_of_total(df_all_end_use, is_incremental=True,
                                      add_eplus_version=True):
    """
    Computes the percentage difference in between the old and the new for each
    type Water or GJ.
    
    eg: Heating % is calculated as
        sum(heating-GJ-each-fuel-new) - sum(heating-GJ-each-fueld-old)
        / sum(all_GJ)
    
    Args:
    ------
    * df_all_end_use (pd.DataFrame): dataframe from parse_before_after_enduse
    * is_incremental (bool): compare each version to the previous version
        or to the oldest one
    * add_eplus_version (bool): Whether to construct the name as EP-OS or
    just OS. Pass True if both OLD_OS and NEW_OS are the same OS version
    
    Returns:
    --------
    * percentage_of_total (pd.DataFrame)
    * html_tot = HTML object
  
    """
    
    if add_eplus_version:
        l_order = ["{}-{}".format(OLD_OS_INFO['EPLUS_VERSION'],
                                  OLD_OS_INFO['OS_VERSION']),
                   
                   "{}-{}".format(TRANSITION_INFO['EPLUS_VERSION'],
                                  TRANSITION_INFO['OS_VERSION']),
                   
                   "{}-{}".format(NEW_OS_INFO['EPLUS_VERSION'],
                                  NEW_OS_INFO['OS_VERSION']),
                  ]
    else:
        l_order = [OLD_OS_INFO['OS_VERSION'],
                   TRANSITION_INFO['OS_VERSION'],
                   NEW_OS_INFO['OS_VERSION']]
        
    # Sort in the right order
    cols_in_order = [x for x in l_order 
                     if x in df_all_end_use.columns]

    concat_dict = {}
    for i, col in enumerate(cols_in_order[1:]):
        if is_incremental:
            k = i
        else:
            k = 0
        sum_old = (df_all_end_use[cols_in_order[k]]
                   .groupby(level='Units', axis=1).sum().sum())
        abs_diff_end_use = ((df_all_end_use[col] 
                            - df_all_end_use[cols_in_order[k]])
                            .stack(0)
                            .groupby(level='EndUse').sum())

        percentage_of_total = (abs_diff_end_use / sum_old)
        d = {'GJ': 'Energy', 'm3': 'Water'}
        percentage_of_total.columns = pd.MultiIndex.from_tuples([(d[x], x) 
                                                                 for x in percentage_of_total.columns],
                                                                names=['Type', 'Units'])
        concat_dict[col] = percentage_of_total
    
    percentage_of_total = pd.concat(concat_dict, axis=1)[cols_in_order[1:]]
    # Drop end uses where we have nothing
    percentage_of_total = (percentage_of_total.reindex(df_all_end_use.index)
                                               .replace(0, np.nan)
                                               .dropna(how='all', axis=0)
                           )

    # drop Fuel Type (units) where none of the versions have a change
    sum_by_type =  percentage_of_total.groupby(level='Units', axis=1).sum().sum()
    percentage_of_total.loc[:,
                            (percentage_of_total.columns
                             .get_level_values('Units')
                             .isin(sum_by_type.index[sum_by_type != 0]))]
    if is_incremental:
        ann = "<strong>Comparing from one version to the next</strong>"
    else:
        ann = "<strong>Comparing each version to {}</strong>".format(cols_in_order[0])
    
    if percentage_of_total.empty:
        # print("There are ZERO percentage differences for {}".format(test))
        html = HTML('<p>{}</p><p style="font-size: 18px; text-align: center">'
                    'There are <strong>ZERO</strong> percentage differences '
                    'for {}</p>'.format(ann, test))
    else:
        html = (percentage_of_total.style.set_table_styles(styles)
                 .set_caption("% diff of total GJ/m3 {}\n{}".format(test, ann))
                 .format(lambda x: "{:.2%}".format(x) if not np.isnan(x) else '-'))
        # display(html)
    return percentage_of_total, html

In [None]:
def report_for_test(test, is_incremental=True,
                    old_os=True, transition=True, new_os=True,
                    plot_if_no_diff=True, add_legend=True,
                    display_tables=False,
                    plot_heatmap=False, heatmap_as_pct_of_total=False,
                    outer_i=None, fig=None, add_eplus_version=True):
    """
    High level method to investigate differences
    
    Args:
    -----
    * test (str): test name, eg 'centralheatpumpsystem.rb
    * is_incremental (bool): compare each version to the previous version
        or to the oldest one

    * old_os, transition, new_os (bool): whether to include these versions
    Note that it relies on the respective global dictionaries
    OLD_OS_INFO, TRANSITION_INFO, NEW_OS_INFO
    * display_tables (bool): if true, shows `table_absolute_difference`
        and `table_percent_difference_table`
    * plot_if_no_diff (bool): if there are no difference, whether to show 
        `plot_end_use_diff` anyways or not
    * gs (matplotlib GridSpec): Pass one if you want to organize the figures
        in a given layout, otherwise a new plot is created
    * add_eplus_version (bool): Whether to construct the name as EP-OS or
    just OS. Pass True if both OLD_OS and NEW_OS are the same OS version
    
    Returns:
    --------
    * fig (matplotlib.figure)
    
    Can display requested things on the fly too
    
    """
    has_diff = False
    
    df_all_end_use = parse_before_after_enduse(test,
                                               old_os=old_os,
                                               transition=transition,
                                               new_os=new_os)
    abs_diff, html_abs = table_absolute_difference(df_all_end_use, is_incremental, add_eplus_version=add_eplus_version)
    # display(html_abs)
    if not abs_diff.empty:
        has_diff = True
        if display_tables:
            pct_diff, html_pct = table_percent_difference_by_end_use_and_fuel(df_all_end_use, is_incremental, add_eplus_version=add_eplus_version)
            percentage_of_total, html_tot = table_percent_difference_of_total(df_all_end_use, is_incremental, add_eplus_version=add_eplus_version)

            display(HTML("""
            <div style='display: grid; grid-template-columns: 1fr 1fr 1fr; grid-column-gap: 10px;'>
                <div style='align-self: center; width: '> {html_abs} </div>
                <div style='align-self: center;'> {html_pct} </div>
                <div style='align-self: center;'> {html_tot} </div>
            </div>""".format(html_abs=html_abs.to_html(),
                             html_pct=html_pct.to_html(),
                             html_tot=html_tot.to_html())))
        
    else:
        display(html_abs)
    
    if (has_diff | plot_if_no_diff):
        plot_end_use_diff(df_all_end_use=df_all_end_use, test=test,
                          outer_i=outer_i, fig=fig,
                          add_legend=add_legend)
        
    if has_diff & plot_heatmap:
        if heatmap_as_pct_of_total:
            plot_heatmap_pct_diff(test=test, pct_diff=percentage_of_total,
                                  is_incremental=is_incremental,
                                  as_pct_of_total=True)
        else:
            plot_heatmap_pct_diff(test=test, pct_diff=pct_diff,
                                  is_incremental=is_incremental,
                                  as_pct_of_total=False)
        
def plot_heatmap_pct_diff(test, is_incremental=False,
                          pct_diff=None, df_all_end_use=None,
                          ax=None, figsize=None, short_title=False,
                          as_pct_of_total=False, vmax=None):
    """
    Plots a heatmap of percentage difference. It will show the xlabels
    as "Fuel" only if one unit, if more than one it's "Fuel-Units"
    
    Args:
    -----
    * test (str): test name
    * pct_diff (pd.DataFrame): from `table_percent_difference_table(both_end_use)`
        If not supplied, it is recomputed
        * as_pct_of_total (bool): if True, calls `table_percent_difference_of_total`
        otherwise calls `table_percent_difference_by_end_use_and_fuel`
    * df_all_end_use (pd.DataFrame): from `parse_before_after_enduse(test)`
        If pct_diff is not supplied, it uses this dataframe to recompute pct_diff
        If also not supplied, it is recomputed
    * ax (matplotlib.axes._subplots.AxesSubplot): The axis on which to plot,
        pass None to create a new figure
    * figsize (tuple of int): force a given figure size
        pass None to autocalculate
    * short_title (bool): display only the test name or also with versions
    * vmax (float, typically between 0 and 1): the maximum for the colorbar
        if None, defaults to 0.25 (25%) is as_pct_of_total is False, and
        0.05 (5%) if as_pct_of_total is True
    Returns:
    --------
    None, plots the heatmap
    """
    if vmax is None:
        if as_pct_of_total:
            # Colorbar goes from 0 (yellow) to 5% (red)
            vmax = 0.05
        else:
            vmax = 0.25
    
    # Modularity in arguments, compute only what's needed
    if pct_diff is None:
        if df_all_end_use is None:
            df_all_end_use = parse_before_after_enduse(test)
        if as_pct_of_total:
            pct_diff, _ = table_percent_difference_of_total(df_all_end_use=df_all_end_use,
                                                            is_incremental=is_incremental)
        else:
            pct_diff, _ = table_percent_difference_by_end_use_and_fuel(df_all_end_use=df_all_end_use,
                                                                       is_incremental=is_incremental)

    show_plot = False
    if ax is None:
        show_plot = True
        if figsize is None:
            w = min(pct_diff.shape[0], 16)
            h = pct_diff.shape[0] * w / (3*pct_diff.shape[1])
        else:
            w = figsize[0]
            h = figsize[1]
        fig, ax = plt.subplots(figsize=(w, h))

    fmt = lambda x,pos: '{:.0%}'.format(x)

    toplot = pct_diff.copy()
    if len(toplot.columns.get_level_values('Units').unique()) == 1:
        toplot.columns = toplot.columns.droplevel('Units')

    sns.heatmap(toplot.abs(),
                ax=ax, cmap='YlOrRd',
                vmin=0, vmax=vmax,
                cbar_kws={'format': mpl.ticker.FuncFormatter(fmt)},
                annot=toplot, fmt='.2%')
    if short_title:
        title = test
    else:
        if as_pct_of_total:
            title = ("Percent difference of total GJ/m3 by End Use for test "
                     "'{}'".format(test))
        else:
            title = ("Relative Individual % diff for each End Use / Fuel for test "
                     "'{}'".format(test))
        if is_incremental:
            title += '\nComparing from one version to the next'
        else:
            title += '\nComparing each to the oldest version ({})'.format(OLD_OS_INFO['OS_VERSION'])
    ax.set_title(title)
    if show_plot:
        plt.show()

In [None]:
def hover(hover_color="#ffff99"):
    return dict(selector="tr:hover",
                props=[("background-color", "%s" % hover_color)])

styles = [
    hover(),
    dict(selector="th", props=[("font-size", "110%"),
                               ("text-align", "center")]),
    dict(selector="td", props=[("text-align", "center")]),
    dict(selector="caption", props=[("caption-side", "bottom"),
                                    ("text-align", "center")])
]

In [None]:
# Make graph centered on page
from IPython.core.display import HTML
HTML("""
<style>
.output_png {
    display: table-cell;
    text-align: center;
    vertical-align: middle;
}
</style>
""")

### Single test

In [None]:
# Has differences, but no water
test = 'unitary_vav_bypass_plenum.rb'

#test = 'surfacecontrol_moveableinsulation.rb'
report_for_test(test, is_incremental=False,
                old_os=True, transition=True, new_os=True,
                plot_if_no_diff=True, display_tables=True,
                plot_heatmap=True, heatmap_as_pct_of_total=False)

#### Inspect in more detail

In [None]:
#test = 'lifecyclecostparameters.rb'
#test = 'vrf.osm'
print(f"Investigating the {test=}")
df_all_end_use = parse_before_after_enduse(test,
                                           old_os=True, transition=True,
                                           new_os=True)

In [None]:
# Show absolute values for only fuel/end use that have a value
print(f'Deviations in end uses for {test=}')
df_all_end_use.replace(0, np.nan).dropna(how='all', axis=0).dropna(how='all', axis=1)

In [None]:
pct_diff, html = table_percent_difference_by_end_use_and_fuel(df_all_end_use)
display(html)

In [None]:
# If you don't supply pct_diff, it will be computed again
# if also you don't supply both_end_use, it will be computed again
# Do notice the is_incremental keyword again...
plot_heatmap_pct_diff(test, pct_diff=None,
                      is_incremental=True,
                      df_all_end_use=None,
                      figsize=(16,9), short_title=False,
                      # Switch as_pct_of_total to see the difference
                      as_pct_of_total=True, vmax=None)

### Multiple Tests - one per row

In [None]:
# We only really care about differences between Transition and new OS
s_diff = df_diff.iloc[:, -1]

In [None]:
report_for_largest_n = 6
print("Reporting for {} largest differences in Transition to New OS".format(report_for_largest_n))

add_legend=True
for test in s_diff.abs().nlargest(report_for_largest_n).index:
    report_for_test(test, plot_if_no_diff=True, display_tables=False, 
                    add_legend=True)
    add_legend=False

### Multiple Tests - in a grid

In [None]:
report_for_largest_n = 6
print("Reporting for {} largest differences in Transition to New OS".format(report_for_largest_n))

outer_ncols = 2
outer_nrows = int(np.ceil(report_for_largest_n/outer_ncols))

fig = plt.figure(figsize=(16, 4*outer_nrows))
outer = mpl.gridspec.GridSpec(outer_nrows, outer_ncols, wspace=0.1, hspace=0.5)

add_legend = True
for i, test in enumerate(s_diff.abs().nlargest(report_for_largest_n).index):
    df_all_end_use = parse_before_after_enduse(test)
    plot_end_use_diff(df_all_end_use=df_all_end_use, test=test, 
                      outer_i=outer[i], 
                      fig=fig, fontsize=7, add_legend=add_legend)
    add_legend = False

# fig.tight_layout()
plt.show()

### Multiple Tests - heatmaps

In [None]:
report_for_largest_n = 10
print("Reporting for {} largest differences in Transition to New OS".format(report_for_largest_n))

ncols = 2
nrows = int(np.ceil(report_for_largest_n/outer_ncols))

fig, axes = plt.subplots(nrows=nrows, ncols=ncols, figsize=(16, 4*outer_nrows))

add_legend = True
for test, ax in zip(s_diff.abs().nlargest(report_for_largest_n).index, axes.flatten()):
    # Switch as_pct_of_total, force vmax (max of colorbar) if you want
    plot_heatmap_pct_diff(test, ax=ax,
                          is_incremental=True,
                          short_title=True,
                          as_pct_of_total=True, vmax=None)
    

fig.tight_layout()
plt.show()

# Compare IDFs

In [None]:
import openstudio
openstudio.openStudioLongVersion()

In [None]:
def check_num_objects(test):
    old_idf_path = os.path.join(OLD_OS_DIR, f'{test}.idf')
    transitioned_idf_path = os.path.join(TRANSITION_DIR, f'{test}.idf')
    new_idf_path = os.path.join(NEW_OS_DIR, f'{test}.idf')
    
    old_idf = openstudio.Workspace.load(openstudio.toPath(old_idf_path)).get()
    transitioned_idf = openstudio.Workspace.load(openstudio.toPath(transitioned_idf_path)).get()
    new_idf = openstudio.Workspace.load(openstudio.toPath(new_idf_path)).get()
    
    old_n = len(old_idf.objects(True))
    trans_n = len(transitioned_idf.objects(True))
    new_n = len(new_idf.objects(True))
        
    return test, old_n, trans_n, new_n

In [None]:
tests = [x for x in next(os.walk(OLD_OS_DIR))[1]]

pool = multiprocessing.Pool(processes=N)

desc = '<h3>Checking number of objects in IDF files</h3>'
label = HTML(desc)
display(label)
all_results = []
for result in tqdm(pool.imap_unordered(check_num_objects, tests), total=len(tests)):
    all_results.append(result)

In [None]:
df_num_objects = pd.DataFrame(all_results, columns=['Test', OLD_OS_INFO['OS_VERSION'], TRANSITION_INFO['OS_VERSION'], NEW_OS_INFO['OS_VERSION']])
df_num_objects.set_index('Test', inplace=True)

In [None]:
print("Any difference in number of objects")
df_num_objects[(df_num_objects.diff(axis=1).abs() > 0).any(axis=1)]

In [None]:
print(f"Tests where the {OS_OLD_VERSION} is different from Transition:")
df_num_objects[df_num_objects[OS_OLD_VERSION] != df_num_objects['Transition']]

In [None]:
print(f"Tests where the difference between {OS_OLD_VERSION} and {OS_NEW_VERSION} is strictly greater than one")
df_num_objects[(df_num_objects[OS_NEW_VERSION] - df_num_objects[OS_OLD_VERSION]).abs() > 1]

 **Analysis: there are no problems:**
 
 * From 3.4.0 to 3.5.0, we wrote transition rules for PTACs/PTHPs
     * cf https://github.com/NREL/OpenStudio/blob/develop/developer/doc/ReleaseNotes/OpenStudio_Release_Notes_3_5_0_20221110.md
     
     > There are unusual `VersionTranslator` Rules for Packaged Systems (PTAC or PTHP) that use a `FanConstantVolume` and that do not have a `Supply Air Fan Operating Mode Schedule`. In 22.1.0 this would effectively, and mistakenly, function as a cycling fan, but this is now disallowed in E+ 22.2.0. In order to retain a similar functionality and energy usage, the `FanConstantVolume` will be replaced by a `FanSystemModel` with an Always Off Schedule (=cycling fan, similar to a `Fan:OnOff`), mapping inputs such as pressure rise and efficiency appropriately.
     * The extra 1 object is the the Always Off Discrete schedule.

* There are also CommentOnly difference when the 

In [None]:
test = 'unitary_vav_bypass_plenum.rb'

old_idf_path = os.path.join(OLD_OS_DIR, f'{test}.idf')
transitioned_idf_path = os.path.join(TRANSITION_DIR, f'{test}.idf')
new_idf_path = os.path.join(NEW_OS_DIR, f'{test}.idf')

old_idf = openstudio.Workspace.load(openstudio.toPath(old_idf_path)).get()
transitioned_idf = openstudio.Workspace.load(openstudio.toPath(transitioned_idf_path)).get()
new_idf = openstudio.Workspace.load(openstudio.toPath(new_idf_path)).get()

len(old_idf.objects(True)), len(transitioned_idf.objects(True)), len(new_idf.objects(True))

df_transitioned = pd.DataFrame([(obj.iddObject().name(), obj.nameString()) for obj in transitioned_idf.objects()],
                               columns=['Type', 'Name'])

df_new = pd.DataFrame([(obj.iddObject().name(), obj.nameString()) for obj in new_idf.objects()],
                               columns=['Type', 'Name'])

df_obj_type_diff = pd.concat(
    [df_transitioned['Type'].value_counts(),
     df_new['Type'].value_counts()],
    axis=1, keys=['Transitioned', 'New'])

df_obj_type_diff[df_obj_type_diff.diff(axis=1)['New'].abs() != 0]

# transitioned_idf.save(transitioned_idf_path, True)

# Strip input cell and warnings

In [None]:
# Local import
import os
import glob as gb
from bs4 import BeautifulSoup


#filelist = gb.glob('*.html')
# filelist = ['AutomateVersionTranslation.html']
filelist = ['Analysis_from_3.7.0(23.2.0)_to_3.8.0-rc2(24.1.0).html']

## Older jupyter install

In [None]:
for s_path in filelist:
    print(s_path)
    
    save_path = "{}_stripped.html".format(os.path.splitext(s_path)[0])
    print("Deleting input cells and warnings")
    with open(s_path,"r+") as htmlDoc:
        soup = BeautifulSoup(htmlDoc, "lxml")
        # Get input divs
        tg = soup.find_all(attrs={"class": "input"})
        # Add input stderr (warnings and errors)
        tg += soup.find_all(attrs={"class": "output_stderr"})
        # Replace with nothing
        for i in range(len(tg)):
            tg[i].replace_with("")

    # Prettify
    html = soup.prettify("utf-8")
    #exportpath = os.path.splitext(htmlpath)[0]+'-noinput'+os.path.splitext(htmlpath)[1]

    # Write
    with open(save_path, "wb") as file:
        file.write(html)

## Newer jupyter install

In [None]:
for s_path in filelist:
    print(s_path)
    
    save_path = "{}_stripped.html".format(os.path.splitext(s_path)[0].replace('_ori', ''))
    with open(s_path,"r+") as htmlDoc:
        soup = BeautifulSoup(htmlDoc, "lxml")
    
    # Remove Raw Nb Convert
    [x.decompose() for x in soup.select('body > p')]
    # Get input divs
    [x.decompose() for x in soup.select('.jp-CodeCell .jp-Cell-inputWrapper')]
    
    print("Deleting sections")
    in_del_block = True
    for i, div in enumerate(soup.find_all('div', attrs={'class': 'jp-Cell'})):
        if i == 0:
            in_del_block = True
            continue
        if h1 := div.find('h1'):
            if h1.attrs['id'].startswith('PART-1'):
                print('PART-1 found')
                in_del_block = True
                continue
            elif h1.attrs['id'].startswith('PART-2'):
                print('PART-2 found')
                in_del_block = False
            elif h1.attrs['id'] == 'Compare-IDFs':
                print('Compare-IDFS found')
                in_del_block = True
                
        if h2 := div.find('h2'):
            if h2.attrs['id'] == 'Analyzing-E+-runtime':
                print('Analyzing-E+-runtime found')
                in_del_block = True
                continue
            elif h2.attrs['id'].startswith('Concat-frames'):
                print('Concat-frames found')
                in_del_block = False

        if in_del_block:
            div.decompose()
    
    print("Deleting input cells and warnings")
    

    
    tg = soup.find_all(attrs={"class": "jp-InputArea-editor"}) # "jp-InputArea"})
    # Add input stderr (warnings and errors)
    tg += soup.find_all(attrs={"data-mime-type": "application/vnd.jupyter.stderr"})
    # tg += soup.find_all(attrs={"class": "jp-OutputArea-executeResult"})
    # Replace with nothing
    for i in range(len(tg)):
        tg[i].replace_with("")

    # Prettify
    html = soup.prettify("utf-8")
    #exportpath = os.path.splitext(htmlpath)[0]+'-noinput'+os.path.splitext(htmlpath)[1]

    # Write
    with open(save_path, "wb") as file:
        file.write(html)

In [None]:
def parse_sql_version_and_sitekbtu_for_sqlfile(sql_path):
    """
    This function grabs the EnergyPlusVersion and the total site energy
    from the SQL file.
    
    Args:
    -----
    * output_directory (str): the path were the SQL should be.
        eg: `./8.8.0/absorption_chillers.osm_8.8.0/`
    
    Returns:
    ---------
    * pd.Series that has the version with SHA and site kbtu
        (or None if it didn't run), which name is the test_name
        (gotten from the name of the output_directory)
    """

    abs_sql_path = Path(sql_path).absolute()
    sql_uri = '{}?mode=ro'.format(abs_sql_path.as_uri())
    with sqlite3.connect(sql_uri, uri=True) as con:
            cursor = con.cursor()
            r = cursor.execute(SQL_QUERY_SIM_INFO).fetchone()
            if r:
                simulation_info = r[0]
                m = VERSION_REGEX.search(simulation_info)
                if m:
                    gpdict = m.groupdict()
                    version_with_sha = "{}.{}.{}-{}".format(gpdict['Major'],
                                                                 gpdict['Minor'],
                                                                 gpdict['Patch'],
                                                                 gpdict['SHA'])
            else:
                msg = ("Cannot find the EnergyPlusVersion in the SQL file. "
                       "For:\n{}".format(sql_path))
                #raise ValueError(msg)
                print(msg)

            # Get Site kBTU
            r = cursor.execute(SQL_QUERY_TOTAL_SITE_KBTU).fetchone()
            if r:
                site_gj = float(r[0])
                site_kbtu = site_gj * GJ_TO_KBTU
                msg = ("Cannot find the Total Site Energy in the SQL file. "
                       "For:\n{}".format(sql_path))
    return pd.Series([version_with_sha, site_kbtu],
                     index=['E+', 'SiteKBTU'],
                     name = os.path.split(sql_path)[1])

In [None]:
parse_sql_version_and_sitekbtu_for_sqlfile('../tmp/os321.sql')

In [None]:
parse_sql_version_and_sitekbtu_for_sqlfile('../tmp/transition.sql')

In [None]:
parse_sql_version_and_sitekbtu_for_sqlfile('../tmp/os330.sql')

Run period year changed from 2009 (Thursday) to 2006 (Sunday)?
Sizing:System: 100% Outdoor Air in Cooling/Heating was changed from 'Yes' to 'No'


In [None]:
parse_sql_version_and_sitekbtu_for_sqlfile('../tmp/os330_sizingsystem/eplusout.sql')

In [None]:
parse_sql_version_and_sitekbtu_for_sqlfile('../tmp/os330_runperiod/eplusout.sql')

In [None]:
parse_sql_version_and_sitekbtu_for_sqlfile('../tmp/os330_sizingsystem_runperiod/eplusout.sql')

In [None]:
print(f"{4109270.60329/4122900.21348 - 1:.2%}")