# Findings

### Flow of information
A common observation from the workshops was that climate information typically follows a chain of use within organizations. It begins with initial users—often from water resources or meteorology teams—who collect climate data and possess strong knowledge of its limitations and biases. These users process or reconstruct the data before passing it along to other teams, such as those responsible for load forecasting or outage planning. However, critical information about the data’s limitations is often lost along the way, as each team tends to assume that the data they receive has already been validated and corrected for uncertainties.

Another category of climate information users includes those who rely on standards, such as design values from organizations like the CSA or internal norms. These users may not be familiar with the origins or limitations of the data but trust that the authoritative body behind the standards ensures its reliability.

<div style="text-align: center;">
<img src="image/info_flow.png" alt="Workshop" width="800">
</div>

### Overview of decision-making challenge in the Workshops

The Sankey diagram below provides a preliminary overview of the applications discussed by participants during the workshops (across all organizations).
Note: This is an early draft that includes only a limited set of applications.

The first set of nodes (leftmost column) represents the sectors involved in the decision-making challenges. The second column provides a brief description of the decision context. The third column highlights the climate variables associated with these challenges, and the final column outlines key characteristics identified by participants as essential for informed decision-making.

Please note that the values shown may exceed the number of visible incoming or outgoing flows. This indicates that certain links were cited multiple times across different decision-making contexts by participants.

In [1]:
#| echo: false

import yaml
import pandas as pd
from functools import reduce
from typing import Tuple
import numpy as np
import plotly.graph_objects as go
import plotly.io as pio
from IPython.display import display, HTML

pio.renderers.default = 'notebook_connected'

# --- Load YAML ---
with open("sankey_nodes_links.yml", "r") as f:
    data = yaml.safe_load(f)

# --- Extract all links into a DataFrame ---
def extract_links(section):
    links = []
    for item in data.get(section, []):
        source = item["source"]
        for entry in item.get("targets", []):
            if isinstance(entry, list):
                target, value = entry
            else:
                target, value = entry, 1
            links.append({"source": source, "target": target, "value": value})
    return links

all_links = (
    extract_links("sectors") +
    extract_links("variables") +
    extract_links("types")
)

df = pd.DataFrame(all_links)

def infer_node_groups_from_yaml(data: dict) -> list:
    # Mapping of YAML sections to their inferred layer index
    sections = ['sectors', 'variables', 'types']
    num_layers = len(sections) + 1  # Because we can get one more from final targets
    layers = [set() for _ in range(num_layers)]

    # Populate known sources in layers based on section order
    for i, section in enumerate(sections):
        for entry in data.get(section, []):
            layers[i].add(entry['source'])
            for target in entry.get('targets', []):
                target_label = target[0] if isinstance(target, list) else target
                layers[i + 1].add(target_label)

    # Remove any duplicates that appear in lower layers
    for i in reversed(range(1, len(layers))):
        for node in layers[i]:
            for j in range(i):
                layers[j].discard(node)

    # Convert to sorted lists for consistency
    node_groups = [sorted(layer) for layer in layers if layer]
    return node_groups


def calc_node_positions(nodes: list, node_groups: list, final_group_extra_gap: float = 0.05, y_sep=0.3) -> Tuple[list, list]:
    normal_gap = (1-final_group_extra_gap) / (len(node_groups)-1)
    x_pos = [
        round(group_idx * normal_gap, 2)
        if group_idx < len(node_groups) - 1
        else round(group_idx * normal_gap + final_group_extra_gap, 2)
        for group_idx, group in enumerate(node_groups)
        for node_idx, node in enumerate(group)
    ]
    y_pos = [
        node
        for group in node_groups
        for node in np.cumsum(np.full(shape=len(group), fill_value=(y_sep/len(group)))).round(4).tolist()
    ]
    return x_pos, y_pos


def make_sankey_params_v3(df: pd.DataFrame, node_groups: list) -> Tuple[dict, dict]:
    sources, targets, values = df.values.T.tolist()
    expected_labels = reduce(lambda x, y: x + y, node_groups)
    assert set(sources) | set(targets) == set(expected_labels), \
        'Mismatch between node_groups and source/target values in YAML.'

    source_idx = list(map(expected_labels.index, sources))
    target_idx = list(map(expected_labels.index, targets))
    x_coords, y_coords = calc_node_positions(expected_labels, node_groups)

    nodes_dict = {
        'label': expected_labels,
        'x': x_coords,
        'y': y_coords
    }
    links_dict = {
        'source': source_idx,
        'target': target_idx,
        'value': values
    }
    return nodes_dict, links_dict

node_groups = infer_node_groups_from_yaml(data)

nodes, links = make_sankey_params_v3(df, node_groups)

fig = go.Figure(
    go.Sankey(
        node=nodes,
        link=links,
        arrangement="snap"
    ),
    layout_title_text="Workshops Sankey Diagram",
)

fig.update_layout(
    height=800
)

# Create HTML with horizontal scroll
html = pio.to_html(fig, include_plotlyjs='cdn', full_html=False)
scrollable = f'''
<div style="overflow-x: auto;">
  <div style="min-width:1200px">{html}</div>
</div>
'''
display(HTML(scrollable))