# Social outcomes and SDGs prototype
This prototype Sankey diagram examines the relation between social outcomes and SDGs over a range of INDIGO projects.

## INDIGO database API endpoint

Setup INDIGO database API endpoint and helper method for getting individual items from the API. This can be used with the project, fund, organisation and assessment_resource endpoints.

In [None]:
import requests

import plotly.graph_objects as go

from natsort import natsorted


INDIGO_DATABASE_API = 'https://golab-indigo-data-store.herokuapp.com/app/api1'


def api_get_item(endpoint, public_id=None):
    """
    Get individual item details from the API

    E.g. 
    item = api_get_item('project', 'INDIGO-POJ-0158')
    """
    try:
        if public_id:
            response = requests.get(f'{INDIGO_DATABASE_API}/{endpoint}/{public_id}')
        else:
            response = requests.get(f'{INDIGO_DATABASE_API}/{endpoint}')
        item = response.json()
        return item
    except Exception as e:
        print(f'\nFailed to retrieve {endpoint} "{public_id}".\nError: {e}')
        return False

## Setup
Setup some parameters and the `SDG_GOAL_MAP` to use UN sustainable development goal colours and titles. The direction of the sankey plots can also be switched here.

In [None]:
# Choose the direction of the plot
LEFT_TO_RIGHT = True

OUTCOME_METRIC_PRIMARY = 'primary'
OUTCOME_METRIC_SECONDARY = 'secondary'

OUTCOME_NUMBER_COUNT = 'number_count'
OUTCOME_SERVICE_USERS = 'service_users'

PROJECT_COLOUR = '#888888'
DEFAULT_COLOUR = '#AAAAAA'

SDG_GOAL_MAP = {
    '1': {
        'name': '1. No poverty',
        'node_colour': '#e5243b',
        'link_colour': 'rgba(229, 36, 59, 0.5)',
    },
    '2': {
        'name': '2. Zero hunger',
        'node_colour': '#DDA63A',
        'link_colour': 'rgba(221, 166, 58, 0.5)',
    },
    '3': {
        'name': '3. Good health and wellbeing',
        'node_colour': '#4C9F38',
        'link_colour': 'rgba(76, 159, 56, 0.5)',
    },
    '4': {
        'name': '4. Quality education',
        'node_colour': '#C5192D',
        'link_colour': 'rgba(197, 25, 45, 0.5)',
    },
    '5': {
        'name': '5. Gender equality',
        'node_colour': '#FF3A21',
        'link_colour': 'rgba(255, 58, 33, 0.5)',
    },
    '6': {
        'name': '6. Clean water and sanitation',
        'node_colour': '#26BDE2',
        'link_colour': 'rgba(38, 189, 226, 0.5)',
    },
    '7': {
        'name': '7. Affordable and clean energy',
        'node_colour': '#FCC30B',
        'link_colour': 'rgba(252, 195, 11, 0.5)',
    },
    '8': {
        'name': '8. Work and economic growth',
        'node_colour': '#A21942',
        'link_colour': 'rgba(162, 25, 66, 0.5)',
    },
    '9': {
        'name': '9. Industry, innovation and infrastructure',
        'node_colour': '#FD6925',
        'link_colour': 'rgba(253, 105, 37, 0.5)',
    },
    '10': {
        'name': '10. Reduced inequalities',
        'node_colour': '#DD1367',
        'link_colour': 'rgba(221, 19, 103, 0.5)',
    },
    '11': {
        'name': '11. Sustainable cities and communities',
        'node_colour': '#FD9D24',
        'link_colour': 'rgba(253, 157, 36, 0.5)',
    },
    '12': {
        'name': '12. Responsible consumption and production',
        'node_colour': '#BF8B2E',
        'link_colour': 'rgba(191, 139, 46, 0.5)',
    },
    '13': {
        'name': '13. Climate action',
        'node_colour': '#3F7E44',
        'link_colour': 'rgba(63, 126, 68, 0.5)',
    },
    '14': {
        'name': '14. Life below water',
        'node_colour': '#0A97D9',
        'link_colour': 'rgba(10, 151, 217, 0.5)',
    },
    '15': {
        'name': '15. Life on land',
        'node_colour': '#56C02B',
        'link_colour': 'rgba(86, 192, 43, 0.5)',
    },
    '16': {
        'name': '16. Peace, justice and strong institutions',
        'node_colour': '#00689D',
        'link_colour': 'rgba(0, 104, 157, 0.5)',
    },
    '17': {
        'name': '17. Partnerships for the goals',
        'color': '#19486A',
        'link_colour': 'rgba(25, 72, 106, 0.5)',
    },
}

## Helper methods
Helper methods for building the nodes and links in the plot.

In [None]:
def add_node(nodes, source, target, value):
    """
    Add a node with source and target
    """
    if source and target:
        key = f'{source}_{target}'

        if key in nodes:
            nodes[key]['value'] += value
        else:
            nodes[key] = {
                'source': source,
                'target': target,
                'value': value,
            }


def make_node_data(labels):
    """
    Make the sankey diagram node data
    """
    names = []
    node_colours = []

    for label in labels:

        # Set the node names
        if label.startswith('INDIGO'):
            names.append(label)
        elif label in SDG_GOAL_MAP:
            names.append(SDG_GOAL_MAP[label]['name'])
        else:
            names.append(f'Target {label}')

        # Set the node colours. Try the whole label first and then reduce the number
        # of characters to see if we get a match, filter out None values, and take the
        # first value from the filtered data.
        if label.startswith('INDIGO'):
            node_color = PROJECT_COLOUR
        else:
            colours = [SDG_GOAL_MAP.get(label[:i], {}).get('node_colour') for i in (None, 2, 1)]
            node_color = next(filter(None, colours), None) or DEFAULT_COLOUR

        node_colours.append(node_color)

    return {
        'label': names,
        'color': node_colours,
    }


def make_link_data(nodes, labels):
    """
    Make the sankey diagram link data
    """
    source = []
    target = []
    value = []
    link_colours = []

    for node in nodes.values():
        node_source = node['source']
        node_target = node['target']
        node_value = node['value']

        source.append(labels.index(node_source))
        target.append(labels.index(node_target))
        value.append(node_value)

        link_colour = None

        # Try to find the colour for the link in the SDG_GOAL_MAP
        for key in [node_source, node_target]:

            # Don't bother with project ids
            if key.startswith('INDIGO'):
                continue

            colours = [SDG_GOAL_MAP.get(key[:i], {}).get('link_colour') for i in (None, 2, 1)]
            link_colour = next(filter(None, colours), None)

            if link_colour:
                break

        link_colours.append(link_colour or DEFAULT_COLOUR)

    return {
        'source': source,
        'target': target,
        'value': value,
        'color': link_colours,
    }

## Sankey visualisation
The `sankey_vis`  method can be called with different sets of parameters to produce different output plots.

In [None]:
def sankey_vis(public_ids, data, outcome_metric_types, outcome_metric_value):
    """
    SDG sankey diagram visualisation

    Args:
        public_ids (list): Project public_id's
        data (dict): Project data
        outcome_metric_types (list): Outcome metric types to be plotted
        outcome_metric_value (str): Outcome metric value to use for plots

    Returns:
        obj: Plot figure
    """
    NOT_AVAILABLE_MSG = 'SDG data is not available'

    nodes = {}
    project_labels = []
    target_labels = []
    goal_labels = []

    for public_id in public_ids:
        
        project_data = data[public_id]['project']['data']
        outcome_metrics = project_data['outcome_metrics']

        for m in outcome_metrics:

            if outcome_metric_value == OUTCOME_SERVICE_USERS:
                value = m.get('targeted_number_of_service_users_or_beneficiaries_total', {}).get('value')
                if not value:
                    continue
            else:
                # OUTCOME_NUMBER_COUNT, just increment the number of target/goal combo
                value = 1

            if OUTCOME_METRIC_PRIMARY in outcome_metric_types:

                primary_sdg_target = m.get('primary_sdg_target', {}).get('value')
                primary_sdg_goal = m.get('primary_sdg_goal', {}).get('value')

                if primary_sdg_target and primary_sdg_goal:
                    project_labels.append(public_id)
                    target_labels.append(primary_sdg_target)
                    goal_labels.append(primary_sdg_goal)

                    if LEFT_TO_RIGHT:
                        add_node(nodes, public_id, primary_sdg_target, value)
                        add_node(nodes, primary_sdg_target, primary_sdg_goal, value)
                    else:
                        add_node(nodes, primary_sdg_goal, primary_sdg_target, value)
                        add_node(nodes, primary_sdg_target, public_id, value)

            if OUTCOME_METRIC_SECONDARY in outcome_metric_types:

                secondary_sdg_targets = m.get('secondary_sdg_targets', {}).get('value')
                secondary_sdg_goals = m.get('secondary_sdg_goals', {}).get('value')

                if secondary_sdg_targets and secondary_sdg_goals:
                    project_labels.append(public_id)
                    target_labels.append(secondary_sdg_targets)
                    goal_labels.append(secondary_sdg_goals)

                    if LEFT_TO_RIGHT:
                        add_node(nodes, public_id, secondary_sdg_targets, value)
                        add_node(nodes, secondary_sdg_targets, secondary_sdg_goals, value)
                    else:
                        add_node(nodes, secondary_sdg_targets, public_id, value)
                        add_node(nodes, secondary_sdg_goals, secondary_sdg_targets, value)

    goal_labels = natsorted(set(goal_labels))
    target_labels = natsorted(set(target_labels))
    project_labels = sorted(set(project_labels))

    labels = project_labels + target_labels + goal_labels
    num_items = max([len(project_labels), len(target_labels), len(goal_labels)])

    if num_items == 0:
        return NOT_AVAILABLE_MSG

    # Make the node data
    node_data = make_node_data(labels)

    node_data.update({
        'pad': 20,
        'thickness': 15,
        'line': {
            'color': 'white',
            'width': 1,
        },
    })

    # Make the link data dictionary
    link_data = make_link_data(nodes, labels)

    """
    link_data.update({
        'line': {
            'color': 'white',
            'width': 0.5,
        }
    })
    """

    data = [
        go.Sankey(
            node=node_data,
            link=link_data,
            arrangement='perpendicular',
        ),
    ]

    fig = go.Figure(
        data=data,
    )

    fig.update_layout(
        height=800,
    )

    return fig

## Get project data
Call the INDIGO API 'project' endpoint and retrieve the data used for the plot.

By default this will get data for all the projects, but you can pass a list of project ID's and get only some. See the comments in the code.

In [None]:
# Call the API and pull down the data for each project
# and store in a dictionary for use later.
# 
# You can set public_ids to some projects only
# eg:
# public_ids = ['INDIGO-POJ-0008', 'INDIGO-POJ-0011', 'INDIGO-POJ-0171', 'INDIGO-POJ-0172', 'INDIGO-POJ-0173', ]
# or pass an empty list, in which case it will get data from all projects
# eg:
# public_ids = []

public_ids = []
endpoint = 'project'

if not public_ids:
    for project_data in api_get_item(endpoint).get('projects'):
        if project_data.get('public'):
            public_ids.append(project_data.get('id'))

data = {}

for public_id in public_ids:
    print("Getting Project " + public_id)
    data[public_id] = api_get_item(endpoint, public_id)

## Build the figures
Create the plot for the projects, first starting with the SDG Primary Outcomes. The thickness of each link is based on the number of outcome metrics.

In [None]:
fig = sankey_vis(public_ids, data, [OUTCOME_METRIC_PRIMARY], OUTCOME_NUMBER_COUNT)
fig.update_layout(
    title='SDG Primary Outcomes (metric counts)',
    title_x=0.5,
)
fig.show()

The thickness of each link for the following plot is based on the number of service users for the project.

In [None]:
# Plot the  number of service users, instead of the total number of outcome metrics
fig = sankey_vis(public_ids, data, [OUTCOME_METRIC_PRIMARY], OUTCOME_SERVICE_USERS)
fig.update_layout(
    title='SDG Primary Outcomes (service users)',
    title_x=0.5,
)
fig.show()

# Some more plots to try...
# Plot the secondary outcome metrics
#fig = sankey_vis(public_ids, data, [OUTCOME_METRIC_SECONDARY], OUTCOME_NUMBER_COUNT)

# Combine primary and secondary outcomes on the same plot
#fig = sankey_vis(public_ids, data, [OUTCOME_METRIC_SECONDARY, OUTCOME_METRIC_SECONDARY], OUTCOME_NUMBER_COUNT)

# Plot the  number of service users, instead of the total number of outcome metrics
#fig = sankey_vis(public_ids, data, [OUTCOME_METRIC_SECONDARY], OUTCOME_SERVICE_USERS)

## Important Notice and Disclaimer on INDIGO Data
<sub><sup>
INDIGO data are shared for research and policy analysis purposes. INDIGO data can be used to support a range of insights, for example, to understand the social outcomes that projects aim to improve, the network of organisations across projects, trends, scales, timelines and summary information. The collaborative system by which we collect, process, and share data is designed to advance data-sharing norms, harmonise data definitions and improve data use. These data are NOT shared for auditing, investment, or legal purposes. Please independently verify any data that you might use in decision making. We provide no guarantees or assurances as to the quality of these data. Data may be inaccurate, incomplete, inconsistent, and/or not current for various reasons: INDIGO is a collaborative and iterative initiative that mostly relies on projects all over the world volunteering to share their data. We have a system for processing information and try to attribute data to named sources, but we do not audit, cross-check, or verify all information provided to us. It takes time and resources to share data, which may not have been included in a project’s budget. Many of the projects are ongoing and timely updates may not be available. Different people may have different interpretations of data items and definitions. Even when data are high quality, interpretation or generalisation to different contexts may not be possible and/or requires additional information and/or expertise. Help us improve our data quality: email us at indigo@bsg.ox.ac.uk if you have data on new projects, changes or performance updates on current projects, clarifications or corrections on our data, and/or confidentiality or sensitivity notices. Please also give input via the INDIGO Data Definitions Improvement Tool and INDIGO Feedback Questionnaire.
</sup></sub>