<a href="https://colab.research.google.com/github/Resource-Efficiency-Collective/coding-tutorials/blob/main/energy_consumption.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Energy consumption

This notebook is split into 3 tasks:


1.   US Energy Sankey example
2.   Create your own Sankey using IEA data
3.   Automate Sankey generation for any country by writing a Python function

Run the first two cells setup the notebook.





In [None]:
%%capture
"""Installation and downloads"""
# Install floweaver and display widget packages
%pip install floweaver ipysankeywidget openpyxl --upgrade

# Import necessary packages
from floweaver import *
import gdown, os
from google.colab import files

# Import and unzip files -> You can then view them in the left files panel
folder, zip_path = 'example_data', 'example_data.zip'
if not os.path.exists(folder): 
  gdown.download('https://drive.google.com/uc?id=1qriY29v7eKJIs07UxAw5RlJirfwuLnyP', zip_path ,quiet=True)
  ! unzip $zip_path -d 'example_data'
  ! rm $zip_path

In [None]:
"""Display setup"""
# Enable widget display for Sankeys in Colab
from google.colab import output
output.enable_custom_widget_manager()

## Task 1 - US example 

Step through this section to see an example for the US based on the [Sankey diagrams of US energy consumption from the Lawrence Livermore National Laboratory](https://flowcharts.llnl.gov/) (thanks to John Muth for the suggestion and transcribing the data).

In [None]:
"""Load the dataset"""
dataset = Dataset.from_csv('example_data/us-energy-consumption.csv',
                           dim_process_filename='example_data/us-energy-consumption-processes.csv')

In [None]:
"""Define the order the nodes appear in"""
sources = ['Solar', 'Nuclear', 'Hydro', 'Wind', 'Geothermal',
           'Natural_Gas', 'Coal', 'Biomass', 'Petroleum']

uses = ['Residential', 'Commercial', 'Industrial', 'Transportation']

In [None]:
"""define the Sankey diagram definition"""
nodes = {
    'sources': ProcessGroup('type == "source"', Partition.Simple('process', sources), title='Sources'),
    'imports': ProcessGroup(['Net_Electricity_Import'], title='Net electricity imports'),
    'electricity': ProcessGroup(['Electricity_Generation'], title='Electricity Generation'),
    'uses': ProcessGroup('type == "use"', partition=Partition.Simple('process', uses)),
    
    'energy_services': ProcessGroup(['Energy_Services'], title='Energy services'),
    'rejected': ProcessGroup(['Rejected_Energy'], title='Rejected energy'),
    
    'direct_use': Waypoint(Partition.Simple('source', [
        # This is a hack to hide the labels of the partition, there should be a better way...
        (' '*i, [k]) for i, k in enumerate(sources)
    ])),
}

ordering = [
    [[], ['sources'], []],
    [['imports'], ['electricity', 'direct_use'], []],
    [[], ['uses'], []],
    [[], ['rejected', 'energy_services'], []]
]

bundles = [
    Bundle('sources', 'electricity'),
    Bundle('sources', 'uses', waypoints=['direct_use']),
    Bundle('electricity', 'uses'),
    Bundle('imports', 'uses'),
    Bundle('uses', 'energy_services'),
    Bundle('uses', 'rejected'),
    Bundle('electricity', 'rejected'),
]

In [None]:
"""Define the colours to roughly imitate the original Sankey diagram"""
palette = {
    'Solar': 'gold',
    'Nuclear': 'red',
    'Hydro': 'blue',
    'Wind': 'purple',
    'Geothermal': 'brown',
    'Natural_Gas': 'steelblue',
    'Coal': 'black',
    'Biomass': 'lightgreen',
    'Petroleum': 'green',
    'Electricity': 'orange',
    'Rejected energy': 'lightgrey',
    'Energy services': 'dimgrey',
}

And here's the result!

In [None]:
sdd = SankeyDefinition(nodes, bundles, ordering,
                       flow_partition=dataset.partition('type'))
weave(sdd, dataset, palette=palette) \
    .to_widget(width=700, height=450, margins=dict(left=100, right=120), debugging=True)

You can save a copy of the Sankey by adding `.auto_save_png('filename.png')` or `.auto_save_svg('filename.svg')` to the end of the `weave` call in the previous box.

## Task 2 - Create your own

Follow the steps below to create an equivalent Sankey for a different country. Fill in each of the lines with a `##Complete here##` entry. If you get completely lost then feel free to consult the [solutions sheet](https://colab.research.google.com/github/resource-efficiency-collective/coding-tutorials/blob/main/energy-consumption-solutions.ipynb) for inspiration, but please do have a go first.

  1. Find and download the IEA World Energy Balances Highlights spreadsheet, from the webpage: https://www.iea.org/data-and-statistics/data-product/world-energy-statistics-and-balances. Then upload it to Colab using the `upload` button in the left panel.

  2. In the next cell import the Excel sheet to a pandas DataFrame. To find appropriate functions for the next steps either have a look at the [pandas documentation](https://pandas.pydata.org/docs/reference/index.html), or remember [your best friend](https://www.google.com/) when writing code.



In [None]:
"""Read in an Excel file"""
import pandas as pd
fileName = 'WorldEnergyBalancesHighlights2021.xlsx'
sheetName = 'TimeSeries_1971-2020'
data = ##Complete here##

3. Filter the DataFrame to contain only the desired country data.

In [None]:
"""Get desired country"""
country = ##Complete here##
countryData = data.loc[##Complete here##]

4. Filter the DataFrame to only contain 'Product', 'Flow' and value for the latest full year. To get the latest year, find the maximum integer value in the column headers.

In [None]:
"""Get values for latest year"""
lastYear = max([colName for colName in ##Complete here## if isinstance(colName, int)])
filterData = countryData[##Complete here##]

# Display data
display(filterData)

5. Filter out rows containing summaries (i.e. Total, Production), different units (GWh) or non-numeric values.

In [None]:
"""Filter out Totals and bad characters"""
remove = '|'.join(['Production','Total','GWh'])
filterData = filterData[~filterData['Product'].str.##Complete here##]
filterData = filterData[~filterData['Flow'].str.##Complete here##]
filterData = filterData[[type(i) is not str for i in ##Complete here##]]

6. Let's match the format in the files for the US example that you can find in the 'example_data' folder.

In [None]:
"""Create dataset table"""
# Rename the columns to define source, target and value
filterData.rename(##Complete here##, inplace=True)

# Create type column
filterData['type'] = ##Complete here##

# Get absolute values to display exports
filterData['value'] = ##Complete here##

# Create groupings - Attibutes all rows with the right element to a group defined by the left element
groups = [['Electricity','Electricity'],['Oil products','Oil refineries']]
for g in groups:
  filterData['target'] = [g[0] if g[1] in i['target'] else i['target'] for i in filterData.iloc]

# Order data so that imports are considered a source and not a target
import numpy as np
orderData = filterData.copy()
importRows = np.where(['Imports' in i for i in filterData['target']])[0]
orderData['source'].iloc[importRows] = ##Complete here##
orderData['target'].iloc[importRows] = ##Complete here##

display(orderData)

7. Let's display all the individual sources and targets and attribute them to process groups for our Sankey diagram.

In [None]:
"""Display individual sources and targets"""
display(orderData['source'].unique(), orderData['target'].unique())

In [None]:
"""Attribute to process groups"""
sources = ##Complete here##
uses = ##Complete here##
imports = ##Complete here##
exports = ##Complete here##
electricity = ##Complete here##
refining = ##Complete here##

8. Create process table as in us-energy-consumption-processes.csv

In [None]:
"""Get all unique types of sources and targets listed in products and flows respectively"""
idColumn = np.concatenate((sources,uses))
typeColumn = ['source']*len(sources)+['use']*len(uses)
processes = pd.DataFrame(np.array([idColumn,typeColumn]).transpose(), columns=['id','type'])

We now have the same tables as used in the US example. Time to build our own Sankey!

In [None]:
"""Load the dataset"""
dataset = Dataset(orderData, dim_process=processes.set_index('id'))

9. Fetch the Sankey definition for the US energy consumption example from the `"""define the Sankey diagram definition"""` box, adapt this to fit with your new process groups defined in step 7. Adapting the flows can be quite fiddly, we need to think effectively about the order of the flows. If you've spent over 30 minutes trying to work this out without success consider the solution sheet, but it's worth getting a bit frustrated first to understand how these flows hold together.

In [None]:
"""Define the Sankey diagram definition"""
##Complete here##

In [None]:
"""Define the colours to roughly imitate the original Sankey diagram"""
palette = {
    ##Complete here##
}

In [None]:
"""Draw out Sankey"""
sdd = SankeyDefinition(nodes, bundles, ordering,
                       flow_partition=dataset.partition('type'))
weave(sdd, dataset, palette=palette) \
    .to_widget(width=900, height=500, margins=dict(left=100, right=200)) \
.auto_save_svg(country+'Sankey.svg')

# Task 3 - Let's automate this procedure for any country with just one click.

Define a function that incorporates all of the previous steps while thinking about still being able to modify it from the outside. The `%%writefile` command writes this function to a Python file when the cell is run.

In [None]:
%%writefile draw_sankey.py
import numpy as np
import pandas as pd
from floweaver import *

def draw_Country_Sankey()##Complete here##:
    """This function creates a Sankey diagram for the data contained in the specified
    .xlsx file according to the specified parameters"""
    
    # Function definitions for each processing stage
    def get_country_data()##Complete here##:
        """Extract data for desired country from Excel sheet to pandas dataFrame"""
        ##Complete here##

    def filter_data()##Complete here##:
        """Filter dataFrame according to year and removing rows with unused data"""
        ##Complete here##

    def format_data()##Complete here##:
        """Format dataFrame to be in correct format for floweaver"""
        ##Complete here##

    def group_processes():
        """Group inputs and outputs that represent same process"""
        ##Complete here##

    def reorder_data()##Complete here##:
        """Order data so that imports are considered a source and not a target"""
        ##Complete here##

    def create_process_df()##Complete here##:
        """Get all unique types of sources and targets listed in products and flows respectively"""
        ##Complete here##

    # Define default parameters
    params={
        ##Complete here##
    }

    diagramParams={
        ##Complete here##
    } 

    # Update default parameters if other parameters are passed to the function
    ##Complete here##

    # Data processing
    countryData = get_country_data(##Complete here##)
    filterData, year = filter_data(##Complete here##)
    formattedData = format_data(##Complete here##)
    groupData = group_processes(##Complete here##)
    orderedData = reorder_data(##Complete here##)

    # Create processes
    processes = create_process_df(##Complete here##)

    # Create Sankey Dataset
    dataset = Dataset(##Complete here##)

    # Return sankey diagram
    sdd = SankeyDefinition(##Complete here##, flow_partition=dataset.partition('type'))
    return weave(##Complete here##).to_widget(width=900, height=500, margins=dict(left=100, right=200)) \
    .auto_save_svg(##Complete here##)


Now import the function you've created from the written file. This could be done from any Jupyter notebook or python script.

In [None]:
# Import function
import draw_sankey as ds

# Reloads function in case you've already imported it to this notebook
from importlib import reload
reload(ds)

In [None]:
# Define variables
fileName = 'WorldEnergyBalancesHighlights2021.xlsx'
sheetName = 'TimeSeries_1971-2020'
country = 'France'

# Call function
diagram = ds.draw_Country_Sankey(fileName, sheetName, {'countryName':country})

# Display Sankey
display(diagram)

If this has worked then have a play around and make sure it works for any country by simply modifying the `country` variable.

The solution example is written according to the principles of functional programming, but this could also be written according to object-oriented programming by defining the sankey diagram as a `Class`. If you are a Python expert already and feeling bored why not have a go at re-writing the draw_sankey.py file as a Class definition. Alternatively you could solidify the function you've created with error messages to catch bad inputs or even create a widget which allows you to pick the country you're after off a list rather than typing in the variable.