# First steps with Python and Ocelot

In [1]:
import ocelot

Import a `spold` (ecospold2) file. Normally we would import a directory with many datasets, but we start slowly and look at just one file for now.

In [2]:
filename = "heat-cogeneration-glo.spold"
datafile = ocelot.io.extract_ecospold2.generic_extractor(filename)

However, the ecospold2 format is a tricky beast. One implementation detail says that it is OK to have multiple datasets within a file. We first check to make sure that we have only one dataset, and then select it by index.

Python is *zero-indexed*. That means if I have a list:

    a = ['cow', 'chicken', 'pig']
    
The first element of that list has index `0`:

    In[0]: a[0]
    Out[0]: 'cow'

In [3]:
len(datafile)

1

In [4]:
dataset = datafile[0]

Before we can do anything sensible with this dataset, we need to know what to expect. So please go look a bit through the [Ocelot data format](https://docs.ocelot.space/data_format.html) (Go ahead, I'll wait. I have nothing but time). This will probably raise more questions than it answers, but it is at least a start.

The first thing to notice is that the data format is what python calls a `dictionary`. Other languages call this a hash table. A dictionary has keys and values (like a word and its definition), and looks like this:

    {
        "a key": "some value"
    }

And we look up values using keys, the same way that we looked up indices in lists:

    In[1]: our_dictionary = {"a key": "some value"}
    In[2]: our_dictionary["a key"]
    Out[2]: "some value"

We don't have to use strings, we can have pretty much any type of data:

    {
        True: False
    }

Speaking of types, you can use the `type()` function to get the type of an argument in Python.

So let's look at the `keys` in our dataset dictionary, and get the `type` of the `values`.

In [6]:
for key in dataset:
    print(key, type(dataset[key]))

end date <class 'str'>
economic scenario <class 'str'>
location <class 'str'>
start date <class 'str'>
access restricted <class 'str'>
name <class 'str'>
filepath <class 'str'>
type <class 'str'>
combined production <class 'bool'>
parameters <class 'list'>
technology level <class 'str'>
exchanges <class 'list'>
id <class 'str'>


* `str` is a string, e.g. `"this is an example"`. You can use `"double"` or `'single'` quotes.
* `bool` is a boolean, i.e. `True` or `False`.
* `list` is a list, e.g. [1,2,3,"look","at","me"]. Lists can have more than one type of data inside them.

## Question 1: Getting data attributes by key

* How can you get the name of the dataset?
* How can you get the location of the dataset?

In [7]:
# Delete this comment and try your own code here

## Question 2: Getting the number of exchanges

The function `len()` will return the length of something, e.g. the length of a list:

    In[1]: len([1,2,3])
    Out[1]: 3
    
Or the length of a string:

    In[1]: len("milk")
    Out[1]: 4

In [None]:
# Delete this comment and try your own code here

Our activity dataset has inputs and outputs. All flows for the dataset are given in the list `exchanges`. There are two types of exchanges: [technosphere exchanges](https://docs.ocelot.space/data_format.html#technosphere-exchanges-activity-exchange-schema), which are product or service flows in the supply chain, and [biosphere exchanges](https://docs.ocelot.space/data_format.html#biosphere-exchanges-elementary-exchange-schema), which are interactions with the natural world, either emissions or consumption of resources.

Exchanges have a lot of data:

In [8]:
dataset['exchanges']

[{'amount': 6.47766990291262e-10,
  'byproduct classification': 'allocatable product',
  'conditional exchange': False,
  'formula': 'factor_MJ_kWh*0.0000000000556',
  'id': '1c0c37a4-2cdb-41b1-98ba-e6862aa79543',
  'name': 'gas power plant, 100MW electrical',
  'properties': [],
  'tag': 'intermediateExchange',
  'type': 'from technosphere',
  'uncertainty': {'mean': 6.47766990291262e-10,
   'mu': -21.0589383934,
   'pedigree matrix': {'completeness': 3,
    'further technology correlation': 3,
    'geographical correlation': 3,
    'reliability': 3,
    'temporal correlation': 3},
   'type': 'lognormal',
   'variance': 0.12,
   'variance with pedigree uncertainty': 0.1327},
  'unit': 'unit',
  'variable': 'gas_power_plant'},
 {'amount': 4.1,
  'byproduct classification': 'allocatable product',
  'conditional exchange': False,
  'id': '8dced74b-0677-4388-9215-bb6d30b8b084',
  'name': 'heat, district or industrial, natural gas',
  'production volume': {'amount': 4439289063000.0,
   'fo

Sometimes exchanges have `properties`. For example, a property could be the dry or wet mass of a flow. In our dataset, the exchange for `heat, district or industrial, natural gas` has a property for `price`. Let's look at this exchange in more detail and we can see this a little more clearly. To do this, we will iterate over the exchanges, and pull out one based on its name:

In [9]:
heat_process_name = 'heat, district or industrial, natural gas'

for exchange in dataset['exchanges']:
    if exchange['name'] == heat_process_name:
        selected_exchange = exchange
        
selected_exchange

{'amount': 4.1,
 'byproduct classification': 'allocatable product',
 'conditional exchange': False,
 'id': '8dced74b-0677-4388-9215-bb6d30b8b084',
 'name': 'heat, district or industrial, natural gas',
 'production volume': {'amount': 4439289063000.0,
  'formula': 'heat/electricity*electricity_apv',
  'variable': 'heat_apv'},
 'properties': [{'amount': 0.0106,
   'id': '38f94dd1-d5aa-41b8-b182-c0c42985d9dc',
   'name': 'price',
   'uncertainty': {'maximum': 0.0106,
    'minimum': 0.0106,
    'pedigree matrix': {'completeness': 1,
     'further technology correlation': 1,
     'geographical correlation': 1,
     'reliability': 1,
     'temporal correlation': 1},
    'standard deviation 95%': 0.0,
    'type': 'undefined'},
   'unit': 'EUR2005'},
  {'amount': 0.184213553594,
   'id': '7a3978ea-3e26-4329-bc8b-0915d58a7e6f',
   'name': 'true value relation',
   'uncertainty': {'maximum': 0.184213553594,
    'minimum': 0.184213553594,
    'pedigree matrix': {'completeness': 1,
     'further t

## Question 3: Find the numeric value (`amount`) of the property `price` for the our selected exchange.

In the code above, you saw some new python syntax: `if` statements, tests for equality (`==`, can also use `<`, `>`, `!=` for not equals). You should be able to answer this question using these syntax elements!

In [10]:
# You know what to do here...

If you need a hint, start with something like this:

    for property in selected_exchange['properties']:
        if property['name'] == 'something':
            print(property['amount'])

We can also write functions that do what we have been already been doing. For example, here is a function that returns a list:

In [11]:
def my_function():
    return [1,2,3]

I can call my function:

In [12]:
my_function()

[1, 2, 3]

Functions can have inputs:

In [14]:
def my_function_with_inputs(input_1, input_2):
    print("Input 1 was:", input_1)
    print("Input 2 was:", input_2)

In [16]:
my_function_with_inputs("The cow says (wait for it...)", "MOOOOOOOOOOO!!!")

Input 1 was: The cow says (wait for it...)
Input 2 was: MOOOOOOOOOOO!!!


We can create a function to generate some statistics on this dataset:

In [19]:
def some_statistics(ds):  # `ds` is an input of a dataset
    print("Number of exchanges:", len(ds['exchanges']))
    print("The sum of the exchanges:", sum(exchange['amount'] for exchange in ds['exchanges']))
    for index, exchange in enumerate(ds['exchanges']):
        # We can use `.get(key, default_value)` to get a key if present, or get the default value otherwise
        print("Exchange {}: {} has {} properties".format(index, exchange['name'], len(exchange.get('properties', []))))

In [20]:
some_statistics(dataset)

Number of exchanges: 5
The sum of the exchanges: 6.020866318795638
Exchange 0: gas power plant, 100MW electrical has 0 properties
Exchange 1: heat, district or industrial, natural gas has 2 properties
Exchange 2: electricity, high voltage has 2 properties
Exchange 3: natural gas, high pressure has 0 properties
Exchange 4: Carbon dioxide, fossil has 0 properties
