Our data generator will have three components. First, we define the link between the "magic" integer ids and the object attributes (e.g. name, location, etc. of activities in EXIOBASE). This will be in a Pandas dataframe.

In [1]:
import pandas as pd

In [2]:
df = pd.DataFrame([
    {'index': 0, 'name': 'foo', 'location': 'CH'},
    {'index': 1, 'name': 'foo', 'location': 'FR'},
    {'index': 2, 'name': 'bar', 'location': 'DE'},
]).set_index('index')

In [3]:
df

Unnamed: 0_level_0,name,location
index,Unnamed: 1_level_1,Unnamed: 2_level_1
0,foo,CH
1,foo,FR
2,bar,DE


Then we have the generator function itself. This can generate an infinite series, but we need to define ahead of time the number of rows (i.e. the number of exchanges that will be modified).

We will have two rows in this example. When iterated over, our function will therefore return a 1-d numpy array of length 2.

In [6]:
import numpy as np

In [7]:
def gen_func():
    while True:
        yield np.random.random(size=(2,))

In [8]:
for x, y in zip(range(10), gen_func()):
    print(x, y)

0 [0.80917835 0.18277058]
1 [0.99828613 0.94558437]
2 [0.14162607 0.23392832]
3 [0.02861499 0.92182092]
4 [0.47642805 0.96394575]
5 [0.81737114 0.57888545]
6 [0.10742288 0.76787843]
7 [0.18578976 0.41947305]
8 [0.8404466 0.4388719]
9 [0.21330664 0.1809946 ]


Finally, we need to define which exchange both elements in our data generator refer to. Here, we again use our "magic" integer ids.

In [11]:
exchanges = [
    {"row": 0, "col": 1},
    {"row": 1, "col": 2},
]

We can now add all three components to a processed data package.

In [1]:
from bw_processing import *

In [None]:
dirname = "generator_metadata"

This will create the directory ``dirname`` in the working directory of this notebook. You can also pass in a string (or `pathlib.Path` instance) for any other path.

If you run this twice, you will need to change `dirname` or delete the created directory, or you will get an error.

In [8]:
dp = Datapackage.create(dirname)

The order things are added in is important, and we need to include the logical connection between them.

In [9]:
dp.add_presamples_data_array(
    gen_func,
    matrix_label="technosphere", 
    name="infinite-data", 
    is_interface=True # The magic that allows for infinite sequences
)

In [12]:
dp.add_presamples_indices_array(
    # indices_wrapper allows us to use a list of dicts, 
    # instead of building the array ourselfes
    indices_wrapper(exchanges), 
    data_array="infinite-data",
    name="infinite-data-indices",
    nrows=2
)

In [13]:
dp.add_csv_metadata(
    df, 
    valid_for=[("infinite-data-indices", "rows"), ("infinite-data-indices", "cols")], 
    name="infinite-data-meta"
)

In [14]:
dp.metadata

{'profile': 'data-package',
 'name': 'a227c871c02648f78fe75df905bf8d4b',
 'id': '04e4ea83757c4e938fbb9ca06c79461e',
 'licenses': [{'name': 'ODC-PDDL-1.0',
   'path': 'http://opendatacommons.org/licenses/pddl/',
   'title': 'Open Data Commons Public Domain Dedication and License v1.0'}],
 'resources': [{'profile': 'interface',
   'format': 'npy',
   'mediatype': 'application/octet-stream',
   'name': 'infinite-data',
   'matrix': 'technosphere'},
  {'profile': 'data-resource',
   'format': 'npy',
   'mediatype': 'application/octet-stream',
   'name': 'infinite-data-indices',
   'path': 'infinite-data-indices.npy',
   'data_array': 'infinite-data'},
  {'profile': 'data-resource',
   'mediatype': 'text/csv',
   'path': 'infinite-data-meta.csv',
   'name': 'infinite-data-meta',
   'valid_for': [('infinite-data-indices', 'rows'),
    ('infinite-data-indices', 'cols')]}],
 'created': '2020-09-25T07:08:23.284964Z'}

In [17]:
dp.data

[<function __main__.gen_func()>,
 A deferred function that will read data only when needed,
       name location
 index              
 0      foo       CH
 1      foo       FR
 2      bar       DE]

In [18]:
dp.finalize()

In [20]:
dp.data

[A deferred function that will read data only when needed,
       name location
 index              
 0      foo       CH
 1      foo       FR
 2      bar       DE]

In [22]:
dp.resources

[{'profile': 'data-resource',
  'format': 'npy',
  'mediatype': 'application/octet-stream',
  'name': 'infinite-data-indices',
  'path': 'infinite-data-indices.npy',
  'data_array': 'infinite-data'},
 {'profile': 'data-resource',
  'mediatype': 'text/csv',
  'path': 'infinite-data-meta.csv',
  'name': 'infinite-data-meta',
  'valid_for': [('infinite-data-indices', 'rows'),
   ('infinite-data-indices', 'cols')]}]

In [19]:
ls generator_metadata/

datapackage.json           infinite-data-meta.csv
infinite-data-indices.npy


Test loading to make sure things are reasonable.

In [1]:
from bw_processing import *

In [2]:
dp = Datapackage.load("generator_metadata")

In [3]:
dp.data

[array([(0, 1, 2147483647, 2147483647), (1, 2, 2147483647, 2147483647)],
       dtype=[('row_value', '<i4'), ('col_value', '<i4'), ('row_index', '<i4'), ('col_index', '<i4')]),
    index name location
 0      0  foo       CH
 1      1  foo       FR
 2      2  bar       DE]

In [5]:
dp.metadata

{'profile': 'data-package',
 'name': 'a227c871c02648f78fe75df905bf8d4b',
 'id': '04e4ea83757c4e938fbb9ca06c79461e',
 'licenses': [{'name': 'ODC-PDDL-1.0',
   'path': 'http://opendatacommons.org/licenses/pddl/',
   'title': 'Open Data Commons Public Domain Dedication and License v1.0'}],
 'resources': [{'profile': 'data-resource',
   'format': 'npy',
   'mediatype': 'application/octet-stream',
   'name': 'infinite-data-indices',
   'path': 'infinite-data-indices.npy',
   'data_array': 'infinite-data'},
  {'profile': 'data-resource',
   'mediatype': 'text/csv',
   'path': 'infinite-data-meta.csv',
   'name': 'infinite-data-meta',
   'valid_for': [['infinite-data-indices', 'rows'],
    ['infinite-data-indices', 'cols']]}],
 'created': '2020-09-25T07:08:23.284964Z'}

To make this fully functional, we just need to add the generator data resource with the same name.

In [9]:
dp.add_presamples_data_array(
    gen_func,
    matrix_label="technosphere", 
    name="infinite-data", 
    is_interface=True
)

In [10]:
dp.metadata

{'profile': 'data-package',
 'name': 'a227c871c02648f78fe75df905bf8d4b',
 'id': '04e4ea83757c4e938fbb9ca06c79461e',
 'licenses': [{'name': 'ODC-PDDL-1.0',
   'path': 'http://opendatacommons.org/licenses/pddl/',
   'title': 'Open Data Commons Public Domain Dedication and License v1.0'}],
 'resources': [{'profile': 'data-resource',
   'format': 'npy',
   'mediatype': 'application/octet-stream',
   'name': 'infinite-data-indices',
   'path': 'infinite-data-indices.npy',
   'data_array': 'infinite-data'},
  {'profile': 'data-resource',
   'mediatype': 'text/csv',
   'path': 'infinite-data-meta.csv',
   'name': 'infinite-data-meta',
   'valid_for': [['infinite-data-indices', 'rows'],
    ['infinite-data-indices', 'cols']]},
  {'profile': 'interface',
   'format': 'npy',
   'mediatype': 'application/octet-stream',
   'name': 'infinite-data',
   'matrix': 'technosphere'}],
 'created': '2020-09-25T07:08:23.284964Z'}