# Generating ecoSPold files for WWT

Pascal Lesage notes for ICRA

## 1) Introduction

### 1.1) Contents of this Notebook  
The WWT LCI tool must generate importable ecoSpold2 tools.  
An example ecoSpold already in ecoinvent is found in the repo.  
This document shows: 
  - how to inform all the required fields  
  - how to generate the ecoSPold files  
Note that **two** ecoSpold files may need to be generated for a given situation: one for WW treatment, and one for WW discharged to the environment without treatment and for particulate content and dissolved substances flushed during hydraulic overload episodes (**See emails on the subject**).  

### Some terminology
Exchange: input or output of the dataset (product flow, emission, etc.)

### 1.2) Note on averaging   
How averaging (regions, technologies) has been discussed extensively in emails (search for subject "[ecoinvent WWT project] IMPORTANT: Input from experts required"  
Averaging is **not** considered here - it is expected that it will be done in the tool itself.

### 1.3) Note on uncertainty  
The characterization of uncertainty of exchanges in ecoinvent is very important. 
The uncertainty is by default described by a [lognormal distribution](https://en.wikipedia.org/wiki/Log-normal_distribution)  
The parameters used to define this distribution are based on the underlying normal distribution:  
  - $mu = ln(amount)$  
  - total variance = variance of underlying normal distribution    

The total variance, called $varianceWithPedigreeUncertainty$ in ecoSpold, is a function of: 
  - the basic uncertainty (simply called $variance$ in ecoSpold), for which we have default values based on the type of exchange we are dealing with, and
  - additionnal uncertainty based on a ranking of the quality of the data, the so-called Pedigree matrix approach.  

The quality ranking is based on the following table:
  ![Pedigree scores](pix\pedigree_scores.PNG)  
The scores are then converted to additional variance using the following factors:  
  ![Pedigree scores](pix\pedigree_scores_to_variance.PNG)  
The total uncertainty (called $varianceWithPedigreeUncertainty$ in ecoSpold) is then calculated as the sum of the basic uncertainty and the additional uncertainty, all expressed as variances of the underlying normal.  

The difficulty arises when the uncertainty of a given exchange is itself calculated on the basis of uncertain parameters. The uncertainty of the underlying parameters then needs to be **propagated** to the exchange amount. There are different approaches that could be taken here:  
  1. Simplest: simply estimate the uncertainty at the exchange level, without formal consideration of the uncertainty of underlying parameters. Lowest quality approach.   
  2. Calculate total uncertainty in tool, and and enter the exchange with the total uncertainty. The pedigree scores would be all ones, and the uncertainty comment would say something to the effect of "uncertainty calculated elsewhere". This is the approach taken in the old tool.    
  3. Enter the parameters as ecoSpold parameters, with uncertainty, and calculate the exchanges as a mathematical relation in ecoSPold. This is the option that is most elegant, but also much more complicated on my end.  
  
Which approach to take is still not determined, but ecoinvent seems to prefer option 2.  

I will for now enter dummy uncertainty values with comment "PLACEHOLDER ESTIMATED UNCERTAINTY" in places where uncertainty should be estimated (by ICRA, by user) but isn't yet.  
I will  for now enter dummy uncertainty values with comment "PLACEHOLDER CALCULATED UNCERTAINTY" in places where uncertainty should be _calculated_ but isn't yet.  

The uncertainty of parameters are defined as dictionaries with the following structure: `{'variance':0.01, 'pedigreeMatrix':[2,4,3,2,4], 'comment': "Some comment"}` 

## 2) Standard inputs

In [1]:
import os
import pandas as pd
import pickle # Used temporarily to access a MasterData dictionary - check if still useful at the end of the project.
from lxml import objectify #Convert XML to dict
import time

In [2]:
# Due to some pickle files having been generated with an older version of Pandas
#import pandas.core.indexes
import sys
#sys.modules['pandas.indexes'] = pandas.core.indexes

## 3) Guillaume Bourgault (GB) code and adaptations/additions

This document relies heavily on the code prepared by GB and distributed in July (`spold2_writer_use.py`).  

Some slight modifications were made to make it easier to use with the WWT tool ==> see `spold2_writer_functions.py`

To run the .py file from the Notebooks, one can use the [%run](http://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-run) magic command

In [3]:
os.chdir(os.path.join(os.getcwd(), 'waste_water_tool'))
%run spold2_writer_functions.py

## 4) Master data and the generation of the `MD` dictionary

The ecoinvent database contains master data for the following entities: Activity Names, Classifications, Companies, Compartments, Exchanges (Elementary and Intermediate), Geographies, Languages, Market Models, Parameters, Persons, Properties, Scenarios, Sources, Tags and Units. 

There are discussions underway to have the tool access the master data on the ecoinvent/IFU server. However, for now, this has not yet been resolved, and many at the ecoinvent Center do not feel this is very important because the amount of master data used for the WWT datasets is not that important, and because the master data could be stored on the server that will host the WWT tool and easily be regularly updated.

For now, I will use the master data that is downloaded on my computer via the [ecoEditor](http://www.ecoinvent.org/data-provider/data-provider-toolkit/ecoeditor/ecoeditor.html).  
Guillaume of the ecoinvent Center (henceforth GB) has written the following code to help **find the master data**:

In [4]:
master_data_folder = find_current_MD_path()
master_data_folder

'C://Users\\Pascal Lesage\\Documents\\ecoinvent\\EcoEditor\\xml\\MasterData\\Production'

Here are the **contents of the master data directory**:

In [5]:
os.listdir(master_data_folder)

['ActivityIndex.xml',
 'ActivityNames.xml',
 'Classifications.xml',
 'Companies.xml',
 'Compartments.xml',
 'Context.xml',
 'DeletedMasterData.xml',
 'ElementaryExchanges.xml',
 'ExchangeActivityIndex.xml',
 'Geographies.xml',
 'IntermediateExchanges.xml',
 'Languages.xml',
 'MacroEconomicScenarios.xml',
 'Parameters.xml',
 'Persons.xml',
 'Properties.xml',
 'Sources.xml',
 'SystemModels.xml',
 'Tags.xml',
 'UnitConversions.xml',
 'Units.xml',
 'user']

The py file includes code to **assemble all the master data in one dictionary**, **`MD`**, where:  
  - the keys of the dictionary are the names of the files above (`ActivityIndex`, `ActivityNames`, etc.)  
  - the values are the contents of the master data xml assembled as **pandas dataframes**.  

Here are some details:  

`get_current_MD(master_data_folder=None, pkl_folder=None, return_MD=False)`:   
  - Arguments:  
    - `master_data_folder` = dir of master data. If `None`, `find_current_MD_path` is used  
    - `pkl_folder` = directory of previously built master data dictionary. If `None` passed, the function will look where it expects it to be, i.e. ` os.path.join(os.path.dirname(os.path.realpath(__file__)),'pkl')`  
    - `return_MD`: if False, the function retunrs None, else it returns the MD
  - Compares the age of the existing master data dictionary MD (if it exists) with that of the actual master data to determine whether the disctionary can be used as-is or whether it needs to be created/updated.  
  - If it needs to be created, the function `build_MD` is called.
  - returns MD
  
`build_MD(md_fields_xls=None, master_data_folder=None, pickle_dump_folder=None, xls_dump_folder=None)`:
  - Called from `get_current_MD`, if needed. 
  - Arguments:  
    - `md_fields_xls`: path to the file `MasterData_fields.xlsx`, by default in `root_dir/documentation`. Default used if argument not passed.  
    - `master_data_folder` = dir of master data. If `None`, `find_current_MD_path`is used  
    - `pickle_dump_folder` = Directory where the pickled **`MD`** should be stored. If `None`, the MD pickle is not stored.
    - `xls_dump_folder` = Directory where the xls version of the master data should be stored. If `None`, the xls is not generated.

**Use in our case**: the `MD.pkl` dictionary is required later, and so needs to be generated.

In [6]:
MD = get_current_MD(return_MD=True)

"MasterData.xlsx" ready in C:\mypy\code\wastewater_treatment_tool\waste_water_tool\documentation

setting MD indices


In [7]:
# Names of dataframes:
[*MD.keys()]

['ActivityIndex',
 'ActivityNames',
 'Classifications',
 'Companies',
 'Compartments',
 'ElementaryExchanges',
 'Geographies',
 'IntermediateExchanges',
 'Languages',
 'MacroEconomicScenarios',
 'Parameters',
 'Persons',
 'Properties',
 'Sources',
 'SystemModels',
 'Tags',
 'UnitConversions',
 'Units',
 'ExchangeActivityIndex',
 'IntermediateExchanges prop.',
 'ElementaryExchanges prop.']

Here is a sample DF, for geographies.

In [8]:
MD['Geographies'].head()

Unnamed: 0_level_0,id,latitude,longitude,name,uNCode,uNRegionCode,uNSubregionCode
shortname,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
AD,10033c48-7d7e-11de-9ae2-0019e336be3a,42.549,1.576,Andorra,0,0,0
AE,14f9e5d0-7d7e-11de-9ae2-0019e336be3a,23.549,54.163,United Arab Emirates,0,0,0
AF,0c608726-7d7e-11de-9ae2-0019e336be3a,33.677,65.216,Afghanistan,0,0,0
AG,09581fbc-7d7e-11de-9ae2-0019e336be3a,17.078,-61.783,Antigua and Barbuda,0,0,0
AI,0fefde28-7d7e-11de-9ae2-0019e336be3a,18.237,-63.032,Anguilla,0,0,0


## 5) High level parameters needed to generate the ecoSpold files

### 5.1) Wastewater properties  
  - BOD5, metal content, etc. 
  - For now, I have stored these in an Excel file (WW_properties.xlsx, in the documentation folder), with random values. 
  - The tool will need to collect this information (ICRA). 
  - The Excel file also contains (random) data on the **dissolved and particulate fractions**, used later in calculations. 
  - I store this information in a pandas dataframe: I suggest the tool does the same thing, since many functions below use this object.

**Some notes**  

**C**: the original tool says the data can be entered as any combination of the following, with a preference for BOD5:  
  - Chemical Oxygen Demand COD as O2  
  - Biological Oxygen Demand BOD5 as O2  
  - Dissolved organic carbon DOC as C  
  - Total organic carbon TOC as C  
  
What is preferable for this tool?  


**N**: the original tool says the data can be entered as one of the following combinations:    
  - NH4 & NO3 & NO2 & N part. & N org. solv.
  - SKN & NO3 & NO2 & N-part.  
  - TKN & NO3 & NO2  
  - N-tot.  
  
How should we deal with this in the tool? Should we ensure that at least one of these combinations is respected?  


**P**: the original tool says the data can be entered as one of the following combinations:    
  - PO4-P & P-part. 
  - P-tot.
How should we deal with this in the tool? Should we ensure that at least one of these combinations is respected?  


**S**: the original tool says the data can be entered as one of the following combinations:  
  - SO4 & S-part  
  - S-tot.  

How should we deal with this in the tool? Should we ensure that at least one of these combinations is respected?  

There are four properties in the old tool that are accounted for, but for which properties do not exist in the ecoinvent Master Data:  
  - Particulate P-part. as P  
  - Total P-tot. as P  
  - Soluble Kjeldahl SKN as N  
  - Total Kjeldahl TKN as N  
  
These should be included in the ecoinvent Master Data and used in our tool.  
I included code that automaticall adds these to the Master Data when adding the property to the wastewater, and used the following names, based on ecoinvent nomenclature:  
  - mass concentration, particulate phosphorus as P  
  - mass concentration, total phosphorus as P  
  - mass concentration, soluble Kjeldahl SKN as N  
  - mass concentration, total Kjeldahl TKN as N  

Please validate these are OK names.

In [9]:
def get_WW_properties(xls=None):
    # From excel for now
    if xls==None:
        xls = os.path.join(root_dir, 'Documentation', 'WW_properties.xlsx')
    return pandas.read_excel(xls, sheet_name='WW_props', index_col=1)

In [10]:
WW_prop_df = get_WW_properties()
WW_prop_df.head().T                   #Transposed for easier viewing. NaN means the cell was empty in the Excel file. 

name,"BOD5, mass per volume","COD, mass per volume","mass concentration, DOC","mass concentration, TOC","mass concentration, dissolved ammonia NH4 as N"
id,dd13a45c-ddd8-414d-821f-dfe31c7d2868,3f469e9e-267a-4100-9f43-4297441dc726,efe22a60-b1a3-4b33-a5ba-4bf575e0a889,a547f885-601d-4d52-9bf9-60f0cef06269,f7fa53fa-ee5f-4a97-bcd8-1b0851afe9a6
unitName,kg/m3,kg/m3,kg/m3,kg/m3,kg/m3
comment,"Biological Oxygen Demand BOD5, as O2",Chemical Oxygen Demand as O2,Mass concentration of Dissolved Organic Carbon,Mass concentration of Total Organic Carbon,Mass concentration of dissolved ammonia NH4 (C...
Amount,0.33673,0.242895,0.151075,0.416505,0.425506
amount_variance,0.00439758,0.00520738,0.000782943,0.00976428,0.00345681
amount_pedigree1,2,4,2,3,4
amount_pedigree2,4,4,5,2,1
amount_pedigree3,3,1,5,2,2
amount_pedigree4,5,5,3,4,3
amount_pedigree5,4,4,4,5,5


### 5.2) Fraction of wastewater discharged to sewer but not treated
This is the unconnected fraction (see emails on the subject).  
It is assumed here that the (ideally weighted) average of countries is OK to use in larger geographies.

In [11]:
# Example value
untreated_fraction = 0.3  
untreated_fraction_uncertainty = {'variance':0.01,
                                  'pedigreeMatrix':[2,4,3,2,4],
                                  'comment': "PLACEHOLDER ESTIMATED UNCERTAINTY",
                                 }

### 5.3) Fraction of particulates and dissolved substances flushed due to hydraulic overload  

I propose here a dummy function to calculate the amount lost to hydraulic overload. It is based on factors used in the existing ecoinvent WWT tool: 2% loss for dissolved fraction, and 1% for particulates. This will probably be better calculated in the tool.

In [12]:
overload_loss_fraction_particulate = 0.01        # Value from ecoinvent v2.2 tool
overload_loss_fraction_dissolved = 0.02   # Value from ecoinvent v2.2 tool
overload_loss_fraction_particulate_uncertainty = {'variance':0.01, 'pedigreeMatrix':[2,4,3,2,4], 'comment':"PLACEHOLDER ESTIMATED UNCERTAINTY"}      # Fake numbers for now.
overload_loss_fraction_dissolved_uncertainty = {'variance':0.01, 'pedigreeMatrix':[2,4,3,2,4], 'comment':"PLACEHOLDER ESTIMATED UNCERTAINTY"}        # Fake numbers for now.

## 6) Generating the ecoSpold file - treatment dataset

This section documents the data that the tool must generate. It does not discuss *how* this data is generated (underlying models, averaging, etc.) 

The data/documentation/naming requirements are taken from several sources:  
- The [Data Quality Guidelines](http://www.ecoinvent.org/files/dataqualityguideline_ecoinvent_3_20130506.pdf)  
- Notes found in the [ecoEditor](http://www.ecoinvent.org/data-provider/data-provider-toolkit/ecoeditor/ecoeditor.html) itself
- A [dataset documentation](http://www.ecoinvent.org/files/dataset_documentation_ecoinvent_3.pdf) document

### 6.1) Generate an empty dataset

Until it is rendered (see below), the dataset will be a dictionary.  
Use the function `create_empty_dataset`  
Some fields which will remain constant are pre-filled.  
One modification remains to be done in the tool: the default author needs to be changed (I put myself as a placeholder). I recommend putting the tool developer here.

In [13]:
treatment_dataset = create_empty_dataset()

### 6.2) Add ActivityIndex
Collection of many things that need to be generated from user input.  
There are a series of functions to run (see below), and then the final function to put it all together is `generate_activityIndex(dataset)`

#### 6.2.1) Activity name

From ecoEditor:
>Activity Name  
>The name describes the activity that is represented by this dataset. The activity name can only be edited when a new dataset is created. If you want to use this dataset under a new activity name, you need to create a new dataset with the desired name, using the current dataset as a template (menu File..., New..., FromExistingDataset).  
>Length: 120  
>Required: Yes  
>EcoSpold02 FieldId: 100  

From DQG:
>Activity names are spelled with lower case starting letter, i.e. “lime production”, not “Lime production”. 

In the case of the WWT datasets, the name will depend on the situation: 

- CASE 1: treatment of the wastewater from a specific source in an "average" WWTP  
- CASE 2: treatment of the wastewater from a specific source in a "specific" WWTP technology/capacity  
- CASE 3: treatment of average wastewater in an "average" WWTP  
- CASE 4: treatment of average wastewater in a "specific" WWTP technology/capacity  

I have written a function `create_WWT_activity_name` that generates a valid name based on three arguments that the tool will need to get from the user:  
- `WW_type` = two choices only: average, or "from x" (e.g. "from steel production", "from residence")
- `technology`: TBD  
- `capacity` = two choices only: 'average' or int representing the yearly capacity in l/year. A check on type should be done.

This function is later used by a second function, `generate_WWT_activity_name`, see below.

In [14]:
def create_WWT_activity_name(WW_type, technology, capacity):
    if WW_type == 'average':
        WW_type_str = ", average"
    else:
        WW_type_str = " {}".format(WW_type)
    
    if technology == 'average':
        technology_str = ""
    else:
        technology_str = "{}, ".format(technology)
    
    if capacity == 'average':
        capacity_str = "average capacity"
    else:
        capacity_str = "capacity {:.1E}l/year".format(capacity).replace('+', '').replace('E0', 'E').replace('.0', '')
    
    return "treatment of wastewater{}, {}{}".format(WW_type_str, technology_str, capacity_str)

Examples:

In [15]:
print(create_WWT_activity_name("average", "technology A", 1e9))
print(create_WWT_activity_name("from steel production", "technology A", 1e9))
print(create_WWT_activity_name("average", "average", 1e9))
print(create_WWT_activity_name("from steel production", "average", 1.1e9))
print(create_WWT_activity_name("average", "average", "average"))
print(create_WWT_activity_name("from steel production", "average", "average"))

treatment of wastewater, average, technology A, capacity 1E9l/year
treatment of wastewater from steel production, technology A, capacity 1E9l/year
treatment of wastewater, average, capacity 1E9l/year
treatment of wastewater from steel production, capacity 1.1E9l/year
treatment of wastewater, average, average capacity
treatment of wastewater from steel production, average capacity


Tests

In [16]:
existing_names = list(MD['ActivityNames'].index)

In [17]:
[create_WWT_activity_name("from black chrome coating", "average", 1.1e10) in existing_names,
create_WWT_activity_name("from lorry production", "average", 4.7e10) in existing_names,
create_WWT_activity_name("average", "average", 1.6e8) in existing_names]

[True, True, True]

`generate_WWT_activity_name` has four arguments:  
- `dataset`: the dataset (dictionary) that the name should be added to  
- Same three arguments as the `create_WWT_activity_name`  

It adds the name to the dataset.  
It also adds the WW_type to the dataset (because it is needed later in the creation of the reference flow name).

In [18]:
treatment_dataset = generate_WWT_activity_name(treatment_dataset, 'from ceramic production', 'average', 5e9)

In [19]:
treatment_dataset['WW_type']

'from ceramic production'

#### 6.2.2) ActivityNameID
Use `ActivityNameID` if the activity name already exists, else generate a new `ActivityNameID`.

To check if the Master data already exists, we:  
   (1) Get the `ActivityIndex` dataframe from `MD`   
   (2) Make the name the index of the dataframe  
   (3) Check if our name is in the dataframe.  

If the name does not exist:  
   (1) Generate a UUID  
   (2) Add a "generic object" to the dataset.
   
I created the function `generate_activityNameId` that takes as argument the `dataset` and the `MD` dictionary:

In [20]:
generate_activityNameId(treatment_dataset, MD)

#### 6.2.3) Geography  
The geography chosen should correspond to a geography already in the master data. It would probably therefore be a good idea to have a drop-down list to choose the geography, and a message saying that the data supplier should communicate with ecoinvent if the geography they want is not in the master data (email address: data@ecoinvent.org).  
I have written a function `generate_geography` that takes as argument the dataset and the geography **shortname**.  

In [21]:
MD['Geographies'].head()

Unnamed: 0_level_0,id,latitude,longitude,name,uNCode,uNRegionCode,uNSubregionCode
shortname,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
AD,10033c48-7d7e-11de-9ae2-0019e336be3a,42.549,1.576,Andorra,0,0,0
AE,14f9e5d0-7d7e-11de-9ae2-0019e336be3a,23.549,54.163,United Arab Emirates,0,0,0
AF,0c608726-7d7e-11de-9ae2-0019e336be3a,33.677,65.216,Afghanistan,0,0,0
AG,09581fbc-7d7e-11de-9ae2-0019e336be3a,17.078,-61.783,Antigua and Barbuda,0,0,0
AI,0fefde28-7d7e-11de-9ae2-0019e336be3a,18.237,-63.032,Anguilla,0,0,0


In [22]:
generate_geography(treatment_dataset, MD, 'GLO')

#### 6.2.4) TimePeriod (Start and end date)
Period for which the dataset is meant to be valid.  Supplied by user. It would be useful to supply default values.    

Format: 'YYYY-MM-DD'. No validation in formula now.

I have written a function `generate_time_period` that takes as argument the dataset, a start date and an end date. 

Format: 'YYYY-MM-DD'. No validation in formula now.

In [23]:
generate_time_period(treatment_dataset, start='1995-01-31', end='2020-12-31')

#### 6.2.5) Dataset ID
The dataset UUID is generated from the dataset['activityName'], dataset['geography'], dataset['startDate'], dataset['endDate']

In [24]:
generate_dataset_id(treatment_dataset)

#### Putting it all together

In [25]:
generate_activityIndex(treatment_dataset)

### 6.3) Activity description

#### 6.3.1) includedActivitiesStart, includedActivitiesEnd

Two text fields. Describes the boundaries of the unit process.

##### Included Activities Start
Suggested text for industrial wastewater:

In [26]:
includedActivitiesStartText = "From the discharge of wastewater {} to the sewer grid.".format(treatment_dataset['WW_type'])
includedActivitiesStartText

'From the discharge of wastewater from ceramic production to the sewer grid.'

Suggested text for municipal wastewater.

In [27]:
includedActivitiesStartText = "From the discharge of municipal wastewater to the sewer grid."

##### Included Activities End  
Based on the [dataset documentation](http://www.ecoinvent.org/files/dataset_documentation_ecoinvent_3.pdf) document, this section has three parts:  
>(i) what is the last activity covered resp. what is the point
of delivery of this dataset?
(ii) what activities are included (and not obvious from the name of the
activity)
(iii) what activities are intentionally excluded from this activity (Among other things, if
the activity is a service like e.g. spinning of bast fibres, that does not include the product used in
the process (i.e. the bast fibres), this information will be included here).  

Each part has a specific mandatory wording. I suggest providing default text for the three sections separately and allowing the user to modify as required.  

**Note** The suggested text below includes many specifications about how the tool should work based on discussions with ecoinvent and from reviewing the existing tool. Please read this carefully. 

In [28]:
includedActivitiesEndText_last =\
"This activity ends with the discharge of treated wastewater to the natural environment."

includedActivitiesEndText_included =\
"This activity includes the transportation of wastewater via the sewer grid, "\
"and the treatment of the wastewater in the wastewater treatment plant."\
"The amounts of infrastructure and consumables are also included as inputs to the activity."

includedActivitiesEndText_excluded = \
" By definition, wastewater not sent to the sewer grid is also excluded. "\
"The fraction of wastewater discharged to the sewer grid but ultimately not treated because the sewer is "\
"unconnected and direct emissions due to hydraulic overload are also excluded. "\
"These are included in another dataset specifically covering the discharge of untreated wastewater. "\
"The production of sludge is included, but its treatment is covered by another treatment activity."

In [29]:
"{} {} {}".format(
    includedActivitiesEndText_last,
    includedActivitiesEndText_included,
    includedActivitiesEndText_excluded
)

'This activity ends with the discharge of treated wastewater to the natural environment. This activity includes the transportation of wastewater via the sewer grid, and the treatment of the wastewater in the wastewater treatment plant.The amounts of infrastructure and consumables are also included as inputs to the activity.  By definition, wastewater not sent to the sewer grid is also excluded. The fraction of wastewater discharged to the sewer grid but ultimately not treated because the sewer is unconnected and direct emissions due to hydraulic overload are also excluded. These are included in another dataset specifically covering the discharge of untreated wastewater. The production of sludge is included, but its treatment is covered by another treatment activity.'

##### Adding the includedActivitiesStart, includedActivitiesEnd

In [30]:
generate_activity_boundary_text(treatment_dataset,
                               includedActivitiesStartText,
                               includedActivitiesEndText_last,
                               includedActivitiesEndText_included,
                               includedActivitiesEndText_excluded)

#### 6.3.2) Technology level

From the Data Quality Guidelines:

>The technology level of a transforming activity is classified in one of these five classes:  
0=Undefined. For market activities that do not have a technology level.  
1=New. For a technology assumed to be on some aspects technically superior to modern technology, but not yet the most commonly installed when investment is based on purely economic considerations.  
2=Modern. For a technology currently used when installing new capacity, when investment is based on purely economic considerations (most competitive technology).  
3=Current (default). For a technology in between modern and old.  
4=Old. For a technology that is currently taken out of use, when decommissioning is based on purely economic considerations (least competitive technology).  
5=Outdated. For a technology no longer in use.The terms used does not necessarily reflect the age of the technologies.   
A modern technology can be a century old, if it is still the most competitive technology, and an old technology can be relatively young, if it is one that has quickly become superseded by other more competitive ones. The technology level is relative to the year for which the data are valid, as given under Time Period. In a time series, the same technology can move between different technology levels over time. The same technology can also be given different technology levels in different geographical locations, even in the same year.  

The user should be able to choose the most relevant technology level from a drop-down menu, have access to descriptions and have the tool default to an appropriate value (current).

In [31]:
generate_technology_level(treatment_dataset, 'Current')

#### 6.3.3) Activity-level comment fields

This is a list of text cells. (Images are also legal, but I'm not sure how we could handle that, nor if they are useful)  

We should supply default text and allow the users to change the text if necessary.  

We should also determine how many cells we want to provide (different cells are listed one after another in ecoEditor, and this seperation is used to split different subjects).

I have written a generic function to add comment fields, `generate_comment` that takes the following arguments:  
- the `dataset`  
- the `comment_type`. Valid types are 'allocationComment','generalComment','geographyComment','technologyComment' and 'timePeriodComment'.  
- the list of text comments (one per cell). 

##### 6.3.3.1) Technology description
>Text (_and image, but I'm not sure how we'd handle that_) field to describe the technology of the activity. The text should cover information necessary to identify the properties and particularities of the technology(ies) underlying the activity data. Describes the technological properties of the unit process. If the activity comprises several subactivities, the corresponding technologies should be reported as well. Professional nomenclature should be used for the description.  

**We should discuss how default text can be generated here based on the modelled technologies/regions. Some of this is still up in the air as of today (August 18), see emails**  

In [32]:
tech_comment_1 = 'The technologies modelled are x and y'
tech_comment_2 = 'They were averaged based on z'
tech_comment_3 = 'These technologies rock'

generate_comment(treatment_dataset, 'technologyComment', [tech_comment_1, tech_comment_2, tech_comment_3])

##### 6.3.3.2) generalComment
From the [dataset documentation](http://www.ecoinvent.org/files/dataset_documentation_ecoinvent_3.pdf) document:  
>Information that concerns the construction of the inventory (details about the Functional Unit, background,
etc.) shall be entered in the General Comment field. Actually, this field can be compared to
the abstract of a scientific article – i.e. the field should offer to the user a first, rough overview of the
dataset.
Please start the text always with "This dataset represents [the production]/ [the service of] ...."  

We should provide default text and allow the user to modify. Again, multiple cells are possible. 

In [33]:
general_comment_1 = 'This dataset represents the treatment of wastewater discharged to the sewer grid {}'.format(
                            treatment_dataset['WW_type'])
general_comment_2 = 'It includes the transportation of the wastewater to the wastewater treatment plant and the actual treatment.'
general_comment_3 = 'It was modelled using XYZ'

generate_comment(treatment_dataset, 'generalComment', [general_comment_1, general_comment_2, general_comment_3])

#####  6.3.3.3) 'geographyComment', 'timePeriodComment'
The user _could_ want to include a comment. 
>**'timePeriodComment'** Text and image field for additional explanations concerning the temporal validity of the flow data reported. It may e.g. include information about:
- how strong the temporal correlation is for the unit process at issue (e.g., are four year old data still adequate for the activity operated today?)  

> **'geographyComment'**  Especially for area descriptions, the nature of the geographical delimitation may be given, especially when this is not an administrative area.  

Let's suppose no such comment now.

In [34]:
generate_comment(treatment_dataset, 'timePeriodComment', [''])
generate_comment(treatment_dataset, 'geographyComment', [''])

##### 6.3.3.4) 'allocationComment'
I would leave this out - I don't see the user needing this.

### 6.4) 'modellingAndValidation', Representativeness
There are multiple fields that are filled in by default (by function). The tool/user should provide, insofar as possible information on three specific items:  
 - 'samplingProcedure'  
>Text describing the sampling and calculation procedures applied for quantifying the exchanges. Reports whether the sampling procedure for particular elementary and intermediate exchanges differ from the general procedure. Mentions possible problems in combining different sampling procedures.  

I will let ICRA generate default text for this field.

 - 'Extrapolations'
 
> Describes extrapolations of data from another time period, another geographical area or another technology and the way these extrapolations have been carried out. It should be reported whether different extrapolations have been done on the level of individual exchanges. If data representative for a activity operated in one country is used for another country's activity, its original representativity can be indicated here. Changes in mean values due to extrapolations may also be reported here.

We should talk about the text text to include here.

`percent`
> Percent of data sampled out of the total that the activity is intended to represent (as given by the fields geography, technology and time period).

Perhaps blank as default (allowed), with the option to add information if available. 

Putting it all together in a function `generate_representativeness(dataset, samplingProcedure_text, extrapolations_text, percent)`

In [35]:
samplingProcedure_text = 'This is a description of the sampling procedure, and it should be changed by IRCA'
extrapolation_text = 'This is a description of the sampling procedure, and it should be changed by IRCA'
percent = 80

generate_representativeness(treatment_dataset, samplingProcedure_text, extrapolation_text, percent)

### 6.5) Data entry section:  
For now, filled in with dummy data ([Current User]). Gets populated by ecoEditor.  I moved the whole section to `create_empty_dataset`

### 6.6) DataGeneratorAndPublication
The data here should basically be the reference of the tool. We will need to include the following information:
  - Author (called dataGenerator)
  - PublishedSource (reference, if there is a report or paper coming out of this work). If we don't publish, this is empty and the `dataPublishedIn` is set to 0.  
  - pageNumber  
  - ... see ecoEditor. 
For now, I'll assume no publication and put myself as author (this will need to change).

The users will have the possibility to change all this in the ecoEditor, which is probably easier than doing it in the tool.

### 6.7) Reference exchange (reference flow)
This is the amount of wastewater treated in the WWTP. 
Here are some key things to know about this exchange:  

#### 6.7.1) Exchange itself
  - It needs to be expressed in m3  
  - Its amount is -1 (1 because it is the common denominator for the whole dataset, and - because this is a convention in ecoinvent to identify treated exchanges)  
  - The name is auto-generated by the `generate_reference_exchange` function  
  - The uncertainty of the reference exchange is nul (it is the only exchange without uncertainty). 
  - The reference exchange shall also be accompanied by a comment. I propose a default comment below.  

In [36]:
ref_exchange_comment = "Refers to the amount of wastewater treated in the wastewater treatment plant."

#### 6.7.2) Production volume

  - A production volume needs to be defined.  
  - It is equal to the total amount of the WW of interest sent to the sewer in the regional scope of the dataset minus the amount discharged to the environment due to the unconnected fraction.  
  - It is expressed in m3/year.  
  - For "average" WW, we can possibly find default numbers.  
  - For industrial WW, we should provide guidance on how to generate this value: (total production volume of the production dataset $*$ the amount of WW generated per unit produced $*$ (1-untreated fraction))  
  - The production volume needs to accompanied with a comment. We could determine what default comment would be appropriate once we determine how the default value for the production volume will be calculated.  
  - The uncertainty of the production volume *should* be calculated as a function of the uncertainty of two parameters:  
    - (Industrial WW) The uncertainty of the production volume of WW (found in the producing activity dataset, should be entered manually.  
    - The uncertainty of the untreated_fraction  
    - For now, I'll just add a dummy function that calculates this: `dummy_calculate_uncertainty_treatment_PV`    

In [37]:
total_PV = 1000000 # Total wastewater generated. Not all of this volume gets treated
total_PV_uncertainty= {'variance':0.01,
                       'pedigreeMatrix':[2,4,3,2,4],
                       'comment': "PLACEHOLDER ESTIMATED UNCERTAINTY"}                                                    
PV_comment = "Yearly volume of wastewater treated."
if untreated_fraction != 0:
    PV_comment += " Excludes the fraction that is discharged directly to the environment ({:.0f}%).".format(
        untreated_fraction*100)

#### 6.7.3) Properties  
##### 6.7.3.1) Disolved substances, particulates
  - The reference exchange also needs to be associated with its properties.  
  - Note that the properties of the water sent to the WWTP are **NOT** the same as those of the WW discharged to the sewer due to some losses associated with hydraulic overload.  
  - The uncertainty of the property amounts *should ideally* be based on the following parameters:  
    - The uncertainty of the initial WW property (as discharged), accounted for in the "amount_variance" and the "amount_pedigreeX" columns of the WW table.  
    - The uncertainty of the split between particulates and solubles, accounted for in the "dissolved_fraction_variance" and the "dissolved_fraction_pedigreeX" columns of the WW table.  
    - Uncertainty of overload_loss_fraction_particulate or overload_loss_fraction_dissolved  
  - **However**, the contribution of the lost amount will be so small that we can safely ignore its contribution to uncertainty. This should be documented.

To show what happens behind the scenes:

In [38]:
# Calculating the property amounts after losses due to hydraulic overload
def calc_overflow_losses_dict(WW_prop_df, overload_loss_fraction_particulate, overload_loss_fraction_dissolved):
    return {prop:WW_prop_df.loc[prop, 'Amount']*(
        WW_prop_df.loc[prop, 'Particulate_fraction']*overload_loss_fraction_particulate\
        + WW_prop_df.loc[prop, 'Dissolved_fraction']*overload_loss_fraction_dissolved)
            for prop in WW_prop_df.index
           }

In [39]:
overflow_losses_dict = calc_overflow_losses_dict(WW_prop_df,
                                                 overload_loss_fraction_particulate,
                                                 overload_loss_fraction_dissolved
                                                )

In [40]:
# Example item from dict:
overflow_losses_dict['BOD5, mass per volume']

0.0044177556482391626

The function `generate_properties_list` generates a list of properties tuples to append to reference exchange.  
The tuples are defined as (property_name, amount, unit, comment, uncertainty)  
The amount is corrected for losses to hydraulic overflow.

In [41]:
properties_list = generate_properties_list(WW_prop_df, overflow_losses_dict)

In [42]:
# Sample property:
properties_list[0]

('BOD5, mass per volume',
 0.33231269251205015,
 'kg/m3',
 'Biological Oxygen Demand BOD5, as O2. Accounts for mass lost in sewer due to hydraulic overloads.',
 {'comment': 'Accounts for the uncertainty of the property amount only, assuming the contribution to uncertainty of losses due to hydraulic overflow are negligible',
  'pedigreeMatrix': [2, 4, 3, 5, 4],
  'variance': 0.0043975778353143513})

##### 6.7.3.2) Other obligatory properties
There are also a set of obligatory properties that need to be added:  
  - wet mass (should probably be 1)  
  - water in wet mass (wet mass - dry mass)  
  - water content (water mass/dry mass)  
  - dry mass  
  - carbon content, fossil  
  - carbon content, non-fossil  
These need to be appended to the other properties: 

In [43]:
WW_obligatory_properties = [
    ('carbon content, fossil',
     0.01,
     'dimensionless',
     'ICRA comment',
     {'variance':0.01, 'pedigreeMatrix':[2,4,3,2,4],'comment':""}
    ),
    ('carbon content, non-fossil',
     0.01,
     'dimensionless',
     'ICRA comment',
     {'variance':0.01, 'pedigreeMatrix':[2,4,3,2,4],'comment':"ICRA comment"}
    ),
    ('dry mass',
     0.01,
     'kg',
     'ICRA comment',
     {'variance':0.6, 'pedigreeMatrix':[2,4,3,2,4],'comment':"ICRA comment"}
    ),
    ('water content',
     0.01,
     'dimensionless',
     'ICRA comment',
     {'variance':0.01, 'pedigreeMatrix':[2,4,3,2,4],'comment':"ICRA comment"}
    ),
    ('water in wet mass',
     999.99,
     'kg',
     'ICRA comment',
     {'variance':0.01, 'pedigreeMatrix':[2,4,3,2,4],'comment':"ICRA comment"}
    ),
    ('wet mass',
     1000,
     'kg',
     'ICRA comment',
     {'variance':0.01, 'pedigreeMatrix':[2,4,3,2,4],'comment':"ICRA comment"}
    )    
]

Add them to other properties

In [44]:
properties_list += WW_obligatory_properties

**The reference exchange is ultimately generated simply using the `generate_treatment_reference_exchange` function**  
Only the arguments in the function need to be generated/taken from user.

In [45]:
treatment_dataset, MD = generate_reference_exchange(treatment_dataset,
                                                    ref_exchange_comment,
                                                    total_PV,
                                                    total_PV_uncertainty,
                                                    untreated_fraction,
                                                    untreated_fraction_uncertainty,
                                                    PV_comment,
                                                    WW_prop_df,
                                                    WW_obligatory_properties,
                                                    overload_loss_fraction_particulate,
                                                    overload_loss_fraction_dissolved,
                                                    MD)

### 6.8) Byproducts/wastes  
There are two types of wastes/byproducts to be considered:  
1) Sludge  
2) Grit  

#### 6.8.1) Sludge
Sludge treatment is outside the scope of this project. 
However, the tool must generate a "sludge exchange" that indicates how much sludge is generated (per reference exchange) and what its composition is (via its "properties").  
The sludge should have the following types of properties, all expressed per kg of sludge:  
  - wet mass (should probably be 1)  
  - water in wet mass (wet mass - dry mass)  
  - water content (water mass/dry mass)  
  - dry mass  
  - carbon content, fossil  
  - carbon content, non-fossil  
  - Part of the content of substances originally in the WW  
  - New substances, from treatment (if relevant)  

For the properties stemming from the initial contents of the WW, the sludge properties will be dictated by the transfer coefficients the tool will calculate.   
There are three approaches here: 
- Enter the transfer coefficients in the dataset, and let the sludge composition be calcualted within the ecoSpold file itself, by ecoinvent (better for transparency and uncertainty propagation)  
- Calculate the sludge properties and enter them as static values, but include the transfer coefficients in the comments (good for transparency).  
- Calculate the sludge properties and enter them as static values with generic comments (worst for transparency)  

I'll assume for now that option 2 is chosen. 

The properties will need to have the following structure:  
(property_name, amount, comment, uncertainty)  

Unless absolutely necessary, use property names from the existing properties

In [46]:
MD['Properties'].index

Index(['BOD5, mass per volume', 'COD, mass per volume',
       'Corresponding fuel use',
       'Corresponding fuel use, transport, freight train', 'Crop factor',
       'EcoSpold01Allocation_other_1', 'EcoSpold01Allocation_other_10',
       'EcoSpold01Allocation_other_100', 'EcoSpold01Allocation_other_101',
       'EcoSpold01Allocation_other_102',
       ...
       'width, internal', 'working time', 'xenon content', 'yearly output',
       'yearly_distributed_amount', 'yield', 'ytterbium content',
       'yttrium content', 'zinc content', 'zirconium content'],
      dtype='object', name='name', length=913)

Let's assume we have the following sludge properties:

In [47]:
sludge_properties = [
    ('carbon content, fossil',
     0.01,
     'dimensionless',
     'ICRA comment',
     {'variance':0.01, 'pedigreeMatrix':[2,4,3,2,4],'comment':"PLACEHOLDER CALCULATED UNCERTAINTY"}
    ),
    ('carbon content, non-fossil',
     0.01,
     'dimensionless',
     'ICRA comment',
     {'variance':0.01, 'pedigreeMatrix':[2,4,3,2,4],'comment':"PLACEHOLDER CALCULATED UNCERTAINTY"}
    ),
    ('dry mass',
     0.01,
     'kg',
     'ICRA comment',
     {'variance':0.6, 'pedigreeMatrix':[2,4,3,2,4],'comment':"PLACEHOLDER CALCULATED UNCERTAINTY"}
    ),
    ('water content',
     0.01,
     'dimensionless',
     'ICRA comment',
     {'variance':0.01, 'pedigreeMatrix':[2,4,3,2,4],'comment':"PLACEHOLDER CALCULATED UNCERTAINTY"}
    ),
    ('water in wet mass',
     0.01,
     'kg',
     'ICRA comment',
     {'variance':0.01, 'pedigreeMatrix':[2,4,3,2,4],'comment':"PLACEHOLDER CALCULATED UNCERTAINTY"}
    ),
    ('wet mass',
     0.01,
     'kg',
     'ICRA comment',
     {'variance':0.01, 'pedigreeMatrix':[2,4,3,2,4],'comment':"PLACEHOLDER CALCULATED UNCERTAINTY"}
    ),
    ('iron content',
     0.01,
     'dimensionless',
     'ICRA comment',
     {'variance':0.01, 'pedigreeMatrix':[2,4,3,2,4],'comment':"PLACEHOLDER CALCULATED UNCERTAINTY"}
    ),
    ('copper content',
     0.01,
     'dimensionless',
     'ICRA comment',
     {'variance':0.01, 'pedigreeMatrix':[2,4,3,2,4],'comment':"PLACEHOLDER CALCULATED UNCERTAINTY"}
    ),
]


The name of the sludge exchange needs to refer to the name of the actual WW treatment dataset:

In [48]:
"sludge, from the {}".format(treatment_dataset['activityName'])

'sludge, from the treatment of wastewater from ceramic production, capacity 5E9l/year'

Let's assume the following amount and uncertainty: 

In [49]:
sludge_amount = 300
sludge_uncertainty = {'variance':0.01, 'pedigreeMatrix':[2,4,3,2,4],'comment':"PLACEHOLDER ESTIMATED UNCERTAINTY"}
sludge_comment = "ICRA COMMENT"

Because it is a byproduct, it needs to be associated with a production volume.

In [50]:
sludge_PV_comment = "Calculated on the basis of the total amount of wastewater treated and the amount of sludge per m3 treated."
sludge_PV_uncertainty = {'variance':0.0006, 'pedigreeMatrix':[2,4,3,2,4],'comment':"PLACEHOLDER CALCULATED UNCERTAINTY"}

Use the function "create sludge":

In [51]:
treatment_dataset = generate_sludge(treatment_dataset,
                                    sludge_amount,
                                    sludge_uncertainty,
                                    sludge_comment, 
                                    sludge_properties,
                                    total_PV, # Total WW generated (see reference flow)
                                    untreated_fraction,
                                    sludge_PV_uncertainty,
                                    sludge_PV_comment,
                                    MD)

##### 6.8.2) Grit
I will assume here, like in ecoinvent, that there are two types of grit: 
  - plastics  
  - biomass, modelled as paper  
  
The tool should provide default values for amounts of grit removed, as well as the uncertainty for these.  
The default values in ecoinvent v2.2 are 15.5 g/m3 of each, and the basic uncertainty is 0.0006.  
If we use these values, the pedigree scores should be: [1,3,5,5,1]  
It would be MUCH better to use other data for this.

In [52]:
grit_default_total_amount = 0.031 #kg/m3 in WWTP - ideally this value would be updated, and in any case the user can override it
grit_default_plastic_ratio = 0.5
grit_default_biomass_ratio = 0.5
total_grit_uncertainty = {'variance':0.0006, 'pedigreeMatrix':[1,3,5,5,1], 'comment': "PLACEHOLDER ESTIMATED UNCERTAINTY"}
fraction_grit_biomass_uncertainty = {'variance':0.0006, 'pedigreeMatrix':[1,3,5,5,1]}
fraction_grit_plastics_uncertainty = {'variance':0.0006, 'pedigreeMatrix':[1,3,5,5,1]}

grit_plastics_comment_default = "Amount of plastics removed from wastewater. Based on an assumed {} kg/m3 of grit removed, "\
                                " and an assumed {:2}% of the grit that is plastics".format(grit_default_total_amount,
                                                                                grit_default_plastic_ratio*100)
grit_biomass_comment_default = "Amount of biomass  removed from wastewater. Based on an assumed {} kg/m3 of grit removed, "\
                                " and an assumed {:2}% of the grit that is biomass. "\
                                "Biomass waste management modelled as paper waste management".format(
                                    grit_default_total_amount,
                                    grit_default_biomass_ratio*100)
grit_plastics_PV_uncertainty = {'variance':0.0006, 'pedigreeMatrix':[2,4,3,2,4],'comment':"PLACEHOLDER CALCULATED UNCERTAINTY"}
grit_biomass_PV_uncertainty = {'variance':0.0006, 'pedigreeMatrix':[2,4,3,2,4],'comment':"PLACEHOLDER CALCULATED UNCERTAINTY"}

In [53]:
grit_plastics_comment_default

'Amount of plastics removed from wastewater. Based on an assumed 0.031 kg/m3 of grit removed,  and an assumed 50.0% of the grit that is plastics'

In [54]:
grit_biomass_comment_default

'Amount of biomass  removed from wastewater. Based on an assumed 0.031 kg/m3 of grit removed,  and an assumed 50.0% of the grit that is biomass. Biomass waste management modelled as paper waste management'

Everything is included in this wrapper function.
Note that the total uncertainty is for now calculated using a **temporary dummy function**.

In [55]:
treatment_dataset =  add_grit(treatment_dataset,
                              grit_default_total_amount,
                              grit_default_plastic_ratio,
                              grit_default_biomass_ratio,
                              total_grit_uncertainty,
                              fraction_grit_plastics_uncertainty, 
                              fraction_grit_biomass_uncertainty,
                              grit_plastics_comment_default,
                              grit_biomass_comment_default,
                              total_PV, # Total WW generated (see reference flow)
                              untreated_fraction,
                              MD)

### 6.9) Inputs from the technosphere  
Inputs from the technosphere correspond to consummables, energy and infrastructure inputs. Here is the list in the current WWT datasets:  
  - Consumables:  
    - iron (III) chloride, without water, in 40% solution state  
    
- Energy:  
    - heat, district or industrial, natural gas  
    - heat, central or small-scale, other than natural gas  
    - electricity, low voltage  
    
    
- Infrastructure - not yet resolved, see email:  
  - wastewater treatment facility, capacity XXl/year  
  - sewer grid, XXl/year, YY km    

#### 6.9.1) Consumables:

From Yves' list: 
  - aluminium sulfate, powder  
  - aluminium sulfate, without water, in 4.33% aluminium solution state  
  - lime  
  - iron(III) chloride, without water, in 14% iron solution state  
  - polyelectrolytes (TBD, waiting on experts to supply a list)
    -

Are there others you would like to include?  

Would you like to allow the user to add inputs themselves? If so, they should be based on the known list of products, see [this excel file](http://www.ecoinvent.org/files/activity_overview_for_users_3.3_undefined_1.xlsx), tab "intermediate exchanges".

**Mandatory inputs**: amount per treated m3 only. Calculated by tool. Can possibly be overriden, but there should be a comment that indicates why this decision was taken.

**Other elective inputs to override default values**: comment (*mandatory if default amount not used*), uncertainty.  

**Units used should be the default units**, see 'MD['IntermediateExchanges'][.loc[exchange_name, 'unitName']

In [56]:
consumable_example_exchange_name = 'lime'
consumable_example_amount = 0.42 #
consumable_example_uncertainty = {'variance': 0.0006, 'pedigreeMatrix': [2, 4, 3, 3, 1], 'comment':"PLACEHOLDER ESTIMATED UNCERTAINTY"}
consumable_example_comment = "Calcium hydroxide (Ca(OH)2) used for alkalinity addition and pH adjustment for metals removal. Amount calculated based on technology mix and wastewater properties."

In [57]:
treatment_dataset = generate_consumables(treatment_dataset,
                                         consumable_example_exchange_name,
                                         consumable_example_amount,
                                         consumable_example_uncertainty,
                                         consumable_example_comment,
                                         MD)

#### 6.9.2) Energy inputs

##### 6.9.2.1) Heat
Heat inputs are entered as MJ of heat provided by a combustion process, and **NOT** as MJ of fuel input or physical quantity of fuel.  
The ecoinvent database distinguishes between heat from natural gas and heat from other sources. If this is not known, a fraction coming from each must be estimated.  
Heat from the onsite combution of sludge should be modelled directly in the tool and should not be considered here.  

In [58]:
example_total_heat = 10
example_fraction_from_natural_gas = 0.8
example_heat_uncertainty = {'variance':0.0006, 'pedigreeMatrix':[2,4,3,2,4],'comment':"PLACEHOLDER ESTIMATED UNCERTAINTY"}
example_heat_from_natrual_gas_uncertainty = {'variance':0.0006, 'pedigreeMatrix':[2,4,3,2,4],'comment':"PLACEHOLDER ESTIMATED UNCERTAINTY"}
example_heat_NG_comment = "default" # Automatically generate comment, can be overridden
example_heat_other_comment = "default" # Automatically generate comment, can be overridden

The following function includes a dummy function to calculate total uncertainty.

In [59]:
treatment_dataset = generate_heat_inputs(treatment_dataset,
                                         example_total_heat,
                                         example_fraction_from_natural_gas,
                                         example_heat_uncertainty,
                                         example_heat_from_natrual_gas_uncertainty,
                                         example_heat_NG_comment,
                                         example_heat_other_comment,
                                         MD)

##### 6.9.2.2) Electricity
In kWh. Low voltage assumed. 

In [60]:
example_electricity_amount = 2
example_electricity_uncertainty = {'variance':0.0006, 'pedigreeMatrix':[2,4,3,2,4],'comment':"PLACEHOLDER ESTIMATED UNCERTAINTY"}
example_electricity_comment="Electricity consumed by the wastewater treatment plant"

In [61]:
treatment_dataset = generate_electricity_input(dataset=treatment_dataset,
                                               amount=example_electricity_amount,
                                               uncertainty=example_electricity_uncertainty,
                                               comment='default',
                                               MD=MD)

#### 6.9.3) Infrastructure

##### 6.9.3.1) Sewer grids: 
- There are currently five sewer grid construction/repair/end-of-life datasets in ecoinvent, each associated with different WWTP capacities (smaller grids have smaller diameters, are longer per capita and transport less WW over their lifetime).  
- Modelling of this infrastructure not yet determined, **discussion needed**. 

##### 6.9.3.2) WWTP
See email questions to experts. **Discussion needed**

### 6.10) Elementary flow (i.e. emissions to water, air or soil)

#### 6.10.1) Direct emissions to water  
Direct emissions associated with the share of chemicals not removed by the treatment before the treated water is released.  
The calculation is based on four parameters:  
  - The amount of a substance in the WW as discharged, as found in the properties    
  - The amount lost to hydraulic overload, as found in the calculated `overflow_losses_dict`  
  - A correspondance (dict) between the property and the emission  
  - A removal factor, **calculated by the tool**  
  
Here is a list of elementary flows that will (possibly) need to be accounted for (ultimate list depends on tool):

In [62]:
WW_prop_df.loc[:, ['Corresp. direct water emission', 'Corresp. direct water emission comment for ICRA']]

Unnamed: 0_level_0,Corresp. direct water emission,Corresp. direct water emission comment for ICRA
name,Unnamed: 1_level_1,Unnamed: 2_level_1
"BOD5, mass per volume","COD, Chemical Oxygen Demand",
"COD, mass per volume","BOD5, Biological Oxygen Demand",
"mass concentration, DOC","DOC, Dissolved Organic Carbon",
"mass concentration, TOC","TOC, Total Organic Carbon",
"mass concentration, dissolved ammonia NH4 as N",Nitrate,Other species for N?
"mass concentration, dissolved nitrate NO3 as N",Nitrate,Other species for N?
"mass concentration, dissolved nitrite NO2 as N",Nitrate,Other species for N?
"mass concentration, particulate nitrogen",Nitrate,Other species for N?
"mass concentration, dissolved organic nitrogen as N",Nitrate,Other species for N?
"mass concentration, nitrogen",Nitrate,Other species for N?


I will assume the tool will be able to calculate the emission to water.  

Use the function `add_elementary_flow` with compartment = 'water' and subcompartment='surface water':

Example use:

In [63]:
example_BOD5_emission_name = 'BOD5, Biological Oxygen Demand'
example_BOD5_emission_amount = 0.1 #kg/m3 discharged treated water, calculated by tool
example_BOD5_emission_comment = "Calculated via XYZ" # ICRA to supply comment
example_BOD5_emission_uncertainty = {'variance':0.0006, 'pedigreeMatrix':[2,4,3,2,4],'comment':"PLACEHOLDER CALCULATED UNCERTAINTY"}

NOTE: make sure you use the correct unit, as found in  
`MD['ElementaryExchanges'].loc[(name, compartment, subcompartment), 'unitName']`

In [64]:
treatment_dataset = add_elementary_flow(dataset=treatment_dataset,
                                        elementary_flow=example_BOD5_emission_name,
                                        compartment='water',
                                        subcompartment='surface water',
                                        amount=example_BOD5_emission_amount,
                                        uncertainty=example_BOD5_emission_uncertainty,
                                        comment=example_BOD5_emission_comment,
                                        MD=MD)

#### 6.10.2) Water flows  
It is important to account for output flows of water: 
  - Volume of treated water discharged to surface water (name='Water', compartment='water', subcompartment='surface water', unit='m3')  
  - Volume of water lost to evaporation during treatment (name='Water', compartment='air', subcompartment='unspecified', unit='m3')  
  
Reuse the `add_elementary_flow` function

#### 6.10.3) Others  
Examples:  
  - Emissions from the onsite combustion of sludge (compartment='air', subcompartment='urban air close to ground') 
  - Emissions from digester gas (compartment='air', subcompartment='urban air close to ground')  
  - Water in sludge (sludge property)  

Reuse the `add_elementary_flow` function

In [65]:
treated_water_discharge = 0.8
evaporated_water = 0.05
treated_water_discharge_comment = "Total amount of treated water discharged to surface water"
evaporated_water_comment = "Water evaporated during treatment"
treated_water_discharge_uncertainty = {'variance':0.0006, 'pedigreeMatrix':[2,4,3,2,4],'comment':"PLACEHOLDER ESTIMATED UNCERTAINTY"}
evaporated_water_uncertainty = {'variance':0.0006, 'pedigreeMatrix':[2,4,3,2,4],'comment':"PLACEHOLDER ESTIMATED UNCERTAINTY"}

In [66]:
treatment_dataset = add_elementary_flow(treatment_dataset,
                                        elementary_flow="Water",
                                        compartment='water',
                                        subcompartment='surface water',
                                        amount=treated_water_discharge,
                                        uncertainty=treated_water_discharge_uncertainty,
                                        comment=treated_water_discharge_comment,
                                        MD=MD)
treatment_dataset = add_elementary_flow(treatment_dataset,
                                        elementary_flow="Water",
                                        compartment='air',
                                        subcompartment='unspecified',
                                        amount=evaporated_water,
                                        uncertainty=evaporated_water_uncertainty,
                                        comment=evaporated_water_comment,
                                        MD=MD)

# Actual generation of the treatment dataset

In [67]:
generate_ecoSpold2(treatment_dataset,
                   os.path.join(root_dir, 'templates'),
                   'test_{}.spold'.format(time.ctime().replace(":", "")),
                   os.path.join(root_dir, 'result_folder'))

file "test_Tue Aug 22 112839 2017.spold" successfully created in folder C:\mypy\code\wastewater_treatment_tool\waste_water_tool\result_folder
