In [1]:
# # Run this and then restart the kernel at the start of each session to install
# # 'teotil3' in development mode
# !pip install -e /home/jovyan/projects/teotil3/

In [2]:
import nivapy3 as nivapy
import pandas as pd
import teotil3 as teo

# Task 2.8: Improve workflow for wastewater treatment and industry

From the proposal text:

> **Oppgave 2.8: Forbedre arbeidsflyten for avløpsrensing og industri**
>
> En litteraturgjennomgang vil bli foretatt for å identifisere typiske proporsjoner av DIN, TON, TDP og TPP i avløp fra ulike typer industri og avløpsrenseanlegg. Typiske forhold mellom BOF og TOC for ulike anlegg vil også bli identifisert og brukt til å konvertere rapporterte utslipp av BOF til TOC (for samsvar med andre kilder).
>
> TEOTIL-koden vil bli oppdatert for å gjøre nye modellparametere godt synlige og enkle å oppdatere (for eksempel innenfor en enkelt parameter fil eller Excel arbeidsbok).
>
> Merk: Denne oppgaven vil kreve at Miljødirektoratet leverer rådata fra databasene for industri og renseanlegg. SSBs arbeidsflyt for behandling av data fra avløpsanlegg må også oppdateres til å inkludere SS og BOF. Dette er ikke inkludert i tidsestimatet som er gitt her.

This notebook provides an overview of the new method, which is implemented by functions in `teo.preprocessing`. For an example of how these functions are used, see [notebook 2.1e](https://nbviewer.org/github/NIVANorge/teotil3/blob/main/notebooks/development/T2-1e_annual_data_upload.ipynb), which illustrates the annual workflow to update the TEOTIL3 database.

## 1. Wastewater treatment

### 1.1. Raw data from SSB

The wastewater dataset comes from SSB and is split into two parts: discharges from "large" sites (>50 p.e.) and discharges from small sites (≤50 p.e). 

#### 1.1.1. Data for "small" sites

The data for small sites (often called the "spredt dataset") is aggregated to kommune level and includes estimates of TOTN and TOTP from 14 different types of small treatment plant:

 * Direkte utslipp
 * Slamavskiller
 * Infiltrasjonsanlegg
 * Sandfilteranlegg
 * Biologisk
 * Kjemisk
 * Biologisk og kjemisk
 * Tett tank (for alt avløpsvann)
 * Tett tank for svartvann
 * Biologisk toalett
 * Konstruert våtmark
 * Tett tank for svartvann, gråvannsfilter
 * Biologisk toalett, gråvannsfilter
 * Annen løsning
 
**SS and organic matter estimates are not considered in SSB's workflow and therefore cannot be included in TEOTIL at this time** (see the note in the proposal text above). However, the model structure should be flexible enough to allow SS and organic matter for "spredt" to be added easily, if Miljødirektoratet decide to fund additional processing with SSB.
 
#### 1.1.2. Data for "large" sites

The basic dataset for large sites is often called the "miljøgifter" dataset and it includes all monitored discharges from large wastewater plants. For TOTN and TOTP, SSB use statistical interpolation to patch reporting gaps in this dataset to create the "store anlegg" dataset. There is also usually a third file, named `RID_Totalpopulasjon_{year}.csv`, that includes the treatment type used by each plant. The following categories are used:

 * Urenset
 * Mekanisk - slamavskiller
 * Mekanisk - sil, rist
 * Mekanisk
 * Kjemisk
 * Biologisk
 * Kjemisk-biologisk
 * Naturbasert
 * Annen rensing

### 1.2. Literature review

Christian Vogelsang has undertaken a literature review to identify typical subfractions of N and P, and the relationships between BOF, KOF and TOC, in the discharges from different types of wastewater treatment plant. These factors will be used to subdivide the TOTN and TOTP values in the "store anlegg" and "små anlegg" datasets from SSB, and to estimate TOC discharges from the "large" treatment plants where possible (i.e. where BOF or KOF are reported). Measured SS fluxes from "large" sites will also be included where available. The new model will therefore consider:'

 * TOTN, DIN & TON and TOTP, TDP and TPP for both large and small wastewater treatment sites
 * TOC from large treatment sites where BOF or KOF are reported. If both BOF and KOF are available, BOF will be used in preference to KOF
 * SS from large treatment sites, where reported
 
**The model will not consider TOC or SS inputs from small/spredt sites**, unless these are incorporated into SSB's workflow in the future.

Christian's literature review is available online and key information is summarised in the table below (hosted on GitHub [here](https://github.com/NIVANorge/teotil3/blob/main/data/point_source_treatment_types.csv)).

For each class of site ("large wastewater", "small wastewater", "industry" etc.) and each treatment type ("mechanical", "biological", "chemical" etc.), the table gives typical proportions of subfractions of N and P, plus parameters for estimating TOC from BOF and KOF. For the TOC calculations, Christian's equations take the following form

$$TOC = k_1 KOF ^ {k_2} + k_3 \quad and \quad TOC = b_1 BOF ^ {b_2} + b_3 $$

In [9]:
url = r"https://raw.githubusercontent.com/NIVANorge/teotil3/main/data/point_source_treatment_types.csv"
df = pd.read_csv(url)
df

Unnamed: 0,site_type,treatment_type,prop_din,prop_ton,prop_tpp,prop_tdp,k1,k2,k3,b1,b2,b3
0,Large wastewater,Urenset,0.65,0.35,0.67,0.33,0.23,1.0,0.0,0.6,1.0,0.0
1,Large wastewater,Mekanisk - slamavskiller,0.68,0.32,0.35,0.65,0.23,1.0,0.0,0.6,1.0,0.0
2,Large wastewater,"Mekanisk - sil, rist",0.68,0.32,0.55,0.45,0.23,1.0,0.0,0.6,1.0,0.0
3,Large wastewater,Mekanisk,0.68,0.32,0.45,0.55,0.23,1.0,0.0,0.6,1.0,0.0
4,Large wastewater,Kjemisk,0.72,0.28,0.87,0.13,0.36,1.0,-2.7,0.65,1.0,6.7
5,Large wastewater,Biologisk,0.87,0.13,0.56,0.44,0.3,1.0,0.0,12.351,-0.857,0.0
6,Large wastewater,Kjemisk-biologisk,0.9,0.1,0.87,0.13,5.1445,-0.76,0.0,7.5368,-0.754,0.0
7,Large wastewater,Kjemisk-biologisk m/N-fjerning,0.76,0.24,0.55,0.45,5.1445,-0.76,0.0,7.5368,-0.754,0.0
8,Large wastewater,Naturbasert,0.91,0.09,0.66,0.34,5.1445,-0.76,0.0,7.5368,-0.754,0.0
9,Large wastewater,Annen rensing,0.76,0.24,0.61,0.39,0.23,1.0,0.0,0.6,1.0,0.0


## 2. Industry