In [5]:
import yaml
import numpy as np
import pandas as pd

from pyThermoML.core import DataReport
from pyThermoML.tools import list_possible_properties, list_possible_constraints, list_possible_variables

## Create a new DataReport instance

In [6]:
# Initialize the dataset
datareport = DataReport()

### 1. Add citation data

In [7]:
# Add citation data
datareport.citation.s_author = ["Max Mustermann"]
datareport.citation.s_title = "Example file"

### 2. Add compounds of the system

In [8]:
# Add compounds based on their smiles to data report
compounds = {
  "choline": "C[N+](C)(C)CCO",
  "chloride": "[Cl-]",
  "glycerol": "C(C(CO)O)O",
  "water": "O",
  "carbon dioxide": "C(=O)=O",
  "oxygen": "O=O",
  "methane": "C",
  "nitrogen": "N#N"
}

datareport.add_compounds_via_smiles( compounds )


Add component: 'choline' with SMILES: 'C[N+](C)(C)CCO'
Success!

Add component: 'chloride' with SMILES: '[Cl-]'
Success!

Add component: 'glycerol' with SMILES: 'C(C(CO)O)O'
Success!

Add component: 'water' with SMILES: 'O'
Success!

Add component: 'carbon dioxide' with SMILES: 'C(=O)=O'
Success!

Add component: 'oxygen' with SMILES: 'O=O'
Success!

Add component: 'methane' with SMILES: 'C'
Success!

Add component: 'nitrogen' with SMILES: 'N#N'
Success!


### 3. Add properties

In order to add a property, certain information must be parsed. These can be set manually, or saved in a yaml file

#### 3.1 List all implemented properties, constraints, and variables

The property name need to match with any of the property names listed below. It is okay if it's only a substring, e.g.: 

```
'Excess molar enthalpy (molar enthalpy of mixing), kJ/mol' can be matched by only specifing 'Excess molar enthalpy'
````

The variables and constraints need to match exactly the names listed below.

> **Note:** Capilization and spaces are important.

In [9]:
list_possible_properties(  )

Implemented properties for each group:

Group: activity_fugacity_osmotic_prop
Activity; Activity coefficient; Osmotic pressure, kPa; Osmotic coefficient

Group: composition_at_phase_equilibrium
Henry's Law constant (mole fraction scale), kPa; Henry's Law constant (molality scale), kPakg/mol; Henry's Law constant (amount concentration scale), kPadm3/mol; Amount per mass of solution, mol/kg; Molality, mol/kg; Amount concentration (molarity), mol/dm3

Group: excess_partial_apparent_energy_prop
Excess molar enthalpy (molar enthalpy of mixing), kJ/mol

Group: heat_capacity_and_derived_prop
Molar enthalpy, kJ/mol

Group: transport_prop
Self diffusion coefficient, m2/s; Viscosity, Pas; Kinematic Viscosity m2/s

Group: volumetric_prop
Mass density, kg/m3; Specific volume, m3/kg; Amount density. mol/m3; Molar volume, m3/mol; Compressibility factor; Adiabatic compressibility, 1/kPa; Isothermal compressibility, 1/kPa; Isobaric coefficient of expansion, 1/K; Excess molar volume, m3/mol; Partial mo

In [10]:
list_possible_constraints( )

Implemented constraints are:
  mass_fraction; mole_fraction; pressure; temperature


In [11]:
list_possible_variables( )

Implemented variables are:
  pressure; temperature


#### 3.2 Define property information

##### Property

Each property needs certain information, this should be parsed via a property_dictionary.

<u>Example:</u> 

```python
property_dict = {
"prediction_type": "Experimental",
"method_name": "",
"method_ref_doi": "https://doi.org/10.1016/j.jct.2016.10.002",
"method_description": "" ,
"confidence_interval": 0,
"uncertanty_method": "",
"name": "Excess molar enthalpy",
"component_identifier": ""
}
```

In case the data should be read in via **json** the dictionary should contain the following keys:

"paths": For each variable specified, a list of paths with the same length as the variable should be provided. In this example there is only the temperature as variable, and 5 temperatures where studied (313.15K, 323.15K, 333.15K, 343.15K, 353.15K).

"keys": A list of keys in the provided json file. At the end of the key list, it is expected that there is a dictionary provided that contain "mean" and "std" as keys. The json file will be read in using the 'get_num_values_from_json' function which multiplies the standard deviation by 2, to obtain the 95% confidence intervall.

<u>Example:</u> 

```python
property_dict["paths"] =  [ ['/home/st/st_st/st_ac137577/workspace/software/ThermoML-Specifications/examples/files/results_313.json',
                             '/home/st/st_st/st_ac137577/workspace/software/ThermoML-Specifications/examples/files/results_323.json',
                             '/home/st/st_st/st_ac137577/workspace/software/ThermoML-Specifications/examples/files/results_333.json',
                             '/home/st/st_st/st_ac137577/workspace/software/ThermoML-Specifications/examples/files/results_343.json',
                             '/home/st/st_st/st_ac137577/workspace/software/ThermoML-Specifications/examples/files/results_353.json'] 
                         ]

property_dict["keys"] = ['03_npt_production', 'data', 'average', 'Enthalpy']
```

For the case, the data is provided within the **notebook** the dictionary should contain the following keys:

"property_results": For each variable specified, a list of dictionaries with the same length as the variable should be provided. In this example there is only the temperature as variable, and 2 temperatures where studied (313.15K, 323.15K). 

```python
property_dict['property_results'] = [ [ {'value': -113162.513, 'uncertainty': 16.404},
                                        {'value': -110550.96845, 'uncertainty': 7.78308},
                                      ]
                                    ]

```


##### Constraints

Constraints are defined via a dictionary. Depending on the constraint, the value of the key is either an another dictionary (mass/mole fraction) or a numerical value (temperature/pressure)

```python
constraints = {
    "mole_fraction": {
        "water": 0.5,
        "glycerol": 0.334,
        "choline": 0.083,
        "chloride": 0.083,
    },
    "pressure": 100 #kPa
    }
```



##### Variables

Variables are defined via a dictionary. For each variable, the value of the key is an list with the numerical values

```python
variables = { "temperature": [313.15,323.15] #K
            }
```

In [12]:
# Read in property information file
with open("input/property_info.yaml") as f:
    data_property = yaml.safe_load(f)

# Example for the property 'Mass density'
data_property["Mass density"]

{'prediction_type': 'Molecular dynamics',
 'method_name': '',
 'method_ref_doi': '',
 'method_description': 'Simulations in NPT ensemble. TIP4P/2005 model (https://doi.org/10.1063/1.2121687).',
 'confidence_interval': 95,
 'uncertanty_method': 'Standard deviation over 3 copies of the same system with different initial velocities.'}

#### 3.3 Read in via json and manifest file

In [13]:
# Add pure or mixture data
with open("input/manifest.yaml") as f:
    data_manifest = yaml.safe_load(f)


for pomd in data_manifest["pomd"]:
    for prop in pomd["properties"]:
        property_dict = data_property[prop["name"]]
        property_dict.update( {"paths": pomd["paths"], **prop} )
        
        datareport.create_pure_or_mixture_data( components = pomd["components"], 
                                                phase = pomd["phase"], constraints = pomd["constraint"],
                                                variables = pomd["variable"], property_dict = property_dict
                                                )


Extract property values via json file!
Matched 'Mass density' with ThermoML property 'Mass density, kg/m3' in following category: 'volumetric_prop'

Extract property values via json file!
Matched 'Molar enthalpy' with ThermoML property 'Molar enthalpy, kJ/mol' in following category: 'heat_capacity_and_derived_prop'



In [14]:
# Extract saved data
all_data = datareport.get_all_data(  )

All data saved in the datareport:

  "Mass density, kg/m3" of system                     variable               constraint                          
                             mean 95_confidence Temperature, K "Mole fraction" of water "Pressure, kPa" of system
0                       989.26692      0.132553         313.15                      1.0                     100.0
1                       984.23857      0.029222         323.15                      1.0                     100.0
2                       978.73098      0.470517         333.15                      1.0                     100.0
3                       972.22267      0.368699         343.15                      1.0                     100.0
4                       965.13213      0.251989         353.15                      1.0                     100.0 

  "Molar enthalpy, kJ/mol" of system                     variable               constraint                          
                                mean 95_confiden

#### 3.4 Direct parsing from notebook

In [15]:
# Example for experimental values
exp_df = pd.read_csv('files/experimental_excess_enthalpy.csv')
print(exp_df,"\n")

# Define property information

property_dict = {
"prediction_type": "Experimental",
"method_name": "",
"method_ref_doi": "https://doi.org/10.1016/j.jct.2016.10.002",
"method_description": "" ,
"confidence_interval": 0,
"uncertanty_method": "",
"name": "Excess molar enthalpy",
"component_identifier": ""
}

# Write for every composition the property, with variable temperature and constant pressure
for _,df in exp_df.groupby("composition"):
    property_dict["property_results"] = [ [ {"value": val, "uncertainty": 0 } for val in df["mean"] ] ]

    components = [ "water", "choline", "chloride", "glycerol" ]

    constraints = {
    "mole_fraction": {
        "water": df["composition"].iloc[0],
        "glycerol": np.round((1- df["composition"].iloc[0])*2/3,3),
        "choline": np.round((1- df["composition"].iloc[0])*1/6,3)
    },
    "pressure": 100 #kPa
    }

    constraints["mole_fraction"]["chloride"] = 1 - sum( constraints["mole_fraction"].values() )

    datareport.create_pure_or_mixture_data( components = components, 
                                            phase = "Liquid", constraints = constraints,
                                            variables = {"temperature": df["temperature"]},
                                            property_dict = property_dict 
                                         ) 


      mean  composition  temperature
0     0.00       0.0000       298.15
1     0.00       0.0000       308.15
2  -557.64       0.2439       308.15
3  -477.69       0.2492       298.15
4  -818.22       0.3982       308.15
5  -727.53       0.3998       298.15
6  -867.63       0.4928       308.15
7  -896.93       0.5137       298.15
8  -939.17       0.6605       308.15
9  -935.48       0.6813       298.15
10 -846.59       0.7614       298.15
11 -778.52       0.7948       308.15
12 -747.38       0.8093       298.15
13 -511.68       0.8524       308.15
14 -483.09       0.8642       298.15
15 -301.59       0.9203       308.15
16 -194.33       0.9497       298.15
17 -157.52       0.9582       308.15
18 -146.46       0.9618       298.15
19    0.00       1.0000       308.15
20    0.00       1.0000       298.15 

Matched 'Excess molar enthalpy' with ThermoML property 'Excess molar enthalpy (molar enthalpy of mixing), kJ/mol' in following category: 'excess_partial_apparent_energy_prop'

Matched 

In [16]:
# Extract saved data
all_data = datareport.get_all_data(  )

All data saved in the datareport:

  "Mass density, kg/m3" of system                     variable               constraint                          
                             mean 95_confidence Temperature, K "Mole fraction" of water "Pressure, kPa" of system
0                       989.26692      0.132553         313.15                      1.0                     100.0
1                       984.23857      0.029222         323.15                      1.0                     100.0
2                       978.73098      0.470517         333.15                      1.0                     100.0
3                       972.22267      0.368699         343.15                      1.0                     100.0
4                       965.13213      0.251989         353.15                      1.0                     100.0 

  "Molar enthalpy, kJ/mol" of system                     variable               constraint                          
                                mean 95_confiden

#### 4. Save it as xml file

In [17]:
# Write as xml file
with open("files/test.xml","w") as f:
    f.write(datareport.xml())