<font size="+3">Toolkit</font>

# Overview

A toolkit is a collection of procedures designed for working with specific types of data, such as Urban GIS, topography, simulations, etc. Toolkits facilitate data management by offering procedures for retrieval and modification. Moreover, they include procedures for analyzing and processing the data, as well as procedures for presenting graphical representations.

Some toolkits require external data. For example a toolkit that handles demography data requires the demography database. 
These external data sources are referred to as [datasources](#datasource) and it will be detailed below how to manage them. 



# The structure and function of  toolkits

A toolkit handles a type of data in a project. Therefore, 
it is necessary to supply a project name when initializing one. Then, the functions
of the toolkit will be performed on the data that exists in that project.

The initialization of a toolkit occurs using the hera [ToolkitHome](#toolkitHome) that will 
be detailed below. 

A toolkit is designed as a [3-tier application](https://en.wikipedia.org/wiki/Multitier_architecture#Three-tier_architecture). 
That is,

<table>
    <tr>
        <td><b>Layer name</b></td>
        <td><b>Description</b></td>
    </tr>
    <tr>
        <td>Data layer</td>
        <td>Manages loading and parsing data</td>
    </tr>
    <tr>
        <td>Analysis layer</td>
        <td>Manages the analysis and computational functions (called logic layer)</td>
    </tr>
    <tr>
        <td>Presentation Layer</td>
        <td>Manages the graphs and other types of presentation</td>
    </tr>    
</table>

The data layer inherits from the [Project](project) class, offering users an interface to add and remove data items (measurements, simulations, or caches) from the database associated with the project on which the toolkit was initialized.

**Caching analysis results** The analysis layer is used to process the data, and compute results that might require lengthy computation. 
Caching the simulation results can reduce the computation time for the next time the same set of parameters is used. 
To cache the results, it is possible to use the Hera mechanism to link the parameters to the computation results by 
adding a cache document with the parameters as meta data. 

The results of the cache should be saved in a directory. The property `filesDirectory` of the toolkit can provide a directory to save the files. 
The value of the `filesDirectory` is given in the constructor of the toolkit. If the `filesDirectory` is not supplied then 
the default directory is the current working directory. 

**datatypes** The `datatypes` property holds all the available properties. 

<a id="toolkitHome"></a>

# Initializing a toolkit

Access to the toolkits is done through the hera tookitHome interface 

For example loading the GIS_DEMOGRAPHY toolkit is given below. 
A list of available toolkits is given [here](#toolkitList) 

The `filesDirectory` property is not given, and therefore, the directory is 
current directory. 

In [1]:
from hera import toolkitHome

toolkitName = toolkitHome.GIS_DEMOGRAPHY
projectName = "The-Project-Name"
toolkit_specific_parameters = dict() # empty for this presentation. 
tk  = toolkitHome.getToolkit(toolkitName=toolkitName,
                             projectName=projectName,
                             **toolkit_specific_parameters)


<a id="datasource"></a>

# Datasource

A datasource is an external data that is needed for the toolkit. 

The data is usually loaded by some repository of data. Note, that the 
loaded data is specific for a **project**. Hence, it is needed 
to load it seperately for each project. 

Internally, datasource is implemented as a data item saved as a measurement document. The type and the resource is 
determined by the user that added them. 

## Adding a datasource

Adding a datasource to a toolkit is performed by addDataSource property. 

The parameters are: 

- **dataSourceName** : str<br/>
  The name of the data source 
- **resource** : str<br/>
    The path to the data file of the datasource.

- **dataFormat** : str<br/> 
    The format of the data. Should be one of the [data formats](dataformats). 
    
- **version**: tuple, default (0,0,1)<br/>
        The version of the datasouce. This allows you to add different version of the datasource 
        and access the currect one. 
        
- **overwrite**: bool, default: False <br/>
        If True, overwrite the existing datasource (of the input version).
        If False, raise an exception if the datasource exists (with the input version). 
        
- **Additional parameters**: additional parameters name and their values for the 
    datasource. 

In [2]:
tk.addDataSource(dataSourceName="thedata",
                 resource="path-to-data",
                 dataFormat=tk.datatypes.STRING,
                 version=(0,0,1),overwrite=True)

<Measurements: Measurements object>

Adding another version of the datasource

In [16]:
tk.addDataSource(dataSourceName="thedata",
                 resource="path-to-data-2",
                 dataFormat=tk.datatypes.STRING,
                 overwrite=True,
                 version=(0,0,2),key="value")

<Measurements: Measurements object>

## Listing the datasources

List the datasources that were added to the project is performed by

In [17]:
tk.getDataSourceTable()

Unnamed: 0,dataFormat,resource,toolkit,datasourceName,version,key
0,string,path-to-data,Demography,thedata,"[0, 0, 1]",
1,string,path-to-data-2,Demography,thedata,"[0, 0, 2]",value


It is possible to filter the datasources using the key/value

In [27]:
tk.getDataSourceTable(key="value")

Unnamed: 0,dataFormat,resource,key,toolkit,datasourceName,version
0,string,path-to-data-2,value,Demography,thedata,"[0, 0, 2]"


Alternativle, it is possible to get the datasource as a list of dictionaryes. 

In [19]:
tk.getDataSourceMap()

[{'dataFormat': 'string',
  'resource': 'path-to-data',
  'toolkit': 'Demography',
  'datasourceName': 'thedata',
  'version': [0, 0, 1]},
 {'dataFormat': 'string',
  'resource': 'path-to-data-2',
  'key': 'value',
  'toolkit': 'Demography',
  'datasourceName': 'thedata',
  'version': [0, 0, 2]}]

## Getting the datasource 

It is possible to retrieve either the metadata document (datasource document) or the data itself. 

To get the data document we use  

In [20]:
import json 

datasourceName  = "thedata"
doc = tk.getDatasourceDocument(datasourceName=datasourceName)
print(json.dumps(doc.desc,indent=4))

{
    "key": "value",
    "toolkit": "Demography",
    "datasourceName": "thedata",
    "version": [
        0,
        0,
        2
    ]
}


If the version is not specified, the function will return 
the highest version.

It is possible to set the version to get the relevant datasource  

In [21]:
doc = tk.getDatasourceDocument(datasourceName=datasourceName,version=(0,0,1))
print(json.dumps(doc.desc,indent=4))

{
    "toolkit": "Demography",
    "datasourceName": "thedata",
    "version": [
        0,
        0,
        1
    ]
}


Getting the data is possible by using the getData

In [24]:
doc.getData()

'path-to-data'

It is possible to get a list of all the datasources

In [25]:
tk.getDatasourceDocumentsList()

[<Measurements: Measurements object>, <Measurements: Measurements object>]

It is also possible to get the datasource data directly 

In [26]:
tk.getDatasourceData(datasourceName=datasourceName)

'path-to-data-2'

## Delete datasource

Deleting the datasource is achieved by

In [28]:
tk.deleteDataSourceDocuments(datasourceName=datasourceName,version=(0,0,1))

[{'_id': {'$oid': '65a2dffb292b74d6310ca28d'},
  '_cls': 'Metadata.Measurements',
  'projectName': 'The-Project-Name',
  'desc': {'toolkit': 'Demography',
   'datasourceName': 'thedata',
   'version': [0, 0, 1]},
  'type': 'ToolkitDataSource',
  'resource': 'path-to-data',
  'dataFormat': 'string'}]

In [29]:
tk.getDataSourceTable()

Unnamed: 0,dataFormat,resource,key,toolkit,datasourceName,version
0,string,path-to-data-2,value,Demography,thedata,"[0, 0, 2]"


# Available toolkits

The datalalyer of a toolkit is 

<table>
    <tr>
        <td colspan="2" align="center"><font size="+1">GIS</font></td>
    </tr>
    <tr>
        <td>Topogrphy</td>
        <td>Manages topography</td>
    </tr>
    <tr>
        <td colspan="2" align="center"><font size="+1">Simulations</font></td>
    </tr>
    <tr>
        <td>LSM</td>
        <td>Stochastic lagrangian simulation (fortran)</td>
    </tr>
    <tr>
        <td>Hermes workflow</td>
        <td>The hermes workflow toolkit</td>
    </tr>
    <tr>
        <td>OpenFOAM</td>
        <td>The openfoam toolkit</td>
    </tr>    
    <tr>
        <td colspan="2" align="center"><font size="+1">Risk assessment</font></td>
    </tr>
    <tr>
        <td>Risk assessment</td>
        <td>Estimating the effects of dispersion</td>
    </tr>    
</table>
