-
Notifications
You must be signed in to change notification settings - Fork 124
Technical Documentation
The PROPTI scripts are written in Python 3.x. Emphasis has been put into compatibility with different operating systems (Mac OS, Linux and Windows) by utilisation of the os package. On the most basic level the scripts provide definition of data structures and elementary functions that are used throughout the package.
The scripts are open source and participation, in order to maintain their code or add further functionality, is very welcome. For you to work on the scripts, keep in mind there are some code style conventions to be maintained for the PROPTI framework. In general the style guide for Python code PEP 8 is to be followed.
The PROPTI scripts are the most basic scripts, used within the PROPTI framework. Those are:
data_structures.py
basic_functions.py
spotpy_wrapper.py
propti_analyse.py
[[TOC]]
class propti.OptimiserProperties
propti.OptimiserProperties
is used to store information that needs to be provided to the optimiser itself. This would be information like what kind of optimisation algorithm to use, how many repetitions to perform or how to name the file the results are saved into.
The constructor of the OptimiserProperties
class accepts the following input:
propti.OptimiserProperties(algorithm: str = 'sceua',
repetitions: int = 1,
backup_every: int = 100,
ngs: int = None,
db_name: str = 'propti_db',
db_type: str = 'csv',
db_precision=np.float64,
num_subprocesses: int = 1,
mpi: bool = False)
-
algorithm
Defines the optimisation algorithm to be used. Default is set to SCEUA from SPOTPY. -
repetitions
Number of sampling repetitions to be performed by the algorithm. Note: The actual number of repetitions may be slightly different than what is entered here, depending on the algorithm.For now limited to the usage of SCEUA.
-
backup_every
For algorithm that support this functionality, this value defines when break points are to be written. These breakpoints are used to restart the simulations in the event of a crash or when the computing time token at a computing cluster has expired and the process is stopped. -
ngs
Number of complexes that are to be used by the optimisation algorithm, if necessary.For now limited to the usage of SCEUA.
-
db_name
File name of the data base file that will be created during the optimisation process. -
db_type
File type of the data base. For now only comma sperated value files (csv) are implemented. -
db_precision
Defines the precision of the output data, stored in the data base. NumPy floats are used, with a default precision of 64 bit. -
num_subprocesses
Defines the number of sub processes, that are to be used for each repetition during the inverse modelling process. This is used when a parameter set is to be tested in multiple different simulation setups simultaniously, for example two different irradiance levels in the Cone Calorimeter would lead tonum_subprocesses=2
. -
mpi
This parameter is provided to SPOTPY to leverage the power of multiple computing cores via MPI, thus speed up the execution of the optimisation algorithm. When set toFalse
sequential execution is performed on one core alone.For now limited to the usage of SCEUA.
The OptimiserProperties
class has the following public methods:
-
upgrade
This method upgrades legacy object instances with missing default values. Used to ensure compatibility of legacy inverse modelling projects with newer PROPTI versions.Use with care!
-
__str__
When called, it provides a human-readable output of the parameter, based on predefined definitions coded into this method (thus not exposed to the user).
class propti.Parameter
propti.Parameter
is used to store general parameter information. This class is used for parameters that are worked on by the optimisation algorithm, as well as general information (meta data) that describes environmental conditions.
The constructor of the Parameter
class accepts the following input:
propti.Parameter(name: str,
units: str = None,
place_holder: str = None,
value: Union[float, int, str] = None,
distribution: str = 'uniform',
min_value: float = None,
max_value: float = None,
max_increment: float = None)
-
name
Character string that describes the parameter. Used for internal reference that is human readable. -
units
Character string that describes the measurement units. -
place_holder
Character string that is used in the template to mark location of parameter to be written. If no value is provided,name
is chosen. -
value
Holds the current parameter value as a float. Can be initialised with a specific value for optimisation algorithms that need a guess vector to start from. Type checking is allowing multiple different types, since the parameter could be used to provide numerical values for computations or string e.g. for the naming of files. -
distribution
Specifies a distribution by which the algorithm shall sample individual parameters during the IMP, if needed by the algorithm. Possible values: 'uniform'. -
min_value
Lower limit of the range in which the algorithm is allowed to sample parameter values. -
max_value
Upper limit of the range in which the algorithm is allowed to sample parameter values. -
max_increment
Increment by which the algorithm is allowed to change the parameter value between simulations. This option is needed for some algorithms.
The Parameter
class has the following public methods:
-
create_spotpy_parameter
No functionality right now - WIP. -
upgrade
This method upgrades legacy object instances with missing default values. Used to ensure compatibility of legacy inverse modelling projects with newer PROPTI versions.Use with care!
-
__str__
When called, it provides a human-readable output of the parameter, based on predefined definitions coded into this method (thus not exposed to the user).
class propti.ParameterSet
propti.ParameterSet
is a container for the parameters, used by the optimisation
algorithm. Also, all the meta data need to be collected into a different ParameterSet
.
The constructor of the ParameterSet
class accepts the following input:
propti.ParameterSet(name: str = None,
params: list[Parameter] = None)
-
name
Optional label for the parameter set. Encouraged to be used for human-readable internal referencing and easier error tracking. -
params
Initial list of the parameters (deep copy).
The ParameterSet
class has the following public methods:
-
upgrade
This method upgrades legacy object instances with missing default values. Used to ensure compatibility of legacy inverse modelling projects with newer PROPTI versions.CAREFUL! Since lists, like params, will be initalised as [], it may cause unrecognised consequences.
-
update
Updates the already existing parameter set with a new one (other
). -
__len__
Returns the length (number of parameters) of the parameter set. -
append
Appends a newParameter
to theParameterSet
(as a deep copy). -
__getitem__
Returns theParameter
at a given index of theParameterSet
-
__str__
When called, it provides a human-readable output of the parameter, based on predefined definitions coded into this method (thus not exposed to the user).
class propti.DataSource
The class
propti.DataSource
is used as a container for experimental and simulation data.
Meta data is stored that identifies the desired data, as well as the data itself. This means the file name and the labels to read a data series. The pandas library is working in the back end to extract the information, thus the labels follow the pandas conventions.
The constructor of the DataSource
class accepts the following input:
propti.DataSource(file_name: str = None,
header_line: int = None,
label_x: str = None,
label_y: str = None,
column_x: int = None,
column_y: int = None,
x_values: list = None,
y_values: list = None,
factor: float = None,
offset: float = None)
-
file_name
Name of the file which contains the desired data, simulation or experiment. -
header_line
Row that contains the column labels. Follows the conventions of pandas data frames (which are working in the back end). -
label_x
Label of the column which contains the information of the x-axis (pandas data frames). -
label_y
Label of the column which contains the information of the y-axis (pandas data frames). -
column_x
Index of the column containing the data series of the x_values (not functional, yet). -
column_y
Index of the column containing the data series of the y_values (not functional, yet). -
x_values
Data of the x-axis (based onlabel_x
orcolumn_x
). -
y_values
Data of the y-axis (based onlabel_y
orcolumn_y
). -
factor
Factor to scale the data on-the-fly. -
offset
Offset to shift the data on-the-fly.
The DataSource
class has the following public methods:
-
__str__
When called, it provides a human-readable output of theDataSource
, based on predefined definitions coded into this method (thus not exposed to the user).
class propti.Relation
propti.Relation
utilises the propti.DataSource
to connect specific experimental data with simulation results. Later, this created relationship is referred to when the fitness of a parameter set is to be evaluated. Multiple relations can be assigned, for example to account for different repetitions under the same experimental conditions.
The constructor of the Relation
class accepts the following input:
propti.Relation(model: DataSource = None,
experiment: DataSource = None,
fitness_method: FitnessMethodInterface=None,
weight: float=1.0)
-
model
Data series that was produced by the model (simulation), which is to be compared to the experimental data. Will be initialised as emptypropti.DataSource
, if no input is provided. -
experiment
Data series of the experimental (target) data, which is to be compared to the model data. Will be initialised as emptypropti.DataSource
, if no input is provided. -
fitness_method
Fitness method used to compare experimental and model data. Available Methods:-
FitnessMethodRMSE(n_points = None, x_def_range = None, scale_fitness = True)
-
FitnessMethodThreshold(threshold_type, threshold_value = None, threshold_range = None, scale_fitness = True)
-
-
weight
Weight factor for total fitness calculation
The Relation
class has the following public methods:
-
read_data
Reads the specified experimental or model data and stores it in theRelation
object. -
map_to_def
Maps the data series to the definition range. Takesfactor
andoffset
, defined inDataSource
, into account. Furthermore, it performs an linear interpolation operation for the mapping. -
__str__
When called, it provides a human-readable output of theRelation
, based on predefined definitions coded into this method (thus not exposed to the user).
class propti.SimulationSetup
A propti.SimulationSetup
is a specific set of information that describes an intended simulation, within an inverse modelling run, completely. It draws upon the classes that have been described above, further meta data is added as well. In general, it merges information on where the simulation is to be executed (working directory), what simulation software template and data shall be used, as well as where to store the results.
One could regard it as an experimental setup, with information on a sample and the conditions to be
tested in. The same material, tested in the same apparatus, but at different conditions would require different SimulationSetup
s.
The constructor of the SimulationSetup
class accepts the following input:
propti.SimulationSetup(name: str,
work_dir: os.path = os.path.join('.'),
model_template: os.path = None,
model_input_file: os.path = 'model_input.file',
model_parameter: ParameterSet = None,
model_executable: os.path = None,
execution_dir: os.path = None,
execution_dir_prefix: os.path = None,
best_dir: os.path = 'best_para',
analyser_input_file: os.path = 'input_analyser.py',
relations: List[Relation] = None)
-
name
Human-readable identifier of theSimulationSetup
. -
work_dir
The working directory, where all the data connected to thisSimulationSetup
is stored. -
model_template
Points to the simulation input file template. -
model_input_file
Name of the simulation input file. It will be created frommodel_template
and themodel_parameter
. -
model_parameter
TheParameterSet
that contains the parameters that are to be worked on by the optimisation algorithm. Note: parameters that describe the environment (experimental conditions), like heat flux, are provided elsewhere. -
model_executable
The simulation software to be used to perform the simulation. This argument will be provided to the command line down the line. -
execution_dir
Location where the model execution (simulation) is performed. In this directory PROPTI creates temporary directories where the model execution is performed. These directories will get a randomised name each, to avoid conflicts during the parallel execution of multiple simulations. When the simulation is finished, the information spefified in theRelation
will be extracted and the remainig files deleted automatically. The extracted information is stored in the data base file. -
execution_dir_prefix
Identifier for the working sub-directories, prepended to the random part of the name. Allows easier error tracking. -
best_dir
Directory for performing simulation(s) with the best parameter set, after the conclusion of overall inverse modelling process.WIP - not functional yet.
-
analyser_input_file
File name used as input for the analyser.WIP - not functional yet. Likely to be deprecated in the future, due to pursueing a different approach for now.
-
relations
List of the relations between experimental and model data.
The SimulationSetup
class has the following public methods:
-
upgrade
This method upgrades legacy object instances with missing default values. Used to ensure compatibility of legacy inverse modelling projects with newer PROPTI versions.Use with care!
-
__str__
When called, it provides a human-readable output of theRelation
, based on predefined definitions coded into this method (thus not exposed to the user).
class propti.SimulationSetupSet
The propti.SimulationSetupSet
class merges all the different SimulationSetup
s. Thus, a complete set of information, defining the overall inverse modelling process, is obtained.
The constructor of the SimulationSetup
class accepts the following input:
propti.SimulationSetupSet(name: str,
setups: List[SimulationSetup] = None)
-
name
Human-readable identifier of theSimulationSetupSet
. -
setups
List of the differentSimulationSetup
s.
The SimulationSetupSet
class has the following public methods:
-
upgrade
This method upgrades legacy object instances with missing default values. Used to ensure compatibility of legacy inverse modelling projects with newer PROPTI versions.CAREFUL! Since lists, like params, will be initalised as [], it may cause unrecognised consequences.
-
__len__
Computes and returns the length of the set (number of elements). -
append
Appends a deep copy of theSimulationSetup
to set. -
__getitem__
Returns specificSimulationSetup
. -
__str__
When called, it provides a human-readable output of theRelation
, based on predefined definitions coded into this method (thus not exposed to the user).
class propti.Version
The propti.Version
class is used to provide information of the current PROPTI version, as well as the utilised simulation software. The user does not need to interact with it directly. When executing propti_prepare.py
propti.Version
will be called automatically. The different version numbers are collected and stored in the pickle.init
file. From there the version information can be extracted. This allows to keep track with which programme versions a specific inverse modelling process was performed. This information may be of interest for tracking down errors. It is also beneficial to expose the software versions for publications. For example, plots can get a label containing information on what software was used to create the data for this specific plot.
Note: This class is heavy WIP!
The constructor of the Version
class contains the following parameters:
propti.Version(flag_propti = 0
flag_exec = 0
ver_propti = self.propti_versionCall()
ver_exec = self.exec_versionCall()
ver_spotpy = spotpy.__version__)
-
flag_propti
-
flag_exec
-
ver_propti
Stores version information of PROPTI. When initialised,propti_version_call
is called. -
ver_exec
Stores version information of simulation software executable. When initialised,exec_version_call
is called. -
ver_spotpy
Stores version information of SPOTPY. When initialised, the build-in methodspotpy.__version__
is called.
The Version
class has the following public methods:
-
__str__
When called, it provides a human-readable output of theVersion
, based on predefined definitions coded into this method (thus not exposed to the user).
The Version
class has the following private methods:
-
propti_version_call
Reads the PROPTI version text file and extracts the version number. -
exec_version_call
Determines the version of the simulation software executable.Note: Right now it is pretty much taylored to FDS.
-
__repr__
propti.create_input_file()
Creates input files for the simulation software. Information on how to construct the file is taken from a simulation setup. The input file will be written to the working directory.
-
setup
TheSimulationSetup
from which the input file is to be created. -
work_dir
Flag to determine if regular inverse modelling process is to be performed, or if the best parameter set is to be simulated, range:['execution', 'best']. -
return
Writes an input file to the directory specified by theSimulationSetup
.
propti.write_input_file()
Writes data to a file, input expected to be string
. The file is created in a specified directory.
-
content
Information to be written to a file, expected to bestring
. -
file_path
File name and path of file to be created. -
return
File written to specified location.
propti.fill_place_holder()
Takes a string that contains specific markers, or place holders. The string gets parsed and the markers are compared with information provided by a ParameterSet
. When a matching place holder is found it is exchanged by the parameter value needed for the simulation.
By convention, the markers are to be encapsulated by the '#' character, for example: '#example_marker#'. This encapsulation is performed automatically by the fill_place_holder
function. Note: The user needs to take care for the encapsulation, when the template file is prepared!
-
template_content
A string containing markers that are encapsulated by the '#' character. -
paras
A ParameterSet from which the parameter information is taken and implemented into the template. -
return
A string where placeholders have been exchanged by parameter values.
propti.read_template()
Reads a specified text file. Returns the content as a string.
-
filename
Name of the file to be read. -
return
File content as string.
propti.run_simulations()
Takes information from the SimulationSetupsSet
to find the necessary simulation input files and start the simulation(s). Multiple simulation setups can be processed at once, which is targeted at cases where the users wants to tests the behaviour of a parameter set in different experimental conditions at once.
Distributes the information to either run_simulation_serial
or run_simulation_mp
, which actually handle simulation execution.
-
setups
Set of simulation setups, describes what simulations need to be executed. -
num_subprocesses
Defines how many sub-processes are to be used per repetition. -
best_para_run
Flag to switch between an inverse modelling run or simulation of best parameter set.WIP - not functional, yet.
-
return
None.
propti.run_simulation_serial()
Executes the simulation of a single simulation setup. Takes information from the SimulationSetup
to find the necessary simulation input files and start the simulation.
-
setup
Simulation setup, describes what simulation needs to be executed. -
best_para_run
Flag to switch between an inverse modelling run or simulation of best parameter set.WIP - not functional, yet.
-
return
None.
propti.run_simulation_mp()
Executes the simulation of multiple simulation setups in parallel. Takes information from the SimulationSetupsSet
to find the necessary simulation input files and start the simulation(s). Multiple simulation setups can be processed at once, which is targeted at cases where the users wants to tests the behaviour of a parameter set in different experimental conditions at once.
-
setups
Set of simulation setups, describes what simulations need to be executed. -
num_threads
Defines how many sub-processes are to be used per repetition. -
return
None.
propti.extract_simulation_data()
WIP - thus no further information is provided here at this point.
propti.map_data()
WIP - thus no further information is provided here at this point.
class propti.SpotpySetup
propti.SpotpySetup
is used to handle the communication with SPOTPY. It is responsible to extract the proper information from SimulationSetupSet
s and transfer it to SPOTPY, as well as transfer guess vectors of new parameter sets back to PROPTI. For now it is mainly tailored to the SCEUA algorithm that is implemented in SPOTPY.
The constructor of the SpotpySetup
class accepts the following input:
propti.SpotpySetup(params: ParameterSet,
setups: SimulationSetupSet,
optimiser: OptimiserProperties)
-
params
Parameter sets that provide information on the parameters which shall be optimised. -
setups
Information that describes the inverse modelling run. -
optimiser
Parameters passed to the optimiser, like the algorithm do be used or the distribution for guess vector sampling.
The SpotpySetup
class has the following public methods:
-
parameters
Provides a guess vector (parameter set) from SPOTPY. -
simulation
Starts the simulation of an individual parameter set that has been generated by SPOTPY, based on information provided by the 'SimulationSetupSet'. Creates input files from templates, fills in parameter values from SPOTPY, executes the simulation, collects the simulation results, deletes the remaining files and gives the simulation results back.Needs a parameter list (vector) as input.
-
evaluation
Reads the target (experimental) data used for the evaluation of the performance of the parameter set against the target data. It returns an array of the target data. -
objectivefunction
Compares the simulation results with the target (experimental) data, by utilising a root mean square error RMSE, implemented in SPOTPY.
The script propti_analyse.py
is a collection of some basic tools to aid the user in accessing their data more easily. This collection is not meant to be the solution to all issues the user might come across. It is rather a starting point for new users, to develop tools they need for themselves.
As a basic concept, the tools are set up that they query the pickle
file to get necessary information. This could be the labels for the different parameters, or number of individuals within a generation of a SCEUA run, and similar information. This ability, to revert back to the pickle
files, makes the tools quite dynamic.
To interact with this script, the user uses the command line and provides a parameter that calls the desired functionality. For example:
python3 path/to/propti/propti_analyse.py --dump_plots .
This calls Python 3 with propti_analyse.py
, the --dump_plots
parameter creates predefined plots of the inverse modeling process (see below). Note the period at the end, this tells propti_analyse.py
where to look for the pickle
file, thus in this example it is assumed the users starts in the main directory of the IMP run.
Note: This script is highly work in progress and my change a lot in the future. Here, only the most robust tools are described, which may be also the most useful to new users.
This creates a number of plots, focused on the SCEUA. Thes following plots are created:
- Scatter plot of the fitness values over the whole IMP run
- Scatter plots of the single parameter values during the IMP, one for each parameter
- Scatter plots for the "best" parameter value per generation, for each parameter
- Boxplots of the fitness values, each generation represented as a boxplot
When using the restart functionality, where the batch script write markers into the database file (csv), this function cleans the database file. It is focussed on the SCEUA. It will create two files, one with only the markers removed. A second file gets the markers removed, as well as the partly completed generations, that might be left over after restart after a system crash.
It is rather simple, in that it just counts the lines between the restart markers and takes only the first n generations times m individuals and stops when no full generation is left.