-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
updated output documentation and some bugfixes
- Loading branch information
1 parent
d03505e
commit 07534a5
Showing
5 changed files
with
41 additions
and
88 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,101 +1,56 @@ | ||
Output | ||
========================= | ||
|
||
The model can produce a range of output files. Output is stored in the output folder as specified in the configurations-file (cfg-file). | ||
The model can produce a range of output files. All output is stored in the output folder as specified in the configurations-file (cfg-file). | ||
|
||
.. note:: | ||
|
||
In addition to these output files, the model settings file (cfg-file) is automatically copied to the output folder. | ||
In addition to the output files listed below, the model settings file (cfg-file) is automatically copied to the output folder. | ||
|
||
.. important:: | ||
|
||
Not all model types provide the output mentioned below. If the 'leave-one-out' or 'single variable' model are selected, only the metrics are stored to a csv-file. | ||
|
||
.. important:: | ||
|
||
Most of the output can only be produced when running a reference model, i.e. when comparing the predictions against observations. | ||
If running a prediction model, only the chance of conflict per polygon is stored to file. | ||
|
||
Selected polygons | ||
------------------ | ||
A shp-file named ``selected_polygons.shp`` contains all polygons after performing the selection procedure. | ||
|
||
Selected conflicts | ||
------------------- | ||
The shp-file ``selected_conflicts.shp`` contains all conflict data points after performing the selection procedure. | ||
|
||
Sampled variable and conflict data | ||
----------------------------------- | ||
During model execution, data is sampled per polygon and time step. | ||
This data contains the geometry and ID of each polygon as well as unscaled variable values (X) and a boolean identifier whether conflict took place or not (Y). | ||
If the model is re-run without making changes to the data and how it is sampled, the resulting XY-array is stored to ``XY.npy``. This file can be loaded again with ``np.load()``. | ||
|
||
If making projections, the Y-part is not available. The remaining X-data is still written to a file ``X.npy``. | ||
|
||
.. note:: | ||
|
||
Note that ``np.load()`` returns an array. This can be further processed with e.g. pandas. | ||
|
||
ML classifier | ||
-------------- | ||
At the end of a reference run, the chosen classifier is fitted with all available XY-data. | ||
To be able to re-use the classifier (e.g. to make predictions), it is pickled to ``clf.pkl``. | ||
|
||
All predictions | ||
------------------ | ||
Per model run, a fraction of the total XY-data is used to make a prediction. | ||
To be able to analyse model output, all predictions (stored as pandas dataframes) made per run are appended to a main output-dataframe. | ||
This dataframe is, actually, the basis of all futher analyes. | ||
When storing to file, this can become a rather large file. | ||
Therefore, the dataframe is converted to npy-file (``raw_output_data.npy``). This file can be loaded again with ``np.load()``. | ||
|
||
.. note:: | ||
|
||
Note that ``np.load()`` returns an array. This can be further processed with e.g. pandas. | ||
|
||
Evaluation metrics | ||
----------------------- | ||
Per model run, a range of metrics are computed to evalute the predictions made. | ||
They are all appended to a dictionary and saved to the file ``evaluation_metrics.csv``. | ||
|
||
ROC-AUC | ||
-------- | ||
To be able to determine the mean of the ROC-AUC score plus its standard deviation, the required data is stored to csv-files. | ||
``ROC_data_tprs.csv`` contains the false positive rates per evaluation, and ``ROC_data_aucs.csv`` the area-under-curve values per run. | ||
|
||
Model prediction per polygon | ||
+-------------------------------+---------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------+ | ||
| File name | Description | Note | | ||
+===============================+=============================================================================================+=============================================================================================+ | ||
| ``selected_polygons.shp`` | Shapefile containing all remaining polygons after selection procedure | | | ||
+-------------------------------+---------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------+ | ||
| ``selected_conflicts.shp`` | Shapefile containing all remaining conflict points after selection procedure | | | ||
+-------------------------------+---------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------+ | ||
| ``XY.npy`` | NumPy-array containing geometry, ID, and scaled data of sample (X) and target data (Y) | can be provided in cfg-file to safe time in next run; file can be loaded with numpy.load() | | ||
+-------------------------------+---------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------+ | ||
| ``X.npy`` | NumPy-array containing geometry, ID, and scaled data of sample (X) | only written in projection run; file can be loaded with numpy.load() | | ||
+-------------------------------+---------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------+ | ||
| ``clf.pkl`` | Pickled classifier fitted with the entirety of XY-data | needed to perform projection run; file can be loaded with pickle.load() | | ||
+-------------------------------+---------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------+ | ||
| ``raw_output_data.npy`` | NumPy-array containing each single prediction made in the reference run | will contain multiple predictions per polygon; file can be loaded with numpy.load() | | ||
+-------------------------------+---------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------+ | ||
| ``evaluation_metrics.csv`` | Various evaluation metrics determined per repetition of the split-sample test repetition | file can e.g. be loaded with pandas.read_csv() | | ||
+-------------------------------+---------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------+ | ||
| ``ROC_data_tprs.csv`` | False-positive rates per repetition of the split-sample test repetition | file can e.g. be loaded with pandas.read_csv(); data can be used to later plot ROC-curve | | ||
+-------------------------------+---------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------+ | ||
| ``ROC_data_aucs.csv`` | Area-under-curve values per repetition of the split-sample test repetition | file can e.g. be loaded with pandas.read_csv(); data can be used to later plot ROC-curve | | ||
+-------------------------------+---------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------+ | ||
| ``output_per_polygon.shp`` | Shapefile containing resulting conflict risk estimates per polygon | for further explanation, see below | | ||
+-------------------------------+---------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------+ | ||
|
||
Conflict risk per polygon | ||
--------------------------- | ||
At the end of all model repetitions, the resulting output dataframe contains multiple predictions for each polygon. | ||
At the end of all model repetitions, the resulting output data frame contains multiple predictions for each polygon. | ||
By aggregating results per polygon, it is possible to assess model output spatially. | ||
|
||
Three main output metrics are calculated per polygon: | ||
|
||
1. The chance of a correct (*CCP*), defined as the ratio of the number of correct predictions made to the overall number of predictions made; | ||
2. The total number of conflicts in the test (*NOC*); | ||
3. The chance of conflict (*COC*), defined as the ration of the number of conflict predictions to the overall number of predictions made. | ||
Three main output metrics are calculated per polygon and saved to ``output_per_polygon.shp``: | ||
|
||
all data | ||
^^^^^^^^^ | ||
1. The number of predictions made per polygon; | ||
2. The number of observed conflicts per polygon; | ||
3. The number of predicted conflicts per polygon; | ||
4. The fraction of correct predictions (*FOP*), defined as the ratio of the number of correct predictions over the total number of predictions made; | ||
5. The chance of conflict (*COC*), defined as the ration of the number of conflict predictions over the total number of predictions made. | ||
|
||
All output metrics (CCP, NOC, COC) are determined based on the entire data set at the end of the run, i.e. without splitting it in chunks. | ||
|
||
The data is stored to ``output_per_polygon.shp``. | ||
|
||
k-fold analysis | ||
^^^^^^^^^^^^^^^^ | ||
The model is repeated several times to eliminate the influence of how the data is split into training and test samples. | ||
As such, the accuracy per run and polygon will differ. | ||
|
||
To account for that, the resulting data set containing all predictions at the end of the run is split in k chunks. | ||
Subsequently, the mean, median, and standard deviation of CCP is determined from the k chunks. | ||
|
||
The resulting shp-file is named ``output_kFoldAnalysis_per_polygon.shp``. | ||
.. important:: | ||
|
||
.. note:: | ||
For projection runs, only the COC can be determined as no conflict observations are used/available. | ||
|
||
In addition to these shp-files, various plots can be stored by using the provided plots-functions. The plots are stored in the output directory too. | ||
Note that the plot settings cannot yet be fully controlled via those functions, i.e. it is more anticipated for debugging. | ||
To create custom-made plots, rather use the shp-files and csv-file. | ||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters