r.edm.eval: port shell/R script to Python (#1001)

The addon is rewritten in Python. This includes some changes in parameter names and functionality to anticipate some additional functionality. Added option to compute stats for multiple prediction layers. Added example to illustrate this option.
OSGeo · Jan 12, 2024 · 73bd437 · 73bd437
1 parent e978f60
commit 73bd437
Show file tree

Hide file tree

Showing 5 changed files with 793 additions and 502 deletions.
diff --git a/src/raster/r.edm.eval/r.edm.eval.html b/src/raster/r.edm.eval/r.edm.eval.html
@@ -1,36 +1,248 @@
 <h2>DESCRIPTION</h2>
 
-<p><em>r.edm.eval</em> is an environmental distribution model evaluation script. This addon is a GRASS GIS wrapper for an R script. The function returns a number of evaluation statistics (on the command line and as a text file with user defined name), including the AUC, maximum sum of specificity and sensitivity (spec. + sens), the maximum true skill statistic (TSS - which is just spec + sens - 1), and the maximum kappa. For all three statistics, the corresponding probability threshold values are also given.
+The <em>r.roc</em> addon returns a few evaluation statistics, 
+including the area under the curve (AUC), the maximum true skill 
+statistic (TSS), and the maximum kappa. For all three statistics, the 
+corresponding probability threshold values are also given. Optionally, 
+it draws the receiver operating characteristic (ROC) curve.
+
+<p>
+The addon takes as input a raster layer with 
+observations(<em>observations</em>) and one or more 
+<em>predictions</em> layers. The <em>observations</em> layer 
+should be encoded with 1 (presence/cases) and 0 (absences/controls). 
+The (<em>predictions</em> layer gives the predicted values (e.g., the 
+outcome of a model). This typically are probability or suitability 
+scores between 0 and 1, but it will work on other scales too. With two 
+or more <em>predictions</em> layers, statistics are calculated for each 
+prediction layer separately. This allows one to compare predictions of 
+different models.
+
+<p>
+By default, the prediction layer is divided in 200 bins (the user can 
+change this number). For each bin, the module computes the cumulative 
+true and false positives (TP, FP), true and false negatives (TN, FN), the 
+true and false positive rate (TPR, FPR), the true negative rate (TNR), 
+Cohen's kappa and the true skill statistic. 
+
+<div class="code"><pre>
+TPR = TP / (TP + FN)
+</pre>
+</div>
+<div class="code"><pre>
+FPR = FP / (FP + TN)
+</pre>
+</div>
+<div class="code"><pre>
+TNR = TN / (TN + FP)
+</pre>
+</div>
+<div class="code"><pre>
+TSS = TPR - FPR
+</pre>
+</div>
+<div class="code"><pre>
+Kappa = 2 * (TP * TN - FN * FP) / ((TP + FP) * (FP + TN) + (TP + FN) * (FN + TN)))
+</pre>
+</div>
+
+<p>
+These values can be saved to a CSV file, together with the 
+corresponding upper and lower boundaries of the bins values. These can 
+be used to calculate additional statistics.
+
+<p>
+With the <em>-b</em> flag, the module will use the fraction of 
+points that is predicted to be presence/true instead of the false 
+positive rate to create the ROC curve and calculate the AUC. Use this 
+when the predictions are created using a model that works with 
+background points instead of (pseudo-)absence points (like Maxent).
+
+<p>
+A problem with the AUC is that it varies with the spatial extent used 
+to select background points (Lobo et al. 2008, Jiménez-Valverde 2012). 
+Generally, the larger that extent, the higher the AUC value. Therefore, 
+AUC values are generally biased and cannot be directly compared. An 
+admittedly simple way to evaluate this issue of changing AUC values 
+with changing extents is to only use (pseudo-)absence or background 
+points within certain distances of the presence points (raster cells). 
+To do so, the user can use the <em>buffer</em> parameter to limit the 
+absence/background points that are used to calculate the various 
+statistics to those within a certain distance from the presence 
+locations/areas. 
+
+<p>
+The user can set the required ratio of presence and total points to be 
+used in the evaluation (<em>preval</em>; value between 0 an 1). This 
+allows the user to test how sensitive the evaluation statistics are for 
+the prevalence of presence points. 
 
-<p>In addition, a csv file (with user-defined name) is created with the number of true and false positives (TP,FP), true and false negatives (TN,FN), the true and false positive rate (TPR,FPR), the true negative rate (TNR) and the three statistics mentioned above are given. The user can use this to create for example the receiver operating characteristic (ROC) curve and other evaluation statistics in her/his favorite software.
-
-<p>The observed distribution map should be a binary map, with 1 for presence and 0 for absence. The modeled distribution normally contains probability scores between 0 and 1, but it will work on other scales too.
+<h2>NOTES</h2>
 
-<p>It is fairly common to use background rather than (pseudo-)absence points in species distribution modeling. To evaluate models that are built using presence and background points, there is the option to add the presence points to the absence points (flag b). If selected, the combined presence-absence points will be used instead of the absence points.
+The module expects an <em>reference</em> raster layer, encoded with 1 
+(presence/cases) and 0 (absences/controls). Optionally, the user can 
+define other values using the <em>absence</em> and <em>presence</em> 
+parameters. If the layer contains more than 2 unique values, or if the 
+layer is not of the type CELL (integer), the module will terminate with 
+a warning.
+
+<p>
+The selection of the best evaluation statistic depends on your data and 
+what you want to test. This module provides a few only, but the user can 
+calculate his/her own using the table in the CSV file. See the 
+References list for a few useful references in that respect.
+
+<p>
+This module depends on the Python <a 
+href="https://pandas.pydata.org/">Pandas library.</a>
+
+<h2>EXAMPLES</h2>
+
+Run this example in the "landsat" mapset of the North Carolina sample 
+data set location. First, create a binary map with herbaceous land cover 
+(1) and other (0). Next, create a raster layer with training pixels 
+using <em>r.learn.train</em> and <em>r.learn.predict</em> from the <a 
+href="https://grass.osgeo.org/grass-stable/manuals/addons/r.learn.ml2.html"></a>r.learn.ml2</a> 
+addon. Note, <em>r.learn.train</em> can compute its own accuracy 
+measures as well.
+
+<div class="code"><pre>
+g.region raster=landclass96
+r.mapcalc "herbaceous = if(landclass96 == 3, 1, 0)"
+r.random input=herbaceous npoints=1000 raster=training_pixels seed=10
+</pre>
+</div>
+
+<p>
+Then we can use these training pixels to perform a classification on 
+the more recently obtained landsat 7 image using <em>r.learn.train</em>.
+
+<div class="code"><pre>
+# train a random forest regression model using r.learn.train
+r.learn.train group=lsat7_2000 training_map=training_pixels \
+	model_name=RandomForestRegressor n_estimators=500 \
+	save_model=rf_model.gz cv=2
+
+# perform prediction using r.learn.predict
+r.learn.predict group=lsat7_2000 load_model=rf_model.gz \
+   output=rf_regressor
+</pre>
+</div>
+
+<p>
+Now we can see how well the model performed using the 
+<em>r.edm.eval</em>.
+
+<div class="code"><pre>
+r.edm.eval -p reference=herbaceous prediction=rf_regressor
+</pre>
+</div>
+
+<p>
+The accuracy measures are printed on screen.
+
+<div class="code"><pre>
+AUC = 0.8362
+maximum TSS = 0.5289
+maximum kappa = 0.4438
+treshold maximum TSS = 0.1118
+treshold maximum kappa = 0.2936
+treshold minimum distance to (0,1) = 0.0979
+</pre>
+</div>
+
+<p>
+And the accompanying ROC curve is shown below (the figure can 
+optionally be saved to file).
+
+<p>
+<div align="center" style="margin: 10px"> <a 
+href="r_edm_eval_01.png"> <img src="r_edm_eval_01.png" alt=" 
+ROC curve" border="0"> </a></div>
+
+<p>
+Let's try how well a logistic regression performs, and compare this 
+with the previous results.
+
+<div class="code"><pre>
+# train a logistic regression model using r.learn.train
+r.learn.train group=lsat7_2000 training_map=training_pixels \
+	model_name=LogisticRegression save_model=lr_model.gz cv=2
+
+# perform prediction using r.learn.predict -p
+r.learn.predict group=lsat7_2000 load_model=lr_model.gz \
+   output=lr_regressor
+</pre>
+</div>
+
+<div class="code"><pre>
+r.edm.eval -p reference=herbaceous prediction=rf_regressor,lr_regressor_1
+</pre>
+</div>
+
+<p>
+The accuracy measures are printed on screen.
+
+<div class="code"><pre>
+AUC = 0.8362
+maximum TSS = 0.5289
+maximum kappa = 0.4438
+treshold maximum TSS = 0.1118
+treshold maximum kappa = 0.2936
+treshold minimum distance to (0,1) = 0.0979
+
+--- performance lr_regressor_1 ---
+AUC = 0.8384
+maximum TSS = 0.5621
+maximum kappa = 0.4669
+treshold maximum TSS = 0.1047
+treshold maximum kappa = 0.1995
+treshold minimum distance to (0,1) = 0.0848
+</pre>
+</div>
+
+And the accompanying ROC curve is shown below (the figure can optionally be saved to file). 
+
+<p>
+<div align="center" style="margin: 10px"> <a href="r_edm_eval_02.png"> 
+<img src="r_edm_eval_02.png" alt=" ROC curve of random forest 
+regression model and logistic regression model." border="0"> </a></div>
 
-<p>A problem with the AUC is that it varies with the spatial extent used to select background points (Lobo et al. 2008, Jim&eacute;nez-Valverde 2012). Generally, the larger that extent, the higher the AUC value. Therefore, AUC values are generally biased and cannot be directly compared. One possible way to address this problem is to use a pairwise distance sampling for presence and absence testing points to remove this bias (Hijmans 2012). Perhaps a simpler way to evaluate this problem of changing AUC values with changing extents is to only sample absence/background points within certain distances of the presence points.
 
-<p>The addon offers the option to limit the area over which the evaluation statistics are calculated. The user can define a buffer (in kilometers) around the absence and presence raster cells (the options <b>buffer_abs</b> and <b>buffer_pres</b> respectively). The inside buffer is obviously only useful when your presence data consists of areas (polygons) rather than point data. Note that currently the script does not check if there are a sufficient number of absence or background points within the given distance of the presence points. The default value is 0, which means no buffer is used.
+<h2>REFERENCES</h2>
 
-<p>To test the sensitivity of the evaluation statistics for the number of evaluation points, it is possible to set the number of presence and absence points to be used (they will be randomly selected) using the <b>num_pres</b> and <b>num_abs</b> parameters (the default, 0, means all points are included). If the number of presence or absence set by the user is larger than the actual number of presence or absence points, this step is silently skipped. Note, however, that the number of points used to calculate the statistics are given in the summary.
+Jiménez-Valverde, A. 2012. Insights into the area under the receiver 
+operating characteristic curve (AUC) as a discrimination measure in 
+species distribution modelling. Global Ecology and Biogeography 21: 
+498-507
 
-<p>The user can set the required ratio of presence and total points to be used in the evaluation (0&lt;<b>preval</b>&lt;1, any value outside this range will be ignored). This allows the user to test how sensitive the evaluation statistics are for the prevalence of presence points. Note that if <b>preval</b> > 0, the num_pre and num_abs parameters will be ignored. Note that if you use the flag to use background points rather than absence points, this <b>preval</b> option will probably not make sense (check the reported prevalence in the summary output).
+<p>
+Lobo, J. M., Jim&eacute;nez-Valverde, A., & Real, R. 2008. AUC: a 
+misleading measure of the performance of predictive distribution 
+models. Global Ecology and Biogeography 17: 145-151.
 
-<h2>NOTES</h2>
+<p>
+Hijmans, R. J. 2012. Cross-validation of species distribution models: 
+removing spatial sorting bias and calibration with a null model. 
+Ecology 93: 679-688.
 
-<p>The selection of the best evaluation statistic depends on your data and what you want to test. This add-on provides a few, but the user can calculate his/her own using the table in the csv file. See the REFERENCES list for a few useful references in that respect.
+<p>
+Allouche, O., Tsoar, A., & Kadmon, R. 2006. Assessing the accuracy of 
+species distribution models: prevalence, kappa and the true skill 
+statistic (TSS). Journal of Applied Ecology 43: 1223-1232.
 
-<p>Besides R, the script uses the R packages <em>rgrass7</em>, <em>reshape</em> and <em>caTools</em>. If they are not installed yet, the script will attempt to do so. This may take some time, so if you are planning to use the script more often, you are advised to install the packages yourself first.
+<p>
+McPherson, J. M., Jetz, W., & Rogers, D. J. 2004. The effects of 
+species' range sizes on the accuracy of distribution models: ecological 
+phenomenon or statistical artefact? Journal of Applied Ecology 41: 
+811-823.
 
+<h2>SEE ALSO</h2>
 
-<h2>REFERENCES</h2>
-<p>Jim&eacute;nez-Valverde, A. 2012. Insights into the area under the receiver operating characteristic curve (AUC) as a discrimination measure in species distribution modelling. Global Ecology and Biogeography 21: 498-507
-<p>Lobo, J. M., Jim&eacute;nez-Valverde, A., & Real, R. 2008. AUC: a misleading measure of the performance of predictive distribution models. Global Ecology and Biogeography 17: 145-151.
-<p>Hijmans, R. J. 2012. Cross-validation of species distribution models: removing spatial sorting bias and calibration with a null model. Ecology 93: 679-688.
-<P>Allouche, O., Tsoar, A., & Kadmon, R. 2006. Assessing the accuracy of species distribution models: prevalence, kappa and the true skill statistic (TSS). Journal of Applied Ecology 43: 1223-1232.
-<p>McPHERSON, J. M., Jetz, W., & Rogers, D. J. 2004. The effects of species' range sizes on the accuracy of distribution models: ecological phenomenon or statistical artefact? Journal of Applied Ecology 41: 811-823.
+<em><a href="https://grass.osgeo.org/grass-stable/manuals/addons/r.confusionmatrix.html">r.confusionmatrix</a></em>
 
 
 <h2>AUTHOR</h2>
 
-Paulo van Breugel <a href="https://ecodiv.earth/contact.html">ecodiv.earth/contact.html</a>
+Paulo van Breugel
+