# 3 The HBV model

## 3.1 Justification of the Hydrological Model

Similar studies have shown that an application of the Hydrologiska Byråns Vattenbalansavdelning (HBV) model is suitable for the Zambezi river basin and areas of a similar climate (Hamududu & Killingtveit, 2016; Shagega et al., 2019). The HBV model is also available on the eWaterCycle platform which makes it suitable for this research.

<figure>
  <img src="Figures/hbv_model_label.png">
  <figcaption>
      Figure 3.1 - HBV model structure (M. Hrachowits, n.d)
  </figcaption>
</figure>

The HBV model is of a simple bucket structure and requires little parameter inputs (Lindström G., et al., 1997). A visualisation of the model is shown in Figure 3.1. The model requires a specification of the exact catchment area, a range of climate-related input data and nine parameters to be estimated. The nine parameters are listed below in Table 1.

<figure>
  <figcaption>
      Tabel 1: Parameters to be estimated for the HBV model
  </figcaption>
  <img src="Figures/Tabel1.jpg"></center>
</figure>

These parameters are optimized to best fit the observed data. The model is calibrated and validated with a 80/20 split. Considering the time range of the available ERA5 dataset the model is calibrated with the period 1 January 1986 to 31 December 2012 and validated for the period 1 January 2013 to 31 December 2019.

## 3.2 Calibrating the HBV Model

The historical data for which the HBV model is calibrated and validated is daily discharge data from measuring station Katima Mulilo, provided by the GRDC data portal (2023). The location of this measuring station is upstream of the Kariba Lake as marked in Figure 2.2. For the full process of calibrating the HBV model refer to Appendix A. 

The HBV model requires inputs which are retrieved from the ERA5 data set. ERA5 is a global climate reanalysis data set made available by the European Centre for Medium Range Weather  Forecasts (ECMWF). The data estimates a range of climate related variables needed for the HBV model from 1940 to the present (Hersbach et al., 2020). The HBV model uses the ERA5 generated precipitation, surface air temperature and shortwave radiation for the catchment area. The potential evapotranspiration is calculated by applying the Makkink equation to the generated ERA5 forcings. The generated ERA5 data appeared to have some corruptions in the period from 1988 to 1992. To avoid large variations in the calibrated HBV model, the ERA5 data is linearly interpolated at the location of these outliers. The initial ERA5 generated forcings and the interpolated results are shown in Figure 3.2 and 3.3 respectively.

<figure>
  <img src="Figures/Initial_ERA5_generated_forcings.jpg">
  <figcaption>
      Figure 3.2 - Initial ERA5 generated forcings
  </figcaption>
</figure>

<figure>
  <img src="Figures/Linearly_interpolated_ERA5_Forcings.png">
  <figcaption>
      Figure 3.3 - Initial ERA5 generated forcings
  </figcaption>
</figure>

Using the interpolated ERA5 generated forcings, two fitting methods are attempted to calibrate the HBV model; the root mean square error method and the Kling Gupta efficiency. For both methods, N=2000 random values, within the realistic bounds for each parameter, are tested as parameters for the HBV model. The model fit and parameter combination that is found to best simulate the observed data in the calibration period is then validated over the stipulated validation period.

### 3.2.1 Root Mean Square Error 

The root mean square error (RMSE) method calculates the absolute square difference between the modelled fit and the observed fit. For N parameter sets the RMSE is calculated, using Equation 2. The RMSE values can range from 0 to ∞, where a RMSE of 0 represents a perfect fit. The parameter set for which the RMSE is the smallest is saved as the most optimal parameter set found.

$${RMSE} = \sqrt {\dfrac{\sum_{0}^{n-1}(O_{i}-S_{i})^2}{{n}}}$$

<p style='text-align: right;'> [Equation 2]


Where: <br>
$O_{i}$ &nbsp;&nbsp; - &nbsp;&nbsp; Observed discharge for data point i <br>
$S_{i}$ &nbsp;&nbsp; - &nbsp;&nbsp; Simulated discharge for data point i <br>
$n$ &nbsp;&nbsp; - &nbsp;&nbsp; Number of data points <br>

### 3.2.2 Kling-Gupta Efficiency (KGE) 

The Kling-Gupta Efficiency (KGE) method is a model fit used in the hydrologic sciences partly due to its ease of computation (Kling et al., 2012). It is computed using Equation 3 and the values can range from -∞ to 1, where 1 is a perfect fit. The KGE value is determined also for N=2000 parameters sets and the parameter set with the highest KGE score is again saved as the most optimal parameter set found.

$${KGE} = 1 - \sqrt{(r-1)^2 + (\frac{\mu_{sim}}{\mu_{obs}} - 1)^2 + (\frac{\sigma_{sim}/\mu_{sim}}{\sigma_{obs}/\mu_{obs}} - 1)^2}$$

<p style='text-align: right;'> [Equation 3]


Where: <br>
$r$ &nbsp;&nbsp; - &nbsp;&nbsp; Correlation coefficient <br>
$\mu_{sim/obs}$ &nbsp;&nbsp; - &nbsp;&nbsp; Mean runoff of the simulated/observed discharge <br>
$\sigma_{sim/obs}$ &nbsp;&nbsp; - &nbsp;&nbsp; Standard deviation of the simulated/observed discharge <br>

## 3.3 Calibration and Validation Results

In running the HBV model with both optimized parameter sets it was clear that the parameters determined with the KGE fitting, better simulated the observed data. The parameters found in the KGE model fit calibration are shown in Table 2. The KGE value of the HBV modelled discharge with these parameters, relative to the observed data, is calculated using Equation 3 to be 0.71. This is indicative for a strong model fit.

<figure>
  <figcaption>
      Tabel 2: Calibrated parameter values for the HBV model
  </figcaption>
  <img src="Figures/Tabel2.jpg">
</figure>

Using this parameter set, the HBV model is run for the calibration and validation period. The initial results are graphed in Figure 3.4. The Figure shows an unrealistic peak in discharge around 1990, including distortions of the model around that time period. As mentioned, this indicates there may be a fault in the ERA5 forcing data that is disproportionately influencing the HBV model. To get a better visualisation of the model accuracy, the data period from 09 January 1987 until 09 May 1992 is stricken from the data. It is chosen to keep the short data period before 1987 because it means having extra data points in the result analysis to determine the return periods of droughts. Figure 3.5 shows the simulated and observed discharge without this period. The model shows a slight underestimation of peak discharges. However, the dry periods are simulated accurately which is critical for the research question. Overall, the calibrated model demonstrates a good ability to simulate the discharge also for the validation period. 

<figure>
  <img src="Figures/Initial simulated discharge.jpg">
  <figcaption>
      Figure 3.4 - Initial simulated discharge
  </figcaption>
</figure>

<figure>
  <img src="Figures/Simulated discharge outliers exlcuded.jpg">
  <figcaption>
      Figure 3.5 - Simulated discharge, outliers excluded
  </figcaption>
</figure>