# Lesson 4 - Compare with P.2108 Clutter Model

In this lesson you'll compare the performance of your clutter propagation model with a model from Recommendation ITU-R P.2108 _Prediction of Clutter Loss_.

### About P.2108
 * __The complete ITU-R Recommendation P.2108__ for the prediction of clutter loss can be found at [Recommendation ITU-R P.2108-1](https://www.itu.int/dms_pubrec/itu-r/rec/p/R-REC-P.2108-1-202109-I!!PDF-E.pdf).
 * __The code repository__ for the U.S. reference implementation of P.2108 can be found at the [NTIA/p2108](https://github.com/NTIA/p2108) GitHub repository.

P.2108 provides three methods (as of version P.2108-1) for estimating signal loss through clutter. The appropriate model should be selected depending on the frequency, environment around terminals, and path type. The three models and their frequency ranges are given below.

1. Height Gain Terminal Correction Model, for 0.03 to 3 GHz
2. Terrestrial Statistical Model, for 2 to 67 GHz
3. Aeronautical Statistical Model, for 10 to 100 GHz

In this lesson we will use the __Terrestrial Statistical Model__ because the measurement data was made with two ground-based terminals and the frequency is 3.5 GHz. A full description of the Terrestrial Statistical Model can be found in section 3.2 of Recommendation ITU-R P.2108-1 linked above.

### Import the P.2108 code library
The [NTIA code repository for P.2108](https://github.com/NTIA/p2108) contains the U.S. Reference Implementation for all three P.2108 clutter loss prediction methods listed above. To use this software we have provided an installable Python package in the following directory path: **`course-materials/packages/p2108-1.0.0-py3-none-any.whl`**. This Python package is provided for the purposes of this tutorial __only__ and is not yet otherwise published. If interested, click "watch" on the [NTIA/p2108](https://github.com/NTIA/p2108) repository to be notified when the Python wrapper is published in the near future.

Execute the following cell to install the P.2108 package in your JupyterLab environment.

In [None]:
! pip install packages/p2108-1.0.0-py3-none-any.whl

### Import the necessary Python libraries

In [None]:
from ITS.ITU.PSeries import P2108
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

### How to use P.2108

We will use the __Terrestrial Statistical Model__ from P.2108. This model is a statistical clutter loss model for terrestrial propagation paths. It is valid for urban and suburban clutter environments. The predicted clutter loss is a correction factor for a single terminal within the clutter. The correction can be applied to both terminals if both are within the clutter (which we will not do since the transmitter is located above the clutter). The model, when applied at only one end of the path, is valid for frequencies between 0.5 and 67 GHz and path lengths of at least 0.25 km.

Let's look at the documentation for the Python function we'll use:

In [None]:
help(P2108.TerrestrialStatisticalModel)

## Examples of calling P.2108

To get a sense for how to use this model, let's predict clutter loss for a variety of percentages of locations at 3.5 GHz and a path length of 0.95 km.

In [None]:
f__ghz = 3.5
d__km = 0.95

clutter_loss_10pct__dB = P2108.TerrestrialStatisticalModel(f__ghz, d__km, p=10)
clutter_loss_50pct__dB = P2108.TerrestrialStatisticalModel(f__ghz, d__km, p=50)
clutter_loss_90pct__dB = P2108.TerrestrialStatisticalModel(f__ghz, d__km, p=90)

print("P.2108 terrestrial statistical model predictions for 3.5 GHz, 0.95 km path:")
print(f"Clutter loss not exceeded for 10% of locations = {clutter_loss_10pct__dB:.1f} dB")
print(f"Clutter loss not exceeded for 50% of locations = {clutter_loss_50pct__dB:.1f} dB")
print(f"Clutter loss not exceeded for 90% of locations = {clutter_loss_90pct__dB:.1f} dB")

The $p$ input lets us explore the statistical bounds of the model. For the propagation of a 3.5 GHz signal, across 0.95 km, with one terminal within the clutter, P.2108 predicts that the median loss ($p = 50\%$) will be 30.0 dB. P.2108 also predicts that 80% (from 10th to 90th percentile) of clutter losses observed will be between 24.4 and 35.5 dB.

### Make P.2108 predictions using Martin Acres dataset path distances

1. Start by loading the Martin Acres measurement data. This is the same data that was introduced in lesson 3, except that the measurements with path distances shorter than 0.25 km have been removed (they're not supported by P.2108). Additionally, the High-TX and Low-TX datasets have been combined into this file.
2. Perform P.2108 predictions with a few different values of $p$

In [None]:
## load the Martin Acres dataset
df = pd.read_csv("data/MartinAcres_Lesson4.csv")

## do P.2108 predictions (ask ), create a new column with those predictions
df["p2108_10pct__dB"] = df.apply(lambda row: P2108.TerrestrialStatisticalModel(row.f__mhz/1000, row.d__km, 10), axis=1)
df["p2108_50pct__dB"] = df.apply(lambda row: P2108.TerrestrialStatisticalModel(row.f__mhz/1000, row.d__km, 50), axis=1)
df["p2108_90pct__dB"] = df.apply(lambda row: P2108.TerrestrialStatisticalModel(row.f__mhz/1000, row.d__km, 90), axis=1)

## quick peek the loaded dataframe and P.2108 results
df.head()

### Plot the P.2108 predictions with the Martin Acres measurement data

In [None]:
fig, ax = plt.subplots(figsize=(11, 6))

## plot the measurement data
data_label = "Measured Loss in Excess of Free Space "
df[df["tx_location"] == "low"].plot.scatter("d__km", "L_excess__db", ax=ax, label=data_label+"(Low TX)")
df[df["tx_location"] == "high"].plot.scatter("d__km", "L_excess__db", c="tab:orange", ax=ax, label=data_label+"(High TX)")

## plot the P.2108 predictions
# for plotting, let's sort the predictions by path distance
sorted_df = df.sort_values("d__km")
sorted_df.plot(ax=ax, x="d__km", y="p2108_90pct__dB", label=r'P.2108 ($p=90\%$)', c='tab:red', ls="--", linewidth=3.0)
sorted_df.plot(ax=ax, x="d__km", y="p2108_50pct__dB", label=r'P.2108 ($p=50\%$)', c='k', linewidth=3.0)
sorted_df.plot(ax=ax, x="d__km", y="p2108_10pct__dB", label=r'P.2108 ($p=10\%$)', c='tab:green', ls="--", linewidth=3.0)

## plot styling
ax.set_xlabel('Path Distance (km)')
ax.set_ylabel('Clutter Loss (dB)')
ax.set_title('Path Distance vs Clutter Loss\nMartin Acres')
ax.legend()
ax.grid(True, which="both", axis="both", alpha=0.3)
ax.minorticks_on()
plt.show()

### How did P.2108 do?
For this dataset, P.2108 appears to accurately capture clutter loss at shorter distances (less than 1 km). At 1.5 km there is a cluster of measurement data not accurately predicted by P.2108. It over predicts the clutter loss for this cluster. If you remember back to lesson 3, this cluster of measurements is from the High TX location. For the High TX, the signal suffers less because the signal passes over most buildings and trees without interference.  

### Compare your model (Lesson 3) to P.2108

Let's take a look at the median predictions from the P.2108 terrestrial statistical model and the model we developed in lesson 3, compared to the measured data.

In [None]:
fig, ax = plt.subplots(figsize=(11, 6))

## plot the measurement data
data_label = "Measured Loss in Excess of Free Space "
df[df["tx_location"] == "low"].plot.scatter("d__km", "L_excess__db", ax=ax, label=data_label+"(Low TX)")
df[df["tx_location"] == "high"].plot.scatter("d__km", "L_excess__db", c="tab:orange", ax=ax, label=data_label+"(High TX)")

## plot the P.2108 predictions with p=50
sorted_df.plot(ax=ax, x="d__km", y="p2108_50pct__dB", label=r'P.2108 ($p=50\%$)', c='k', linewidth=3.0)

## plot your model from lesson 3
df.plot.scatter(ax=ax, x="d__km", y="pred_loss", label='Lesson 3 Model (Median)', s=10, c='tab:green')

ax.set_xlabel('Path Distance (km)')
ax.set_ylabel('Clutter Loss (dB)')
ax.set_title('Comparing the Lesson 3 Model to P.2108\nMartin Acres')
ax.legend()
ax.grid(True, which="both", axis="both", alpha=0.3)
ax.minorticks_on()
plt.show()

We can see that our lesson 3 model seems to better predict clutter loss for these paths. Specifically, we see that our model is better at handling the high-TX data than P.2108. This is not surprising, since the model is based on geometry which accounts for how the elevation angle between terminals impacts the degree of path obstruction by clutter. Additionally, we should not be too surprised since our lesson 3 model was produced by fitting to this exact dataset. We'll explore additional datasets in lesson 5. For now, let's consider how we might quantify differences in performance between multiple models.

You may notice in the plot above that the lesson 3 model doesn't produce a smooth curve. Recall that your model uses _3D Clutter Distance_, the distance that the signal travels before it exits out of the clutter, as its independent variable. The plot above uses _Path Distance_ as its independent variable. Since the two models are differently parameterized, it can be tricky to try and perform a comparison like this. The choice of independent variable when plotting may produce misleading images; the lesson 3 model is a simple and continuous logarithmic function, but the plot above makes its behavior seem erratic and unintuitive.

Let's now look at a few other ways we can quantify model performance.

### Quantify Model Performance by $\mathrm{RMSE}$

One important metric we can use to quantify the performance of a given model when evaluated against measurement data is the root mean square error ($\mathrm{RMSE}$). This is the standard deviation of the residuals, or the difference between the measured and predicted values. A smaller value of the $\mathrm{RMSE}$ indicates that the model is a better fit to the measured data. The $\mathrm{RMSE}$ is especially useful since it provides a single value for each tested dataset, allowing for simple comparisons of prediction performance across models and across datasets.

In [None]:
## residuals from our lesson 3 model were loaded from the CSV already
rmse_lesson3 = np.std(df["error"])

## get residuals for P.2108 median predictions
rmse_p2108 = (df["L_excess__db"] - df["p2108_50pct__dB"]).std()

print(f"For this dataset, the lesson 3 model has RMSE = {rmse_lesson3:.1f} dB")
print(f"For this dataset, the P.2108 model has RMSE = {rmse_p2108:.1f} dB")

### Quantify Performance with Cumulative Distribution Functions (CDFs)

Another way to understand these models is to look at their distributions of clutter loss predictions based on the Path Distance (or 3D Clutter Distance). To do this, plot the CDF of a) the measurement data, b) P.2108, and c) your clutter model. 

Start by defining the clutter model you made in lesson 3. 

In [None]:
## Define your model from Lesson 3
slope = 13.71
y_int = -9.8
model_std = 0 ## standard deviation of the model (lesson 3)

## define the model as a function which takes the 3D Clutter Distance (in meters) as input
def clutter_model(clutter_distance__m):
    return slope * np.log10(clutter_distance__m) + y_int

Next, find the distribution of clutter loss predictions from your model. Ensure that the predicted sample has the same __3D Clutter Distance__ distribution as the measurement data. 

In [None]:
## find the distribution of 3D Clutter Distances in the Martin Acres dataset
clutter_distances_array = np.sort(df["clutter_d__meter"])

## Predict the clutter loss using your lesson 3 clutter model
model_cdf_distri = []
for d in clutter_distances_array:
    model_cdf_distri.append(np.random.normal(clutter_model(d), model_std))

Next, find the distribution of clutter loss predictions from P.2108. Ensure that the predicted sample has the same __Path Distance__ distribution as the measurement data. 

In [None]:
## find the distribution of Path Distances in the Martin Acres dataset
distances_array = np.sort(df["d__km"])

## Predict the clutter loss using P.2108
p2108_cdf_distri = []
for d in distances_array:
    # p2108_cdf_distri.append(P2108.TerrestrialStatisticalModel(df["f__mhz"][0]/1000, d, np.random.randint(1,100)))
    p2108_cdf_distri.append(P2108.TerrestrialStatisticalModel(df["f__mhz"][0]/1000, d, 50))

### Plot the CDFs

In [None]:
fig, ax = plt.subplots(figsize=(11, 6))

meas_N = len(df)
meas_x = np.sort(df["L_excess__db"])
meas_y = np.arange(meas_N) / float(meas_N)
## plot the CDF
ax.plot(meas_x, meas_y, label='Measured Loss in Excess of Free Space', linewidth=3.0)

p2108_x = np.sort(np.array(p2108_cdf_distri))
p2108_y = np.arange(meas_N) / float(meas_N)
## plot the CDF
ax.plot(p2108_x, p2108_y, label='P.2108', linewidth=3.0)

model_x = np.sort(np.array(model_cdf_distri))
model_y = np.arange(meas_N) / float(meas_N)
## plot the CDF
ax.plot(model_x, model_y, label='Lesson 3 Model', linewidth=3.0)

ax.set_xlabel('Clutter Loss (dB)')
ax.set_ylabel('Probability')
ax.set_title('Clutter Loss CDF\nMartin Acres')
ax.grid(True, which="both", axis="both", alpha=0.3)
ax.minorticks_on()
ax.legend(fontsize=14)
plt.show()

The distribution of clutter losses predicted by your model from lesson 3 is close to actual distribution of measured losses in excess of free space in the Martin Acres neighborhood. P.2108 appears to generally overpredict the clutter loss by 3-5 dB. Keeping in mind the same caveats as before about _why_ our model outperforms P.2108 for this dataset, we see the utility in using CDFs to evaluate model performance. The CDF provides a far more detailed view of how our model compares to measurements than the simple $\mathrm{RMSE}$ metric.

### One last thing

Before ending this lesson it's important to discuss a bit more about why P.2108 doesn't perform well with the high TX data. P.2108's Terrestrial Statistical Model assumes that all propagation paths are completely horizontal through clutter (a 0-degree RX elevation angle). It also assumes that propagation paths that are _near_ horizontal will suffer the same loss as a _completely_ horizontal path. This turns out not to be true. If we look at the Martin Acres measurements, the Low TX data has an average RX elevation angle of 1 degree and the High TX data has an average RX elevation angle of 4 degrees. Both are near zero and could be assumed to suffer the same clutter loss as a completely horizontal path. Yet we see a big difference, especially from the High TX dataset. In short, a 4-degree RX elevation angle is substantial enough to break P.2108's underlying assumptions and disrupt its predictive power.

Well done, you've compared your statistical clutter model to the P.2108 clutter model. In the next lesson you'll see how good (or bad) your clutter model performs with other datasets.

End of Lesson 4.