# Comparing measured and modeled data

In the previous exercise, you probably noticed that the different modeled datasets did not exactly agree on the amount of irradiance, even for the same place and time period.

The way to assess the accuracy of modeled irradiance data is to compare it with high-quality ground measurements, which is the aim of this exercise.

In [None]:
# Install pvlib on Google Colab as this is not a standard package.
!pip install pvlib

In [1]:
import pvlib  # library for PV and solar calculations
import pandas as pd  # library for data analysis
import matplotlib.pyplot as plt  # library for plotting
import numpy as np  # library for math and linear algebra

## Step 1: Retrieve measured data

For this exercise, we will be using measurements from the Cabauw BSRN station (`'station'='CAB'`). The station is selected because of the high-quality, and that quality-control has already been performed on the data.

**Credentials for the BSRN FTP server are available on Learn.**.

In [14]:
# Write your code here
data_measured, meta_measured = 

## Step 2: Resample measurement data to hourly
Since the satellite data we will be working with is hourly, we need to resample the 1-minute measurement data to hourly.

To do this, you can use the built-in function `data_measured.resample('1h').mean()`. Make sure to specify that the index should be left labeled (e.g., the period 00:00 to 00:59 is labeled 00:00).

In [15]:
# Write your code here

## Step 3: Retrieve modeled data

For the comparison, we will only use one modeled dataset, namely CAMS Radiation (see the previous exercise on how to retrieve this).

Retrieve CAMS irradiance data for the same location and period:

In [18]:
# Write your code here
data_model, meta_model = 

Unnamed: 0,Observation period,ghi_extra,ghi_clear,bhi_clear,dhi_clear,dni_clear,ghi,bhi,dhi,dni,Reliability
2020-01-01 00:00:00+00:00,2020-01-01T00:00:00.0/2020-01-01T01:00:00.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
2020-01-01 01:00:00+00:00,2020-01-01T01:00:00.0/2020-01-01T02:00:00.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
2020-01-01 02:00:00+00:00,2020-01-01T02:00:00.0/2020-01-01T03:00:00.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
2020-01-01 03:00:00+00:00,2020-01-01T03:00:00.0/2020-01-01T04:00:00.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
2020-01-01 04:00:00+00:00,2020-01-01T04:00:00.0/2020-01-01T05:00:00.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0


## Step 4: Visual comparison

As a first assessment, it is always useful to plot the data.

There are many useful plots to make, but as a first inspection, a scatter plot works nicely.

What other useful plots can you think of?

In [None]:
# Write your code here

## Step 5: Calculate deviation metrics

While the graphical comparison is a good starting point, it doesn't give any quantitative metrics.

To quantitatively compare two datasets, it is common practice to calculate the following deviation metrics:
- Root mean square deviation
- Absolute mean deviation
- Mean deviation (bias)

Calculate these deviation metrics for solar elevation angles greater than 10 degrees (to avoid using measurements that are affected by shading).

In [None]:
# Write your code here

## Step 6: Compare distribution function

The deviation metrics gave high-level insight into the deviation between the datasets, but did not indicate how and why the data deviated.

To investigate this, it is useful to compare the distribution of observed irradiances between the two datasets.

In this step, plot the distribution function of GHI for the two datasets.

In [19]:
# Write your code here