Script structure:

  0. Parameters: set back-up values for parameters, for when the notebook is run manually (_noy_ via pipeline)
  1. Setup:
        * Paths
        * Utils functions
  2. Load Data
        * **Routine data** (DHIS2) already formatted & aggregated (output of pipeline XXX)
        * **Reporting** (DHIS2) pre-computed, already formatted & aggregated (output of pipeline ???)
        * **Shapes** (DHIS2) for plotting (this could be removed if we move the plots to "report/EDA" nb)
  3. Calculate **Reportng Rate (RR)**
        * "**Dataset**": using pre-computed reportings from DHIS2/SNIS (was: "DHIS2")
        * "**Data Element**": using calculated expected nr of report (nr of active facilities) (was: "CONF")
  4. **Export** reporting rate data to `.../data/dhis2/reporting_rate/` as .parquet (and .csv) files for **either**:
        * dataset: "XXX_reporting_rate_**dataset**.parquet" **or**
        * dataelement: "XXX_reporting_rate_**dataelement**.parquet"

-------------------
**Naming harmonization to improve code readability**:

**Reporting Rate** data frames, based on different **methods**:
* follwo this structure: `reporting_rate_<method>`. So:
    * **Dataset** (using pre-computed reporting) : `reporting_rate_dataset`
    * **Data Element** (Diallo 2025) : `reporting_rate_dataelement`

--------------------

### To Do:
* For `DATAELEMENT_METHOD_DENOMINATOR == "PYRAMID_OPEN_FACILITIES"`: **add code** to count OPEN facilities () for **countries with "normal" pyramids** (i.e., when no mixing of facilities and admin levels ... !). Atm only code for Niger, which runs only if `COUNTRY_CODE == NER`. Should add similar (but simpler) code for the rest of the countries (i.e, `COUNTRY_CODE != NER`)
* Check why Data Element **Denominator** `routine_active_facilities` is **calculated at `YEAR` (aggregated) instead of `MONTH`** ... possibly fix this to match granularity of other alternatives for denominator (which are calculated at MONTH level)
* Modify **report notebook** and/or pipeline.py code so that it does not make the **pipeline FAIL** if `reporting_rate_dataset` or `reporting_rate_dataelement` is **not found** (which is now always the case since we only output 1 file at each run!!)

----------------

## Parameters

Set Default values **if _not_ provided by pipeline**<br>
This makes the execution flexible and "safe": nb can be run manually from here or be executed via pipeline, without having to change anything in the code!

In [None]:
# Set BACKUP VALUE: root path - NEVER CHANGE THIS!
if (!exists("SNT_ROOT_PATH")) {
  SNT_ROOT_PATH <- "/home/hexa/workspace" 
}


# Choose to run either DataSet OR DataElement method
if (!exists("REPORTING_RATE_METHOD")) {
  # REPORTING_RATE_METHOD <- "DATASET"  
  REPORTING_RATE_METHOD <- "DATAELEMENT"
}


# Data Elemenet method: Choice of which INDICATORS to use to count the nr of reporting facilities 
# CONF
if (!exists("DATAELEMENT_METHOD_NUMERATOR_CONF")) {
  DATAELEMENT_METHOD_NUMERATOR_CONF <- TRUE # FALSE
}

# SUSP
if (!exists("DATAELEMENT_METHOD_NUMERATOR_SUSP")) {
  DATAELEMENT_METHOD_NUMERATOR_SUSP <- TRUE # FALSE
}

# TEST
if (!exists("DATAELEMENT_METHOD_NUMERATOR_TEST")) {
  DATAELEMENT_METHOD_NUMERATOR_TEST <- TRUE # FALSE
}



# Data Elemenet RR. Choice: which df to use for nr of `EXPECTED_REPORTS` (DENOMINATOR) 
if (!exists("DATAELEMENT_METHOD_DENOMINATOR")) {
  DATAELEMENT_METHOD_DENOMINATOR <- "ROUTINE_ACTIVE_FACILITIES"   
  # DATAELEMENT_METHOD_DENOMINATOR <- "PYRAMID_OPEN_FACILITIES" 
  # DATAELEMENT_METHOD_DENOMINATOR <- "DHIS2_EXPECTED_REPORTS" # ⚠️ only if `REPORTING_RATE_METHOD == "DATASET"` && DataSet is available!! ⚠️
} 


## 1. Setup

### 1.1. Paths

In [None]:
# PROJECT PATHS
CODE_PATH <- file.path(SNT_ROOT_PATH, 'code') # this is where we store snt_utils.r
CONFIG_PATH <- file.path(SNT_ROOT_PATH, 'configuration') # .json config file
DATA_PATH <- file.path(SNT_ROOT_PATH, 'data', 'dhis2')  

### 1.2. Utils functions

In [None]:
source(file.path(CODE_PATH, "snt_utils.r"))

### 1.3. Packages

In [None]:
# List required pcks  ---------------->  check  what are the really required libraries
required_packages <- c("arrow", # for .parquet
                       "tidyverse",
                       "stringi", 
                       "jsonlite", 
                       "httr", 
                       "reticulate")

# Execute function
install_and_load(required_packages)

### 1.3.1. OpenHEXA-specific settings

#### For 📦{sf}, tell OH where to find stuff ...

In [None]:
Sys.setenv(PROJ_LIB = "/opt/conda/share/proj")
Sys.setenv(GDAL_DATA = "/opt/conda/share/gdal")

#### Set environment to load openhexa.sdk from the right path

In [None]:
# Set environment to load openhexa.sdk from the right path
Sys.setenv(RETICULATE_PYTHON = "/opt/conda/bin/python")
reticulate::py_config()$python
openhexa <- import("openhexa.sdk")

### 1.4. Load and check `config` file

In [None]:
# Load SNT config

config_file_name <- "SNT_config.json" 
config_json <- tryCatch({
        jsonlite::fromJSON(file.path(CONFIG_PATH, config_file_name)) 
    },
    error = function(e) {
        msg <- paste0("Error while loading configuration", conditionMessage(e))  
        cat(msg)   
        stop(msg) 
    })

msg <- paste0("SNT configuration loaded from : ", file.path(CONFIG_PATH, config_file_name))
log_msg(msg)

**Save config fields as variables**

In [None]:
# Generic
COUNTRY_CODE <- config_json$SNT_CONFIG$COUNTRY_CODE
ADMIN_1 <- toupper(config_json$SNT_CONFIG$DHIS2_ADMINISTRATION_1)
ADMIN_2 <- toupper(config_json$SNT_CONFIG$DHIS2_ADMINISTRATION_2)

# How to treat 0 values (in this case: "SET_0_TO_NA" converts 0 to NAs)
NA_TREATMENT <- config_json$SNT_CONFIG$NA_TREATMENT

# Which (aggregated) indicators to use to evaluate "activity" of an HF - for Reporting Rate method "Ousmane"
DHIS2_INDICATORS <- names(config_json$DHIS2_DATA_DEFINITIONS$DHIS2_INDICATOR_DEFINITIONS)

# Which reporting rate PRODUCT_UID to use (not that this is a dataset in COD, but 2 dataElements in BFA!)
REPORTING_RATE_PRODUCT_ID <- config_json$SNT_CONFIG$REPORTING_RATE_PRODUCT_UID

In [None]:
# DHIS2_INDICATORS
log_msg(paste("Expecting the following DHIS2 (aggregated) indicators : ", paste(DHIS2_INDICATORS, collapse=", ")))

In [None]:
# Fixed  cols for routine data formatting 
fixed_cols <- c('OU_ID','PERIOD', 'YEAR', 'MONTH', 'ADM1_ID', 'ADM2_ID') # (OU_NAME has homonimous values!)
# print(paste("Fixed routine data (`dhis2_routine`) columns (always expected): ", paste(fixed_cols, collapse=", ")))
log_msg(paste("Expecting the following columns from routine data (`dhis2_routine`) : ", paste(fixed_cols, collapse=", ")))

In [None]:
# Fixed cols for exporting RR tables: to export output tables with consistent structure
fixed_cols_rr <- c('YEAR', 'MONTH', 'ADM2_ID', 'REPORTING_RATE') 

### 1.5. 🔍 Check: at least 1 indicator must be selected
The use can toggle on/off each of the indicators. Therefore, need to make sure at least one is ON. <br>
Alternatively, `CONF` could be made mandatory, but I think it looks better if they're all displayed in the Run pipeline view (more intuitive).

In [None]:
nr_of_indicators_selected <- sum(DATAELEMENT_METHOD_NUMERATOR_CONF, DATAELEMENT_METHOD_NUMERATOR_SUSP, DATAELEMENT_METHOD_NUMERATOR_TEST)

if (nr_of_indicators_selected == 0) {
    msg <- "[ERROR] Error: no indicator selected, cannot perform calculation of reporting rate method 'Data Element'! Select at least one (e.g., `CONF`)."
    cat(msg)   
    stop(msg)
}

## 2. Load Data

### 2.1. **Routine** data (DHIS2) 
already formatted & aggregated (output of pipeline XXX)

In [None]:
# DHIS2 Dataset extract identifier
dataset_name <- config_json$SNT_DATASET_IDENTIFIERS$DHIS2_DATASET_FORMATTED

# Load file from dataset
dhis2_routine <- tryCatch({ get_latest_dataset_file_in_memory(dataset_name, paste0(COUNTRY_CODE, "_routine.parquet")) }, 
                  error = function(e) {
                      msg <- paste("Error while loading DHIS2 routine data file for: " , COUNTRY_CODE, conditionMessage(e))  # log error message
                      cat(msg)
                      stop(msg)
})

msg <- paste0("DHIS2 routine data loaded from dataset : ", dataset_name, " dataframe dimensions: ", paste(dim(dhis2_routine), collapse=", "))
log_msg(msg)

In [None]:
# Ensure correct data type for numerical columns 
dhis2_routine <- dhis2_routine %>%
    mutate(across(c(PERIOD, YEAR, MONTH), as.numeric))

In [None]:
head(dhis2_routine, 3)

#### 🔍 Check expected cols for method **Data Element**, numerator using multiple indicators.
Based on which indicator(s) are selected (if any)

In [None]:
# Initialize empty vector
indicators_selected = c()

# Add elements based on user selection(s)
if (DATAELEMENT_METHOD_NUMERATOR_CONF) {
    indicators_selected = append(indicators_selected, "CONF")
}

if (DATAELEMENT_METHOD_NUMERATOR_SUSP) {
    indicators_selected = append(indicators_selected, "SUSP")
}

if (DATAELEMENT_METHOD_NUMERATOR_TEST) {
    indicators_selected = append(indicators_selected, "TEST")
}

print(paste0("Selected indicators: ", paste(indicators_selected, collapse = ", ")))

In [None]:
# This is kinda useless now but KEEP in case we ADD MORE CHOICES OF INDICATORS!! 
if (DATAELEMENT_METHOD_NUMERATOR_CONF | DATAELEMENT_METHOD_NUMERATOR_SUSP | DATAELEMENT_METHOD_NUMERATOR_TEST) {
    log_msg(paste0("Indicator(s) ", paste(indicators_selected, collapse = ", ") , " selected for calculation of numerator for method `Data Element`." ))
    
    if ( length(which(indicators_selected %in% names(dhis2_routine))) < length(indicators_selected) ) {
    log_msg(paste0("🚨 Warning: one or more of the follow column is missing from `dhis2_routine`: ", paste(expected_col, collapse = ", "), "."), "warning")
    } 

}

### 2.2. **Reporting** pre-computed from DHIS2 
Data granularity:
* **ADM2**
* **MONTH** (PERIOD)

Note: data comes from different dataset (`DS_NAME`): `A SERVICES DE BASE`, `B SERVICES SECONDAIRES`,`D SERVICE HOPITAL` 

The col `DS_METRIC` indicates whether the `VALUE` is `EXPECTED_REPORTS` or `ACTUAL_REPORTS`

In [None]:
# REPORTING_RATE_METHOD <- "DATAELEMENT"  # "DATASET"
REPORTING_RATE_METHOD

In [None]:
if (REPORTING_RATE_METHOD == "DATASET" | DATAELEMENT_METHOD_DENOMINATOR == "DHIS2_EXPECTED_REPORTS") {
    # DHIS2 Dataset extract identifier
    dataset_name <- config_json$SNT_DATASET_IDENTIFIERS$DHIS2_DATASET_FORMATTED
    file_name <- paste0(COUNTRY_CODE, "_reporting.parquet")
    
    # Load file from dataset
    dhis2_reporting <- tryCatch({ get_latest_dataset_file_in_memory(dataset_name, file_name) }, 
                      error = function(e) {
                          msg <- paste("Error while loading DHIS2 pre-computed REPORTING data file for: " , COUNTRY_CODE, conditionMessage(e))  # log error message
                          cat(msg)
                          stop(msg)
    })
    
    msg <- paste0("DHIS2 pre-computed REPORTING data loaded from file `", file_name, "` (from dataset : `", dataset_name, "`). Dataframe dimensions: ", 
                  paste(dim(dhis2_reporting), collapse=", "))
    log_msg(msg)
}

In [None]:
if (REPORTING_RATE_METHOD == "DATASET" | DATAELEMENT_METHOD_DENOMINATOR == "DHIS2_EXPECTED_REPORTS") {
    # Convert VALUE col to <dbl> - should not be needed but keep as safety measure 
    dhis2_reporting <- dhis2_reporting |>
    mutate(across(c(PERIOD, YEAR, MONTH, VALUE), as.numeric))

    head(dhis2_reporting, 3)
    }

In [None]:
# # Convert VALUE col to <dbl> - should not be needed but keep as safety measure 
# dhis2_reporting <- dhis2_reporting |>
# mutate(across(c(PERIOD, YEAR, MONTH, VALUE), as.numeric))

In [None]:
# head(dhis2_reporting, 3)

#### 2.2.1. **Filter** to keep only values for `PRODUCT_UID` defined in config.json

In [None]:
REPORTING_RATE_PRODUCT_ID

In [None]:
if (REPORTING_RATE_METHOD == "DATASET" | DATAELEMENT_METHOD_DENOMINATOR == "DHIS2_EXPECTED_REPORTS") {

     # Handle problems with incorrect configuration - to be improved 🚧
     if (is.null(REPORTING_RATE_PRODUCT_ID)) {
         log_msg("🛑 Problem with definition of REPORTING_RATE_PRODUCT_ID, check `SNT_config.json` file!")
     } else 
         product_name <- dhis2_reporting |> filter(PRODUCT_UID %in% REPORTING_RATE_PRODUCT_ID) |> pull(PRODUCT_NAME) |> unique()
         log_msg(glue::glue("Using REPORTING_RATE_PRODUCT_ID == `{REPORTING_RATE_PRODUCT_ID}`, corresponding to DHIS2 Product name : `{product_name}`."))

    }

In [None]:
if (REPORTING_RATE_METHOD == "DATASET" | DATAELEMENT_METHOD_DENOMINATOR == "DHIS2_EXPECTED_REPORTS") {

    dhis2_reporting_filtered <- dhis2_reporting |>
    filter(PRODUCT_UID %in% REPORTING_RATE_PRODUCT_ID) |>
    select(-PRODUCT_UID, -PRODUCT_NAME) # useless cols now
    
    print(dim(dhis2_reporting_filtered))
    head(dhis2_reporting_filtered)
    
}

#### 2.2.2. Format to produce `dhis2_reporting_expected`
🚨 Note: Use `dhis2_reporting_expected$EXPECTED_REPORTS` as new denominator for REPORTING_RATE calculations (methods dataset and dataelement)

In [None]:
if (REPORTING_RATE_METHOD == "DATASET" | DATAELEMENT_METHOD_DENOMINATOR == "DHIS2_EXPECTED_REPORTS") {
    
    dhis2_reporting_wide <- dhis2_reporting_filtered |> 
    pivot_wider(
        names_from = PRODUCT_METRIC, 
        values_from = VALUE
    )
    
    print(dim(dhis2_reporting_wide))
    head(dhis2_reporting_wide)
    
}

In [None]:
# Use `dhis2_reporting_expected$EXPECTED_REPORTS` as new denomitor for RR calculations (methods ANY and CONF)

if (REPORTING_RATE_METHOD == "DATASET" | DATAELEMENT_METHOD_DENOMINATOR == "DHIS2_EXPECTED_REPORTS") {
    
    dhis2_reporting_expected <- dhis2_reporting_wide |> 
    select(-ACTUAL_REPORTS)
    
    print(dim(dhis2_reporting_expected))
    head(dhis2_reporting_expected)
}

#### 2.2.3. **Checks** on data completeness: _do **periods match** with routine data?_
Lack of perfect overlap in periods between routine data and reporting rate data might create headhaches downstream!<br>
Specifically, **incidence** calculations will show **N2 smaller than N1** due to **aggregation by YEAR when NA** values are present!

In [None]:
if (REPORTING_RATE_METHOD == "DATASET" | DATAELEMENT_METHOD_DENOMINATOR == "DHIS2_EXPECTED_REPORTS") {
    
    # --- Check Year Compatibility ---
    routine_years <- sort(unique(as.integer(dhis2_routine$YEAR))) # as.integer
    expected_years <- sort(unique(as.integer(dhis2_reporting_expected$YEAR))) # as.integer
    
    if (!setequal(routine_years, expected_years)) {
      missing_in_routine <- setdiff(expected_years, routine_years)
      missing_in_expected <- setdiff(routine_years, expected_years)
    
      if (length(missing_in_routine) > 0) {
        log_msg(paste0("🚨 Warning: YEAR value(s) present in 'dhis2_reporting_expected' but not in 'dhis2_routine': ",
                       paste(missing_in_routine, collapse = ", ")))
      }
      if (length(missing_in_expected) > 0) {
        log_msg(paste0("🚨 Warning: YEAR value(s) present in 'dhis2_routine' but not in 'dhis2_reporting_expected': ",
                       paste(missing_in_expected, collapse = ", ")))
      }
    } else {
      log_msg("✅ YEAR values are consistent across 'dhis2_routine' and 'dhis2_reporting_expected'.")
    
      # --- Check Month Compatibility (if years are consistent) ---
      all_years <- unique(routine_years) # Or expected_years, they are the same now
    
      for (year_val in all_years) {
        routine_months_for_year <- dhis2_routine %>%
          filter(YEAR == year_val) %>%
          pull(MONTH) %>%
          unique() %>%
          sort()
    
        expected_months_for_year <- dhis2_reporting_expected %>%
          filter(YEAR == year_val) %>%
          pull(MONTH) %>%
          unique() %>%
          sort()
    
        if (!setequal(routine_months_for_year, expected_months_for_year)) {
          missing_in_routine_months <- setdiff(expected_months_for_year, routine_months_for_year)
          missing_in_expected_months <- setdiff(routine_months_for_year, expected_months_for_year)
    
          if (length(missing_in_routine_months) > 0) {
            log_msg(paste0("🚨 Warning: for YEAR ", year_val, ", MONTH value(s) '", paste(missing_in_routine_months, collapse = ", "),
                           "' present in 'dhis2_reporting_expected' but not in 'dhis2_routine'!"
                           ))
          }
          if (length(missing_in_expected_months) > 0) {
            log_msg(paste0("🚨 Warning: for YEAR ", year_val, ", MONTH value(s) '", paste(missing_in_expected_months, collapse = ", "), 
                           "' present in 'dhis2_routine' but not in 'dhis2_reporting_expected'!"
                           ))
          }
        } else {
          log_msg(paste0("✅ For year ", year_val, ", months are consistent across both data frames."))
        }
      }
    }

}

### 2.3. **Pyramid** to count OPEN facilities (denominator)
Table (and column) needed for denominator of "Data Element" reporting rate if choice == `PYRAMID_OPEN_FACILITIES`

In [None]:
# DATAELEMENT_METHOD_DENOMINATOR <- "PYRAMID_OPEN_FACILITIES"
DATAELEMENT_METHOD_DENOMINATOR

#### **Raw** pyramid

In [None]:
if (REPORTING_RATE_METHOD == "DATAELEMENT" && DATAELEMENT_METHOD_DENOMINATOR == "PYRAMID_OPEN_FACILITIES") {
    
    # DHIS2 Dataset extract identifier
    dataset_name <- config_json$SNT_DATASET_IDENTIFIERS$DHIS2_DATASET_EXTRACTS
    
    # Load file from dataset
    dhis2_pyramid_raw <- tryCatch({ get_latest_dataset_file_in_memory(dataset_name, paste0(COUNTRY_CODE, "_dhis2_raw_pyramid.parquet")) }, 
                      error = function(e) {
                          msg <- paste("Error while loading DHIS2 pyramid RAW data file for: " , COUNTRY_CODE, conditionMessage(e))  # log error message
                          cat(msg)
                          stop(msg)
    })
    
    msg <- paste0("DHIS2 pyramid data loaded from dataset : `", dataset_name, "`. Dataframe dimensions: ", paste(dim(dhis2_pyramid_raw), collapse=", "))
    log_msg(msg)
    
    head(dhis2_pyramid_raw)
    
}

## 3. Calculate **Reporting Rate** (RR)
We compute it using 2 approaches, user can decided later on which one to use for incidence adjustment.

## 3.1. "**Dataset**" reporting rate: pre-computed, from **DHIS2**
Exrtacted from DHIS2 and formatted. 

Straightforward: `ACTUAL_REPORTS` / `EXPECTED_REPORTS` (just pivot `DS_METRIC` and divide)

In [None]:
if (REPORTING_RATE_METHOD == "DATASET" | DATAELEMENT_METHOD_DENOMINATOR == "DHIS2_EXPECTED_REPORTS") {

    reporting_rate_dataset <- dhis2_reporting_wide |> 
    mutate(REPORTING_RATE = ACTUAL_REPORTS / EXPECTED_REPORTS)
    
    print(dim(reporting_rate_dataset))
    head(reporting_rate_dataset, 3)
}

#### Quick data quality check 🔍

In [None]:
# --- Define function ---------------------------
inspect_reporting_rate <- function(data_tibble) {

  # Dynamically get the name of the tibble passed to the function
  # Extract the litteral name of the variable passed (e.g., "reporting_rate_dhis2_month")
  tibble_name_full <- deparse(substitute(data_tibble))

  # Extract the 'method' part from the tibble name
  method <- stringr::str_extract(tibble_name_full, "(?<=reporting_rate_).*") # "(?<=reporting_rate_).*?(?=_month)"

  # Calculations for proportion of values > 1
  values_greater_than_1 <- sum(data_tibble$REPORTING_RATE > 1, na.rm = TRUE)
  total_values <- length(data_tibble$REPORTING_RATE)

  if (total_values > 0) {
    proportion <- values_greater_than_1 / total_values * 100
    min_rate <- min(data_tibble$REPORTING_RATE, na.rm = TRUE)
    max_rate <- max(data_tibble$REPORTING_RATE, na.rm = TRUE)
  } else {
    proportion <- 0
    min_rate <- NA # Set to NA if no values to calculate min/max
    max_rate <- NA # Set to NA if no values to calculate min/max
  }

  if (proportion == 0) {
      clarification = NULL
  } else {
      clarification = " (there are more reports than expected)"
  }

  # Print the formatted result
  log_msg(
    paste0(
      "🔍 For reporting rate method : `", method, "`, the values of REPORTING_RATE range from ", round(min_rate, 2),
      " to ", round(max_rate, 2),
      ", and ", round(proportion, 2), " % of values are >1", clarification, "."
    )
  )

  # Histogram
  hist(data_tibble$REPORTING_RATE, 
     breaks = 50)
}

In [None]:
if (REPORTING_RATE_METHOD == "DATASET" | DATAELEMENT_METHOD_DENOMINATOR == "DHIS2_EXPECTED_REPORTS") {
    inspect_reporting_rate(reporting_rate_dataset)
    }

#### Subset cols

In [None]:
if (REPORTING_RATE_METHOD == "DATASET" | DATAELEMENT_METHOD_DENOMINATOR == "DHIS2_EXPECTED_REPORTS") {
    reporting_rate_dataset <- reporting_rate_dataset |> 
    select(all_of(fixed_cols_rr))
    
    dim(reporting_rate_dataset)
    head(reporting_rate_dataset, 3)
}

#### Plot by MONTH (heatmap)

In [None]:
if (REPORTING_RATE_METHOD == "DATASET") {
    
    # Plot reporting rate heatmap
    options(repr.plot.width = 20, repr.plot.height = 10) 
    
    # reporting_rate_conf_month %>%
    reporting_rate_dataset %>%
    mutate(
        DATE = as.Date(paste0(YEAR, "-", MONTH, "-01"))
        ) %>%
    ggplot(., aes(x = DATE,  
                  y = factor(ADM2_ID), 
                  fill = REPORTING_RATE * 100)
          ) + 
      geom_tile() +
      scale_fill_viridis_c(
        option = "C",
        direction = 1,  # blue = low, yellow = high
        limits = c(0, 100),
        name = "Reporting rate (%)"
      ) +
      labs(
        title = "Monthly Reporting Rate by Health District -  Method 'DataSet'",
        subtitle = "Each tile represents the reporting completeness per district per month",
        x = "Month",
        y = "Health District"
      ) +
      theme_minimal(base_size = 13) +
      theme(
        axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5, size = 9),
        axis.text.y = element_text(size = 9),
        plot.title = element_text(face = "bold", hjust = 0.5, size = 14),
        plot.subtitle = element_text(hjust = 0.5, size = 12),
        legend.position = "right",
        panel.grid = element_blank()
      )
}

----------------------------

## 3.2. **Data Element** reporting rate: based on reporting of one or more indicators
**_Partially_ following methods by WHO and as per Diallo (2025) paper**

To accurately measure data completeness, we calculate the **monthly** reporting rate per **ADM2**, as the **proportion** of **facilities** (HF or `OU_ID`) that in a given month submitted data for either a single indicator (i.e., **confirmed** malaria case as `CONF`) or for _any_ of the chosen indicators (i.e., `CONF`, `SUSP`, `TEST`). 

Basically, "Data Element" reporting rate is the number of facilities reporting on 1 or more given indicators, over the total number of facilities.<br>

For this method the user is allowed to **chose** how to calculate both the **numerator** and the **denominator**.<br> 
Specifically:
* **Numerator**: is the number of **facilities that _actually reported_** data, and it is estimated based on whether a facility (FoSa, or HF, or `OU_ID`) **submitted data** for **_any_** of the following **indicators**:
    * `CONF`: confirmed malaria cases and/or
    * `SUSP`: suspected malaria cases and/or
    * `TEST`: tested malaria cases <br>
    Note: we **recommend** always including `CONF` because it is a core indicator consistently tracked across the dataset. This choice ensures alignment with the structure of the incidence calculation, which is also mainly based on confirmed cases.

    <br>
      
* **Denominator**: is the number of **facilities _expected_ to report**. This number can be obtained in two different ways:
    * `"DHIS2_EXPECTED_REPORTS"`: uses the col `EXPECTED_REPORTS` from the df `dhis2_reporting_expected`.<br>
      This is obtained directly from DHIS2, and is the same denominator used to calculate the "Dataset" reporting rate.
    * `"ROUTINE_ACTIVE_FACILITIES"`: uses the col `EXPECTED_REPORTS` from the df `active_facilities`.<br>
      This is calculated as the number of "**active**" facilities (`OU_ID`), defined as those that submitted _any_ data **at least once in a given year**, across ***all*** indicators extracted in `dhis2_routine` (namely: all aggregated indicators as defined in the SNT_config.json file, see: `config_json$DHIS2_DATA_DEFINITIONS$DHIS2_INDICATOR_DEFINITIONS`)

<br>

This method improves over simple binary completeness flags by accounting for both spatial (facility coverage) and temporal (monthly timeliness) dimensions. <br>

### Calculate the **numerator**

**Note**: the col `REPORTED` keeps the same name regardless of the value of `DATAELEMENT_METHOD_NUMERATOR` because 
in this way the code needs to be parametrized only once (here).


In [None]:
if (REPORTING_RATE_METHOD == "DATAELEMENT") {

dhis2_routine_active <- dhis2_routine %>%
    mutate(
        # if_any() returns TRUE if the condition is met for any of the selected columns
        ACTIVE = if_else(if_any(all_of(indicators_selected), ~ !is.na(.x)), 1, 0)
    )

log_msg(paste0("Evaluating reporting facilities based on indicators: ", paste(indicators_selected, collapse = ", "), "."))

dim(dhis2_routine_active)
head(dhis2_routine_active, 3)

}

In [None]:
# --- 1.  Calculate `SUBMITTED_REPORTS` as the nr of ACTIVE facilities (that REPORTED, each month) ------------------------

if (REPORTING_RATE_METHOD == "DATAELEMENT") {

dhis2_routine_submitted <- dhis2_routine_active %>% # OLD: dhis2_routine_reporting_month <- dhis2_routine_reporting %>%
  group_by(ADM2_ID, YEAR, MONTH) %>% 
  summarise(
    SUBMITTED_REPORTS = sum(ACTIVE, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  ungroup() %>%  
    mutate(YEAR = as.integer(YEAR),
           MONTH = as.integer(MONTH)
          ) 

print(dim(dhis2_routine_submitted))
head(dhis2_routine_submitted, 3)

}

### Calculate the **denominator**

#### Option: `ROUTINE_ACTIVE_FACILITIES`
This is to be used **only when** `DATAELEMENT_METHOD_DENOMINATOR ==`**`ROUTINE_ACTIVE_FACILITIES`** 

In [None]:
# DATAELEMENT_METHOD_DENOMINATOR <- "DHIS2_EXPECTED_REPORTS" 
# DATAELEMENT_METHOD_DENOMINATOR <- "ROUTINE_ACTIVE_FACILITIES"
DATAELEMENT_METHOD_DENOMINATOR

In [None]:
# Calculate the tot nr of facilities (distinct OU_ID) based on all HF that appear in the routine data (each YEAR)
# meaning: regardless of what indicators they submit data for, as long as they have submitted something

if (REPORTING_RATE_METHOD == "DATAELEMENT" && DATAELEMENT_METHOD_DENOMINATOR == "ROUTINE_ACTIVE_FACILITIES") {
    routine_active_facilities <- dhis2_routine %>%
    # Keep only rows where at least one indicator has non-NA value
    filter(if_any(any_of(DHIS2_INDICATORS), ~ !is.na(.))) %>%
    group_by(YEAR, ADM2_ID) %>%
    summarize(
      EXPECTED_REPORTS = n_distinct(OU_ID),
      .groups = "drop" # remove grouping 
    )

    nr_of_rows <- nrow(routine_active_facilities)
    log_msg(glue::glue("Produced df `routine_active_facilities`, with column `EXPECTED_REPORTS` calculated from DHIS2 routine data. Dataframe `routine_active_facilities` has {nr_of_rows} rows."))

    head(routine_active_facilities, 3)
    
} 


#### Option: `PYRAMID_OPEN_FACILITIES`
This is to be used **only when** `DATAELEMENT_METHOD_DENOMINATOR ==`**`PYRAMID_OPEN_FACILITIES`** 

⚠️⚠️⚠️ **To be ADDED**: method for **countries with "normal" pyramids** (i.e., when no mixing of facilities and admin levels ... !) ⚠️⚠️⚠️

In [None]:
# ⚠️⚠️⚠️ Add here code that does the counts the nr of open facilities (as below) BUT without the step of separating hospitals and other FoSa's 
# before flagging open and closed facilities per month ...

🚨 Specific to **Niger EnDoP**: Pre-processing needed to separate facilities from adm levels!! 🚨

Specifically:
* **Hospital**s (HD a Hopital District): at **level 4** together with Aires de Sante
* All other **FoSa**s: at **level 6**, also mixed with the hospital units

Therefore, to assigned closed/open status, it is necessary to attach to each individual facility the closng and opening data column. 
To do this: 
1) extract list of facilities and id across the 2 levels (4 and 6) and
2) calculate the nr of open facilities per MONTH (PERIOD) per ADM2, ending up with a df with cols: `ADM2_ID`, `YEAR`, `MONTH`, `OPEN_FACILITIES_COUNT` = `EXPECTED_REPORTS`
3) add this to the df with the **numerator** (`dhis2_routine_submitted`)

In [None]:
if (REPORTING_RATE_METHOD == "DATAELEMENT" && DATAELEMENT_METHOD_DENOMINATOR == "PYRAMID_OPEN_FACILITIES") {
    
# names(dhis2_pyramid_raw)
dim(dhis2_pyramid_raw)
head(dhis2_pyramid_raw, 3)
    
}

#### 1. Create list of facilities 
Separate "facilities" (of any type, such as hospitals to CSI, Infermieres etc) from admin levels and hospital units (wards, depts...)

In [None]:
# Helpers to detect Aires and Hospitals:

if (COUNTRY_CODE ==  "NER" && REPORTING_RATE_METHOD == "DATAELEMENT" && DATAELEMENT_METHOD_DENOMINATOR == "PYRAMID_OPEN_FACILITIES") {
    
is_aire_l5     <- function(x) str_detect(x, regex("^\\s*aire[^a-zA-Z]?", ignore_case = TRUE))
is_hospital_l4 <- function(x) str_detect(x, regex("^(hd|chr|chu|hgr)", ignore_case = TRUE))

}

In [None]:
# List of all FoSa (from Aires → Level 6)

if (COUNTRY_CODE ==  "NER" && REPORTING_RATE_METHOD == "DATAELEMENT" && DATAELEMENT_METHOD_DENOMINATOR == "PYRAMID_OPEN_FACILITIES") {
    
fosa_master <- dhis2_pyramid_raw %>%
  filter(is_aire_l5(LEVEL_5_NAME)) %>%
  distinct(
    OU_ID   = LEVEL_6_ID,
    OU_NAME = LEVEL_6_NAME,
    region        = LEVEL_2_NAME,
    district      = LEVEL_3_NAME,
    ADM2_ID = LEVEL_3_ID,
    DATE_OPENED = OPENING_DATE, 
    DATE_CLOSED = CLOSED_DATE
  ) %>%
  mutate(OU_TYPE = "FoSa")

dim(fosa_master)
head(fosa_master)
}

In [None]:
# List of all Hospitals (from Level 4, aggregate dates across children)

if (COUNTRY_CODE ==  "NER" && REPORTING_RATE_METHOD == "DATAELEMENT" && DATAELEMENT_METHOD_DENOMINATOR == "PYRAMID_OPEN_FACILITIES") {
    
hosp_master <- dhis2_pyramid_raw %>%
filter(is_hospital_l4(LEVEL_4_NAME)) %>%
group_by(LEVEL_4_ID, LEVEL_4_NAME, LEVEL_2_NAME, LEVEL_3_NAME, LEVEL_3_ID) %>%
summarise(
    OPENING_DATE = suppressWarnings(min(OPENING_DATE, na.rm = TRUE)),
    CLOSED_DATE = suppressWarnings(max(CLOSED_DATE, na.rm = TRUE)),
    .groups = "drop"
) %>%
mutate(
    DATE_OPENED = ifelse(is.infinite(OPENING_DATE), NA, OPENING_DATE) |> as_datetime(),
    DATE_CLOSED = ifelse(is.infinite(CLOSED_DATE), NA, CLOSED_DATE) |> as_datetime()
    ) %>%
distinct(
 OU_ID = LEVEL_4_ID, 
 OU_NAME = LEVEL_4_NAME,
 region=LEVEL_2_NAME,
 district=LEVEL_3_NAME,
 ADM2_ID=LEVEL_3_ID,
 DATE_OPENED,
 DATE_CLOSED
) %>%
mutate(
 OU_TYPE = "Hospital"
 )

dim(hosp_master)
head(hosp_master)

}

In [None]:
# Merge both

if (COUNTRY_CODE ==  "NER" && REPORTING_RATE_METHOD == "DATAELEMENT" && DATAELEMENT_METHOD_DENOMINATOR == "PYRAMID_OPEN_FACILITIES") {
    
facility_master <- bind_rows(fosa_master, hosp_master) %>% 
  select(ADM2_ID, 
         OU_ID, 
         DATE_OPENED, 
         DATE_CLOSED)

dim(facility_master)
head(facility_master, 3)

}

#### 2. Calculate nr of facilities open each MONTH per ADM2

In [None]:
# Define start and end period based on routine data 

if (COUNTRY_CODE ==  "NER" && REPORTING_RATE_METHOD == "DATAELEMENT" && DATAELEMENT_METHOD_DENOMINATOR == "PYRAMID_OPEN_FACILITIES") {
    
PERIOD_START <- dhis2_routine$PERIOD |> min()
PERIOD_END <- dhis2_routine$PERIOD |> max()

print(paste0("Start period: ", PERIOD_START))
print(paste0("End period :", PERIOD_END))

}

In [None]:
## Create a "complete" grid of every month and year for the period range  ---------------------------------------

if (COUNTRY_CODE ==  "NER" && REPORTING_RATE_METHOD == "DATAELEMENT" && DATAELEMENT_METHOD_DENOMINATOR == "PYRAMID_OPEN_FACILITIES") {
    
months_grid <- tibble(
  month_date = seq(
    ymd(paste0(PERIOD_START, "01")), # Converts 202201 to "20220101" and then to a date
    ymd(paste0(PERIOD_END, "01")),   # same
    by = "months"
  )
) %>%
  mutate(
    YEAR = year(month_date),
    MONTH = month(month_date)
  )

dim(months_grid)    
head(months_grid, 3)

}

In [None]:
## Create a "complete" grid of every ADM2_ID for every month  ---------------------------------------

if (COUNTRY_CODE ==  "NER" && REPORTING_RATE_METHOD == "DATAELEMENT" && DATAELEMENT_METHOD_DENOMINATOR == "PYRAMID_OPEN_FACILITIES") {
    
# This ensures that even if an ADM2_ID has zero open facilities in a month,
# it will still appear in the final result with a count of 0.
complete_grid <- expand_grid(
  ADM2_ID = unique(facility_master$ADM2_ID),
  month_date = months_grid$month_date
) %>%
  mutate(
    YEAR = year(month_date),
    MONTH = month(month_date)
  )

head(complete_grid, 3)

}

In [None]:
## Calculate the number of open facilities  ---------------------------------------

# # The facility must have opened on or before the last day of the current month. 
# # To calculate the last day: add one month and subtract one day from the first day.
# complete_grid$month_date[1]                       # "2022-01-01"
# complete_grid$month_date[1] + months(1) - days(1) # "2022-01-31"
# # The facility must either still be open (DATE_CLOSED is NA) OR it must have closed on or after the first day of that month.

if (COUNTRY_CODE ==  "NER" && REPORTING_RATE_METHOD == "DATAELEMENT" && DATAELEMENT_METHOD_DENOMINATOR == "PYRAMID_OPEN_FACILITIES") {
    
open_facilities_count <- facility_master %>%
  # Create a row for every possible combination of facility and month
  crossing(months_grid) %>%
  # A facility is "open" if it opened BEFORE the end of the month
  # AND it either never closed (NA) or closed AFTER the start of the month.
  filter(
    DATE_OPENED <= month_date + months(1) - days(1) & # opened on or before the last day of the current month
      (is.na(DATE_CLOSED) | DATE_CLOSED >= month_date) # 
  ) %>%
  # Count the number of open facilities for each area and month
  count(ADM2_ID, YEAR, MONTH, name = "OPEN_FACILITIES_COUNT")

head(open_facilities_count, 3)
}

In [None]:
## Join the counts back to the complete grid to include zeros --------------------------------------

if (COUNTRY_CODE ==  "NER" && REPORTING_RATE_METHOD == "DATAELEMENT" && DATAELEMENT_METHOD_DENOMINATOR == "PYRAMID_OPEN_FACILITIES") {
    
pyramid_open_facilities <- complete_grid %>%
  left_join(open_facilities_count, by = c("ADM2_ID", "YEAR", "MONTH")) %>%
  # If a month had no open facilities, the count will be NA. Change it to 0.
  mutate(OPEN_FACILITIES_COUNT = replace_na(OPEN_FACILITIES_COUNT, 0)) %>% # DENOMINATOR: use same col name as other methods 
  select(ADM2_ID, YEAR, MONTH, 
         EXPECTED_REPORTS = OPEN_FACILITIES_COUNT) %>%
  arrange(ADM2_ID, YEAR, MONTH)

print(dim(pyramid_open_facilities))
head(pyramid_open_facilities, 3)

}

### Calculate **Reporting Rate** 

**Join df for Denominator**

**Note**<br>
in both df's (`dhis2_reporting_expected` OR `routine_active_facilities`) the col `EXPECTED_REPORTS` has the same name to simplify parametrization: only difference between the 2 options is the df to be joined (right element in `left_join()`)

In [None]:
DATAELEMENT_METHOD_DENOMINATOR

In [None]:
# --- 2. Join `dhis2_reporting_expected` OR `dhis2_calculated_expected` to add `EXPECTED_REPORTS` ------------------------------------------------

if (REPORTING_RATE_METHOD == "DATAELEMENT") {

# Parametrized based on DATAELEMENT_METHOD_DENOMINATOR: left_join() the respective df
if (DATAELEMENT_METHOD_DENOMINATOR == "DHIS2_EXPECTED_REPORTS") {
    # Add df of rep rate extracted directly from DHIS2
    dhis2_routine_submitted_expected <- left_join(
    dhis2_routine_submitted, 
    dhis2_reporting_expected |> select(ADM2_ID, YEAR, MONTH, EXPECTED_REPORTS), # `dhis2_reporting_expected`
    by = join_by(ADM2_ID, YEAR, MONTH)
    ) 
    log_msg("Calculating `Data Element` reporting rate, using as denominator `EXPECTED_REPORTS` from DHIS2.")
    
} else if (DATAELEMENT_METHOD_DENOMINATOR == "ROUTINE_ACTIVE_FACILITIES") {
    # Add df of rep rate CALCULATED based on submissiosn in dhis2 routine data "active" facilities
    dhis2_routine_submitted_expected <- left_join(
    dhis2_routine_submitted, 
    routine_active_facilities, # has only cols: `YEAR`, `ADM2_ID`, `EXPECTED_REPORTS`
    by = join_by(ADM2_ID, YEAR) #, MONTH)
    ) 
    log_msg("Calculating `Data Element` reporting rate, using as denominator `EXPECTED_REPORTS` as CALCULATED from routine data to extract the number ACTIVE facilities 
            (reporting on any of the extracted indicators at least once per year).")
    
}  else if (DATAELEMENT_METHOD_DENOMINATOR == "PYRAMID_OPEN_FACILITIES") {
   # Add df of rep rate CALCULATED based on OPEN facilities as per PYRAMID RAW
    dhis2_routine_submitted_expected <- left_join(
    dhis2_routine_submitted, 
    pyramid_open_facilities, 
    by = join_by(ADM2_ID, YEAR, MONTH)
    ) 
    log_msg("Calculating `Data Element` reporting rate, using as denominator `EXPECTED_REPORTS` as CALCULATED from DHIS2 pyramid. 
            This method counts the nr of OPEN facilities (Hospitals + FoSa) for each ADM2 per MONTH.")
}

# Safety measures ...
dhis2_routine_submitted_expected <- dhis2_routine_submitted_expected |>
  # ungroup() %>%  
  mutate(YEAR = as.integer(YEAR),
         MONTH = as.integer(MONTH)
          ) 


print(dim(dhis2_routine_submitted_expected))
head(dhis2_routine_submitted_expected, 3)

}

In [None]:
# --- 3. Calculate `REPORTING_RATE` ------------------------------------------------

if (REPORTING_RATE_METHOD == "DATAELEMENT") {
    
reporting_rate_dataelement <- dhis2_routine_submitted_expected |>
mutate(
    REPORTING_RATE = SUBMITTED_REPORTS / EXPECTED_REPORTS
  ) 

dim(reporting_rate_dataelement)
head(reporting_rate_dataelement, 3)

}

#### Quick data quality check 🔍

In [None]:
if (REPORTING_RATE_METHOD == "DATAELEMENT") {
    inspect_reporting_rate(reporting_rate_dataelement)
}


#### Subset cols

In [None]:
if (REPORTING_RATE_METHOD == "DATAELEMENT") {

    reporting_rate_dataelement <- reporting_rate_dataelement |> 
    select(all_of(fixed_cols_rr))
    
    head(reporting_rate_dataelement, 3)
}

#### Plot by MONTH (heatmap)

In [None]:
if (REPORTING_RATE_METHOD == "DATAELEMENT") {

# Plot reporting rate heatmap
options(repr.plot.width = 20, repr.plot.height = 10) 

# reporting_rate_conf_month %>%
reporting_rate_dataelement %>%
mutate(
    DATE = as.Date(paste0(YEAR, "-", MONTH, "-01"))
    ) %>%
ggplot(., aes(x = DATE,  
              y = factor(ADM2_ID), 
              fill = REPORTING_RATE * 100)
      ) + 
  geom_tile() +
  scale_fill_viridis_c(
    option = "C",
    direction = 1,  # blue = low, yellow = high
    limits = c(0, 100),
    name = "Reporting rate (%)"
  ) +
  labs(
    title = "Monthly Reporting Rate by Health District - Method 'DataElement'",
    subtitle = "Each tile represents the reporting completeness per district per month",
    x = "Month",
    y = "Health District"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5, size = 9),
    axis.text.y = element_text(size = 9),
    plot.title = element_text(face = "bold", hjust = 0.5, size = 14),
    plot.subtitle = element_text(hjust = 0.5, size = 12),
    legend.position = "right",
    panel.grid = element_blank()
  )

}

# 4. Export 📁 /data/ folder

### 🧹 Clear output directory
This is needed to ensure that only 2 files are written to the new version of the Dataset:
* **Data Set** reporting rate (only one way to calculate it, not parametrized as nothing to "decide" here)
* **Data Element** reporting rate: here there are 7 possible combinations of numerator times 3 possible combinatiosn of denominator.<br>
    These are too many optiosn to give to the incidence pipeline (the step that ingests this data), where these would need to be hardcoded in the pipeline module. When running the incidence pipeline, the user simply choses whether to use `"dataset"` or `"dataelement"`, and therefore there must be only one file for each option.<br>
  <s>However, we want to **preserve the info** on the choice of **numerator** and **denominator** in the **filename**. The import function used in incidence therefore only looks for the fixed pattern in the filename, and ignores the tags for numerator and denominator (e.g., "n-conf-susp-test", "d-dexrep").<//s>

In [None]:
# Cleanup
path_to_clear <- file.path(DATA_PATH, "reporting_rate")
files_to_delete <- list.files(path_to_clear, full.names = TRUE, recursive = TRUE)
unlink(files_to_delete, recursive = TRUE)
log_msg(glue::glue("🧹 Deleting all existing files from `{path_to_clear}`. Output of current pipeline run will replace output of previous run."))

### CSV

#### Build up file name for **data Element** method

In [None]:
# 🚨 Currently not in use! Keeping for future update to method 🚨 (GP 2025-08-29)

# Abbreviation for Data Elememnt chosen NUMERATOR
method_num = tolower(paste0("n-", paste(indicators_selected, collapse = "-")))
method_num


# Abbreviation for Data Elememnt chosen DENOMINATOR
if (DATAELEMENT_METHOD_DENOMINATOR == "DHIS2_EXPECTED_REPORTS") {
    method_den = "d-dexrep" # "d1"
} else if (DATAELEMENT_METHOD_DENOMINATOR == "ROUTINE_ACTIVE_FACILITIES") {
    method_den = "d-actfac" # "d2"
    } else if (DATAELEMENT_METHOD_DENOMINATOR == "PYRAMID_OPEN_FACILITIES") {
    method_den = "d-opnfcl" # "d2"
    }

method_den

#### Write function to assemble path based on method - for .**csv**

In [None]:
# write function
snt_write_csv <- function(x, output_data_path, method) {
  
  full_directory_path <- file.path(output_data_path, "reporting_rate")
  
  if (!dir.exists(full_directory_path)) {
    dir.create(full_directory_path, recursive = TRUE)
  }

  file_path <- file.path(full_directory_path, paste0(COUNTRY_CODE, "_reporting_rate_", method, ".csv")) 
  
  write_csv(x, file_path)

  log_msg(paste0("Exported : ", file_path))
}

#### Use function to export .csv files

In [None]:
# Method "Dataset"

if (REPORTING_RATE_METHOD == "DATASET") {
    snt_write_csv(x = reporting_rate_dataset, 
              output_data_path = DATA_PATH, 
              method = "dataset") 
}

In [None]:
# Method "Data Element"

if (REPORTING_RATE_METHOD == "DATAELEMENT") {
       snt_write_csv(x = reporting_rate_dataelement,
                     output_data_path = DATA_PATH, 
                     method = "dataelement")
}

### parquet

#### Write function to assemble path based on method - for .**parquet**

In [None]:
# write function
snt_write_parquet <- function(x, output_data_path, method) {
  
  full_directory_path <- file.path(output_data_path, "reporting_rate")
  
  if (!dir.exists(full_directory_path)) {
    dir.create(full_directory_path, recursive = TRUE)
  }

  file_path <- file.path(full_directory_path, paste0(COUNTRY_CODE, "_reporting_rate_", method, ".parquet")) 
  
  arrow::write_parquet(x, file_path)

  log_msg(paste0("Exported : ", file_path))
}

#### Use function to export .csv files

In [None]:
# Method "Dataset"

if (REPORTING_RATE_METHOD == "DATASET") {
    snt_write_parquet(x = reporting_rate_dataset,
                  output_data_path = DATA_PATH,
                  method = "dataset"
                 ) 
}

In [None]:
# Method "Data Element"

if (REPORTING_RATE_METHOD == "DATAELEMENT") {
       snt_write_parquet(x = reporting_rate_dataelement,
                  output_data_path = DATA_PATH,
                  method = "dataelement"
                 )
}