-------------
🤌🏼 Points to discuss:
* **what we do want to plot here?** <br>
    Plot only what is produced by the pipeline (hence reflect choice of parameters from pipeline run) OR all the possible options (all output produced by all pipelines run so far, meaning whatever is writte the to most recent version of the Dataset?)
* **how to handle missing files?**: namely, situations in which files are not yet been produced. In this reporting rate case, if the user only runs the pipeline to produce the "Dataset" reporrting rate file,, then we cannot plot anything for the "Data Element" reporting rate as there is no file yet ...
  Atm this is handled with `if` logic, but should be made more elegant to avoid repeating the same code twice (for dataset and for dataelement)

-------------

🚧 To do:
* **Plots shouls be wrapped as functions (DRY code)**! Cuold save in .R file in this same location to `source()` only here (as plots are specifc to this notebook, no need to save in snt_utils.R)
* **Display _real_ data**: do **_not_ cap** reporting rate values at 1 (100%)!! It's important to visualize real full range if we want to qualitatively assess and compare different methods!
* <s>**fix object names**: `routine_data` is NOT routine data ... !!</s>
* When importing `reporting_rate_data`, try if possible to avoid using `tryCatch`, and use `log_msg(..., "warning")` instead (should simplify code and logic ... ). Idea is to **log a meaningful warning without making the pipeline fail** just becauase a file in the report nb is missing ... !

-------------

# Taux de Rapportage des Formations Sanitaires - Health Facility Reporting Rates

In [None]:
# Set SNT Paths
SNT_ROOT_PATH  <- "~/workspace"
CODE_PATH      <- file.path(SNT_ROOT_PATH, "code")
CONFIG_PATH    <- file.path(SNT_ROOT_PATH, "configuration")

# load util functions
source(file.path(CODE_PATH, "snt_utils.r"))

# List required packages 
required_packages <- c("dplyr", "tidyr", "terra", "ggplot2", "stringr", "lubridate", "viridis", "patchwork", "zoo", "purrr", "arrow", "sf", "reticulate", "leaflet")

# Execute function
install_and_load(required_packages)

# Set environment to load openhexa.sdk from the right environment
Sys.setenv(RETICULATE_PYTHON = "/opt/conda/bin/python")
reticulate::py_config()$python
openhexa <- import("openhexa.sdk")

# Load SNT config
config_json <- tryCatch({ jsonlite::fromJSON(file.path(CONFIG_PATH, "SNT_config.json"))},
    error = function(e) {
        msg <- paste0("Error while loading configuration", conditionMessage(e))  
        cat(msg)   
        stop(msg) 
    })

# Required environment for the sf packages
Sys.setenv(PROJ_LIB = "/opt/conda/share/proj")
Sys.setenv(GDAL_DATA = "/opt/conda/share/gdal")

In [None]:
# Configuration variables
DATASET_NAME <- config_json$SNT_DATASET_IDENTIFIERS$DHIS2_REPORTING_RATE
COUNTRY_CODE <- config_json$SNT_CONFIG$COUNTRY_CODE
ADM_2 <- toupper(config_json$SNT_CONFIG$DHIS2_ADMINISTRATION_2)

In [None]:
# print function
printdim <- function(df, name = deparse(substitute(df))) {
  cat("Dimensions of", name, ":", nrow(df), "rows x", ncol(df), "columns\n\n")
}

In [None]:
# import DHIS2 shapes data
DATASET_DHIS2 <- config_json$SNT_DATASET_IDENTIFIERS$DHIS2_DATASET_FORMATTED
shapes_data <- tryCatch({ get_latest_dataset_file_in_memory(DATASET_DHIS2, paste0(COUNTRY_CODE, "_shapes.geojson")) }, 
                  error = function(e) {
                      msg <- paste("Error while loading DHIS2 Shapes data for: " , COUNTRY_CODE, conditionMessage(e))
                      cat(msg)
                      stop(msg)
                      })

pyramid_data <- tryCatch({ get_latest_dataset_file_in_memory(DATASET_DHIS2, paste0(COUNTRY_CODE, "_pyramid.parquet")) }, 
                  error = function(e) {
                      msg <- paste("Error while loading DHIS2 Shapes data for: " , COUNTRY_CODE, conditionMessage(e))
                      cat(msg)
                      stop(msg)
                      })
pyramid_data <- pyramid_data %>%
  select(LEVEL_3_ID, LEVEL_3_NAME, LEVEL_2_NAME, LEVEL_1_NAME) %>%
  distinct()

In [None]:
head(pyramid_data)

## A) Taux de Soumission des Rapports / Dataset Reporting Rate

**[FR]**
Cette section analyse le **taux de soumission des rapports**, tel que calculé dans le Système National d’Information Sanitaire (SNIS). Ce taux est défini comme le nombre de rapports effectivement reçus (rapports actuels) divisé par le nombre de rapports attendus (rapports attendus) sur une période donnée. Les rapports attendus correspondent au nombre de formations sanitaires qui, selon les paramètres du SNIS, devaient soumettre un rapport. Cet indicateur permet d’évaluer si les structures ont transmis les rapports requis, sans tenir compte du contenu ou de l’exhaustivité des données saisies.

**[EN]**
This section analyzes the **dataset reporting rate**, as calculated in the Health Management Information System (HMIS). The rate is defined as the number of reports actually submitted (actual reports) divided by the number of reports expected (expected reports) over a given period. Expected reports refer to the number of health facilities that were required to report according to SNIS configuration. This indicator helps assess whether health facilities submitted their required reports, regardless of the content or completeness of the data within those reports.

In [None]:
# # import data

# # OLD: "routine_data "
# reporting_rate_data <- tryCatch({ get_latest_dataset_file_in_memory(DATASET_NAME, paste0(COUNTRY_CODE, "_reporting_rate_dataset.parquet")) }, 
#                   error = function(e) {
#                       msg <- paste("Error while loading seasonality file for: " , COUNTRY_CODE, conditionMessage(e))
#                       cat(msg)
#                       stop(msg)
#                       })

# ADM_2_ID <- gsub("_NAME", "_ID", ADM_2)
# reporting_rate_data <- reporting_rate_data %>%
#   left_join(pyramid_data, by = setNames(ADM_2_ID, "ADM2_ID"))

# printdim(reporting_rate_data)

In [None]:
# head(reporting_rate_data, 3)

In [None]:
# Import from Dataset

reporting_rate_data <- tryCatch({
    # Attempt to load the dataset
    get_latest_dataset_file_in_memory(
        DATASET_NAME, 
        paste0(COUNTRY_CODE, "_reporting_rate_dataset.parquet")
    )
}, 
error = function(e) {
    # If an error occurs, log a warning
    msg <- paste("[WARNING] Warning: Could not load reporting rate file for:", COUNTRY_CODE, ". Proceeding with empty data. Error:", conditionMessage(e))
    log_msg(msg, level = "warning")
    
    # IMPORTANT: Return an empty tibble with the correct structure SO PIPELINE DOES NOT FAIL
    return(
        tibble(
            YEAR = double(),
            MONTH = double(),
            ADM2_ID = character(),
            REPORTING_RATE = double()
        )
    )
})

# Add _NAME cols from pyramid
if (nrow(reporting_rate_data) != 0) {

    ADMIN_2 <- toupper(config_json$SNT_CONFIG$DHIS2_ADMINISTRATION_2)
    ADMIN_2_LEVEL <- str_replace(ADMIN_2, "NAME", "ID")
    
    reporting_rate_data <- reporting_rate_data %>%
    # left_join(pyramid_data, by = c("ADM2_ID" = "LEVEL_3_ID")) # old
    left_join(pyramid_data, by = c("ADM2_ID" = ADMIN_2_LEVEL))
}

### Plot: Heatmap

In [None]:
if (nrow(reporting_rate_data) != 0) {

# Prepare date column + category
reporting_rate_data <- reporting_rate_data %>%
  mutate(
    date = as.Date(paste0(YEAR, "-", MONTH, "-01")),
    ADM2_ID = factor(ADM2_ID),
    # reporting_pct = pmin(REPORTING_RATE, 1) * 100, # `pmin()` caps to 100%
    reporting_pct = REPORTING_RATE * 100,
    category = cut(
      reporting_pct,
      # breaks = c(-Inf, 50, 80, 90, Inf),
      # labels = c("<50", "50–80", "80–90", "≥90"),
      # GP 2025-08-07 added this, but double check (seems too many >100!!)
      breaks = c(-Inf, 50, 80, 90, 100, Inf),
      labels = c("<50", "50–80", "80–90", "90-100", ">100"),
      right = TRUE # FALSE: intervals are left-closed: lower bound is included
    )
  )

# Define color scale
reporting_colors <- c(
  "<50" = "#d7191c",     # red
  "50–80" = "#fdae61",   # orange
  "80–90" = "#ffffbf",   # yellow
  "90-100" = "#1a9641",  # green
  ">100" = "darkgreen"
)

# Plot heatmap
options(repr.plot.width = 18, repr.plot.height = 15)
ggplot(reporting_rate_data, aes(x = date, y = LEVEL_3_NAME, fill = category)) +
  geom_tile() +
  scale_fill_manual(
    values = reporting_colors,
    name = "Taux de soumission (%)"
  ) +
  labs(
    title = "Taux de soumission des rapports mensuels par district sanitaire",
    subtitle = "Monthly Dataset Reporting Rate by Health District",
    x = "Mois - Month",
    y = "District Sanitaire - Health District"
  ) +
  theme_minimal(base_size = 16) +
  theme(
    axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5, size = 16),
    axis.text.y = element_text(size = 9),
    plot.title = element_text(face = "bold", hjust = 0.5, size = 20),
    plot.subtitle = element_text(hjust = 0.5, size = 16),
    # legend.position = "right",
    legend.position = "top",
    panel.grid = element_blank()
  )

}

In [None]:
if (nrow(reporting_rate_data) != 0) {

# Prepare the data
reporting_rate_data_box <- reporting_rate_data %>%
  mutate(
    MONTH = as.integer(MONTH),
    YEAR = as.factor(YEAR),
    # reporting_pct = pmin(REPORTING_RATE, 1) * 100
    reporting_pct = REPORTING_RATE * 100
  )

# Month labels in French
month_labels_fr <- c(
  "Janv", "Févr", "Mars", "Avril", "Mai", "Juin",
  "Juil", "Août", "Sept", "Oct", "Nov", "Déc"
)

# Plot
options(repr.plot.width = 18, repr.plot.height = 15)
ggplot(reporting_rate_data_box, aes(x = factor(MONTH), y = reporting_pct, fill = YEAR)) +
  geom_boxplot(outlier.size = 0.8, outlier.alpha = 0.4) +
  scale_x_discrete(labels = month_labels_fr) +
  # scale_y_continuous(name = "Taux de soumission (%)", limits = c(0, 100)) +
  scale_y_continuous(name = "Taux de soumission (%)") +
  labs(
    title = "Distribution mensuelle du taux de soumission des rapports",
    subtitle = "Monthly Distribution of Dataset Reporting Rate by Health District (2021–2024)",
    x = "Mois",
    fill = "Année"
  ) +
  theme_minimal(base_size = 16) +
  theme(
    plot.title = element_text(face = "bold", size = 20),
    plot.subtitle = element_text(size = 16),
    legend.position = "bottom"
  )

}

In [None]:
if (nrow(reporting_rate_data) != 0) {

# Step 1: Aggregate to annual reporting rate per district
annual_data <- reporting_rate_data %>%
  group_by(YEAR, ADM2_ID) %>%
  summarise(reporting_rate = mean(REPORTING_RATE, na.rm = TRUE)) %>%
  ungroup()

# Step 2: Join with spatial data (assuming 'map_sf' contains geometry and ADM2_ID)
map_data <- shapes_data %>%
  left_join(annual_data, by = "ADM2_ID")

# Step 3: Bin the reporting rate into categories
map_data <- map_data %>%
  mutate(
    reporting_cat = case_when(
      reporting_rate < 0.5  ~ "<50",
      reporting_rate < 0.8  ~ "50-79", # "50-80"
      reporting_rate < 0.9  ~ "80-89", # "80-90"
      reporting_rate >= 0.9 ~ ">=90",
      TRUE ~ NA_character_
    ),
    reporting_cat = factor(reporting_cat, levels = c("<50", "50-79", "80-89", ">=90")) #  levels = c("<50", "50-80", "80-90", ">=90")
  )

# Step 4: Define colors
reporting_colors <- c(
  "<50"   = "#d7191c",
  "50-79" = "#fdae61",
  "80-89" = "#ffffbf",
  ">=90"  = "#1a9641"
)

# Step 5: Plot
options(repr.plot.width = 18, repr.plot.height = 10)
ggplot(map_data) +
  geom_sf(aes(fill = reporting_cat), color = "white", size = 0.2) +
  facet_wrap(~ YEAR) +
  scale_fill_manual(values = reporting_colors, name = "Taux de soummision (%)") +
  labs(
    title = "Taux de soumission des rapports annuels par district sanitaire",
    subtitle = "Annual Dataset Reporting Completeness by Health District"
  ) +
  theme_minimal(base_size = 16) +
  theme(
    legend.position = "right",
    strip.text = element_text(face = "bold", size = 16),
    plot.title = element_text(face = "bold")
  )

}

In [None]:
if (nrow(reporting_rate_data) != 0) {

# Step 1: Compute mean reporting rate per ADM2_ID over all years
mean_reporting_stats <- map_data %>%
  group_by(ADM2_ID) %>%
  summarise(
    reporting_rate = mean(reporting_rate, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  mutate(
    reporting_cat = case_when(
      reporting_rate < 0.5  ~ "<50",
      reporting_rate < 0.8  ~ "50-80",
      reporting_rate < 0.9  ~ "80-90",
      reporting_rate >= 0.9 ~ ">=90",
      TRUE ~ NA_character_
    )
  )

# Set correct factor levels to match legend
mean_reporting_stats$reporting_cat <- factor(
  mean_reporting_stats$reporting_cat,
  levels = c("<50", "50-80", "80-90", ">=90")
)

# Step 2: Join with shapes (drop geometry to avoid spatial join conflict)
mean_reporting_map <- shapes_data %>%
  left_join(st_drop_geometry(mean_reporting_stats), by = "ADM2_ID") %>%
  st_as_sf()

# Step 3: Define custom color scale
reporting_colors <- c(
  "<50"   = "#d7191c",   # red
  "50-80" = "#fdae61",   # orange
  "80-90" = "#ffffbf",   # yellow
  ">=90"  = "#1a9641"    # green
)

# Step 4: Plot
options(repr.plot.width = 20, repr.plot.height = 10)
ggplot(mean_reporting_map) +
  geom_sf(aes(fill = reporting_cat), color = "white", size = 0.2) +
  scale_fill_manual(
    values = reporting_colors,
    name = "Taux de soumission (%)",
    drop = FALSE
  ) +
  labs(
    title = "Taux moyen de soumission des rapports (toutes années confondues)",
    subtitle = "Mean Annual Dataset Reporting Rate (All Years Combined)"
  ) +
  theme_minimal(base_size = 16) +
  theme(
    legend.position = "right",
    plot.title = element_text(face = "bold"),
    plot.subtitle = element_text()
  )

}

## B) Taux de rapportage des éléments de données: cas confirmés / Data element Reporting Rate: confirmed cases


In [None]:
# # import data
# # was: routine_data
# reporting_rate_data <- tryCatch({ get_latest_dataset_file_in_memory(DATASET_NAME, paste0(COUNTRY_CODE, "_reporting_rate_dataelement.parquet")) }, 
#                   error = function(e) {
#                       msg <- paste("Error while loading seasonality file for: " , COUNTRY_CODE, conditionMessage(e))
#                       # cat(msg)
#                       log_msg(msg, level = "warning") # GP 20250908
#                       # stop(msg) # GP 20250908
#                       })

# reporting_rate_data <- reporting_rate_data %>%
#   left_join(pyramid_data, by = c("ADM2_ID" = "LEVEL_3_ID"))

# printdim(reporting_rate_data)

### Import and format data

In [None]:
# config_json$SNT_CONFIG$DHIS2_ADMINISTRATION_2

In [None]:
# ADMIN_2 <- toupper(config_json$SNT_CONFIG$DHIS2_ADMINISTRATION_2)
# ADMIN_2

In [None]:
# ADMIN_2_LEVEL <- str_replace(ADMIN_2, "NAME", "ID")
# ADMIN_2_LEVEL

In [None]:
# Import from Dataset

reporting_rate_data <- tryCatch({
    # Attempt to load the dataset
    get_latest_dataset_file_in_memory(
        DATASET_NAME, 
        paste0(COUNTRY_CODE, "_reporting_rate_dataelement.parquet")
    )
}, 
error = function(e) {
    # If an error occurs, log a warning
    msg <- paste("[WARNING] Warning: Could not load reporting rate file for:", COUNTRY_CODE, ". Proceeding with empty data. Error:", conditionMessage(e))
    log_msg(msg, level = "warning")
    
    # IMPORTANT: Return an empty tibble with the correct structure SO PIPELINE DOES NOT FAIL
    return(
        tibble(
            YEAR = double(),
            MONTH = double(),
            ADM2_ID = character(),
            REPORTING_RATE = double()
        )
    )
})

# Add _NAME cols from pyramid
if (nrow(reporting_rate_data) != 0) {

    ADMIN_2 <- toupper(config_json$SNT_CONFIG$DHIS2_ADMINISTRATION_2)
    ADMIN_2_LEVEL <- str_replace(ADMIN_2, "NAME", "ID")
    
    reporting_rate_data <- reporting_rate_data %>%
    # left_join(pyramid_data, by = c("ADM2_ID" = "LEVEL_3_ID")) # old
    left_join(pyramid_data, by = c("ADM2_ID" = ADMIN_2_LEVEL))
}

In [None]:
# reporting_rate_data

In [None]:
# if (nrow(reporting_rate_data) != 0) {
#     reporting_rate_data <- reporting_rate_data %>%
#     left_join(pyramid_data, by = c("ADM2_ID" = "LEVEL_3_ID"))
# }

### Plot: Heatmap

In [None]:
# Prepare date column + category

if (nrow(reporting_rate_data) != 0) {

reporting_rate_data <- reporting_rate_data %>%
  mutate(
    date = as.Date(paste0(YEAR, "-", MONTH, "-01")),
    ADM2_ID = factor(ADM2_ID),
    # reporting_pct = pmin(REPORTING_RATE, 1) * 100,
    reporting_pct = REPORTING_RATE * 100,
    category = cut(
      reporting_pct,
      # breaks = c(-Inf, 50, 80, 90, Inf),
      # labels = c("<50", "50–80", "80–90", "≥90"),
      # right = FALSE
      breaks = c(-Inf, 50, 80, 90, 100, Inf),
      labels = c("<50", "50–80", "80–90", "90-100", ">100"),
      right = TRUE
    )
  )

# # Define color scale
# reporting_colors <- c(
#   "<50" = "#d7191c",     # red
#   "50–80" = "#fdae61",   # orange
#   "80–90" = "#ffffbf",   # yellow
#   "≥90" = "#1a9641"      # green
# )

# Define color scale
reporting_colors <- c(
  "<50" = "#d7191c",     # red
  "50–80" = "#fdae61",   # orange
  "80–90" = "#ffffbf",   # yellow
  "90-100" = "#1a9641",  # green
  ">100" = "darkgreen" # "darkgreen"
)

# Plot heatmap
options(repr.plot.width = 18, repr.plot.height = 15)
ggplot(reporting_rate_data, aes(x = date, y = LEVEL_3_NAME, fill = category)) +
  geom_tile() +
  scale_fill_manual(
    values = reporting_colors,
    name = "Taux de soumission (%)"
  ) +
  labs(
    title = "Taux de rapportage mensuels par district sanitaire",
    subtitle = "Monthly Data Element Reporting Rate by Health District",
    x = "Mois - Month",
    y = "District Sanitaire - Health District"
  ) +
  theme_minimal(base_size = 16) +
  theme(
    axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5, size = 16),
    axis.text.y = element_text(size = 9),
    plot.title = element_text(face = "bold", hjust = 0.5, size = 20),
    plot.subtitle = element_text(hjust = 0.5, size = 16),
    legend.position = "top", # "right"
    panel.grid = element_blank()
  )

}

### Plot: boxplot

In [None]:
# Prepare the data

if (nrow(reporting_rate_data) != 0) {
    
reporting_rate_data_box <- reporting_rate_data %>%
  mutate(
    MONTH = as.integer(MONTH),
    YEAR = as.factor(YEAR),
    # reporting_pct = pmin(REPORTING_RATE, 1) * 100 # `pmin()` caps values to 1 (then, 100%)
    reporting_pct = REPORTING_RATE * 100
  )

# Month labels in French
month_labels_fr <- c(
  "Janv", "Févr", "Mars", "Avril", "Mai", "Juin",
  "Juil", "Août", "Sept", "Oct", "Nov", "Déc"
)

}

In [None]:
if (nrow(reporting_rate_data) != 0) {

# Plot
options(repr.plot.width = 18, repr.plot.height = 15)
ggplot(reporting_rate_data_box, aes(x = factor(MONTH), y = reporting_pct, fill = YEAR)) +
  geom_boxplot(outlier.size = 0.8, outlier.alpha = 0.4) +
  scale_x_discrete(labels = month_labels_fr) +
  # scale_y_continuous(name = "Taux de soumission (%)", limits = c(0, 100)) +
  scale_y_continuous(name = "Taux de soumission (%)") +
  labs(
    title = "Distribution mensuelle du taux de rapportage",
    subtitle = "Monthly Distribution of Data Element Reporting Rate by Health District (2021–2024)",
    x = "Mois",
    fill = "Année"
  ) +
  theme_minimal(base_size = 16) +
  theme(
    plot.title = element_text(face = "bold", size = 20),
    plot.subtitle = element_text(size = 16),
    legend.position = "bottom"
  )

}

### Plot: choropleth

In [None]:
if (nrow(reporting_rate_data) != 0) {

# Step 1: Aggregate to annual reporting rate per district   
annual_data <- reporting_rate_data %>%
  group_by(YEAR, ADM2_ID) %>%
  summarise(reporting_rate = mean(REPORTING_RATE, na.rm = TRUE)) %>%
  ungroup()

# Step 2: Join with spatial data (assuming 'map_sf' contains geometry and ADM2_ID)
map_data <- shapes_data %>%
  left_join(annual_data, by = "ADM2_ID")

# Step 3: Bin the reporting rate into categories
map_data <- map_data %>%
  mutate(
    reporting_cat = case_when(
      reporting_rate < 0.5  ~ "<50",
      reporting_rate < 0.8  ~ "50-80",
      reporting_rate < 0.9  ~ "80-90",
      reporting_rate >= 0.9 ~ ">=90",
      TRUE ~ NA_character_
    ),
    reporting_cat = factor(reporting_cat, levels = c("<50", "50-80", "80-90", ">=90"))
  )

# Step 4: Define colors
reporting_colors <- c(
  "<50"   = "#d7191c",
  "50-80" = "#fdae61",
  "80-90" = "#ffffbf",
  ">=90"  = "#1a9641"
)

}

In [None]:
if (nrow(reporting_rate_data) != 0) {

# Step 5: Plot
options(repr.plot.width = 18, repr.plot.height = 10)
ggplot(map_data) +
  geom_sf(aes(fill = reporting_cat), color = "white", size = 0.2) +
  facet_wrap(~ YEAR) +
  scale_fill_manual(values = reporting_colors, name = "Taux de soummision (%)") +
  labs(
    title = "Taux de rapportage des éléments de donnée annuels par district sanitaire",
    subtitle = "Annual Data element Reporting Completeness by Health District"
  ) +
  theme_minimal(base_size = 16) +
  theme(
    legend.position = "right",
    strip.text = element_text(face = "bold", size = 16),
    plot.title = element_text(face = "bold")
  )

}

### Plot: choropleth 2

In [None]:
if (nrow(reporting_rate_data) != 0) {

# Step 1: Compute mean reporting rate per ADM2_ID over all years
mean_reporting_stats <- map_data %>%
  group_by(ADM2_ID) %>%
  summarise(
    reporting_rate = mean(reporting_rate, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  mutate(
    reporting_cat = case_when(
      reporting_rate < 0.5  ~ "<50",
      reporting_rate < 0.8  ~ "50-80",
      reporting_rate < 0.9  ~ "80-90",
      reporting_rate >= 0.9 ~ ">=90",
      TRUE ~ NA_character_
    )
  )

# Set correct factor levels to match legend
mean_reporting_stats$reporting_cat <- factor(
  mean_reporting_stats$reporting_cat,
  levels = c("<50", "50-80", "80-90", ">=90")
)

# Step 2: Join with shapes (drop geometry to avoid spatial join conflict)
mean_reporting_map <- shapes_data %>%
  left_join(st_drop_geometry(mean_reporting_stats), by = "ADM2_ID") %>%
  st_as_sf()

# Step 3: Define custom color scale
reporting_colors <- c(
  "<50"   = "#d7191c",   # red
  "50-80" = "#fdae61",   # orange
  "80-90" = "#ffffbf",   # yellow
  ">=90"  = "#1a9641"    # green
)

}

In [None]:
if (nrow(reporting_rate_data) != 0) {

# Step 4: Plot
options(repr.plot.width = 20, repr.plot.height = 10)
ggplot(mean_reporting_map) +
  geom_sf(aes(fill = reporting_cat), color = "white", size = 0.2) +
  scale_fill_manual(
    values = reporting_colors,
    name = "Taux de soumission (%)",
    drop = FALSE
  ) +
  labs(
    title = "Taux moyen de rapportage (toutes années confondues)",
    subtitle = "Mean Annual Data Element Reporting Rate (All Years Combined)"
  ) +
  theme_minimal(base_size = 16) +
  theme(
    legend.position = "right",
    plot.title = element_text(face = "bold"),
    plot.subtitle = element_text()
  )

}