### **01 - Incidence of MLTC**
#### **01F - Manuscript outputs - progression rate by initial condition count**

**Imports**

In [1]:
# required importss

# requires blank line after last import


In [2]:
%%sparkr

if (!requireNamespace("PHEindicatormethods", quietly = TRUE)) {
  install.packages("PHEindicatormethods")
}
library(PHEindicatormethods)

**Parameter cell**

In [8]:
# parameter cell
incidence_schema = ""  # "mltc_incidence_outputs_v40_20230331"
segmentation_schema = ""  # "obh_segmentation_v40_20230331"
analysis_year = ""  # "2022/23"

# optional, can be blank


In [9]:
# Set parameters in Spark configuration with 'param.' prefix (for use in SQL cells)
spark.conf.set("param.incidence_schema", incidence_schema)
spark.conf.set("param.segmentation_schema", segmentation_schema)
spark.conf.set("param.analysis_year", analysis_year)


---

#### **01F - Manuscript outputs - progression rate by initial condition count**


This notebook extracts the rate of progression split by initial condition count

**a - Create denominator person time temporary view**

In [5]:
%%sql

CREATE    OR REPLACE TEMPORARY VIEW headline_condition_count_transitions_denominator AS
SELECT    condition_count,
          SUM(person_years) AS person_years
FROM      ${param.incidence_schema}.mm_incidence_person_time m
WHERE     financial_year = '${param.analysis_year}'
GROUP BY  condition_count

**b - Create numerator temporary view, including percentile ages that transition occur**

In [6]:
%%sql

CREATE    OR REPLACE TEMPORARY VIEW headline_condition_count_transitions_numerator AS
SELECT    previous_condition_count,
          perc_05,
          perc_25,
          perc_50,
          perc_75,
          perc_95,
          COUNT(*) AS incidence
FROM      (
          SELECT    previous_condition_count,
                    PERCENTILE_CONT (0.05) within GROUP (
                    ORDER BY  age * 1.0
                    ) OVER (
                    PARTITION BY previous_condition_count
                    ) AS perc_05,
                    PERCENTILE_CONT (0.25) within GROUP (
                    ORDER BY  age * 1.0
                    ) OVER (
                    PARTITION BY previous_condition_count
                    ) AS perc_25,
                    PERCENTILE_CONT (0.5) within GROUP (
                    ORDER BY  age * 1.0
                    ) OVER (
                    PARTITION BY previous_condition_count
                    ) AS perc_50,
                    PERCENTILE_CONT (0.75) within GROUP (
                    ORDER BY  age * 1.0
                    ) OVER (
                    PARTITION BY previous_condition_count
                    ) AS perc_75,
                    PERCENTILE_CONT (0.95) within GROUP (
                    ORDER BY  age * 1.0
                    ) OVER (
                    PARTITION BY previous_condition_count
                    ) AS perc_95
          FROM      ${param.incidence_schema}.mm_incidence_transitions_age
          WHERE     financial_year = '${param.analysis_year}'

          ) x
GROUP BY  previous_condition_count,
          perc_05,
          perc_25,
          perc_50,
          perc_75,
          perc_95
ORDER BY  previous_condition_count,
          perc_05,
          perc_25,
          perc_50,
          perc_75,
          perc_95

**c - Create combined output**

In [6]:
%%sql

CREATE    OR REPLACE TEMPORARY VIEW combined_output AS

SELECT    n.previous_condition_count,
          incidence,
          person_years,
          (incidence * 1.0) / (person_years * 1.0) * 100 AS progression_rate,
          perc_05,
          perc_25,
          perc_50,
          perc_75,
          perc_95
FROM      headline_condition_count_transitions_denominator d
INNER     JOIN headline_condition_count_transitions_numerator n on d.condition_count = n.previous_condition_count

**d - Calculate confidence intervals using Byar's method**

This section uses `PHEindicatormethods` package `phe_rate` function.

Convert to RSpark DataFrame then R DataFrame

In [7]:
%%sparkr

df_combined_r <- sql("SELECT * FROM combined_output")

In [8]:
%%sparkr

r_combined <- collect(df_combined_r)

Calculate crude rate and confidence intervals

In [11]:
%%sparkr

# Calculate rates with confidence intervals
r_crude_rate_output <- phe_rate(
  data = r_combined,
  x = incidence,
  n = person_years,
  multiplier = 100,
  confidence = 0.95
)

# Convert the R DataFrame to a SparkR DataFrame
df_r_crude_rate_output <- createDataFrame(r_crude_rate_output)

# Save as a temporary view
createOrReplaceTempView(df_r_crude_rate_output, "r_crude_rate_output_view")


**e - Apply small number suppression and export final table**

<blockquote style="color: #D8000C; background-color: #FFD2D2; padding: 10px; border-left: 6px solid #D8000C;">
  <strong>⚠️ Warning:</strong> DROP TABLE is currently commented out, as this table does not need to be recreated each time the incidence analysis is run.
</blockquote>

In [None]:
%%sql

--DROP TABLE IF EXISTS ${param.incidence_schema}.output_01F_incidence_results_by_initial_condition_count

In [12]:
%%sql

CREATE    TABLE ${param.incidence_schema}.output_01F_incidence_results_by_initial_condition_count USING PARQUET AS
SELECT    previous_condition_count,
          CASE
                    WHEN person_years BETWEEN 1 AND 7  THEN '***'
                    ELSE CAST(person_years AS STRING)
          END AS person_years,
          CASE
                    WHEN incidence BETWEEN 1 AND 7  THEN '***'
                    ELSE CAST(incidence AS STRING)
          END AS incidence,
          value as progression_rate,
          lowercl as lower_cl,
          uppercl as upper_cl,
          perc_05,
          perc_25,
          perc_50,
          perc_75,
          perc_95
FROM      r_crude_rate_output_view
ORDER BY  previous_condition_count

In [13]:
%%sql

SELECT    *
FROM      ${param.incidence_schema}.output_01F_incidence_results_by_initial_condition_count

Five year age band breakdown of numerator transitions (to reference in Results)

In [11]:
     
%%sql
SELECT    previous_condition_count,
          CASE
                    WHEN mm.age >= 90 THEN '90+'
                    ELSE quinary
          END AS age_band,
          CASE
                    WHEN COUNT(*) BETWEEN 1 AND 7  THEN '***'
                    ELSE COUNT(*)
          END AS incidence
FROM      ${param.incidence_schema}.mm_incidence_transitions_age mm
INNER     JOIN ${param.segmentation_schema}.dim_age a ON a.age = mm.age
WHERE     financial_year = '${param.analysis_year}'
GROUP BY  previous_condition_count,
          CASE
                    WHEN mm.age >= 90 THEN '90+'
                    ELSE quinary
          END
ORDER BY  previous_condition_count,
          CASE
                    WHEN mm.age >= 90 THEN '90+'
                    ELSE quinary
          END