### **01 - Incidence of MLTC**
#### **01E - Manuscript outputs - trend**

**Imports**

In [1]:
# required imports

# requires blank line after last import


In [2]:
%%sparkr

if (!requireNamespace("PHEindicatormethods", quietly = TRUE)) {
  install.packages("PHEindicatormethods")
}
library(PHEindicatormethods)

**Parameter cell**

In [3]:
# parameter cell
incidence_schema = ""  # "mltc_incidence_outputs_v40_20230331"

# optional, can be blank


In [4]:
# Set parameters in Spark configuration with 'param.' prefix (for use in SQL cells)
spark.conf.set("param.incidence_schema", incidence_schema)


---

#### **01E - Manuscript outputs - trend**



This notebook extracts the rate of progression from 1 to 2+ conditions by financial year

**a - Create denominator person time temporary view**

In [5]:
%%sql

CREATE    OR REPLACE TEMPORARY VIEW overall_trend_denominator_combined AS
SELECT    financial_year,
          SUM(
          CASE
                    WHEN condition_count = 1 THEN person_years
                    ELSE NULL
          END
          ) AS person_years_1
FROM      ${param.incidence_schema}.mm_incidence_person_time m
GROUP BY  financial_year

**b - Create numerator temporary view, including percentile ages that transition occur**

In [6]:
%%sql

CREATE    OR REPLACE TEMPORARY VIEW overall_trend_numerator_combined AS
SELECT    financial_year,
          perc_05,
          perc_25,
          perc_50,
          perc_75,
          perc_95,
          COUNT(*) AS incidence_1_2_plus
FROM      (
          SELECT    financial_year,
                    PERCENTILE_CONT (0.05) within GROUP (
                    ORDER BY  age * 1.0
                    ) OVER (
                    PARTITION BY financial_year
                    ) AS perc_05,
                    PERCENTILE_CONT (0.25) within GROUP (
                    ORDER BY  age * 1.0
                    ) OVER (
                    PARTITION BY financial_year
                    ) AS perc_25,
                    PERCENTILE_CONT (0.5) within GROUP (
                    ORDER BY  age * 1.0
                    ) OVER (
                    PARTITION BY financial_year
                    ) AS perc_50,
                    PERCENTILE_CONT (0.75) within GROUP (
                    ORDER BY  age * 1.0
                    ) OVER (
                    PARTITION BY financial_year
                    ) AS perc_75,
                    PERCENTILE_CONT (0.95) within GROUP (
                    ORDER BY  age * 1.0
                    ) OVER (
                    PARTITION BY financial_year
                    ) AS perc_95
          FROM      ${param.incidence_schema}.mm_incidence_transitions_age
          WHERE     previous_condition_count = 1
          AND       condition_count >= 2
          ) x
GROUP BY  financial_year,
          perc_05,
          perc_25,
          perc_50,
          perc_75,
          perc_95
ORDER BY  financial_year,
          perc_05,
          perc_25,
          perc_50,
          perc_75,
          perc_95

**c - Create combined output**

In [7]:
%%sql

CREATE    OR REPLACE TEMPORARY VIEW combined_output AS

SELECT    d.financial_year,
          incidence_1_2_plus,
          person_years_1,
          (incidence_1_2_plus * 1.0) / (person_years_1 * 1.0) * 100 AS progression_rate_1_2_plus,
          perc_05,
          perc_25,
          perc_50,
          perc_75,
          perc_95
FROM      overall_trend_numerator_combined n
INNER     JOIN overall_trend_denominator_combined d on d.financial_year = n.financial_year

**d - Calculate confidence intervals using Byar's method**

This section uses `PHEindicatormethods` package `phe_rate` function.

Convert to RSpark DataFrame then R DataFrame

In [8]:
%%sparkr

df_combined_r <- sql("SELECT * FROM combined_output")

In [9]:
%%sparkr

r_combined <- collect(df_combined_r)

Calculate crude rate and confidence intervals

In [10]:
%%sparkr

# Calculate rates with confidence intervals
r_crude_rate_output <- phe_rate(
  data = r_combined,
  x = incidence_1_2_plus,
  n = person_years_1,
  multiplier = 100,
  confidence = 0.95
)

# Convert the R DataFrame to a SparkR DataFrame
df_r_crude_rate_output <- createDataFrame(r_crude_rate_output)

# Save as a temporary view
createOrReplaceTempView(df_r_crude_rate_output, "r_crude_rate_output_view")


**e - Apply small number suppression and export final table**

<blockquote style="color: #D8000C; background-color: #FFD2D2; padding: 10px; border-left: 6px solid #D8000C;">
  <strong>⚠️ Warning:</strong> DROP TABLE is currently commented out, as this table does not need to be recreated each time the incidence analysis is run.
</blockquote>

In [None]:
%%sql

--DROP TABLE IF EXISTS ${param.incidence_schema}.output_01E_incidence_results_trend

In [11]:
%%sql

CREATE    TABLE ${param.incidence_schema}.output_01E_incidence_results_trend USING PARQUET AS
SELECT    financial_year,
          CASE
                    WHEN person_years_1 BETWEEN 1 AND 7  THEN '***'
                    ELSE CAST(person_years_1 AS STRING)
          END AS person_years_1,
          CASE
                    WHEN incidence_1_2_plus BETWEEN 1 AND 7  THEN '***'
                    ELSE CAST(incidence_1_2_plus AS STRING)
          END AS incidence_1_2_plus,
          value as progression_rate_1_2_plus,
          lowercl as lower_cl_1_2,
          uppercl as upper_cl_1_2,
          perc_05,
          perc_25,
          perc_50,
          perc_75,
          perc_95
FROM      r_crude_rate_output_view
ORDER BY  financial_year

In [12]:
%%sql

SELECT    *
FROM      ${param.incidence_schema}.output_01E_incidence_results_trend