### **01 - Incidence of MLTC**
#### **01D01 - Manuscript outputs - age standardisation inputs**

**Imports**

In [1]:
# required imports

# requires blank line after last import


**Parameter cell**

In [2]:
# parameter cell
incidence_schema = ""  # "mltc_incidence_outputs_v40_20230331"
analysis_year = ""  # "2022/23"

# optional, can be blank


In [3]:
# Set parameters in Spark configuration with 'param.' prefix (for use in SQL cells)
spark.conf.set("param.incidence_schema", incidence_schema)
spark.conf.set("param.analysis_year", analysis_year)


---

#### **01D01 - Manuscript outputs - age standardisation inputs**

This section creates an input table for age standardisation of gender, IMD and ethnicity outputs.

**a - Aggregate numerator and denominator**

This first query creates a temporary view aggregating the denominator and numerator by:
- Gender
- Age band (10 year bandings)
- IMD quintile
- Ethnicity (Census 2011)
- IMD quintile

This is restricted to base condition counts from 0 to 2 (as these are the only breakdowns required for the main results table)

In [4]:
%%sql

CREATE    OR REPLACE TEMPORARY VIEW age_standardisation_incidence_base AS

SELECT    py.*,
          i.progression_incidence
FROM      (
          SELECT    gender_description,
                    age_band,
                    COALESCE(imd_quintile,'Unknown') as imd_quintile,
                    COALESCE(census_2011_ethnic_group,'Unknown') as census_2011_ethnic_group,
                    condition_count,
                    SUM(person_years) AS person_years
          FROM      ${param.incidence_schema}.mm_incidence_person_time m
          WHERE     financial_year = '${param.analysis_year}'
          AND       condition_count <= 2
          GROUP BY  gender_description,
                    age_band,
                    COALESCE(imd_quintile,'Unknown'),
                    COALESCE(census_2011_ethnic_group,'Unknown'),
                    condition_count
          ) py
LEFT      OUTER JOIN (
          SELECT    gender_description,
                    age_band,
                    COALESCE(imd_quintile,'Unknown') as imd_quintile,
                    COALESCE(census_2011_ethnic_group,'Unknown') as census_2011_ethnic_group,
                    previous_condition_count,
                    COUNT(*) AS progression_incidence
          FROM      ${param.incidence_schema}.mm_incidence_transitions_age m
          WHERE     financial_year = '${param.analysis_year}'
          AND       previous_condition_count <= 2
          GROUP BY  gender_description,
                    age_band,
                    COALESCE(imd_quintile,'Unknown'),
                    COALESCE(census_2011_ethnic_group,'Unknown'),
                    previous_condition_count
          ) i ON i.gender_description = py.gender_description
AND       i.age_band = py.age_band
AND       i.imd_quintile = py.imd_quintile
AND       i.census_2011_ethnic_group = py.census_2011_ethnic_group
AND       i.previous_condition_count = py.condition_count


**b - Create standard population**

This section creates the standard population by 10 year age bandings, to be used for age standardisation.

In this case, the selected standard population is people with 0 conditions (in person years, for consistency with the rest of the analysis).

In [5]:
%%sql

CREATE    OR REPLACE TEMPORARY VIEW age_standardisation_standard_population AS
SELECT    age_band,
          SUM(person_years) AS standard_pop_person_years
FROM      ${param.incidence_schema}.mm_incidence_person_time m
WHERE     financial_year = '${param.analysis_year}'
AND       condition_count = 0
GROUP BY  age_band
ORDER BY  age_band

**c - Calculate incidence rates**

This section calculates age band-specific incidence rates by base condition count (from 0 to 2) for:
- Whole population
- Gender breakdown
- Gender and age
- Gender and ethnicity
- Gender and IMD
- IMD/ethnicity combined
- Gender, IMD and ethnicity combined

In [6]:
%%sql

CREATE    OR REPLACE TEMPORARY VIEW age_standardisation_incidence_aggregate AS
-- whole population
SELECT    'NA' AS gender_description,
          'NA' AS breakdown_type,
          'NA' AS socio_demographic_breakdown,
          b.age_band,
          b.condition_count,
          SUM(b.progression_incidence) AS progression_incidence,
          SUM(b.person_years) AS person_years,
          SUM(b.progression_incidence) / SUM(b.person_years) AS incidence_rate
FROM      age_standardisation_incidence_base b
GROUP BY  b.age_band,
          b.condition_count
UNION ALL
-- gender breakdown
SELECT    b.gender_description,
          'NA' AS breakdown_type,
          'NA' AS socio_demographic_breakdown,
          b.age_band,
          b.condition_count,
          SUM(b.progression_incidence) AS progression_incidence,
          SUM(b.person_years) AS person_years,
          SUM(b.progression_incidence) / SUM(b.person_years) AS incidence_rate
FROM      age_standardisation_incidence_base b
WHERE     gender_description NOT IN ('NOT KNOWN', 'NOT SPECIFIED')
GROUP BY  b.gender_description,
          b.age_band,
          b.condition_count
UNION ALL
-- gender and age breakdown
SELECT    b.gender_description,
          'Age' AS breakdown_type,
          age_band AS socio_demographic_breakdown, -- age_band used for breakdown
          b.age_band, -- age_band used for standardisation (same as above in this case)
          b.condition_count,
          SUM(b.progression_incidence) AS progression_incidence,
          SUM(b.person_years) AS person_years,
          SUM(b.progression_incidence) / SUM(b.person_years) AS incidence_rate
FROM      age_standardisation_incidence_base b
WHERE     gender_description NOT IN ('NOT KNOWN', 'NOT SPECIFIED')
GROUP BY  b.gender_description,
          b.age_band,
          b.condition_count
UNION ALL
-- gender and ethnicity breakdown
SELECT    b.gender_description,
          'Ethnicity' AS breakdown_type,
          census_2011_ethnic_group AS socio_demographic_breakdown,
          b.age_band,
          b.condition_count,
          SUM(b.progression_incidence) AS progression_incidence,
          SUM(b.person_years) AS person_years,
          SUM(b.progression_incidence) / SUM(b.person_years) AS incidence_rate
FROM      age_standardisation_incidence_base b
WHERE     gender_description NOT IN ('NOT KNOWN', 'NOT SPECIFIED')
GROUP BY  b.gender_description,
          b.census_2011_ethnic_group,
          b.age_band,
          b.condition_count
UNION ALL
-- gender and IMD breakdown
SELECT    b.gender_description,
          'IMD' AS breakdown_type,
          imd_quintile AS socio_demographic_breakdown,
          b.age_band,
          b.condition_count,
          SUM(b.progression_incidence) AS progression_incidence,
          SUM(b.person_years) AS person_years,
          SUM(b.progression_incidence) / SUM(b.person_years) AS incidence_rate
FROM      age_standardisation_incidence_base b
WHERE     gender_description NOT IN ('NOT KNOWN', 'NOT SPECIFIED')
GROUP BY  b.gender_description,
          b.imd_quintile,
          b.age_band,
          b.condition_count
UNION ALL
-- IMD/ethnicity breakdown
SELECT    'NA' AS gender_description,
          'IMD and Ethnicity' AS breakdown_type,
          CONCAT(b.imd_quintile, ' / ', b.census_2011_ethnic_group) AS socio_demographic_breakdown,
          b.age_band,
          b.condition_count,
          SUM(b.progression_incidence) AS progression_incidence,
          SUM(b.person_years) AS person_years,
          SUM(b.progression_incidence) / SUM(b.person_years) AS incidence_rate
FROM      age_standardisation_incidence_base b
WHERE     gender_description NOT IN ('NOT KNOWN', 'NOT SPECIFIED')
GROUP BY  CONCAT(b.imd_quintile, ' / ', b.census_2011_ethnic_group),
          b.age_band,
          b.condition_count
UNION ALL
-- Gender, IMD/ethnicity breakdown
SELECT    b.gender_description AS gender_description,
          'Gender, IMD and Ethnicity' AS breakdown_type,
          CONCAT(b.imd_quintile, ' / ', b.census_2011_ethnic_group) AS socio_demographic_breakdown,
          b.age_band,
          b.condition_count,
          SUM(b.progression_incidence) AS progression_incidence,
          SUM(b.person_years) AS person_years,
          SUM(b.progression_incidence) / SUM(b.person_years) AS incidence_rate
FROM      age_standardisation_incidence_base b
WHERE     gender_description NOT IN ('NOT KNOWN', 'NOT SPECIFIED')
GROUP BY  b.gender_description,
          CONCAT(b.imd_quintile, ' / ', b.census_2011_ethnic_group),
          b.age_band,
          b.condition_count

**d - Create output table with standard population applied**

This section applies the age-specific standard population to each sociodemographic breakdown of data.

This will be fed into an RSpark notebook to calculate standardised rates and confidence intervals.

<blockquote style="color: #D8000C; background-color: #FFD2D2; padding: 10px; border-left: 6px solid #D8000C;">
  <strong>⚠️ Warning:</strong> DROP TABLE is currently commented out, as this table does not need to be recreated each time the incidence analysis is run.
</blockquote>

In [7]:
%%sql

-- DROP TABLE IF EXISTS ${param.incidence_schema}.mm_incidence_age_standardisation_input

In [8]:
%%sql 

CREATE TABLE ${param.incidence_schema}.mm_incidence_age_standardisation_input USING PARQUET AS

SELECT    a.gender_description,
          a.breakdown_type,
          a.socio_demographic_breakdown,
          a.age_band,
          a.condition_count as initial_condition_count,
          a.progression_incidence,
          a.person_years,
          a.incidence_rate,
          s.standard_pop_person_years
FROM      age_standardisation_incidence_aggregate a
INNER     JOIN age_standardisation_standard_population s ON s.age_band = a.age_band
ORDER BY  CASE
                    WHEN gender_description = 'NA' THEN 0
                    ELSE gender_description
          END,
          CASE
                    WHEN breakdown_type = 'NA' THEN 0
                    ELSE breakdown_type
          END,
          CASE
                    WHEN socio_demographic_breakdown = 'NA' THEN 0
                    ELSE socio_demographic_breakdown
          END,
          age_band,
          initial_condition_count