### **01 - Incidence of MLTC**
#### **01C01 - Manuscript outputs - main table progression rates**

This notebook creates the inputs for 01C02, which calculates Progression Rate Ratios (PRRs) and applies confidence intervals to both rates and PRRs

**Imports**

In [1]:
# required imports

# requires blank line after last import


**Parameter cell**

In [2]:
# parameter cell
incidence_schema = ""  # "mltc_incidence_outputs_v40_20230331"
analysis_year = ""  # "2022/23"
segmentation_schema = ""  # "obh_segmentation_v40_20230331"

# optional, can be blank


In [3]:
# Set parameters in Spark configuration with 'param.' prefix (for use in SQL cells)
spark.conf.set("param.incidence_schema", incidence_schema)
spark.conf.set("param.analysis_year", analysis_year)
spark.conf.set("param.segmentation_schema", segmentation_schema)


---

#### **01C01 - Manuscript outputs - main table progression rates**


**2022/23 main descriptive table - incidence of 1+, 2+, 3+ conditions overall and by socio-demographic breakdowns**

Separate queries are included below (and unioned together) for each combination of breakdowns required for the main manuscript results table:
- Overall (no breakdowns)
- Gender
- Age and gender
- Age and ethnicity
- Age and IMD

Two temporary views are created:
- The first creates a numerator table (counting transitions at each MLTC progression level)
- The second creates a denominator table (summing person_years at each base MLTC progression state)

These are then used in a third query below to extract crude outputs for the main results table. These are then used as the input for the next notebook (01C02, which calculates PRRs and confidence intervals)

Numerator temporary view:

In [5]:
%%sql
CREATE    OR REPLACE TEMPORARY VIEW overall_numerator_combined AS

-- Overall output
SELECT    'NA' AS gender_description,
          'NA' AS breakdown_type,
          'NA' AS socio_demographic_breakdown,
          COUNT(
          CASE
                    WHEN previous_condition_count = 0
                    AND       condition_count >= 1 THEN 1
                              ELSE NULL
          END
          ) AS incidence_0_1_plus,
          COUNT(
          CASE
                    WHEN previous_condition_count = 1
                    AND       condition_count >= 2 THEN 1
                              ELSE NULL
          END
          ) AS incidence_1_2_plus,
          COUNT(
          CASE
                    WHEN previous_condition_count = 2
                    AND       condition_count >= 3 THEN 1
                              ELSE NULL
          END
          ) AS incidence_2_3_plus
FROM      ${param.incidence_schema}.mm_incidence_transitions_age m
WHERE     financial_year = '${param.analysis_year}'
UNION ALL

-- Gender breakdown only
SELECT    gender_description,
          'NA' AS breakdown_type,
          'NA' AS socio_demographic_breakdown,
          COUNT(
          CASE
                    WHEN previous_condition_count = 0
                    AND       condition_count >= 1 THEN 1
                              ELSE NULL
          END
          ) AS incidence_0_1_plus,
          COUNT(
          CASE
                    WHEN previous_condition_count = 1
                    AND       condition_count >= 2 THEN 1
                              ELSE NULL
          END
          ) AS incidence_1_2_plus,
          COUNT(
          CASE
                    WHEN previous_condition_count = 2
                    AND       condition_count >= 3 THEN 1
                              ELSE NULL
          END
          ) AS incidence_2_3_plus
FROM      ${param.incidence_schema}.mm_incidence_transitions_age m
WHERE     financial_year = '${param.analysis_year}'
GROUP BY  gender_description
UNION ALL

-- Age and gender breakdown
SELECT    gender_description,
          'Age' AS breakdown_type,
          age_band AS socio_demographic_breakdown,
          COUNT(
          CASE
                    WHEN previous_condition_count = 0
                    AND       condition_count >= 1 THEN 1
                              ELSE NULL
          END
          ) AS incidence_0_1_plus,
          COUNT(
          CASE
                    WHEN previous_condition_count = 1
                    AND       condition_count >= 2 THEN 1
                              ELSE NULL
          END
          ) AS incidence_1_2_plus,
          COUNT(
          CASE
                    WHEN previous_condition_count = 2
                    AND       condition_count >= 3 THEN 1
                              ELSE NULL
          END
          ) AS incidence_2_3_plus
FROM      ${param.incidence_schema}.mm_incidence_transitions_age m
WHERE     financial_year = '${param.analysis_year}'
GROUP BY  gender_description,
          age_band
UNION ALL

-- Gender and ethnicity breakdown
SELECT    gender_description,
          'Ethnicity' AS breakdown_type,
          census_2011_ethnic_group AS socio_demographic_breakdown,
          COUNT(
          CASE
                    WHEN previous_condition_count = 0
                    AND       condition_count >= 1 THEN 1
                              ELSE NULL
          END
          ) AS incidence_0_1_plus,
          COUNT(
          CASE
                    WHEN previous_condition_count = 1
                    AND       condition_count >= 2 THEN 1
                              ELSE NULL
          END
          ) AS incidence_1_2_plus,
          COUNT(
          CASE
                    WHEN previous_condition_count = 2
                    AND       condition_count >= 3 THEN 1
                              ELSE NULL
          END
          ) AS incidence_2_3_plus
FROM      ${param.incidence_schema}.mm_incidence_transitions_age m
WHERE     financial_year = '${param.analysis_year}'
GROUP BY  gender_description,
          census_2011_ethnic_group
UNION ALL

-- Gender and IMD breakdown
SELECT    gender_description,
          'IMD' AS breakdown_type,
          imd_quintile AS socio_demographic_breakdown,
          COUNT(
          CASE
                    WHEN previous_condition_count = 0
                    AND       condition_count >= 1 THEN 1
                              ELSE NULL
          END
          ) AS incidence_0_1_plus,
          COUNT(
          CASE
                    WHEN previous_condition_count = 1
                    AND       condition_count >= 2 THEN 1
                              ELSE NULL
          END
          ) AS incidence_1_2_plus,
          COUNT(
          CASE
                    WHEN previous_condition_count = 2
                    AND       condition_count >= 3 THEN 1
                              ELSE NULL
          END
          ) AS incidence_2_3_plus
FROM      ${param.incidence_schema}.mm_incidence_transitions_age m
WHERE     financial_year = '${param.analysis_year}'
GROUP BY  gender_description,
          imd_quintile

Denominator temporary view:

In [6]:
%%sql

CREATE    OR REPLACE TEMPORARY VIEW overall_denominator_combined AS

-- Overall output
SELECT    'NA' AS gender_description,
          'NA' AS breakdown_type,
          'NA' AS socio_demographic_breakdown,
          SUM(unique_people_fy_end) AS unique_people_fy_end,
          SUM(
          CASE
                    WHEN condition_count = 0 THEN person_years
                    ELSE NULL
          END
          ) AS person_years_0,
          SUM(
          CASE
                    WHEN condition_count = 1 THEN person_years
                    ELSE NULL
          END
          ) AS person_years_1,
          SUM(
          CASE
                    WHEN condition_count = 2 THEN person_years
                    ELSE NULL
          END
          ) AS person_years_2
FROM      ${param.incidence_schema}.mm_incidence_person_time m
WHERE     financial_year = '${param.analysis_year}'
UNION ALL

-- Gender breakdown only
SELECT    gender_description,
          'NA' AS breakdown_type,
          'NA' AS socio_demographic_breakdown,
          SUM(unique_people_fy_end) AS unique_people_fy_end,
          SUM(
          CASE
                    WHEN condition_count = 0 THEN person_years
                    ELSE NULL
          END
          ) AS person_years_0,
          SUM(
          CASE
                    WHEN condition_count = 1 THEN person_years
                    ELSE NULL
          END
          ) AS person_years_1,
          SUM(
          CASE
                    WHEN condition_count = 2 THEN person_years
                    ELSE NULL
          END
          ) AS person_years_2
FROM      ${param.incidence_schema}.mm_incidence_person_time m
WHERE     financial_year = '${param.analysis_year}'
GROUP BY  gender_description
UNION ALL

-- Age and gender breakdown
SELECT    gender_description,
          'Age' AS breakdown_type,
          age_band AS socio_demographic_breakdown,
          SUM(unique_people_fy_end) AS unique_people_fy_end,
          SUM(
          CASE
                    WHEN condition_count = 0 THEN person_years
                    ELSE NULL
          END
          ) AS person_years_0,
          SUM(
          CASE
                    WHEN condition_count = 1 THEN person_years
                    ELSE NULL
          END
          ) AS person_years_1,
          SUM(
          CASE
                    WHEN condition_count = 2 THEN person_years
                    ELSE NULL
          END
          ) AS person_years_2
FROM      ${param.incidence_schema}.mm_incidence_person_time m
WHERE     financial_year = '${param.analysis_year}'
GROUP BY  gender_description,
          age_band
UNION ALL

-- Gender and ethnicity breakdown
SELECT    gender_description,
          'Ethnicity' AS breakdown_type,
          census_2011_ethnic_group AS socio_demographic_breakdown,
          SUM(unique_people_fy_end) AS unique_people_fy_end,
          SUM(
          CASE
                    WHEN condition_count = 0 THEN person_years
                    ELSE NULL
          END
          ) AS person_years_0,
          SUM(
          CASE
                    WHEN condition_count = 1 THEN person_years
                    ELSE NULL
          END
          ) AS person_years_1,
          SUM(
          CASE
                    WHEN condition_count = 2 THEN person_years
                    ELSE NULL
          END
          ) AS person_years_2
FROM      ${param.incidence_schema}.mm_incidence_person_time m
WHERE     financial_year = '${param.analysis_year}'
GROUP BY  gender_description,
          census_2011_ethnic_group
UNION ALL

-- Gender and IMD breakdown
SELECT    gender_description,
          'IMD' AS breakdown_type,
          imd_quintile AS socio_demographic_breakdown,
          SUM(unique_people_fy_end) AS unique_people_fy_end,
          SUM(
          CASE
                    WHEN condition_count = 0 THEN person_years
                    ELSE NULL
          END
          ) AS person_years_0,
          SUM(
          CASE
                    WHEN condition_count = 1 THEN person_years
                    ELSE NULL
          END
          ) AS person_years_1,
          SUM(
          CASE
                    WHEN condition_count = 2 THEN person_years
                    ELSE NULL
          END
          ) AS person_years_2
FROM      ${param.incidence_schema}.mm_incidence_person_time m
WHERE     financial_year = '${param.analysis_year}'
GROUP BY  gender_description,
          imd_quintile

**Create count of unique period population**

Note: this cannot be calculated in the previous mm_incidence_person_time table, as period population figures will contain people that span multiple different condition count cohorts, and can therefore not be aggregated in the same way without double counting. For this reason they are calculated separately here.

**Run time**: ~6 mins

In [None]:
%%sql

--DROP TABLE IF EXISTS ${param.incidence_schema}.mm_incidence_period_population

In [7]:
%%sql

CREATE     TABLE ${param.incidence_schema}.mm_incidence_period_population USING PARQUET AS

-- Overall output
SELECT    'NA' AS gender_description,
          'NA' AS breakdown_type,
          'NA' AS socio_demographic_breakdown,
          COUNT(DISTINCT p.pseudo_nhs_number) as people
FROM      ${param.segmentation_schema}.fact_model f
INNER     JOIN ${param.incidence_schema}.dim_subsegment_combinations_config ssc ON ssc.old_subsegment_combination_id = f.subsegment_combination_id
INNER     JOIN ${param.segmentation_schema}.dim_date d ON d.date_id = f.date_id
INNER     JOIN ${param.segmentation_schema}.dim_person p ON p.person_id = f.person_id
WHERE     f.gp_id IS NOT NULL
AND       f.age_id>=20
AND       d.financial_year = '${param.analysis_year}'
UNION ALL

-- Gender breakdown only
SELECT    gender_description,
          'NA' AS breakdown_type,
          'NA' AS socio_demographic_breakdown,
          COUNT(DISTINCT p.pseudo_nhs_number) as people
FROM      ${param.segmentation_schema}.fact_model f
INNER     JOIN ${param.incidence_schema}.dim_subsegment_combinations_config ssc ON ssc.old_subsegment_combination_id = f.subsegment_combination_id
INNER     JOIN ${param.segmentation_schema}.dim_date d ON d.date_id = f.date_id
INNER     JOIN ${param.segmentation_schema}.dim_person p ON p.person_id = f.person_id
WHERE     f.gp_id IS NOT NULL
AND       f.age_id>=20
AND       d.financial_year = '${param.analysis_year}'
GROUP     BY gender_description
UNION ALL

-- Age and gender breakdown
SELECT    gender_description,
          'Age' AS breakdown_type,
          CASE WHEN a.ten_year IN ('90-99','100-109','110-119') THEN '90+' ELSE a.ten_year END AS socio_demographic_breakdown,
          COUNT(DISTINCT p.pseudo_nhs_number) as people
FROM      ${param.segmentation_schema}.fact_model f
INNER     JOIN ${param.incidence_schema}.dim_subsegment_combinations_config ssc ON ssc.old_subsegment_combination_id = f.subsegment_combination_id
INNER     JOIN ${param.segmentation_schema}.dim_date d ON d.date_id = f.date_id
INNER     JOIN ${param.segmentation_schema}.dim_person p ON p.person_id = f.person_id
INNER     JOIN ${param.segmentation_schema}.dim_age a ON a.age_id = f.age_id
WHERE     f.gp_id IS NOT NULL
AND       f.age_id>=20
AND       d.financial_year = '${param.analysis_year}'
GROUP     BY gender_description,
          CASE WHEN a.ten_year IN ('90-99','100-109','110-119') THEN '90+' ELSE a.ten_year END
UNION ALL

-- Gender and ethnicity breakdown
SELECT    gender_description,
          'Ethnicity' AS breakdown_type,
          census_2011_ethnic_group AS socio_demographic_breakdown,
          COUNT(DISTINCT p.pseudo_nhs_number) as people
FROM      ${param.segmentation_schema}.fact_model f
INNER     JOIN ${param.incidence_schema}.dim_subsegment_combinations_config ssc ON ssc.old_subsegment_combination_id = f.subsegment_combination_id
INNER     JOIN ${param.segmentation_schema}.dim_date d ON d.date_id = f.date_id
INNER     JOIN ${param.segmentation_schema}.dim_person p ON p.person_id = f.person_id
INNER     JOIN ${param.segmentation_schema}.dim_ethnicity e ON e.ethnicity_id = p.ethnicity_id
WHERE     f.gp_id IS NOT NULL
AND       f.age_id>=20
AND       d.financial_year = '${param.analysis_year}'
GROUP     BY gender_description,
          census_2011_ethnic_group
UNION ALL

-- Gender and IMD breakdown
SELECT    gender_description,
          'IMD' AS breakdown_type,
          imd_quintile AS socio_demographic_breakdown,
          COUNT(DISTINCT p.pseudo_nhs_number) as people
FROM      ${param.segmentation_schema}.fact_model f
INNER     JOIN ${param.incidence_schema}.dim_subsegment_combinations_config ssc ON ssc.old_subsegment_combination_id = f.subsegment_combination_id
INNER     JOIN ${param.segmentation_schema}.dim_date d ON d.date_id = f.date_id
INNER     JOIN ${param.segmentation_schema}.dim_person p ON p.person_id = f.person_id
WHERE     f.gp_id IS NOT NULL
AND       f.age_id>=20
AND       d.financial_year = '${param.analysis_year}'
GROUP     BY gender_description,
          imd_quintile

Calculate unique count of people over the whole study period

**Note**: 2016/17 data is excluded for consistency with the trend analysis outputs (due to potential register effect impact). See 01E_incidence_results_trend_chart_01 for more detail.

In [None]:
%%sql

--DROP TABLE IF EXISTS ${param.incidence_schema}.mm_incidence_period_population_6_years

In [5]:
%%sql

CREATE     TABLE ${param.incidence_schema}.mm_incidence_period_population_6_years USING PARQUET AS

SELECT    'NA' AS gender_description,
          'NA' AS breakdown_type,
          'NA' AS socio_demographic_breakdown,
          COUNT(DISTINCT p.pseudo_nhs_number) as people
FROM      ${param.segmentation_schema}.fact_model f
INNER     JOIN ${param.incidence_schema}.dim_subsegment_combinations_config ssc ON ssc.old_subsegment_combination_id = f.subsegment_combination_id
INNER     JOIN ${param.segmentation_schema}.dim_date d ON d.date_id = f.date_id
INNER     JOIN ${param.segmentation_schema}.dim_person p ON p.person_id = f.person_id
WHERE     f.gp_id IS NOT NULL
AND       f.age_id >= 20
AND       d.date >= '2017-04-01' -- for consistency with trend output - see 01E_incidence_results_trend_chart_01

**Main results table**

This section uses the two tables to extract all crude results required for the main results table:

**Note**:
- `NULL` ethnicity and IMD are set to "Unknown" to improve readability of outputs
- Population figures are extracted as
   - Period population figures (i.e. all people in the population at any point in the given financial year)
   - Point population (snapshot) figures (i.e. all people in the population at the end of the financial year)
   - Person time (expressed as person years) spent in the population within the financial year

In [8]:
%%sql

--DROP TABLE IF EXISTS ${param.incidence_schema}.mm_incidence_transitions_main_results

In [10]:
%%sql

CREATE    TABLE ${param.incidence_schema}.mm_incidence_transitions_main_results USING PARQUET AS
SELECT    d.gender_description,
          d.breakdown_type,
          COALESCE(d.socio_demographic_breakdown, 'Unknown') AS socio_demographic_breakdown,
          p.people AS unique_people_fy_period,
          d.unique_people_fy_end AS unique_people_fy_end,
          d.person_years_0 AS person_years_0,
          n.incidence_0_1_plus AS incidence_0_1_plus,
          (n.incidence_0_1_plus * 1.0) / (d.person_years_0 * 1.0) * 100 AS progression_rate_0_1_plus,
          d.person_years_1 AS person_years_1,
          n.incidence_1_2_plus AS incidence_1_2_plus,
          (n.incidence_1_2_plus * 1.0) / (d.person_years_1 * 1.0) * 100 AS progression_rate_1_2_plus,
          d.person_years_2 AS person_years_2,
          n.incidence_2_3_plus AS incidence_2_3_plus,
          (n.incidence_2_3_plus * 1.0) / (d.person_years_2 * 1.0) * 100 AS progression_rate_2_3_plus
FROM      overall_denominator_combined d
LEFT      OUTER JOIN overall_numerator_combined n ON n.gender_description = d.gender_description
AND       n.breakdown_type = d.breakdown_type
AND       COALESCE(n.socio_demographic_breakdown, 'Unknown') = COALESCE(d.socio_demographic_breakdown, 'Unknown')
LEFT      OUTER JOIN ${param.incidence_schema}.mm_incidence_period_population p ON p.gender_description = d.gender_description
AND       p.breakdown_type = d.breakdown_type
AND       COALESCE(p.socio_demographic_breakdown, 'Unknown') = COALESCE(d.socio_demographic_breakdown, 'Unknown')
WHERE     d.gender_description NOT IN ('NOT KNOWN', 'NOT SPECIFIED')
ORDER BY  d.gender_description,
          d.breakdown_type,
          COALESCE(d.socio_demographic_breakdown, 'Unknown')