Skip to content
Multimorbidity in households and health care costs
R SAS Stata
Branch: master
Clone or download

Latest commit

Fetching latest commit…
Cannot retrieve the latest commit at this time.


Type Name Latest commit message Commit time
Failed to load latest commit information.


Multimorbidity in households and health care costs

Status: In Progress

Project Description

As the number of people with multiple long-term conditions grows, meeting their needs will be one of the biggest challenges facing the NHS. Patients with two or more conditions account for over half of primary and secondary care activity and costs.

The responsibility for managing these conditions falls mostly on the individuals themselves and on their informal carers but people who are less able to effectively manage their conditions need more care from the NHS. It is important to understand the wider context of people living with multiple conditions and any impact this has on health care activity and costs.

This project examines whether the health care activity and costs of a person with multiple conditions depends on their household health context.

Data source

We used data from the Clinical Practice Research Datalink (CPRD) linked to Hospital Episode Statistics (HES), ONS mortality and Indices of Multiple Deprivation. ISAC protocol number 17_150RMn2.

From an original random sample, we selected a subsample of two-person households. We counted the nunber of conditions they had at baseline (1st April 2014) and calculated their health care activity and cost described in over two years of follow-up.

How does it work?

As the data used for this analysis is not publically available, the code cannot be used to replicate the analysis on this dataset. However, with modifications the code will be able to be used on other patient-level CPRD extracts.


These scripts were written in SAS Enterprise Guide Version 7.12, RStudio Version 1.1.383 and Stata MP15. The following R packages are used:

Getting started

The SAS folder contains:

  • Derive_twopersonhouseholds - Uses the denominator file from CPRD to select patients in two-person households. We limited the sample to people aged 50+ that registered at their practice within 1 year of each other. The resulting list of patient IDs can then be merged with other variables for analysis. We merged with data on multiple conditions and primary and secondary care activity and costs.

The R folder contains:

  • 01_Derive_variables_for_regression - Prepare variables for the two-part regression models

  • 02_Two_part_regression_models - Combines logistic model for whether has a non-zero cost with a gamma distribution for cost where this is non-zero. Predicted values are estimated for each level of household multimorbidity. One option for obtaining confidence intervals is to use bootstrapping. We opted instead to switch to Stata and use the twopm command.

The Stata folder contains:

  • twopm_costs - Runs two part models and obtains estimated mean (s.e.) costs across both parts of the model.

Authors - please feel free to get in touch


This project is licensed under the MIT License.

You can’t perform that action at this time.