# Methodology

This chapter outlines the comprehensive methodology used to calibrate income distributions across constituencies. Our approach combines multiple data sources and statistical techniques to ensure accurate representation of earnings across different geographical areas and demographic segments. The process consists of six major steps, each building upon the previous to create a robust calibration framework.

## Data Source Selection and Acquisition

The foundation of our calibration process rests on carefully selected authoritative data sources. Nomis serves as our primary source for earnings data, providing comprehensive information on income deciles split by sex and employment status (full-time/part-time). This granular breakdown enables detailed analysis of earning patterns across different demographic groups. We supplement this with HMRC income data, which serves as our baseline for EFRS calibration. The Annual Survey of Hours and Earnings (ASHE) provides additional reliable income targets, drawing strength from its basis in administrative data from HMRC PAYE firms. This combination of sources ensures a robust foundation for our subsequent analysis and calibration steps.

## Earnings Distribution Integration

The integration of earnings distributions forms a crucial second step in our methodology. This process involves synthesizing the various data sources into a coherent framework that can be used for calibration. We begin by aligning local area statistics with national benchmarks, ensuring consistency across different geographic levels. The integration process carefully considers the relationships between different income bands and their distributions across regions. This step is particularly important as it sets the stage for creating accurate calibration targets that reflect both local variations and national patterns in income distribution.

## Missing Data Imputation

Our third step addresses the challenge of incomplete data, which is common in large-scale geographic analyses. We employ a sophisticated imputation methodology based on nearest-neighbor principles, where missing values are estimated using data from areas with similar total income profiles. This approach preserves the underlying patterns in the data while ensuring complete coverage across all regions. The imputation process considers both geographic proximity and demographic similarities, resulting in estimates that maintain the integrity of the overall distribution patterns.

## Constituency Mapping Framework

The fourth step involves creating a comprehensive mapping framework to address the evolution of constituency boundaries. We have developed detailed mapping matrices that establish clear relationships between 2010 and 2024 constituencies. This framework ensures continuity in our analysis despite geographic boundary changes, allowing us to track and compare data across different time periods. The mapping process carefully considers the implications of boundary changes on income distribution patterns and adjusts the calibration accordingly.

## Income Percentile Calibration

In the fifth step, we transform raw percentile data into meaningful earnings bands. This process begins with basic percentile data from Nomis and employs advanced spline fitting techniques to create continuous distribution curves. These curves are then used to estimate population counts and total earnings within specific income bands (such as £20k-£30k, £30k-£40k). This transformation is essential for creating practical, usable calibration targets that align with common income bracketing while maintaining the statistical integrity of the original percentile data.

## Quality Control and Validation

The final step in our methodology involves quality control and validation procedures.