## Health Condition using 'hcond' and 'hcondnew'
### By Gavin Qu - May 23rd 2024
#### Data Extraction 

-	Encode hcond and new healthcond variables correctly 
-	Note that individuals are asked about pre-existing health conditions on their first interview in the UKHLS – the hcond (i) variable, where i codes different conditions – and then asked whether they have developed new conditions in subsequent interviews – the hcondn(i) variable in waves 1-9 and hcondnew(i) in waves 10 onwards.
-	hcond in wave 1 and new entrants for succeeding waves, hcondn in wave 1-9, hcondnew in wave 10-13

For example, hcond1-19 has 1, 3-13 waves, and it's for new interviewees only. While hcondn1-19 have 2-9 waves asking the existing interviewees about newly devloped conditions, and hcondnew1-19 have wave 9-13 for the same questions. 
hcond21 and hcondnew21 only exist from wave 10-13, while hcondnew22 only exist from 10-13. 

'dcsedfl_dv' is death data, but it's onyl 50% accurate when it comes to health mortality

In [None]:
import pandas as pd

# List of variables to extract
variables = [
    'hcond1', 'hcond2', 'hcond3', 'hcond4', 'hcond5', 'hcond6', 'hcond7',
    'hcond8', 'hcond9', 'hcond10', 'hcond11', 'hcond12', 'hcond13', 'hcond14',
    'hcond15', 'hcond16', 'hcond17', 'hcond18', 'hcond19', 'hcond21', 'hcond22',
    'hcondn1', 'hcondn2', 'hcondn3', 'hcondn4', 'hcondn5', 'hcondn6', 'hcondn7',
    'hcondn8', 'hcondn9', 'hcondn10', 'hcondn11', 'hcondn12', 'hcondn13', 'hcondn14',
    'hcondn15', 'hcondn16', 'hcondn17', 'hcondn18', 'hcondn19', 'hcondnew1', 'hcondnew2',
    'hcondnew3', 'hcondnew4', 'hcondnew5', 'hcondnew6', 'hcondnew7', 'hcondnew8',
    'hcondnew10', 'hcondnew11', 'hcondnew12', 'hcondnew13', 'hcondnew14', 'hcondnew15',
    'hcondnew16', 'hcondnew19', 'hcondnew21', 'hcondnew22'
]

# Function to load and filter wave data
def load_wave_data(wave_prefix):
    # Assuming each wave data is in a separate .dta file named 'wave_prefix.dta'
    file_path = f'path_to_your_data/{wave_prefix}_data.dta'
    wave_data = pd.read_stata(file_path, columns=variables)
    wave_data['wave'] = wave_prefix
    return wave_data

# List to store data from each wave
all_waves_data = []

# Wave prefixes from 'a' to 'm'
wave_prefixes = [chr(i) for i in range(ord('a'), ord('n'))]

# Loop through wave prefixes
for prefix in wave_prefixes:
    wave_data = load_wave_data(prefix)
    all_waves_data.append(wave_data)

# Combine all waves into a single DataFrame
combined_data = pd.concat(all_waves_data, ignore_index=True)

# Save the combined data to a CSV file
combined_data.to_csv('combined_ukhls_data.csv', index=False)

# Display the first few rows of the combined data
print(combined_data.head())