## Health Condition using 'hcond' and 'hcondnew'
### By Gavin Qu - May 23rd 2024
#### Data Extraction 

-	Encode hcond and new healthcond variables correctly 
-	Note that individuals are asked about pre-existing health conditions on their first interview in the UKHLS – the hcond (i) variable, where i codes different conditions – and then asked whether they have developed new conditions in subsequent interviews – the hcondn(i) variable in waves 1-9 and hcondnew(i) in waves 10 onwards.
-	hcond in wave 1 and new entrants for succeeding waves, hcondn in wave 1-9, hcondnew in wave 10-13

For example, hcond1-19 has 1, 3-13 waves, and it's for new interviewees only. While hcondn1-19 have 2-9 waves asking the existing interviewees about newly devloped conditions, and hcondnew1-19 have wave 9-13 for the same questions. 
hcond21 and hcondnew21 only exist from wave 10-13, while hcondnew22 only exist from 10-13. 

'dcsedfl_dv' is death data, but it's onyl 50% accurate when it comes to health mortality

To check whether the long panel dataset you created has all the correct values, including the special codes like missing values, proxy, refusal, etc., you can use pandas to display the unique values for each variable. This way, you can verify that all expected values are present in the dataset.
**Here's a script that:**
- Loads the long panel dataset.
- Displays the unique values for each variable.
- Checks for the presence of the specified special codes.

In [3]:
import pandas as pd
import os

# Load the long panel data from the Stata file
long_panel_data_path = '/Users/gavinqu/Desktop/School/Dissertation/EssexDissertation/Data/long_panel_ukhls_hcond_data.dta'
long_panel_data = pd.read_stata(long_panel_data_path)

# Define the special codes to check
special_codes = {
    'missing': -9,
    'proxy': -7,
    'refusal': -2,
    'don\'t know': -1,
    'not mentioned': 0,
    'mentioned': 1
}

# Function to check special codes in each variable
def check_special_codes(df, special_codes):
    for column in df.columns:
        if column not in ['pidp', 'wave', 'variable']:
            unique_values = df[column].unique()
            print(f"Unique values in column '{column}': {unique_values}")
            for code_name, code_value in special_codes.items():
                if code_value in unique_values:
                    print(f"  {code_name} ({code_value}) is present in column '{column}'")
                else:
                    print(f"  {code_name} ({code_value}) is NOT present in column '{column}'")
        else:
            unique_values = df[column].unique()
            print(f"Unique values in column '{column}': {unique_values}")

# Check for special codes in the long panel dataset
check_special_codes(long_panel_data, special_codes)

Unique values in column 'pidp': [  68001367   68004087   68006127 ... 1644552890 1644675410 1649095330]
Unique values in column 'wave': ['a' 'b' 'c' 'd' 'e' 'f' 'g' 'h' 'i' 'j' 'k' 'l' 'm']
Unique values in column 'variable': ['a_hcond11' 'a_hcond13' 'a_hcond5' 'a_hcond7' 'a_hcond1' 'a_hcond15'
 'a_hcond17' 'a_hcond9' 'a_hcond10' 'a_hcond14' 'a_hcond16' 'a_hcond12'
 'a_hcond8' 'a_hcond3' 'a_hcond6' 'a_hcond2' 'a_hcond4' 'b_hcondn9'
 'b_hcondn1' 'b_hcondn6' 'b_hcondn11' 'b_hcondn10' 'b_hcondn8' 'b_hcondn4'
 'b_hcondn7' 'b_hcondn17' 'b_hcondn14' 'b_hcondn2' 'b_hcondn3' 'b_hcondn5'
 'b_hcondn15' 'b_hcondn16' 'b_hcondn13' 'b_hcondn12' 'c_hcondn17'
 'c_hcondn10' 'c_hcondn1' 'c_hcond11' 'c_hcond14' 'c_hcond3' 'c_hcondn13'
 'c_hcondn15' 'c_hcond17' 'c_hcondn4' 'c_hcond1' 'c_hcond7' 'c_hcondn3'
 'c_hcondn11' 'c_hcondn14' 'c_hcondn5' 'c_hcond5' 'c_hcond4' 'c_hcond9'
 'c_hcondn12' 'c_hcondn2' 'c_hcond16' 'c_hcondn16' 'c_hcond6' 'c_hcond13'
 'c_hcond15' 'c_hcond8' 'c_hcond2' 'c_hcondn9' 'c_hcond1

### New approach to include the disdif and hcond along with age and death value in the long panel format: 
1. Load the xhhrel.dta file to get the death information.
2. Merge the death information with the main dataset.
3. Load the disdif variables from each wave and combine them with the existing health condition data.
4. Calculate the frailty index using the combined dataset.