## NDI Mortality Data for 1999 - 2018 NHANES

PUBLIC-USE LINKED MORTALITY FOLLOW-UP THROUGH DECEMBER 31, 2019. 

The public-use Linked Mortality Files (LMF) for 1999-2018 NHANES can be downloaded from [this site](Hellohttps://ftp.cdc.gov/pub/Health_Statistics/NCHS/datalinkage/linked_mortality/). For more information on the data files and data dictionaries visit [this site](https://www.cdc.gov/nchs/data-linkage/mortality-public.htm)

The following Python code can be used to read the fixed-width format ASCII public-use Linked
Mortality Files (LMFs) from a stored location into a data frame and save as a csv

##### Import Libraries

In [94]:
import os
import pandas as pd
import numpy as np

##### Define Column and Width Positions

In [95]:
nhanes_cols = [(0, 6), (14, 15), (15, 16), (16, 19), (19, 20), (20, 21), (42, 45), (45, 48)]

##### Define Variable Labels and Value Formats

In [None]:
# Note: Variable labels and value formats are not provided in the data, you may need to refer to the data dictionary

variable_labels = {
    "ELIGSTAT": "Eligibility Status for Mortality Follow-up",
    "MORTSTAT": "Final Mortality Status",
    "UCOD_LEADING": "Underlying Cause of Death: Recode",
    "DIABETES": "Diabetes Flag from Multiple Cause of Death (MCOD)",
    "HYPERTEN": "Hypertension Flag from Multiple Cause of Death (MCOD)"
}
value_formats = {
    "ELIGSTAT": {1: "Eligible", 2: "Under age 18, not available for public release", 3: "Ineligible"},
    "MORTSTAT": {0: "Assumed alive", 1: "Assumed deceased"},
    "UCOD_LEADING": {1: "Diseases of heart", 2: "Malignant neoplasms", 3: "Chronic lower respiratory diseases", 
                     4: "Accidents", 5: "Cerebrovascular diseases", 6: "Alzheimer's disease", 7: "Diabetes mellitus",
                     8: "Influenza and pneumonia", 9: "Nephritis, nephrotic syndrome and nephrosis", 10: "All other causes"},
    "DIABETES": {0: "No", 1: "Yes"},
    "HYPERTEN": {0: "No", 1: "Yes"}
}


##### Function to Extract Start and End Years from Filename

In [None]:

def extract_years(filename):
    parts = filename.split("_")
    start_year, end_year = parts[1], parts[2]
    return int(start_year), int(end_year)


##### Retrieve all .dat Files

In [97]:
dat_files = [file for file in os.listdir("Mort data") if file.endswith(".dat")]
print("Files in directory: \n\n" + str(dat_files) + "\n\n with length " + str(len(dat_files)))

Files in directory: 

['NHANES_1999_2000_MORT_2019_PUBLIC.dat', 'NHANES_2001_2002_MORT_2019_PUBLIC.dat', 'NHANES_2003_2004_MORT_2019_PUBLIC.dat', 'NHANES_2005_2006_MORT_2019_PUBLIC.dat', 'NHANES_2007_2008_MORT_2019_PUBLIC.dat', 'NHANES_2009_2010_MORT_2019_PUBLIC.dat', 'NHANES_2011_2012_MORT_2019_PUBLIC.dat', 'NHANES_2013_2014_MORT_2019_PUBLIC.dat', 'NHANES_2015_2016_MORT_2019_PUBLIC.dat', 'NHANES_2017_2018_MORT_2019_PUBLIC.dat']

 with length 10


##### Read and Store the Data using Fixed-width Format as a DataFrame

In [98]:
all_dataframes = []

for filename in dat_files:
    nhanes_data = pd.read_fwf(os.path.join("Mort data", filename), colspecs=nhanes_cols, header=None)
    nhanes_data.columns = ["respondent_sequence_number", "elig_stat", "mort_stat", "ucod_leading", "diabetes", "hyperten", "permth_int", "permth_exm"]
    
    nhanes_data.replace(".", np.nan, inplace=True)

    start_year, end_year = extract_years(filename)
    nhanes_data["start_year"] = start_year
    nhanes_data["end_year"] = end_year
    
    nhanes_data["respondent_sequence_number"] = nhanes_data["respondent_sequence_number"].astype(str).str.zfill(5)
    
    all_dataframes.append(nhanes_data)


##### Concatenate all Dataframes

In [100]:
combined_data = pd.concat(all_dataframes, ignore_index=True)
combined_data.head(10)

Unnamed: 0,respondent_sequence_number,elig_stat,mort_stat,ucod_leading,diabetes,hyperten,permth_int,permth_exm,start_year,end_year
0,1,2,,,,,,,1999,2000
1,2,1,1.0,6.0,0.0,0.0,177.0,177.0,1999,2000
2,3,2,,,,,,,1999,2000
3,4,2,,,,,,,1999,2000
4,5,1,0.0,,,,244.0,244.0,1999,2000
5,6,1,0.0,,,,246.0,245.0,1999,2000
6,7,1,0.0,,,,237.0,236.0,1999,2000
7,8,2,,,,,,,1999,2000
8,9,2,,,,,,,1999,2000
9,10,1,1.0,1.0,0.0,0.0,231.0,231.0,1999,2000


##### Convert the DataFrame to CSV

In [101]:
combined_data.to_csv("NHANES_MORT_1999_2018_PUBLIC.csv", index=False)