### Analysis
Which factors best predict an early death?  
Train a model on the dataset. Cross-validate.   
What will the model try to predict? Age at death.
Which predictors will you use?
* Race
* Education
* Marital status
* Sex
* Manner of death (consider limiting this to just natural, instead of self-inflicted or crime-related causes)

Train other models that try to predict how likely it is (and when) that you'll die in a particular kind of way:
* Alzheimer's
* Cancer
* Heart disease
* Cardiovascular disease
* Diabetes
* Etc.

### Visualization methodology
Do a hist on the data first, then use the count of each bin as a weight for a gaussian kernel. Then do KDE (kernel density estimation) and plot the results.


### For health data, try to get published info about life expectancy
* Have them enter in their weight and height. Calculate BMI. This has massive effects on life expectancy (see Lancet article on BMI and obesity). 
* BMI interacts with other variables, such as exercise. See PLOS Medicine article on leisure exercise.
* Smoking
* Get county data from CDC! Once you do, you can link it to the census info at: https://www.census.gov/support/USACdataDownloads.html. Unfortunately, they stopped making data available after about 2010. But there is enough info in there to get at some interesting relationships.
    * Crime
    * Income
    * Education
    * Employment
    * Population (which you can cross reference with County geographical area to estimate population density)

### General Rule:
* Unless there are published data showing a relationship between risk factors, don't assume they interact. Rather, assume they are independent and that there is a race between them. In other words, if smoking takes 6 years off your life and a BMI of 40 takes 3 years off your life, we shouldn't assume that someone who has both of these is taking 9 years off their life. Rather, we assume that they would probably die from smoking before their obesity killed them. This establishes a hierarchy of things to change about your life. Given all the information about you, there are things you can change and things you can't change. Of the things you can change, which is most likely to kill you first? Let's work on that.

### Main narrative
There are a handful of apps and websites that, at varying levels of accuracy, try to predict the day you will die. None of them tell you very much about how they are doing it.

### This app will serve two main purposes:
* Help you grapple with the meaning of your life. It can be a therapist.
    * How will it do this? 
        * Show you how much time you (probably) have left to live, which hopefully will motivate you to focus on what is most important to you.
* Help you gain control over your health. It can be a personal trainer.
    * How will it do this? 
        * Show you how much control you have over how long you (probably) have left to live. 
        * Suggest the changes that will most likely result in a longer life.
        * Educate. Back up the claims with publicly available data and peer-reviewed studies. Make the data analysis easy to understand and replicate for those interested (with jupyter notebooks), and have links to all the studies used to inform the predictions. 
        
### Marketability?
This app could be integrated as a simplified widget in time management and efficiency apps (as a way of motivating and reminding the user how much time they have left in life). It could also be integrated into biometric tracking apps.



## Some useful ID codes

'education_2003_revision'
* 1:8th grade or less
* 2:9 - 12th grade, no diploma
* 3:high school graduate or GED completed
* 4:some college credit, but no degree
* 5:Associate degree
* 6:Bachelor’s degree
* 7:Master’s degree
* 8:Doctorate or professional degree
* 9:Unknown

'education_reporting_flag'
* 0:1989 revision of education item on certificate
* 1:2003 revision of education item on certificate
* 2:no education item on certificate

'sex'
* M:Male
* F:Female

'marital_status'
* S:Never married, single
* M:Married
* W:Widowed
* D:Divorced
* U:Marital Status unknown

'manner_of_death'
* 1:Accident
* 2:Suicide
* 3:Homicide
* 4:Pending investigation
* 5:Could not determine
* 6:Self-Inflicted
* 7:Natural
* Blank:Not specified

'39_cause_recode'
* 001:Tuberculosis (A16-A19)
* 002:Syphilis (A50-A53)
* 003:Human immunodeficiency virus (HIV) disease (B20-B24)
* 004:Malignant neoplasms (C00-C97)
* 005:Malignant neoplasm of stomach (C16)
* 006:Malignant neoplasms of colon, rectum and anus (C18-C21)
* 007:Malignant neoplasm of pancreas (C25)
* 008:Malignant neoplasms of trachea, bronchus and lung (C33-C34)
* 009:Malignant neoplasm of breast (C50)
* 010:Malignant neoplasms of cervix uteri, corpus uteri and ovary (C53-C56)
* 011:Malignant neoplasm of prostate (C61)
* 012:Malignant neoplasms of urinary tract (C64-C68)
* 013:Non-Hodgkin's lymphoma (C82-C85)
* 014:Leukemia (C91-C95)
* 015:Other malignant neoplasms (C00-C15,C17,C22-C24,C26-C32,C37-C49,C51-C52, C57-C60,C62-C63,C69-C81,C88,C90,C96-C97)
* 016:Diabetes mellitus (E10-E14)
* 017:Alzheimer's disease (G30)
* 018:Major cardiovascular diseases (I00-I78)
* 019:Diseases of heart (I00-I09,I11,I13,I20-I51)
* 020:Hypertensive heart disease with or without renal disease (I11,I13)
* 021:Ischemic heart diseases (I20-I25)
* 022:Other diseases of heart (I00-I09,I26-I51)
* 023:Essential (primary) hypertension and hypertensive renal disease (I10,I12)
* 024:Cerebrovascular diseases (I60-I69)
* 025:Atherosclerosis (I70)
* 026:Other diseases of circulatory system (I71-I78)
* 027:Influenza and pneumonia (J10-J18)
* 028:Chronic lower respiratory diseases (J40-J47)
* 029:Peptic ulcer (K25-K28)
* 030:Chronic liver disease and cirrhosis (K70,K73-K74)
* 031:Nephritis, nephrotic syndrome, and nephrosis (N00-N07,N17-N19,N25-N27)
* 032:Pregnancy, childbirth and the puerperium (O00-O99)
* 033:Certain conditions originating in the perinatal period (P00-P96)
* 034:Congenital malformations, deformations and chromosomal abnormalities (Q00-Q99)
* 035:Sudden infant death syndrome (R95)
* 036:Symptoms, signs and abnormal clinical and laboratory findings, not elsewhere classified (excluding Sudden infant death syndrome) (R00-R94,R96-R99)
* 037:All other diseases (Residual) (A00-A09,A20-A49,A54-B19,B25-B99,D00-E07, E15-G25,G31-H93,I80-J06,J20-J39,J60-K22,K29-K66,K71-K72, K75-M99,N10-N15,N20-N23,N28-N98)
* 038:Motor vehicle accidents (V02-V04,V09.0,V12-V14,V19.0-V19.2,V19.4-V19.6, V20-V79,V80.3-V80.5,V81.0-V81.1,V82.0-V82.1,V83-V86,V87.0-V87.8, V88.0-V88.8,V89.0,V89.2)
* 039:All other and unspecified accidents and adverse effects (V01,V05-V06,V09.1,V09.3-V09.9,V10-V11,V15-V18,V19.3,V19.8-V19.9, V80.0-V80.2,V80.6-V80.9,V81.2-V81.9,V82.2-V82.9,V87.9,V88.9,V89.1, V89.3,V89.9,V90-X59,Y40-Y86,Y88)
* 040:Intentional self-harm (suicide) (*U03,X60-X84,Y87.0)
* 041:Assault (homicide) (*U01-*U02,X85-Y09,Y87.1)
* 042:All other external causes (Y10-Y36,Y87.2,Y89)

'race'
* 18:Asian Indian
* 28:Korean
* 38:Samoan
* 48:Vietnamese
* 58:Guamanian
* 68:Other Asian or Pacific Islander in areas reporting codes 18-58
* 78:Combined other Asian or Pacific Islander, includes codes 18-68
* 01:White
* 02:Black
* 03:American Indian (includes Aleuts and Eskimos)
* 04:Chinese
* 05:Japanese
* 06:Hawaiian (includes Part-Hawaiian)
* 07:Filipino
* 00:Other races
* 08:Other Asian or Pacific Islander

'hispanic_originrace_recode'
* 1:Mexican
* 2:Puerto Rican
* 3:Cuban
* 4:Central or South American
* 5:Other or unknown Hispanic
* 6:Non - Hispanic white
* 7:Non - Hispanic black
* 8:Non - Hispanic other races
* 9:Hispanic origin unknown