As the human era continues advancements in medicine keep improving the quality of life for countries around the world. In the pursuit to ensure these advancements are made available to everyone, the question then becomes what factors contribute most to the life expectancy in the modern era. The goal of this project was to see which features in our data had the strongest relationship with life expectancy.
The data used was provided by the WHO and contains data on the following features:
- Country -
- Year (Between 2000 and 2015)
- Status (Developed or developing/1 or 0)
- Life Expectancy
- Adult Mortality (deaths between 15 and 60)
- Infant Deaths (per 1000 population)
- Alchohol (consumption per capita in liters)
- Percentage Expenditure (on healthcare)
- Hepatitis B (immunization)
- Measles (cases per 1000)
- BMI
- Under-Five Deaths (per 1000)
- Polio (immunization)
- Total Expenditure (on healthcare)
- Diphteria (immunization)
- HIV/AIDs (deaths per 1000)
- GDP
- Population
- Thinness 1-19 years
- Thinness 5-9 years
- Income Compostion Of Resources (percentage between 0 and 1)
- Schooling (years)
For data cleaning, all that was done was standardized the type format and imputed the missing data. The average was used for numerical data types and mode for categorical.
On average, BMI has only gone up between 2000 and 2015.
On average, there was a period where alcohol consumption was decreasing but started to increase towards the middle of the decade.
Cases of young deaths across multiple metrics are also steadily decreasing. For adult mortality (between 15-60) there are two spikes for 2004 and 2008, these years do coincidentally coincide with active moments of conflict during the Iraq and Afghanistan wars. However, more analysis is needed to be sure
Regardless, life expectancy has been consistently increasing between 2000 and 2015.
BMI has a moderately strong relationship with life expectancy as seen above, the higher the BMI the higher the life expectancy. This appears to be a little counter intuitive because you'd expect a negative relationship with a lower BMI making a higher life expectancy. When calculating the correlation coefficient (which you can think of as "percent correlated") we get .56, so just a little over 50% correlated
Our next visual shows a weak relationship between GDP and life expectancy, when calculating the correlation coefficient (from now on called CC) you get 43%. So, while there is a relationship, it doesn't appear to be a strong one.
The hepatitis-b vaccine appears to have a weak relationship with life expectancy. All the data points far from the redline appear to be outliers, meaning they are not representative of the rest of the population. The CC came out to 20%.
However, the polio vaccine seems to have a "positive/moderate to strong relationship" meaning the more people are vaccinated the larger the life expectancy. The CC calculated was 46%
It appears adult mortality (15-60) has a strong negative relationship, meaning the more cases of adult mortality results in a lower life expectancy. The CC calculated was 70% in the negative direction
The relationship between under 5 deaths and life expectancy is very weak. CC calculated at 20% in the negative direction.
Similarly, infant deaths seem to have a very weak relationship with life expectancy at 20% in the negative direction.
Population doesn't appear to be a strong factor, seeing how the trend line goes flat across. The CC calcualtion was only 20%.
For the chart above, we filtered all countries with a life expectancy under 65 and see if there was a relationship between that and percentage spent on healthcare. As shown, there doesn't appear to be a strong relationship with a CC of 1%.
Alcohol has a weak relationship to life expectancy with only a 39% correlation
Schooling appears to have a very strong relationship to life expectancy. You can clearly see on the graph about more schooling means a higher life expectancy, the cc calculated was 70%.
We did run a statistical analysis on all these features where we split two samples from the life expectancy median and did find a statistical significance in the samples on all these features. However, there were some limitations that we will expand upon later.
After playing with a few different types of models, the Random Forest Regressor. This is a type of Decision Tree model that is good with numerical data. Think if you ask a bunch of friends for advice on a given decision and you took all their advice to create a refined choice in your decision. This is how the random forest works. We split our data into a standard train test split and used the default parameters, this got us a model that can predict life expectancy withing about 1-3 years of error. When tuning the parameters, the error increased so we opted to leave it alone.
The two strongest factors given regarding life expectancy are schooling and adult mortality. However, I am not sure if schooling is directly associated with life expectancy. It could also be possible with more schooling available means more public works, meaning higher standards of sanitation, which could also be contributing. However, with adult mortality, it does appear there are enough young deaths that are affecting the average life expectancy.
As far as moderate relationships, we have BMI, GDP, and the polio vaccine, however these three can also be connected. Notice earlier how higher BMIs were associated with higher life expectancy. Higher BMI could also mean higher access to food and doctors (polio vaccine).
With no subject matter expert on hand, the sample groups for the statistical analysis were all split from the median. This was probably not the correct move since everything came back with a statistical significance. In addition, this dataset treats all countries the same and social differences such as diets should be considered.
If we want to increase the average life expectancy, we need to lower adult mortality. More research needs to be done on adult mortality on its own to discover the leading causes to mitigate that issue.
While this project gives us some glimpses into what effects life expectancy, more research needs to be done on these features on their own to come up with a clear path to a healthier human race.
If possible, we need to look at these findings in a more cultural context. Separating or marking the countries by region and continent to better see how culture impacts these numbers would be ideal. In addition, a separate study into adult mortality is needed seeing as that is something affecting worldwide averages significantly.