<a href="https://colab.research.google.com/github/davidsadovy/data690_fall2022/blob/main/data690_world_dev/ipynb/wdx_analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Does Knowledge Affect Quality of Life?
- Dave Sadovy
- 11/13/2022
- Individual Project Part B

## Motivation
Many people believe in knowledge for its own sake.  That is, there is value in learning about the world around us, even if the knowledge gained does not produce any practical results.  Knowledge is not simply a means to an end - it is an end in itself!  Those who embrace this philosophy are often life-long learners.  Pursuing education provides them with both enjoyment and a sense of fulfillment.  But beyond personal satisfaction, does knowledge impact quality of life?

This study will examine possible relationships between knowledge and quality of life at a national level.  Specifically, we will try to determine if knowledge is associated with either health or wealth.  The results may inform strategies for improving quality of life through the advancement and promotion of education.  Knowledge will be approximated by a nation's average years of total schooling its citizens have attained by age 64.  Quality of life will be represented by two factors: life expectancy will serve as a proxy for health, while GPD per capita will stand in for wealth.  

## Data Source and Selected Indicators
The source of the data is [The World Development Explorer](https://www.worlddev.xyz).  

Definitions of the indicators used for this study:
1.  **Barro-Lee: Average years of total schooling, age 60-64, total.**  Average years of total schooling, 60-64, total is the average years of education completed among people age 60-64.  
2.  **Life expectancy at birth, total (years).**  Life expectancy at birth indicates the number of years a newborn infant would live if prevailing patterns of mortality at the time of its birth were to stay the same throughout its life.
3.  **GDP per capita, PPP (current international $).**  This indicator provides per capita values for gross domestic product (GDP) expressed in current international dollars converted by purchasing power parity (PPP) conversion factor.   GDP is the sum of gross value added by all resident producers in the country plus any product taxes and minus any subsidies not included in the value of the products. conversion factor is a spatial price deflator and currency converter that controls for price level differences between countries. Total population is a mid-year population based on the de facto definition of population, which counts all residents regardless of legal status or citizenship.

Since the more prosperous countries of the world have the luxury to apply more resources towards the pursuit of knowledge, this study will look specifically at the G7 nations: Canada, France, Germany, Italy, Japan, the United Kingdom, and the United States.  (Note: An initial examination was done of the G8, which includes Russia along with the G7.  However, Russia proved to be an outlier in several important metrics, so they have been excluded from this investigation.)  This selection of nations has the additional advantage of representing three of the world's seven regions: East Asia & Pacific, Europe & Central Asia, and North America.  Data for the selected indicators for the G7 is available for the period from 1990 to 2010, so the research will be limited to those years.

In [1]:
# import required libraries
import pandas as pd
import plotly.express as px

In [2]:
# set URL to G7 data file in GitHub
DATA_SOURCE = "https://raw.githubusercontent.com/davidsadovy/data690_fall2022/main/data690_world_dev/ipynb/wdx_wide.csv"

# read data file into data frame
df = pd.read_csv(DATA_SOURCE)

# examine first 3 rows of data
df.head(3)

Unnamed: 0.1,Unnamed: 0,Year,Country Code,Country Name,Region,Income Group,Lending Type,BAR.SCHL.6064,NY.GDP.PCAP.PP.CD,SP.DYN.LE00.IN
0,0,1990,CAN,Canada,North America,High income,Not classified,8.65,,77.421951
1,1,1990,DEU,Germany,Europe & Central Asia,High income,Not classified,8.7,19452.5734,75.227756
2,2,1990,FRA,France,Europe & Central Asia,High income,Not classified,5.38,17642.07214,76.6


In [3]:
# reduce data frame to keep only columns of interest and rename columns from Indicator ID to Indicator Name
df = df[["Year", "Country Name", "BAR.SCHL.6064", "NY.GDP.PCAP.PP.CD", "SP.DYN.LE00.IN"]]
df.rename(columns = {"BAR.SCHL.6064": "Barro-Lee: Average years of total schooling, age 60-64, total",
                     "NY.GDP.PCAP.PP.CD": "GDP per capita, PPP (current international $)",
                     "SP.DYN.LE00.IN": "Life expectancy at birth, total (years)"}, inplace=True)

df.head(3)

Unnamed: 0,Year,Country Name,"Barro-Lee: Average years of total schooling, age 60-64, total","GDP per capita, PPP (current international $)","Life expectancy at birth, total (years)"
0,1990,Canada,8.65,,77.421951
1,1990,Germany,8.7,19452.5734,75.227756
2,1990,France,5.38,17642.07214,76.6


## Indicators by G7 Nations in 2010

In [19]:
# create new data frame for 2010 only
df_2010 = df.query('Year == 2010')

df_2010

Unnamed: 0,Year,Country Name,"Barro-Lee: Average years of total schooling, age 60-64, total","GDP per capita, PPP (current international $)","Life expectancy at birth, total (years)"
140,2010,Canada,12.43,40099.44824,81.246341
141,2010,Germany,12.2,38952.6946,79.987805
142,2010,France,10.01,35902.90311,81.663415
143,2010,United Kingdom,11.3,36576.58654,80.402439
144,2010,Italy,9.11,35158.44184,82.036585
145,2010,Japan,11.22,35335.37351,82.842683
146,2010,United States,13.53,48650.64313,78.541463


In [22]:
df_2010.describe()

Unnamed: 0,Year,"Barro-Lee: Average years of total schooling, age 60-64, total","GDP per capita, PPP (current international $)","Life expectancy at birth, total (years)"
count,7.0,7.0,7.0,7.0
mean,2010.0,11.4,38668.012996,80.960105
std,0.0,1.497576,4782.370887,1.43676
min,2010.0,9.11,35158.44184,78.541463
25%,2010.0,10.615,35619.13831,80.195122
50%,2010.0,11.3,36576.58654,81.246341
75%,2010.0,12.315,39526.07142,81.85
max,2010.0,13.53,48650.64313,82.842683


### Average Years of Schooling

In [5]:
fig = px.bar(
    df_2010, 
    x="Country Name", 
    y="Barro-Lee: Average years of total schooling, age 60-64, total", 
    title="Average Years of Schooling, G7, 2010", 
    color="Country Name", 
    template="plotly_dark"
    )

fig.update_layout(barmode='stack', xaxis={'categoryorder':'total descending'})

fig.show()

The data ranges from a low of 9.11 years of schooling in Italy to a high of 13.53 years in the United States, with an average of 11.4.  This range of over 4 years is relatively large.  Note also that the top two nations are in North America.

### Life Expectancy

In [6]:
fig = px.bar(
    df_2010, 
    x="Country Name", 
    y="Life expectancy at birth, total (years)", 
    title="Life Expectancy, G7, 2010", 
    color="Country Name", 
    template="plotly_dark"
    )

fig.update_layout(barmode='stack', xaxis={'categoryorder':'total descending'})

fig.show()

Life Expectancies range from a low of 78.54 years in the United States to a high of 82.84 years in Japan.  The average across the G7 is 80.96 years.  Life expectancies are fairly consistent, with a range of 4.3 years, slightly less than the range in schooling years.

### GDP Per Capita

In [7]:
fig = px.bar(
    df_2010, 
    x="Country Name", 
    y="GDP per capita, PPP (current international $)", 
    title="GDP Per Capita, G7, 2010", 
    color="Country Name", 
    template="plotly_dark"
    )

fig.update_layout(barmode='stack', xaxis={'categoryorder':'total descending'})

fig.show()

Italy has the lowest GPD per capita at \$35,158, while the United States has the highest at \$48,650.  The average is \$38,668.  Once again the top two are the North American nations.  The United States' GDP per capita is nearly $10,000 above the G7 average.

Examining these three bar graphs, it is interesting to note that France, Italy, and Japan are the bottom three in both Average Years of Schooling and GDP Per Capita, while they are the top three in Life Expectancy.  

## Indicators by G7 Nations from 1990-2010

### Average Years of Schooling

In [8]:
# schooling data was only collected every 5 years
# create new data frame dropping 'nan' for years data wasn't collected
df3 = df[["Year", "Country Name", "Barro-Lee: Average years of total schooling, age 60-64, total"]]
df3.dropna(inplace=True)



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



In [9]:
fig = px.line(
    df3, 
    x="Year", 
    y="Barro-Lee: Average years of total schooling, age 60-64, total", 
    title="Average Years of Schooling, G7, 1990-2010", 
    color="Country Name", 
    template="plotly_dark"
    )

fig.show()

Here we see that the United States has maintained a significant advantage over the entire period, while France and Italy have remained at the bottom. The slopes of each line are relatively constant with a few notable exceptions.  Italy's numbers declined from 1990 to 1995 before beginning a steady rise.  The only other decrease is schooling was in the United Kingdom from 1995 to 2000, which was also followed by a steady rise.

### Life Expectancy

In [10]:
fig = px.line(
    df, 
    x="Year", 
    y="Life expectancy at birth, total (years)", 
    title="Life Expectancy, G7, 1990-2010", 
    color="Country Name", 
    template="plotly_dark"
    )

fig.show()

Japan's life expectancy has exceeded the rest of the G7 for the entire time period under study.  The United States has consistently held the bottom spot, with the gap widening.  

### GDP Per Capita

In [11]:
fig = px.line(
    df, 
    x="Year", 
    y="GDP per capita, PPP (current international $)", 
    title="GDP Per Capita, G7, 1990-2010", 
    color="Country Name", 
    template="plotly_dark"
    )

fig.show()

The United States has held a strong lead in GDP per capita for the entirety of these two decades.  All seven nations seem to be steadily increasing at roughly the same pace, with the notable exception of the Global Financial Crisis of 2008. 

## The Impact of Knowledge on Quality of Life

In [12]:
# set URL to data file for all nations in GitHub
DATA_SOURCE_2 = "https://raw.githubusercontent.com/davidsadovy/data690_fall2022/main/data690_world_dev/ipynb/wdx_wide_all.csv"

# read data file into data frame
df2 = pd.read_csv(DATA_SOURCE_2)

# examine first 3 rows of data
df2.head(3)

Unnamed: 0.1,Unnamed: 0,Year,Country Code,Country Name,Region,Income Group,Lending Type,BAR.SCHL.6064,NY.GDP.PCAP.PP.CD,SP.DYN.LE00.IN
0,0,1990,ABW,Aruba,Latin America & Caribbean,High income,Not classified,,21942.26749,73.468
1,1,1990,AFG,Afghanistan,South Asia,Low income,IDA,0.37,,50.331
2,2,1990,AGO,Angola,Sub-Saharan Africa,Lower middle income,IBRD,,3263.33435,45.306


In [13]:
# reduce data frame to keep only columns of interest and rename columns from Indicator ID to Indicator Name
df2 = df2[["Year", "Country Name", "Region", "BAR.SCHL.6064", "NY.GDP.PCAP.PP.CD", "SP.DYN.LE00.IN"]]
df2.rename(columns = {"BAR.SCHL.6064": "Barro-Lee: Average years of total schooling, age 60-64, total",
                     "NY.GDP.PCAP.PP.CD": "GDP per capita, PPP (current international $)",
                     "SP.DYN.LE00.IN": "Life expectancy at birth, total (years)"}, inplace=True)

df2.head(3)

Unnamed: 0,Year,Country Name,Region,"Barro-Lee: Average years of total schooling, age 60-64, total","GDP per capita, PPP (current international $)","Life expectancy at birth, total (years)"
0,1990,Aruba,Latin America & Caribbean,,21942.26749,73.468
1,1990,Afghanistan,South Asia,0.37,,50.331
2,1990,Angola,Sub-Saharan Africa,,3263.33435,45.306


### Average Years of Schooling vs. Life Expectancy

In [14]:
fig = px.scatter(
    df2.query('Year == 2010'), 
    x="Barro-Lee: Average years of total schooling, age 60-64, total",
    y="Life expectancy at birth, total (years)", 
    title="Average Years of Schooling vs. Life Expectancy, All Nations, 2010", 
    color="Region", 
    template="plotly_dark"
    )

fig.show()

This scatter plot comparing average years of total schooling with life expectancy for all nations (not just the G7) in 2010 seems to suggest that people with more education live longer, though the correlation does not seem very strong.

### Regression - Average Years of Schooling vs. Life Expectancy

In [15]:
fig = px.scatter(
    df2.query('Year == 2010'), 
    x="Barro-Lee: Average years of total schooling, age 60-64, total",
    y="Life expectancy at birth, total (years)", 
    title="Average Years of Schooling vs. Life Expectancy, All Nations, 2010", 
    template="plotly_dark",
    trendline="ols"
    )

fig.show()

This regression shows the correlation between average years of total schooling and life expectancy.  A very rough estimate is that life expectancy increases about 5 years for every additional 3 years of schooling, though there is a great deal of variation.  

### Average Years of Schooling vs. GDP Per Capita

In [16]:
fig = px.scatter(
    df2.query('Year == 2010'), 
    x="Barro-Lee: Average years of total schooling, age 60-64, total",
    y="GDP per capita, PPP (current international $)", 
    title="Average Years of Schooling vs. GDP Per Capita, All Nations, 2010", 
    color="Region", 
    template="plotly_dark"
    )

fig.show()

Again looking at all nations, this scatter plot shows a relationship between schooling and GDP per capita.  More educated nations tend to be wealthier.

### Regression - Average Years of Schooling vs. GDP Per Capita

In [17]:
fig = px.scatter(
    df2.query('Year == 2010'), 
    x="Barro-Lee: Average years of total schooling, age 60-64, total",
    y="GDP per capita, PPP (current international $)", 
    title="Average Years of Schooling vs. GDP Per Capita, All Nations, 2010", 
    template="plotly_dark",
    trendline="ols"
    )

fig.show()

It appears that 1 year of schooling is associated with $3,000 in GPD.  There is much less variation in this scatter plot, indicating a stronger correlation.

## Conclusion
It seems clear from the scatter plots and regression plots for the 142 nations with data available that knowledge *is* correlated with both health and wealth.  The data on the G7 supports the correlation between knowledge and wealth, but not the correlation between knowledge and health.  It seems likely that this is due to the small sample size of the G7, along with the fact that these seven wealthy nations are near the top in both education and life expectancy.  

While we can be fairly confident that knowledge is associated with greater health and wealth, we can say nothing about causation without further study.  It may be tempting to conclude, for example, that more educated people are able to earn more money, so knowledge creates wealth.  But perhaps wealth provides people the freedom to continue their education for a longer period of time, in which case we would conclude that wealth creates knowledge.  It does seem possible that more educated people may better understand how to live healthier, suggesting that knowledge increases health.  However, it seems more likely that there is a lurking variable (perhaps wealth?) that is a significant contributor to both education and health.  Clearly these three indicators have a complex interrelationship, and are undoubtedly influenced by additional factors.

In this study we may also rely on the anecdotal evidence from our life experiences to support our conclusions.  People learn that smoking causes cancer, so they quit smoking.  Diabetics learn how to manage their blood sugar, so they avoid health complications.  Studies show that seatbelts save lives, so seatbelt laws are passed and traffic fatalities decrease.  In these and many other cases, increased knowledge causes increased health.  Similarly, higher levels of education are well-known to open doors to higher-paying jobs.  Notwithstanding extreme outliers like Elon Musk and other billionaires, workers with PhDs or MDs earn more than those with Master's degrees, who earn more than those with Bachelor's degrees, who earn more than those with high school diplomas.  So we can see how increased knowledge causes increased wealth.  Again, the exact causal relationships between knowledge, health, and wealth are undoubtedly complex, but experience provides strong support to the conclusions of our study.

Based on the results of our investigation, we may reasonably conclude that a life-long pursuit of knowledge may contribute to a higher quality of life!

In [18]:
fig = px.choropleth(
    df_2010,
    locations="Country Name",
    locationmode="country names",
    color="Barro-Lee: Average years of total schooling, age 60-64, total",
    color_continuous_scale="geyser",
    title="2010 Barro-Lee: Average years of total schooling, age 60-64, total", 
    template="plotly_dark"
    )

fig.show()

*\"Early to bed and early to rise, makes a man healthy, wealthy, and wise.\"* - Benjamin Franklin