<h1> Analyzing Education's effect on Capital Gain </h1>

<h3> Summary: </h3>
Our group aims to analyze and determine the correlation between the number of years an individual spends in school compared to their capital gain. We perform a basic regression analysis to obtain our results

<h3> Introduction: </h3>
Education is often seen as a gateway to greater understanding of the world; it is also seen as a way to prosperity and riches. In turn, capital gains may be a common form of passive income; and those who flourish are considered wise. We aim to determine the extent of this and how education ultimately affects the amount of capital gains one makes.

For this project, we use the "Census Income" dataset from the UC Ivirne Machine Learning Repository.

<h3> Methods & Results: </h3>

--------------------

<h2> Code:

<h4> Imports

In [5]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

ModuleNotFoundError: No module named 'matplotlib'

<h4> Data </h4>

Data from https://archive-beta.ics.uci.edu/ml/datasets/census+income

In [3]:
# load data
data = pd.read_csv('data/adult.data', header=None)

# prepare column names
names = ['age',
        'workclass',
        'fnlwgt',
        'education',
        'education-num',
        'marital-status',
        'occupation',
        'relationship',
        'race',
        'sex',
        'capital-gain',
        'capital-loss',
        'hours-per-week',
        'native-country',
        'income']

# assign column names
data.columns = names

# convert income boolean to 0/1 to make analyses easier
data['income_bool'] = data['income'] == " <=50K"
data['income_bool'] = data['income_bool'].astype(int)

# view examples in dataset
data.head()

Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,income,income_bool
0,39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K,1
1,50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K,1
2,38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K,1
3,53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K,1
4,28,Private,338409,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K,1


In [4]:
# isolate variables to analyze
ed = data['education-num']
gains = data['capital-gain']

# run polynomial regression to find line coefficients
m, b = np.polyfit(ed, gains, 1)

# plot line calculated above
plt.scatter(ed,gains)
plt.plot(ed, m*ed+b)

NameError: name 'plt' is not defined

<h3> Discussion: </h3>

Looking at our graph above, we can that there is a positive correlation between education level and capital gain. There are many explanations on why there is a positive correlation. The most basic one would be that the higher level of education you obtain, the higher the income, hence you can invest more of the money which leads to higher capital gains. However, Catherine R. states in her publication of Capital Gains: Surviving in an Increasingly For-Profit World, that the positive correlation we see above may be because people with more money can afford more education. This would mean that education level is not causing a higher level of capital gains, but that a higher level of capital gains makes school more affordable hence increases education level. The positive correlation we found above does not conclude a causal relationship. <br>

Overall, the result found was expected as it is predicted that with further education you make more money that can be used on capital gains, or, with more capital gains you usually have more money and hence can afford more education. Both these reasons explain a positive correlation. <br>

Another point we can see from our graph is the importance of early education. We can see that at early education levels, there is little capital gains in comparison to middle and higher education levels. After a certain threshold, we see that the relation in education and capital gains has a much smaller correlation coefficient than compared to the early years. <br>

If we try to decrease bias in our analysis, we can try to find a causal effect in our data and see if higher education causes a growth in capital gains or if there is only a positive relation and the causation is due to something else (perhaps a third variable). However, demonstrating the positive correlation allows us to grow on our project to find a causation. We now know that people with higher education level tend to have more capital gains and that can be used to find why it happens. <br>

<h3> References: </h3>

Cook, Catherine R., and Marylouise Fennell. "Capital Gains: Surviving in an Increasingly For-Profit World." Presidency 4.1 (2001): 28-33.

Fagereng, Andreas, et al. Saving behavior across the wealth distribution: The importance of capital gains. No. w26588. National Bureau of Economic Research, 2019.

Heckman, James J. "The economics of inequality: The value of early childhood education." American Educator 35.1 (2011): 31.

Wind, Barend, and Lina Hedman. "The uneven distribution of capital gains in times of socio-spatial inequality: Evidence from Swedish housing pathways between 1995 and 2010." Urban Studies 55.12 (2018): 2721-2742.
