# Exploring Economic Wellbeing and Income Inequality in the US Since 2005

In [142]:
import marimo as mo
import altair as alt
import polars as pl
import pandas as pd
from pathlib import Path
import geopandas as gpd
from vega_datasets import data

In [143]:
pctl = pd.read_csv('../Data/IDDA/pctl_of_inc_all_data.csv')

# calculating some inequality measures
pctl['IQR'] = pctl['pctl75'] - pctl['pctl25']
pctl['90/10'] = pctl['pctl90'] / pctl['pctl10']
pctl['50/10'] = pctl['pctl50'] / pctl['pctl10']
pctl['90/50'] = pctl['pctl90'] / pctl['pctl50']

#just indv income data
idv = pctl[pctl['level'] == 'pik']

# getting all US data from 2005 - 2019
idv_us = idv.query("geo_var == 'usst' and group_var == 'xall' and inc_var == 'TC' and samp == 'all_w2_pik'")

median income

In [None]:
alt.Chart(idv_us).mark_line(color = 'green').encode(
    alt.X('year:O', title= 'Year'),
    alt.Y('pctl50:Q', title= 'Income (USD)')
).properties(
    title = 'Individual Median Income Over Time',
    width = 1000,
    height = 400,
    background = '#ebf7de'
).configure_axis(
    gridColor ='#3f64ca3d'
)

Before getting into the intricacies of income inequality in the United States, I thought it would be useful to first look at how the average (50th percentile) American has been doing over this period. What we see is a general increase in median individual income in the US over the 14 years. This seems to be good, but what if we were to compare this group's trend to other percentiles over this 14 years? Does every income level see increases, and are these increases comparable?

In [None]:
alt.Chart(idv_us).transform_fold(['pctl10', 'pctl25', 'pctl50', 'pctl75', 'pctl90', 'pctl95'],
    as_=['Percentile', 'Income']
).mark_line().encode(
    alt.X('year:O', title= 'Year'),
    alt.Y('Income:Q', title= 'Income (USD)'),
    alt.Color('Percentile:O', title = 'Income Percentiles', scale = alt.Scale(scheme = 'bluegreen'), legend = alt.Legend(
        orient='right', 
        labelExpr="{'pctl10': '10th Percentile', 'pctl25': '25th Percentile', 'pctl50': 'Median', 'pctl75': '75th Percentile', 'pctl90': '90th Percentile', 'pctl95': '95th Percentile'}[datum.value]"
        ))
).properties(
    title = 'Individual Income Over Time for Various Percentiles',
    width = 1000,
    height = 400,
    background = '#ebf7de'
).configure_axis(
    gridColor ='#3f64ca3d'
)

Upon adding in more percentiles to observe, what we find is that, while median income is generally increasing, its changes are marginal when compared to the 75th, 90th, and 95th percentiles, who appear to realize much greater gains over the 14 year period. So, although it is favorable to see the average American doing better over time, there is evidence for concerns that these gains are insignificant in the grand scheme of the economy.

However, the marginal gains for the median American seem significant when comapred to the 10th and 25th percentiles, for these Americans appear to see almost no gain in their incomes over the 14 year period. 

Although the changes in income seem to be most significant for high earners, what do these changes look like in terms of percent changes? In other words, relative to their original income, what kinds of gains are Americans seeing?

In my mind, in an ideal world, we should see significant percentage changes for low income earners, since their incomes begin at a much lower level, so any increase to their income should be significant. Moreover, I would expect the middle-level income earners to see increases, but slightly less since they began with more money. Lastly, I would expect the top percentiles of earners to see the lowest percent changes in income, since they began with much greater levels of income. Let's do some calculations and take a look:

In [None]:
idv_us_year_filt = idv_us.query("year == 2005 or year == 2019")
cols_to_piv = ['pctl10', 'pctl25', 'pctl50', 'pctl75', 'pctl90', 'pctl95']
idv_us_pivot = idv_us_year_filt.pivot(index = 'year', columns= 'geo_abb', values= cols_to_piv)

idv_us_change = (idv_us_pivot.loc[2019] - idv_us_pivot.loc[2005]) / idv_us_pivot.loc[2005] * 100

us_change = idv_us_change.reset_index()

final_us_change = us_change.rename(columns={
    'level_0': 'Percentile', 0: 'Percentage_Change'
})

pct_chg_points = alt.Chart(final_us_change).mark_bar().encode(
    alt.X('Percentile:O', axis = alt.Axis(
        title = 'Percentile',
        labelAngle= 0,
        labelExpr="{'pctl10': '10th Percentile', 'pctl25': '25th Percentile', 'pctl50': 'Median', 'pctl75': '75th Percentile', 'pctl90': '90th Percentile', 'pctl95': '95th Percentile'}[datum.value]"
    )),
    alt.Y('Percentage_Change:Q', title= 'Percentage Change (2005 to 2009)'),
    alt.Color('Percentile:O', legend = None, scale = alt.Scale(scheme = 'bluegreen'))
).properties(
    title = 'Percent Changes in Individual Income from 2005 to 2019',
    width = 1100,
    height = 400,
    background = '#ebf7de'
).configure_axis(
    gridColor ='#3f64ca3d'
)

pct_chg_points

The differences in these percent changes are not as large as I had hoped. As mentioned, the lower income earners would ideally see a much larger percentage change in income as compared to higher income earners since they are beginning with a much smaller amount of money. However, we see only about a 10% differnce in the percentage change between the 10th percentile and the 95th percentile. This is meaningful when you consider how much more money the top earners had to gain to realize a 50% increase income, i.e. seeing increases in hundreds of thousands of dollars versus hundreds or thousands. 

Another important note is that the middle income earners see lower percentage increases in their incomes relative to both the lower and higher income earners. We have heard discussions over the past decade or so about a diminishing middle class, and these less significant percent changes in income levels point to this.

Bringing in more data, we can look at how these trends compare when we account for inflation (adjusting incomes to 2019 price levels for each year). Do the increases seem to outpace the ever-increasing price levels?

In [None]:
# hard coded color scheme -- not used here but later on.
color_scale = alt.Scale(domain=['pctl10', 'pctl10_adj', 'pctl25', 'pctl25_adj', 
                                 'pctl50', 'pctl50_adj', 'pctl75', 'pctl75_adj', 
                                 'pctl90', 'pctl90_adj', 'pctl95', 'pctl95_adj'],
                        range=['#cff7cf', '#b4f3b4', '#91ee91', '#6ee86e', 
                               '#4ae24a', '#27dd27', '#1ebd1e', '#189a18', 
                               '#137713', '#0d540d', '#083008', '#020d02'])

alt.Chart(idv_us).transform_fold(
    ['pctl10', 'pctl10_adj', 'pctl25', 'pctl25_adj', 'pctl50', 'pctl50_adj', 'pctl75', 'pctl75_adj', 'pctl90', 'pctl90_adj', 'pctl95', 'pctl95_adj'],
    as_=['Percentile', 'Income']
).mark_line().encode(
    alt.X('year:O', title='Year'),
    alt.Y('Income:Q', title='Income (USD)'),
    alt.Color('Percentile:O', title='Income Percentiles', scale=alt.Scale(scheme = 'bluegreen'), legend=alt.Legend(
        orient='right',
        labelExpr="{'pctl10': '10th Percentile', 'pctl10_adj': '10th Percentile (Adj)', "
                  "'pctl25': '25th Percentile', 'pctl25_adj': '25th Percentile (Adj)', "
                  "'pctl50': 'Median', 'pctl50_adj': 'Median (Adj)', "
                  "'pctl75': '75th Percentile', 'pctl75_adj': '75th Percentile (Adj)', "
                  "'pctl90': '90th Percentile', 'pctl90_adj': '90th Percentile (Adj)', "
                  "'pctl95': '95th Percentile', 'pctl95_adj': '95th Percentile (Adj)'}[datum.value]"
    ))
).properties(
    title='Individual Income Over Time for Various Percentiles (Adjusted & Unadjusted)',
    width=1000,
    height=400,
    background='#ebf7de'
).configure_axis(
    gridColor='#3f64ca3d'
)

Once we begin to adjust for inflation, we see that, for most percentiles, any aformentioned increase in income has dimished drastically; median income has not seen the increases we initally suspected, and this is even more so the case for lower percetniles. We do still see significant increases in income for the top percentiles of earners, which is evidence that only those most-fortunate have been able to keep up with inflation, while the majority of citizens are struggling to keep the pace. 

Moreover, I have hard coded a color scale above for this graph in an attempt to get more gradient (lighter lights and darker darks), but I do not love the way it turned out (it's very neon). Thus, I will keep it there, but  stick to the bluegreen that I have been using.

Again, what does this look like in terms of percentage changes?

In [None]:
idv_us_year_filt_adj = idv_us.query("year == 2005 or year == 2019")
cols_to_piv_adj = ['pctl10', 'pctl25', 'pctl50', 'pctl75', 'pctl90', 'pctl95', 'pctl10_adj', 'pctl25_adj', 'pctl50_adj', 'pctl75_adj', 'pctl90_adj', 'pctl95_adj']
idv_us_pivot_adj = idv_us_year_filt_adj.pivot(index = 'year', columns= 'geo_abb', values= cols_to_piv_adj)

idv_us_change_adj = (idv_us_pivot_adj.loc[2019] - idv_us_pivot_adj.loc[2005]) / idv_us_pivot_adj.loc[2005] * 100

us_change_adj = idv_us_change_adj.reset_index()

final_us_change_adj = us_change_adj.rename(columns={
    'level_0': 'Percentile', 0: 'Percentage_Change'
})

pct_chg_points_adj = alt.Chart(final_us_change_adj).mark_bar().encode(
    alt.X('Percentile:O', axis = alt.Axis(
        title = 'Percentile',
        labelAngle= 0,
        labelExpr="{'pctl10': '10th Percentile', 'pctl10_adj': '10th Percentile (Adj)', "
                  "'pctl25': '25th Percentile', 'pctl25_adj': '25th Percentile (Adj)', "
                  "'pctl50': 'Median', 'pctl50_adj': 'Median (Adj)', "
                  "'pctl75': '75th Percentile', 'pctl75_adj': '75th Percentile (Adj)', "
                  "'pctl90': '90th Percentile', 'pctl90_adj': '90th Percentile (Adj)', "
                  "'pctl95': '95th Percentile', 'pctl95_adj': '95th Percentile (Adj)'}[datum.value]"
    )),
    alt.Y('Percentage_Change:Q', title= 'Percentage Change (2005 to 2009)'),
    alt.Color('Percentile:O', legend = None, scale = alt.Scale(scheme = 'bluegreen'))
).properties(
    title = 'Percent Changes in Individual Income from 2005 to 2019',
    width = 1100,
    height = 400,
    background = '#ebf7de'
).configure_axis(
    gridColor ='#3f64ca3d'
)

pct_chg_points_adj

It is quite apparent that, as we adjust for inflation, these percent increases in income are much less significant than once thought. Moreover, not only are these changes even more marginal, but we still see the same issue as before of the top earners seeing a comparable percentage increase in income to the low-income earners. We also still see the middle-level earners seeing even lower levels of improvement in income. 

The discrepancies between the top and the bottom are apparant. Income is rapidly expanding at the top, while those at the bottom are not seeing very main gains at all in income

We have seen that, in the United States as a whole, incomes are rising much more rapidly for high income earners than they are for low income earners. But inequality can be investigated in other ways than between percentiles. How do median incomes differ state by state?

In [None]:
geo_us_states = gpd.read_file(data.us_10m.url, driver='TopoJSON', layer='states')
idv_state_all = idv.query("geo_var == 'state' and group_var == 'xall'")

In [None]:
idv_2005_state_all = idv_state_all.query("year == 2005")

alt.Chart(geo_us_states).mark_geoshape().transform_lookup(
    lookup='id',
    from_=alt.LookupData(data=idv_2005_state_all, key='geo_var_val', fields=['pctl50'])
).encode(
    alt.Color('pctl50:Q', title='Median Income', scale = alt.Scale(scheme= 'bluegreen'))
).project(
    type='albersUsa'
).properties(
    title = 'Median Individual Income by State in 2005',
    background = '#ebf7de'
)

In the beginning of this period (2005) we are observing, there is already clear inequality across states, with locations like NJ, MA, CT, and DC having relatively higher median incomes than places like Montana, Mississippi, and West Virginia. Has this discrepancy been alleviated over time, i.e. have the latter states been able to catch up to the formers? Or, have these differences stayed the same, or potentially worsened?

In [None]:
idv_2019_state_all = idv_state_all.query("year == 2019")

alt.Chart(geo_us_states).mark_geoshape().transform_lookup(
    lookup='id',
    from_=alt.LookupData(data=idv_2019_state_all, key='geo_var_val', fields=['pctl50'])
).encode(
    alt.Color('pctl50:Q', title='Median Income', scale = alt.Scale(scheme= 'bluegreen'))
).project(
    type='albersUsa'
).properties(
    title = 'Median Individual Income by State in 2019',
    background = '#ebf7de'
)

Not clear from this map at all actually, seems like the states have actually become more similar in level. I am not sure I want to include this map because it looks like inequality has gotten better, but it hasn't. Let's look at a box plot of median incomes over this 14 year period:

In [None]:
alt.Chart(idv_state_all).mark_boxplot().encode(
    alt.X('pctl50:Q', title = "Median Income"),
    alt.Y('year:O', title = 'Year'),
    alt.Color('year:O', legend = None, scale = alt.Scale(scheme = 'greens'))
).properties(
    title= "Boxplot of States' Median Incomes by Year",
    width=1000,
    height=400,
    background='#ebf7de'
).configure_axis(
    gridColor='#3f64ca3d'
)

We can see that, as the years progress, median incomes begin to skew more and more to the higher end, which is evidence that these discrepancies are worsening. Higher earning states are realizing increases in median incomes, whereas lower earning states have not seen significant increases. The trend is worsening to the point of producing outliers on our plot.

Moreover, I am using the "greens" color scheme here as a test to see if it appears better than the bluegreen color scheme.

While it is useful to look at these discrepancies in median income across states, it also important to look for any discrepancies among low-income earners across states:

In [None]:
alt.Chart(geo_us_states).mark_geoshape().transform_lookup(
    lookup='id',
    from_=alt.LookupData(data=idv_2005_state_all, key='geo_var_val', fields=['pctl10'])
).encode(
    alt.Color('pctl10:Q', title='10th Percentile Income', scale = alt.Scale(scheme= 'bluegreen'))
).project(
    type='albersUsa'
).properties(
    title = '10th Percentile Individual Income by State in 2005',
    background = '#ebf7de'
)

In the beginning, we definately see some discrepancies across states in places like Nevada and Massachusetts when compared to Michigan and Louisiana. We didn't see an improvement in this inequality for median income earners, but do we see one for lower-income earners? The depressed median incomes wouldn't be as problematic if the lower earners had caught up over the 14 years. Let's look at a box plot over the years:

In [None]:
alt.Chart(idv_state_all).mark_boxplot().encode(
    alt.X('pctl10:Q', title = "10th Percentile Income"),
    alt.Y('year:O', title = 'Year'),
    alt.Color('year:O', legend = None, scale = alt.Scale(scheme = 'greens'))
).properties(
    title= "Boxplot of States' 10th Percentile Incomes by Year",
    width=1000,
    height=400,
    background='#ebf7de'
).configure_axis(
    gridColor='#3f64ca3d'
)

It was likely too hopeful to believe that this inequality between states may have improved for low-income earners over the years. From the chart, we see that, just like it had for median income, the discrepancies between states has worsened for the 10th percentile as well, with states like Washington and Massachusetts pulling away from the rest.

What about the top 10% of earners per state? Do we see discrepancies between them, or are they doing about the same state by state? Let's take a look at how they were doing in 2005:

In [None]:
alt.Chart(geo_us_states).mark_geoshape().transform_lookup(
    lookup='id',
    from_=alt.LookupData(data=idv_2005_state_all, key='geo_var_val', fields=['pctl90'])
).encode(
    alt.Color('pctl90:Q', title='90th Percentile Income', scale = alt.Scale(scheme= 'bluegreen'))
).project(
    type='albersUsa'
).properties(
    title = '90th Percentile Individual Income by State in 2005',
    background = '#ebf7de'
)

We again see significant discrepancies between states as it pertains to the top 10% of earners. States like California and New Jersey appear to have much higher incomes in the top 10% as compared to places like South Dakota and Arkansas. Again, it is important to observe if these differences have improved, stayed the same, or worsened over the years:

In [None]:
alt.Chart(idv_state_all).mark_boxplot().encode(
    alt.X('pctl90:Q', title = "90th Percentile Income"),
    alt.Y('year:O', title = 'Year'),
    alt.Color('year:O', legend = None, scale = alt.Scale(scheme = 'greens'))
).properties(
    title= "Boxplot of States' 90th Percentile Incomes by Year",
    width=1000,
    height=400,
    background='#ebf7de'
).configure_axis(
    gridColor='#3f64ca3d'
)

A cross-state comparison shows evidence of income inequality as well. In any given year, there are disparities between states in terms of median income, 10th percentile income, and 90th percentile income. Moreover, this gap between states has expanded as time has gone on.

Are there disparities within states as well? Probably, but lets look at how severe they are. First, lets look at how these percentiles evolve within each state over the 14 year period:

The important question to ask is: in states where all percentiles' incomes are particularly high (NJ, DC, NY, MA, etc), are the people at the bottom still doing alright in the grand scheme of things? It can seem like the low earners in a place like California might still be doing alright due to the portrayal of the map, but let's take a look to see if this is the case:

In [None]:
alt.Chart(idv_state_all).transform_fold(
    ['pctl10', 'pctl25', 'pctl50', 'pctl75', 'pctl90', 'pctl95', 'pctl98'],
    as_=['Percentile', 'Value']
).mark_rect().encode(
    alt.X('year:O', axis=alt.Axis(labels=False, title="'05 - '19")),
    alt.Y('Percentile:O', title='Income Percentiles'),
    alt.Color('Value:Q', scale=alt.Scale(scheme='bluegreen'), title='Income Value')
).properties(
    title='Income Distribution by Percentile Over Years',
    width=40,
    height=80
).facet(
    'geo_abb:N',
    columns=17
).configure_axis(
    gridColor='#3f64ca3d')

# Cant get the background color to change above: error with backround color on faceted charts :(

Looking at how these percentiles evolve state by state, we can see that some states are growing far past others. Moreover, some states see much higher inequality from within, such as with states like CT, DC, and NJ, which is seen in the dark green corners at the bottom right of those states. On the other hand, states that are much more light overall (like KY, WV, etc), have much less internal inequality.

There are various measures of inequality:
    1. IQR (75th percentile - 25th percentile)
    2. 90/10 (90th Percentile / 10th Percentile)
    3. 90/50 (90th Percentile / 50th Percentile)
    4. 50/10 (50th Percentile / 10th Percentile)

These measures of inequality within each state in 2019:

In [None]:
alt.Chart(geo_us_states).mark_geoshape().transform_lookup(
    lookup='id',
    from_=alt.LookupData(data=idv_2019_state_all, key='geo_var_val', fields=['IQR'])
).encode(
    alt.Color('IQR:Q', title='Interquartile Range (IQR)', scale=alt.Scale(scheme='bluegreen'))
).project(
    type='albersUsa'
).properties(
    title='Interquartile Range of Individual Income by State in 2019',
    background='#ebf7de'
).configure_axis(
    gridColor='#3f64ca3d'
)

The above map seeks to observe the income sperad of the middle 50% of people per state. This spread is greatest in places like California, which we could see in teh above heat map. However, it might be more intuitive to observe the various ratios of income between percentiles.

First is the ratio of the 90th percentile income to the 10th percentile of income, or how much more the top 10% is earning as compared to the bottom 10%.

In [None]:
alt.Chart(geo_us_states).mark_geoshape().transform_lookup(
    lookup='id',
    from_=alt.LookupData(data=idv_2019_state_all, key='geo_var_val', fields=['90/10'])
).encode(
    alt.Color('90/10:Q', title='90/10 Ratio', scale=alt.Scale(scheme='bluegreen'))
).project(
    type='albersUsa'
).properties(
    title='90/10 Percentile Ratio of Individual Income Ratio by State in 2019',
    background='#ebf7de'
).configure_axis(
    gridColor='#3f64ca3d'
)

Above, we can see that there are numerous states where the top 10% of income-earners are earning incomes that are 30x higher than the bottom 10% of income earners. Even for the states with the smallest discrepancies within, we still see top-earners making around 15-20x what the low-earners are. Clearly, we have inequality both across as well as within states, with some states experiencing much more inequality within than others.

How does the median income earner compare to the 10th percentile?

In [None]:
alt.Chart(geo_us_states).mark_geoshape().transform_lookup(
    lookup='id',
    from_=alt.LookupData(data=idv_2019_state_all, key='geo_var_val', fields=['50/10'])
).encode(
    alt.Color('50/10:Q', title='50/10 Ratio', scale=alt.Scale(scheme='bluegreen'))
).project(
    type='albersUsa'
).properties(
    title='50/10 Percentile Ratio of Income Ratio by State in 2019',
    background='#ebf7de'
).configure_axis(
    gridColor='#3f64ca3d'
)

Again, when looking at the ratio of incomes between the median and 10th precentiles, we still see that several states experience significant inequality even when not looking at the highest of earners. For example, in a state like NJ, the median earner is making 10x what the bottom 10% earner is making, which is likely a difference of around $50K to $75K.

Lastly, how do the top-earners compare to the median earners?

In [None]:
alt.Chart(geo_us_states).mark_geoshape().transform_lookup(
    lookup='id',
    from_=alt.LookupData(data=idv_2019_state_all, key='geo_var_val', fields=['90/50'])
).encode(
    alt.Color('90/50:Q', title='90/50 Ratio', scale=alt.Scale(scheme='bluegreen'))
).project(
    type='albersUsa'
).properties(
    title='90/50 Percentile Ratio of Income Ratio by State in 2019',
    background='#ebf7de'
).configure_axis(
    gridColor='#3f64ca3d'
)

Again, we see that some states have larger discrepancies between their earners than others. Unsurprisingly, this difference appears largest in California, likely due to the tech industry. A ratio of around 3.5x can still be a matter of hundreds of thousands of dollars, which is a significant difference. 

Now that we have seen the various ways income is unequally distrubuted across the states, let's take a look at who is benefitting from these discrepancies? In other words, what are the demographics of these high-earners, and are these demographics consistent across states?

In [None]:
prop = pd.read_csv('../Data/IDDA/prop_share_all_data.csv')
prop

In [163]:
prop_all = prop.query("group_var == 'xrea' and percentile != 0.0 and inc_var == 'TC'")

In [None]:
prop_all_2019 = prop_all.query("year == 2019")
prop_all_2019

In [165]:
color_scale_eth = alt.Scale(domain=['Hispanic', 'NH_AIAN', 'NH_Asian', 'NH_Black', 
                                 'NH_NHOPI', 'NH_White'],
                        range=['#cff7cf', '#91ee91', 
                               '#4ae24a', '#1ebd1e', 
                               '#137713', '#083008'])

In [None]:
alt.Chart(prop_all_2019).mark_bar().encode(
    alt.X('sum(proportion):Q', stack='normalize'),
    alt.Y('geo_abb:N', title="State"),
    alt.Color('group_var_val:N', title="Ethnic Group", scale=color_scale_eth)
).properties(
    title="Proportion Xth Percentile Income Earners by State and Ethnic Group in 2019",
    width=1000,
    height=500
).facet(
    'percentile:O',
    columns=1
).configure_axis(
    gridColor='#3f64ca3d'
)

What we get from this is the ethnic makeup of the 90th, 95th, and 98th percentile by state. The goal of doing this visual like this was to be able to both compare states' makeup as well as how the percentiles affect these makeups, thus the faceting by percetnile. However, I do not think this does that great of a job comparing percentiles since the changes are marginal. Therefore, it may be best to focus on one percentile and look at its makeup. Regardless, it is clear that the top percentiles, save maybe a few states, are disproportionately made up of white people. This is certainly due to decades centuries of structural racism in the US, and I think this is evidence the US can do a better job of removing barriers for ethnic groups.

Again, I hard coded another scheme here to make the distinctions between groups easier to see. It is a variation of the "greens" scheme, but with starker differences. There are differences in brightness here, but I think it might add to the effect of that group being so disproportionately large?

Thinking about it again before final submission, I don't love the theme I have going here on this graph. The bluegreen theme is just not doing it for me on this graph either unfortunately.

## Where is the title for above? I am trying to get it to work, but it is not appearing...

In [167]:
prop_sex = prop.query("group_var == 'xsex' and percentile != 0.0 and inc_var == 'TC' and geo_abb != 'US'")

In [168]:
prop_sex_2019 = prop_sex.query("year == 2019 and percentile == 95.0")

In [None]:
alt.Chart(prop_sex_2019).mark_bar().encode(
    alt.X('sum(proportion):Q', stack='normalize'),
    alt.Y('geo_abb:N', title="State"),
    alt.Color('group_var_val:N', title="Ethnic Group", scale=alt.Scale(scheme='bluegreen'))
).properties(
    title="Proportion of 95th Percentile Income Earners by State and Sex in 2019",
    width=1000,
    height=500,
    background='#ebf7de'
).configure_axis(
    gridColor='#3f64ca3d'
)

We are doing a similar comparison here as we are above. Instead of looking at proportions of ethnic groups, we are looking at the makeup by sex. Again, what we see is an unequal distribution for every state when focusing on the 95th percentile. In other words, the male group here is disproporitionately benefiting from high incomes, especilly when you consider they makeup less than half the population. This is certainly due to years and years of structural sexism in the US, yet, similar to the above issue, I would have hope to have seen these discrepancies a little more alleviated by this point. 

In [170]:
prop_fb = prop.query("group_var == 'xfb' and percentile == 95.0 and inc_var == 'TC' and geo_abb != 'US'")

In [171]:
prop_fb_2019 = prop_fb.query("year == 2019")

In [None]:
alt.Chart(prop_fb_2019).mark_bar().encode(
    alt.X('sum(proportion):Q', stack='normalize'),
    alt.Y('geo_abb:N', title="State"),
    alt.Color('group_var_val:N', title="Ethnic Group", scale=alt.Scale(scheme='bluegreen'))
).properties(
    title="Proportion of 95th Percentile Income Earners by State and Place of Birth in 2019",
    width=1000,
    height=500,
    background='#ebf7de'
).configure_axis(
    gridColor='#3f64ca3d'
)

We are again doing a similar comparison here, focusing on just the 95th percentile of income earners in each state. Again, we say that non-foreign-born Americans make up a very large portion of this when compared to foreign-born individuals. I am unsure if I want to include this in the final product, I was just curious what the results would look like. However, since I cannot confirm if these number are disproportionate to the actual population, I do not know how relevant it is.

In [173]:
prop_age = prop.query("group_var == 'xaged' and percentile == 95.0 and inc_var == 'TC' and geo_abb != 'US'")

In [174]:
prop_age_2019 = prop_age.query("year == 2019")

In [None]:
alt.Chart(prop_age_2019).mark_bar().encode(
    x=alt.X('sum(proportion):Q', stack = 'normalize'),
    y=alt.Y('geo_abb:N', title="State"),
    color=alt.Color('group_var_val:N', title="Ethnic Group", scale=alt.Scale(scheme='bluegreen'))
).properties(
    title="Proportion of 95th Percentile Income Earners by State and Age in 2019",
    width=1000,
    height=500,
    background='#ebf7de'
).configure_axis(
    gridColor='#3f64ca3d'
)

This is the final visualization I have produced. Similar to the last one, I do not think this one is all that relevant for it makes sense that high income earners would be distributed this way. However, I am also using it as a guide for the final color scheme that I will decide on. I was hoping to go for someting resembling money since this is about income, and I think these tones do a good job of this.