This R
project looks at the US economy from a statistical mechanical framework.
The project constructs a statistical ensemble using the income distribution of the United States.
The data set comprises income data from 1996-2019.
The model data comes from three different sources.
- Internal Revenue Service Income Tax Data
- Energy Information Agency Open Data API
- Federal Reserve Bank of St. Louis FRED API
To avoid having to obtain the API keys, the data has already been downloaded and resides in the data/raw/
directory.
To avoid having to process
the IRS data, the downloaded *.xls
files have already been parsed into suitable R
data structures in data/processed/irs/
.
The model is developed using R-Studio
and can be installed from that link.
Once R-Studio
is installed you will need to install the necessary CRAN
libraries.
To install the libraries, from the R console run:
install.packages("renv")
Once renv
is installed execute the following from the R console:
renv::restore()
The income distribution model regression uses HMC( Hamiltonian Monte Carlo) with NUTS (No U-Turn Sampling) in a Bayesian framework to estimate the model's hyper-parameters.
This is done using rstan
which is an implementation of `Stan.
There are two different distributions that can be selected:
The Gibbs model is taken directly from Banerjee and Yakovenko (2010) derivation. Their derivation uses the Gibbs distribution as the basis of the thermal portion of the income distribution and the Pareto distribution for the epithermal portion,
The Maxwell model is derived from the Maxwell distribution in the same manner that the Gibbs distribution was using the stationary Fokker-Plank equation to include the Pareto portion of the distribution.
The Gibbs distribution provides the better fit of the Adjusted Gross Income (AGI), while the Maxwell distribution provides the better fit of the Taxable Income (TI).
By default, income_model_regression.R
is configured to follow the same method and dataset of Banerjee and Yakoveno (2010).
The regression can be configured to use the other distribution or other datasets, by modifying the two variables, fitModel
and fitData
,
# Select Model and Load Data
fitModel <- "gibbs_model.stan"
# fitModel <- "maxwell_model.stan"
#
# Select the appropriate table either "11_agi", "11_ti", or "21"
table <- "11_agi"
Once configured, the R script can be run from the console using,
source("src/income_model_regression.R")
This will generate an output file in data/processed/stan/[gibbs_model.stan | maxwell_model.stan]
with a name based on the ensemble's data set depending on the configuration.
By default, the regression will run 4 simultaneous chains that are used to test for convergence.
This can be adjusted by setting nChains
to the desired number of chains.
To examine the Stan output, you will need to install shinystan
The remaining ensemble parameters are estimated using aggregate_data.R
.
This file is configured similarly to income_model_regression.R
,
# Configure the analysis
fitModel <- "gibbs_model.stan"
# fitModel <- "maxwell_model.stan"
# Select the appropriate table either "11_agi", "11_ti", or "21"
table <- "11_agi"
Once configured, the R script can be run from the console using,
source("src/aggregate_data.R")
This model does a three regressions to evaluate the parameters of the model,
Where the regression coefficients are: initial utility
The first regression is to estimate
We recognize
We compute
While money is important, it can be easily debased; we need to look at shifting to an energy basis to provide an absolute metric of value. Since value is related to the Hamiltonian, we will use the exergy input into the economy. We observe that the value of the dollar for a given year is fixed, so the distribution of value will be identical to the distribution of income, thus
We can define the economic temperature,
Using the familiar ideal gas equation where we replace volume with money and pressure with equation (6), we have
Similarly for energy we have
We then use equation (7) to determine the coefficient
Call:
lm(formula = M ~ 0 + Tstar, data = specEcon)
Residuals:
Min 1Q Median 3Q Max
-1020.91 -398.36 3.77 350.50 810.06
Coefficients:
Estimate Std. Error t value Pr(>|t|)
Tstar 1.137651 0.002032 559.9 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 504.2 on 23 degrees of freedom
Multiple R-squared: 0.9999, Adjusted R-squared: 0.9999
F-statistic: 3.135e+05 on 1 and 23 DF, p-value: < 2.2e-16
Using equation (8) and our knowledge of
Call:
lm(formula = E ~ 0 + T, data = specEcon)
Residuals:
Min 1Q Median 3Q Max
-7.4854 -1.7528 0.7924 2.2513 5.4146
Coefficients:
Estimate Std. Error t value Pr(>|t|)
T 1.1353 0.0023 493.6 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.514 on 23 degrees of freedom
Multiple R-squared: 0.9999, Adjusted R-squared: 0.9999
F-statistic: 2.437e+05 on 1 and 23 DF, p-value: < 2.2e-16
The parameters
print(params)
$R
[1] 1.137651
$c
[1] 0.9979318
$s_0
[1] 11.50636
$gamma
[1] 2.140008
The two remaining economic parameters are calculated using the following relationships:
-
$P$ : Equation (3) -
$\mu$ :$\mu = T\left[(c + 1) , R - \bar{s}\right]$
To see all of the ensemble's parameters run,
view(specEcon)
Looking at the relationship between
The regression used
Call:
lm(formula = P ~ M, data = data_poly)
Residuals:
Min 1Q Median 3Q Max
-0.054344 -0.017852 -0.003016 0.016943 0.053135
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.22581 0.12346 58.53 <2e-16 ***
M -1.33864 0.03068 -43.63 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.0284 on 22 degrees of freedom
Multiple R-squared: 0.9886, Adjusted R-squared: 0.9881
F-statistic: 1904 on 1 and 22 DF, p-value: < 2.2e-16
with the proportionality constant and the polytropic index,
print(params_poly$C)
print(params_poly$n)
# C
# 1374.455
# n
# 1.338639
With these relationships, we can fully describe the system and evaluate changes in policy that effect the regression parameters, the regression inputs, and/or the thermodynamic path (monetary and energy policy).
- Banerjee, A., & Yakovenko, V. M. (2010). Universal patterns of inequality. New Journal of Physics, 12, 1-25. doi:10.1088/1367-2630/12/7/075032
- Callen, H. B. (1985). Thermodynamics and an Introduction to Thermostatistics (2nd ed.). New York: John Wiley & Sons.
- Energy Information Agency (2022). Open Data. https://www.eia.gov/opendata/index.php.
- Federal Reserve Bank of St. Louis. (2022). FRED Economic Data API. https://fred.stlouisfed.org/docs/api/fred/.
- Internal Revenue Service (2022). SOI Tax Stats - Individual Statistical Tables by Size of Adjusted Gross Income. https://www.irs.gov/statistics/soi-tax-stats-individual-statistical-tables-by-size-of-adjusted-gross-income.
- Lawrence Livermore National Laboratory (2022). Estimated U.S. Energy Consumption in 2021. LLNL-MI-410527.
- Matsoukas, T. (2018). Generalized Statistical Thermodynamics: Thermodynamics of Probability Distributions and Stochastic Processes. Switzerland: Springer.