Add Phase 1 data examination results #17

martinholmer · 2024-02-27T13:40:09Z

This PR adds the examination/results.md document and provides a link to it in the high-level README.md document.

donboyd5

results.md:

Would it be possible to add the output-file variable names for the two current-law runs:
- CY2023 Payroll Tax Liability ($ billion) (federal employee plus employer share) -- this looks like payroll tax
- CY2023 Individual Income Tax Liability ($ billion) (federal individual income tax) -- it looks to me like this is iitax
Those are some big differences vs. CBO in current law liability (especially IIT), for both taxdata and pe! We will need to dig into that.
As a result of this, I have added the following variables to the ad hoc analysis: c05800, taxbc, othertaxes, iitax, payrolltax. (See this, at bottom of table: https://boyd-psl-adhoc.netlify.app/analysis.html#comparison-of-weighted-sums-for-selected-variables)
You can see that my ad hoc run shows, at 2023 levels, for baseline 2023 law (I think), the following.
For iitax, I get $2,154.3 for taxdata, which about matches your $2,154.4 for taxdata. However, you get $2,012.9 for the pe phase 1 dataset but I get $1,540.4.
For payrolltax, I match your taxdata number and am somewhat closer for pe than I am with iitax: you have 1696.7 but I have 1630.1 --far too different to be rounding differences.
This suggests for PE that we're doing something different or using different data or comparing different results. I am using the Feb 20 version of the PE file (see top line here: https://boyd-psl-adhoc.netlify.app/prelims.html).
Maybe we can discuss when we talk tomorrow.

donboyd5 · 2024-02-27T14:51:50Z

@martinholmer @nikhilwoodruff @MaxGhenis
Perhaps this is the cause of the difference for PE?

I stack 3 files: (1) pe, (2) taxdata as grown by you to 2023 and then run through tax-calculator, i believe, with all variables, and (3) taxdata as... with only the variables that are in pe
this means that the variables in pe and in td same-variables-as-pe file that are not in the grown-td all-variables file are missing
I set missing values to zero. I did this because I think/thought tax-calculator could not handle missing values
and then run through tax-calculator

I am guessing that as a result I have some important input variables in my stacked file that I set to zero, and they are affecting tax-calculator results, giving results that are not what I intended.

It seems like the fix is for me to do tax calculations in two steps:

create a stacked file of pe and td same-variables-as-pe and run this through tax calculator
run the td all-variables file through tax calculator

Then, get the two resulting files and stack them. It will have missing values for the variables that are in td but not in pe, for the pe records and for the td-same-variables-as-pe records. Leave them missing. Calculate comparisons on this file.

I think this should fix it. I had not thought through the implications of setting missing to zero in the stacked file. I'll try to do this now and will report back.

donboyd5 · 2024-02-27T15:34:22Z

That fixes it. The 2023 baseline-law results in the ad hoc analysis now match the results from @martinholmer. The updated ad hoc analysis is here. The revised R code is here.

The results generally are much closer now between pe and taxdata. @nikhilwoodruff and @MaxGhenis I'm sorry for any grief or head-scratching this caused you. Still plenty of questions to investigate, but not the massive differences my erroneous earlier ad hoc analysis gave.

martinholmer · 2024-02-27T18:25:09Z

@donboyd5 asked in the discussion of PR #17:

Would it be possible to add the output-file variable names for the two current-law runs:

CY2023 Payroll Tax Liability ($ billion) (federal employee plus employer share) -- this looks like payroll tax

CY2023 Individual Income Tax Liability ($ billion) (federal individual income tax) -- it looks to me like this is iitax

The "output-file variable names" are at the top of the td23.res-expect and pe23.res-expect files on the second line. I have added a mention of these two files in the data examination methods document.

Add Phase 1 data examination results

martinholmer added 3 commits February 26, 2024 15:55

Add examination/results.md document and associated files and links

0c865e5

Minor changes to the results.md document

f00e1b3

Minor edits to methods.md and results.md documents

22e535d

martinholmer requested a review from donboyd5 February 27, 2024 13:40

martinholmer changed the title ~~Add Phase 1 examination results~~ Add Phase 1 data examination results Feb 27, 2024

donboyd5 reviewed Feb 27, 2024

View reviewed changes

donboyd5 mentioned this pull request Feb 27, 2024

Discussion: How to analyze flattened Policy Engine file versus taxdata #12

Closed

Add file location of results from the td23 and pe23 runs to methods.md

2a170a0

Merge in recent changes on master branch

a74bd48

martinholmer merged commit bf28cef into PSLmodels:master Feb 27, 2024
2 checks passed

martinholmer deleted the examination-results branch February 27, 2024 18:44

donboyd5 pushed a commit to donboyd5/tax-microdata-benchmarking that referenced this pull request Nov 6, 2024

Merge pull request PSLmodels#17 from martinholmer/examination-results

d7c9e64

Add Phase 1 data examination results

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Phase 1 data examination results #17

Add Phase 1 data examination results #17

martinholmer commented Feb 27, 2024

donboyd5 left a comment •

edited

Loading

donboyd5 commented Feb 27, 2024

donboyd5 commented Feb 27, 2024

martinholmer commented Feb 27, 2024

Add Phase 1 data examination results #17

Add Phase 1 data examination results #17

Conversation

martinholmer commented Feb 27, 2024

donboyd5 left a comment • edited Loading

Choose a reason for hiding this comment

donboyd5 commented Feb 27, 2024

donboyd5 commented Feb 27, 2024

martinholmer commented Feb 27, 2024

donboyd5 left a comment •

edited

Loading