-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing variables in CPS file #119
Comments
I believe this is an important issue for making the CPS file available on PolicyBrain, and it appears important for other B-Tax users as well. ' My understanding was that we have imputations for these variables from John O'Hare, but it appears that may be incorrect (@andersonfrailey, could you chime in?) If so, we may need to impute these variables onto the CPS file from the PUF. My view is that a simple imputations would be sufficient for now. adding to cc @jdebacker @hdoupe. |
We won't be able to impute the variables @codykallen mentioned using just the CPS data. We impute a few other variables in the PUF off of the CPS so we could expand that process to include the missing variables we need for B-Tax. |
@andersonfrailey said:
We may need to do the opposite: impute to the CPS off of the PUF rather than off of the CPS to the PUF. |
@MattHJensen said:
You're right. My comment was poorly worded. We could use the same routine that we use for the of the deductions that are imputed onto the CPS: use the PUF to get beta coefficients that are used in the imputation on the CPS. |
This paper looks like it could be helpful https://www.irs.gov/pub/irs-soi/06ohara.pdf |
A Tax-Calculator and TaxData user asked:
cc @andersonfrailey @codykallen @Amy-Xu @martinholmer @evtedeschi3 |
@MattHJensen, I was under the impression that the CPS file was simply missing Conceivably, one could apply an estimate based on the passive share of total business income, reallocate this percentage of each filing unit's |
Looking through the code, What we've done in the past for variables in the CPS that were the sum of multiple variables from the PUF is used the ratio of the two to split the CPS variable. If we're confident that the current |
@andersonfrailey, be careful when splitting |
Good point @codykallen. So would it be better if we split |
@andersonfrailey, I would recommend splitting |
+1 |
Thanks for the breakdown, @codykallen. I'm working on a TaxData PR now. |
As seen in PR #127, splitting up Here's a rough outline of a method I'm considering and would like some feedback on. First step is to split the IRS PUF into bins based on income and filing type. Then determine the probability of a return having a non-zero value for the variable within each bin. Then either run a regression on those who have a non-zero value or just find the mean and standard deviation. I’ll then split the CPS tax units into the same bins (using only the ones determined to be filers). Using the probabilities from the PUF, I’ll randomly assign tax units to have a non-zero value for the variable. Among those assigned a non-zero value, I’ll either use the regression parameters to predict a value, or randomly assign one based on the standard deviation and the mean for that bin, depending on which route I take in the first part. Would love to hear what y'all think of this or if you have a different approach. |
@andersonfrailey, could you explain how this deals with negative / positive values? |
Sure. The issue with using the same methods used when we impute the various deductions is that method uses the log of all deductions. That doesn't work with Thinking about this with a more clear head than I was yesterday, I suppose I could just tweak our current method by not using the log of |
@codykallen, @andersonfrailey, @MattHJensen, What's the status of taxdata issue #119? |
This is something I want to work on after this UBI project is finished. I'd like to leave this issue open so that it's easy to find and won't fall off my radar. |
To allow people outside the OSPC to use B-Tax (beyond the limited capabilities of the webapp), they would need to be able to use the CPS instead. It is simple to modify B-Tax to call
Records.cps_constructor
instead ofRecords()
, but the CPS is missing several critical variables for B-Tax.B-Tax requires marginal tax rates on 10 different variables from Tax-Calculator, but the CPS file is missing 4 of them:
e02000
: Total Sch E income or losse26270
: Income or loss from a partnership or S corporationp22250
: Short-term capital gain or lossp23250
: Long-term capital gain or lossThe missing variables
e02000
ande26270
would cause incorrect calculations for METRs, METTRs and cost of capital. The missing variablesp22250
andp23250
cause errors that prevent B-Tax from running (unlesstest_run = True
, in which case the preset hardcoded values are used.Would it be reasonable for someone to add impute these variables for the CPS file?
@MattHJensen @andersonfrailey @Amy-Xu @martinholmer
The text was updated successfully, but these errors were encountered: