Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing variables in CPS file #119

Open
codykallen opened this issue Oct 24, 2017 · 18 comments
Open

Missing variables in CPS file #119

codykallen opened this issue Oct 24, 2017 · 18 comments

Comments

@codykallen
Copy link

To allow people outside the OSPC to use B-Tax (beyond the limited capabilities of the webapp), they would need to be able to use the CPS instead. It is simple to modify B-Tax to call Records.cps_constructor instead of Records(), but the CPS is missing several critical variables for B-Tax.

B-Tax requires marginal tax rates on 10 different variables from Tax-Calculator, but the CPS file is missing 4 of them:
e02000: Total Sch E income or loss
e26270: Income or loss from a partnership or S corporation
p22250: Short-term capital gain or loss
p23250: Long-term capital gain or loss

The missing variables e02000 and e26270 would cause incorrect calculations for METRs, METTRs and cost of capital. The missing variables p22250 and p23250 cause errors that prevent B-Tax from running (unless test_run = True, in which case the preset hardcoded values are used.

Would it be reasonable for someone to add impute these variables for the CPS file?

@MattHJensen @andersonfrailey @Amy-Xu @martinholmer

@MattHJensen
Copy link
Contributor

MattHJensen commented Oct 26, 2017

I believe this is an important issue for making the CPS file available on PolicyBrain, and it appears important for other B-Tax users as well. '

My understanding was that we have imputations for these variables from John O'Hare, but it appears that may be incorrect (@andersonfrailey, could you chime in?)

If so, we may need to impute these variables onto the CPS file from the PUF. My view is that a simple imputations would be sufficient for now.

adding to cc @jdebacker @hdoupe.

@andersonfrailey
Copy link
Collaborator

andersonfrailey commented Oct 26, 2017

We won't be able to impute the variables @codykallen mentioned using just the CPS data. We impute a few other variables in the PUF off of the CPS so we could expand that process to include the missing variables we need for B-Tax.

@MattHJensen
Copy link
Contributor

@andersonfrailey said:

We won't be able to impute the variables @codykallen mentioned using just the CPS data. We impute a few other variables in the PUF off of the CPS so we could expand that process to include the missing variables we need for B-Tax.

We may need to do the opposite: impute to the CPS off of the PUF rather than off of the CPS to the PUF.

@andersonfrailey
Copy link
Collaborator

@MattHJensen said:

We may need to do the opposite: impute to the CPS off of the PUF rather than off of the CPS to the PUF.

You're right. My comment was poorly worded. We could use the same routine that we use for the of the deductions that are imputed onto the CPS: use the PUF to get beta coefficients that are used in the imputation on the CPS.

@MattHJensen
Copy link
Contributor

This paper looks like it could be helpful https://www.irs.gov/pub/irs-soi/06ohara.pdf

@MattHJensen
Copy link
Contributor

A Tax-Calculator and TaxData user asked:

am I right that the CPS file codes all passthrough income as active (e00900) not passive (e02000)? The sum of e00900 matches (more or less) IRS data for the sum of business income and s-corp/partnership income. This obviously has big implications in the House bill. So curious a) if I'm undestanding this correctly, and b) if you have any advice on how to handle if I am.

  • Is it true that rather than missing e02000, e02000 is lumped together with e00900?
  • Does anyone have a suggestion for a 'quick fix' that either TaxData contributors or the user could implement in the next 24-48 hours?

cc @andersonfrailey @codykallen @Amy-Xu @martinholmer @evtedeschi3

@codykallen
Copy link
Author

@MattHJensen, I was under the impression that the CPS file was simply missing e02000 and e26270. If e02000 and e26270 have been misclassified as e00900, then this does have serious implications in the House bill, as the bill would make 30% of active business income (e00900 + e26270) eligible for the 25% top rate but it would make 100% of passive business income (e02000 - e26270) eligible for that rate.

Conceivably, one could apply an estimate based on the passive share of total business income, reallocate this percentage of each filing unit's e00900 and e00900p to e02000, and recalculate. This would come closer to capturing the overall score and distributional effect, but it would miss the degree to which some individuals may pay more or less under the bill.

@andersonfrailey
Copy link
Collaborator

Looking through the code, e00900 in the CPS is the sum of the semp_val (Own business self-employment earnings, total value) variable in the CPS files. Unfortunately the CPS documentation doesn't specify if that means both active and passive income.

What we've done in the past for variables in the CPS that were the sum of multiple variables from the PUF is used the ratio of the two to split the CPS variable. If we're confident that the current e00900 variable in the CPS is actually e00900 + e02000, we could do the same pretty quickly.

@codykallen
Copy link
Author

@andersonfrailey, be careful when splitting e00900 into e00900 and e02000. If you increase e02000 but not e26270, then you classify all of that as passive business income, whereas technically e02000 also includes some active business income (from partnerships and S corporations), e26270.

@andersonfrailey
Copy link
Collaborator

Good point @codykallen. So would it be better if we split e00900 into e00900 and e02000 and then used the new e02000 variable to get at e26270?

@codykallen
Copy link
Author

@andersonfrailey, I would recommend splitting e00900 into e00900 (sole proprietorship income or loss), e26270 (partnership and S corporation income or loss), and e02000 - e26270 (passive business income or loss). IRS table 1.4 from from Individual Complete Report has a useful breakdown between: "Business or Profession" (technically sole proprietorship income, e00900); partnership and S corporation income (e26270); and rent, royalty, estate and trust income or loss (passive Sch E income, e02000 - e26270).

@MattHJensen
Copy link
Contributor

@andersonfrailey, I would recommend splitting e00900 into e00900 (sole proprietorship income or loss), e26270 (partnership and S corporation income or loss), and e02000 - e26270 (passive business income or loss). IRS table 1.4 from from Individual Complete Report has a useful breakdown between: "Business or Profession" (technically sole proprietorship income, e00900); partnership and S corporation income (e26270); and rent, royalty, estate and trust income or loss (passive Sch E income, e02000 - e26270).

+1

@andersonfrailey
Copy link
Collaborator

Thanks for the breakdown, @codykallen. I'm working on a TaxData PR now.

@andersonfrailey
Copy link
Collaborator

As seen in PR #127, splitting up e00900 isn't an effective method for getting e02000 and e26270. I'm instead going to try and impute them.

Here's a rough outline of a method I'm considering and would like some feedback on.

First step is to split the IRS PUF into bins based on income and filing type. Then determine the probability of a return having a non-zero value for the variable within each bin. Then either run a regression on those who have a non-zero value or just find the mean and standard deviation.

I’ll then split the CPS tax units into the same bins (using only the ones determined to be filers). Using the probabilities from the PUF, I’ll randomly assign tax units to have a non-zero value for the variable. Among those assigned a non-zero value, I’ll either use the regression parameters to predict a value, or randomly assign one based on the standard deviation and the mean for that bin, depending on which route I take in the first part.

Would love to hear what y'all think of this or if you have a different approach.

cc @codykallen @Amy-Xu

@MattHJensen
Copy link
Contributor

@andersonfrailey, could you explain how this deals with negative / positive values?

@andersonfrailey
Copy link
Collaborator

Sure. The issue with using the same methods used when we impute the various deductions is that method uses the log of all deductions. That doesn't work with e02000 and e26270 because they could be negative.

Thinking about this with a more clear head than I was yesterday, I suppose I could just tweak our current method by not using the log of e02000 and e26270. This would essentially consist of me running a logit model to determine who has a non-zero value for the variable, then an OLS for those determined by the logit to have a non-zero value. This is more or less what we do already for various deductions and expenses.

@martinholmer
Copy link
Contributor

@codykallen, @andersonfrailey, @MattHJensen, What's the status of taxdata issue #119?
There's been no discussion of that issue since November 17, 2017.

@andersonfrailey
Copy link
Collaborator

This is something I want to work on after this UBI project is finished. I'd like to leave this issue open so that it's easy to find and won't fall off my radar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants