Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add n65 field for number of people age 65 or older #243

Closed
MaxGhenis opened this issue Jul 7, 2018 · 18 comments
Closed

Add n65 field for number of people age 65 or older #243

MaxGhenis opened this issue Jul 7, 2018 · 18 comments

Comments

@MaxGhenis
Copy link
Contributor

Low-priority, but I've found myself needing to use the number of people age 65+ in tax units a few times. This is straightforward to calculate when using a dataframe, e.g. I've used this function:

def n65(df):
    return ((df.age_head >= 65).astype(int) + 
            (df.age_spouse >= 65).astype(int) + 
            df.elderly_dependent)

But this might not be obvious to all users, so having it in the data might be helpful. Alternatively, this n65 function could be made available in taxcalc.

@andersonfrailey
Copy link
Collaborator

@MaxGhenis I think adding an n65 variable would be pretty straightforward. We have a potential new contributor looking for an easy entry project so I'll probably pass this on to her, unless someone can think of a reason against adding n65.

cc @martinholmer @hdoupe

@martinholmer
Copy link
Contributor

@andersonfrailey said:

I think adding an n65 variable would be pretty straightforward. We have a potential new contributor looking for an easy entry project so I'll probably pass this on to her, unless someone can think of a reason against adding n65.

I see the benefit of starting out a new contributor with an easy project, but I'm not sure this is the most efficient way to accomplish the task. Adding a n65 variable means taxdata work on both CPS and PUF files, slightly larger data files, extra taxdata testing, and more Tax-Calculator input data documentation.

The other approach would be to simply provide @MaxGhenis' code as part of Tax-Calculator and add a test of that new code. That second approach would avoid all the costs of adding a "derived" variable to both CPS and PUF input data.

I'd be happy to implement the second approach, but perhaps it is easy enough to assign to the new contributor. What do you think, @andersonfrailey ?

@andersonfrailey
Copy link
Collaborator

Good points, @martinholmer. I think it's easy enough to assign to the new contributor. Do you think it would be best to add the function as a method of the calculator class or put it in utils.py?

@martinholmer
Copy link
Contributor

@MaxGhenis said in Tax-Calculator issue #243:

Low-priority, but I've found myself needing to use the number of people age 65+ in tax units a few times.
This is straightforward to calculate when using a dataframe. I've used this function:

def n65(df):
   return ((df.age_head >= 65).astype(int) + 
           (df.age_spouse >= 65).astype(int) + 
           df.elderly_dependent)

But this might not be obvious to all users, so having it in the data might be helpful. Alternatively, this n65 function could be made available in Tax-Calculator.

@MaxGhenis, Thanks for the suggestion/request, but it seems a little more complicated than I first thought.
Here is the documentation entry for the elderly_dependent input variable:

Input Variable Name: elderly_dependent
Description: 1 if filing unit has a dependent age 65+; otherwise 0
Datatype: int
Availability: taxdata_puf, taxdata_cps
IRS Form Location:
2013-2016: imputed from CPS data; not used in tax law

So, this variable indicates the presence of one (or more) elderly dependents. It is not a count of the number of elderly dependents. So, the formula @MaxGhenis uses (see above) is the best he can do with existing data, but it it not correct in all cases because of the data limitation. Perhaps this limitation is OK for the work @MaxGhenis is doing, but I'm reluctant to add this "approximate" code to Tax-Calculator.

@andersonfrailey, would it be possible to construct a n65 variable for both the PUF and CPS that was correct? How much work would that involve? @MaxGhenis, how are you using the results of your n65() function? How important is it to get the count of 65+ people in a filing unit exactly right?

@MaxGhenis
Copy link
Contributor Author

elderly_dependent appears to be misdocumented, at least with CPS data, where the field's maximum value is 3:

calc.dataframe(['elderly_dependent']).groupby('elderly_dependent').size()
# 0    454345
# 1      2075
# 2        44
# 3         1

@andersonfrailey
Copy link
Collaborator

@MaxGhenis, the bug was on the TaxData side. In PR #194 I updated the elderly_dependent variable to be a 0 or 1 in the CPS.

@martinholmer, it wouldn't be too much effort to construct an n65 variable. Before PR #194 elderly_dependent effectively functioned as n65 so I can base any changes off that.

@martinholmer
Copy link
Contributor

Here is what I get for the values of elderly_dependent in both the PUF and CPS data sets:

$ ./csv_vars.sh puf.csv | grep elderly
55 elderly_dependent

$ awk -F, 'NR>1{t++;n[$55]++}END{print t;for(i in n)print i,n[i]}' puf.csv
239002
0 237784
1 1218

$ gunzip -k taxcalc/cps.csv.gz 

$ ./csv_vars.sh taxcalc/cps.csv | grep elderly
61 elderly_dependent

$ awk -F, 'NR>1{t++;n[$61]++}END{print t;for(i in n)print i,n[i]}' taxcalc/cps.csv
456465
0 454345
1 2075
2 44
3 1

So, @MaxGhenis, you're right about the CPS data having more detail (that is, the exact count of elderly dependents in the filing unit). But it appears as if the PUF data do not have that kind of detail. So, adding @MaxGhenis' n65() function to Tax-Calculator would still be not be exactly correct for PUF data.

@MaxGhenis, why not just use your n65() function in your CPS work. What's the advantage of adding either n65 data or a function to Tax-Calculator? I know there is some advantage, but we have to weigh that advantage against the cost of adding the feature.

@andersonfrailey

@MaxGhenis
Copy link
Contributor Author

Would it be OK to implement n65 as a CPS-only feature in this case? It'd be awesome if this could get in before switching elderly_dependent to 0/1, since that will make my current function incorrect.

@martinholmer
Copy link
Contributor

@andersonfrailey said:

it wouldn't be too much effort to construct an n65 variable. Before taxdata PR #194 elderly_dependent effectively functioned as n65 so I can base any changes off that.

@andersonfrailey, Are you saying you can construct an accurate n65 variable for the PUF data?

If not, then I don't think we should create any n65 variables.

If so, then why not change the name of elderly_dependent to elderly_dependents making it a count variable in both the CPS and the PUF data?

@MaxGhenis

@martinholmer
Copy link
Contributor

@MaxGhenis said:

Would it be OK to implement n65 as a CPS-only feature in this case?

No, we are not going to get in the business of including things in Tax-Calculator that work only with one or another of the input data sets.

@andersonfrailey
Copy link
Collaborator

@martinholmer I can create an n65 variable in the PUF that would be created exactly like the other age variables in the PUF (nu18, n1820, etc.) and rename elderly_dependent to n65 in the CPS. Neither would take much effort.

@MaxGhenis
Copy link
Contributor Author

rename elderly_dependent to n65 in the CPS

n65 would need to be defined as per my initial comment, not renamed. For example, RECID 3 has age_head=80 yet elderly_dependent=0.

@martinholmer
Copy link
Contributor

@andersonfrailey and @MaxGhenis, Thanks for all the input on this issue.

Seems, like we should backtrack on #194 and make a new elderly_dependents (note the trailing s) variable in the CPS like elderly_dependent was before #194 and make the new PUF elderly_dependents variable be a count variable (just like the new CPS elderly_dependents variable). Then we can drop the old elderly_dependent indicator variable from both the CPS and the PUF file. That would be all the taxdata work involved.

Then in Tax-Calculator, I'll change the documentation to reflect the new elderly_dependents count variable and add an n65() function or method somewhere in the code (and a test of that new code).

Does that make sense as a work plan on this issue?

@MaxGhenis
Copy link
Contributor Author

Sounds good. Based on this line it seems like the dependent care above-the-line deduction may have been miscomputed using CPS data so far, is that right? Will the elderly_dependent -> elderly_dependents change also change that line to use something like elderly_dependents.clip(upper=1)?

@martinholmer
Copy link
Contributor

@MaxGhenis said in taxdata issue #243:

Sounds good. Based on this line it seems like the dependent care above-the-line deduction may have been miscomputed using CPS data so far, is that right? Will the elderly_dependent -> elderly_dependents change also change that line to use something like elderly_dependents.clip(upper=1)?

@codykallen, my guess is that you added the elderly-dependent-care deduction logic to Tax-Calculator because it was part of the early Trump tax reform proposals. In those proposals, was the maximum deduction amount (_ALD_Dependents_Elder_c) expressed as "per elderly dependent"? Or, was the maximum credit the same amount no matter how many elderly dependents were in the tax filing unit? What was the tax reform proposal?

Here is the relevant JSON in Trump2016.json:

        "_ALD_Dependents_thd":
            {"2017": [[250000, 500000, 250000, 500000, 500000]]},
        "_ALD_Dependents_Elder_c": 
            {"2017": [5000]},
        "_ALD_Dependents_Child_c":
            {"2017": [7156]},

The logic of the non-elderly-care deduction uses n13 so the $7156 is a "per young child" amount.
Is the $5000 elderly-care deduction amount also "per elderly dependent"?

@MattHJensen

@codykallen
Copy link

@martinholmer, although you have correctly identified which proposal prompted the addition of the _ALD_Dependents_* parameters (specifically, the Trump 2016 campaign proposal), I was not actually the one to model it. It was prepared by @MattHJensen for the first of the campaign tax reform JSONs, Trump2016.json.

As with many campaign reforms, there is substantial uncertainty regarding the precise details intended. I wrote a memo on the dependent care provisions back in 2016, and I'll summarize what we knew then.

Mr. Trump proposes to exclude the costs of child and elder care, up to certain limits, from income. . This deduction would apply to children under 13, with up to four children per family, and capped [per child] at the average cost of childcare in the state. These benefits would also apply for those using stay-at-home parents and grandparents for child care. Families could deduct necessary eldercare costs, up to $5,000 per year.

The deduction would be phased out for individuals earning more than $250,000 or couples earning more than $500,000. The most detailed information can be found in their child care plan fact sheet: https://assets.donaldjtrump.com/CHILD_CARE_FACT_SHEET.pdf

Generally, I think it is a reasonable assumption that the $5000 elderly care deduction amount is also per dependent, although I don't expect that there are many filers with multiple elderly dependents.

@martinholmer
Copy link
Contributor

@codykallen, Sorry about misidentifying you as the author of Trump2016.json.
Thanks so much for the very helpful discussion of what we know about that reform.

@martinholmer
Copy link
Contributor

The elderly_dependent dummy variable has been changed to the elderly_dependents count variable in both the CPS and PUF data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants