Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize spline tax functions to 2D #839

Merged
merged 25 commits into from
Apr 22, 2023
Merged

Conversation

prrathi
Copy link
Contributor

@prrathi prrathi commented Dec 27, 2022

This PR addresses issue #828 through pyGAM which enables monotonic splines for any number of dimensions. It creates a new method within monotone_spline function alongside new arguments to specify the method and necessary inputs. Also adds a new function avg_by_bin_2d for binning data with any number of x dimensions (maybe the function name should be changed) that's used in the 2d case for the tax data example created in test_txfunc.py. @jdebacker @rickecon

@prrathi
Copy link
Contributor Author

prrathi commented Dec 27, 2022

Error is "ModuleNotFoundError: No module named 'pygam'", do new modules/packages need to be added elsewhere?

@rickecon
Copy link
Member

rickecon commented Dec 28, 2022

@prrathi. This looks really good. For the tests to run on the GitHub Action, you need to add the pygam package to the environment.yml and to setup.py. Here is my suggested changes.

Add the following three lines at the end of environment.yml:

- pip
- pip:
  - pygam

Add the following two lines after "requests", in the install_requires={ section of setup.py:

    "pip",
    "pygam",

After you make these changes, you'll want to update your ogcore-dev environment by doing the following steps:

  1. Delete your current ogcore-dev environment by deactivating it conda deactivate then deleting it conda remove --name ogcore-dev --all.
  2. Create an updated version of the ogcore-dev conda environment by navigating to the main OG-Core directory of your repository and creating the conda environment conda env create -f environment.yml. Then add the ogcore package to the environment by activating the environment conda activate ogcore-dev and typing pip install -e ..

Make sure your test_txfunc.py tests pass locally on your machine in this environment. Once you commit the changes above to your PR, the new test should pass on the GitHub Action CI as well.

@prrathi
Copy link
Contributor Author

prrathi commented Dec 28, 2022

@rickecon thanks for the help, think previous issues should be fixed but its giving an error with test_SS_fsolve

@rickecon
Copy link
Member

@prrathi. You need to do two things to this PR.

  1. Use the black package to update your files format so they will pass the "Check Black formatting" CI test. Make sure you have updated your ogcore-dev conda package to include the changes you made in this PR. Then make sure you are in the master branch of your fork. In your terminal in the OG-Core repository on your machine, type black . -l 79. This will update the files that need to be formatted. You will be able to see which files these are by typing git status. Then add these files git add -A and commit them git commit -m "Black formatted files" and push them to your remote branch git push origin master. This will update your PR and the files will be formatted.
  2. PR Update build_and_test.yml GitHub Action #836 should fix the test_SS_fsolve test that is failing in this PR in test_SS.py. Once PR Update build_and_test.yml GitHub Action #836 is merged, you should update your branch by doing the following steps.
    • Make sure you have all your changes that you want in your local OG-Core master branch added and committed and pushed.
    • Go to your master branch and git fetch upstream the changes to the PSLmodels/OG-Core repo's master branch.
    • Merge those changes into your master branch by typing git merge upstream/master.
    • You will likely have merge conflicts that you need to resolve. If so, you'll see which files have conflicts by typing git status and/or by looking at the output after your git merge command from the previous step. Go to those files, choose which changes to keep, add them, commit them, and push them.

After following these steps (1) and (2), I think it is likely that all the CI tests will pass.

@codecov-commenter
Copy link

codecov-commenter commented Jan 3, 2023

Codecov Report

Merging #839 (fc2e081) into master (fdff510) will decrease coverage by 2.04%.
The diff coverage is 3.57%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #839      +/-   ##
==========================================
- Coverage   81.96%   79.93%   -2.04%     
==========================================
  Files          18       18              
  Lines        4037     4146     +109     
==========================================
+ Hits         3309     3314       +5     
- Misses        728      832     +104     
Flag Coverage Δ
unittests 79.93% <3.57%> (-2.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
ogcore/txfunc.py 29.52% <2.99%> (-4.60%) ⬇️
ogcore/__init__.py 100.00% <100.00%> (ø)

@rickecon
Copy link
Member

rickecon commented Jan 3, 2023

@prrathi. Perfect. All tests are passing. Now I just need to review this. I will try to get this done by the end of this week.

@prrathi
Copy link
Contributor Author

prrathi commented Jan 3, 2023

@rickecon sounds good, one thing that is somewhat missing is test cases for the actual interpolated against expected interpolated values analogous to your test case for the other methodology. I wasn't sure how to obtain that expected value so didn't write any cases for that

@rickecon rickecon linked an issue Jan 12, 2023 that may be closed by this pull request
@prrathi
Copy link
Contributor Author

prrathi commented Jan 14, 2023

@rickecon @jdebacker following up if there were any updates on reviewing this, I also should be able to make this Monday's meeting if there is one so can discuss more there

@prrathi
Copy link
Contributor Author

prrathi commented Jan 21, 2023

@rickecon I think the only reference here is https://pygam.readthedocs.io/en/latest/. I had experimented with adding a tensor term in addition to the current spline terms for each dimension but with this specific dataset hadn't seen improvement. There are also some other model classes I didn't explore

@rickecon
Copy link
Member

@prrathi. Can you merge the most recent updates to OG-Core into your branch? That will be a better foundation for me to work off of.

@prrathi
Copy link
Contributor Author

prrathi commented Jan 21, 2023

@rickecon merged

@rickecon rickecon mentioned this pull request Jan 21, 2023
# X, y, weights, lam=100, incl_uncstr=True, show_plot=True, method='pygam'
# )

with open("test_io_data/micro_data_dict_for_tests.pkl", "rb") as f:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use os.path.join(CUR_PATH, "test_io_data", "micro_data_dict_for_tests.pkl") for the files name so we can open it from other directories besides tests (e.g., this test fails to open the pickle file when I run pytest from OG-Core/

Also, should we make use of the utils.safe_read_pickle as we do elsewhere?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prrathi I see you removed your local path. That's helpful. But also note the suggested changes to specify paths from the CUR_PATH.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jdebacker @rickecon added the change. Also from testing locally I think this needs numpy version < 1.24, because the pygam package has code that doesn't work with numpy >= 1.24, specifically they call numpy.int which seems to be deprecated.

@jdebacker
Copy link
Member

@prrathi A couple other things on this PR:

  1. Can you add to the txfunc.get_tax_rates() function a case for your tax functions? e.g.,:
elif tax_func_type == "2Dmono":
  1. Add the name of your case to the list of "tax_func_type" accepted values in default_parameters.json

cc @rickecon

@prrathi
Copy link
Contributor Author

prrathi commented Mar 4, 2023

@jdebacker @rickecon did those, also added the option to txfunc.txfunc_est(). Following up on above, not sure if requiring numpy version < 1.24 will be needed in the environment.yml. Also for the given test case the 1d version of this approach seems to have better loss than Eilers, so think it could be used effectively in (1d) mono setting as well

@jdebacker
Copy link
Member

@prrathi

See my PR to your branch with a few changes necessary to make things run.

One thing I wasn't clear on in your changes: can one estimate mono2D with age_specific=True and age_specific=False?

ogcore/txfunc.py Outdated
if not np.isscalar(bins):
err_msg = "monotone_spline2 ERROR: bins value is not type scalar"
"""
New args:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't this just "Args" as in other docstrings?

Can you can probably remove references to "new" and "old" and just refer to the method as "eilers" and "pygam"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

ogcore/txfunc.py Outdated
elif tax_func_type == "mono2D":
mono_interp, _, wsse_cstr, _, _ = monotone_spline(
df[["total_labinc", "total_capinc"]].values,
df["etr"].values,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case should use txrates rather than df["etr"] since lines above adjust txrates for the specific rate chosen (i.e., ETR, MTRx, MTRy).

Also, for consistency with other cases, will want to use X and Y rather than columns straight from df here - same for the weights.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think your PR to my fork fixed this

@prrathi
Copy link
Contributor Author

prrathi commented Mar 16, 2023

@rickecon @jdebacker Following up on the last meeting and the comments above I was looking more into the code integrating mono and mono2D tax function options. If the goal is to be able to run either of these in the code, I found what I think are a few issues:

  • in the main TPI loop in run_TPI function of TPI.py the ETR, MTRx, and MTRy are all tiled numpy arrays, regardless of p.tax_func_type, and become the etr_params, mtr_params arguments for ETR_income, MTR_income functions in tax.py. These in turn become the params parameter in calls to the get_tax_rates of txfunc.py where the logic for different tax_func_type values is now included. But for the mono and mono2D (being added) options, this params is an array instead of a list whose first entry is function that we want, so the current code wouldn't be compatible and would need changes in TPI.py.
  • in tax_func_estimate, beginning here we have logic based on age_specific and tax_func_type. Currently for the mono case , if S is the right value etrparam_list_adj, mtrxparam_list_adj, mtryparam_list_adj are used as replacement values. But these variables are only declared in the not mono case, so I don't think the code would run if tax_func_type = mono (and by extension mono2D once added). We would need a method to deal with outlier non-parametric tax functions.

I think I should be able to add in the necessary logic for the first issue and maybe some idea like an interpolation of functions can work for the second issue, but wanted to make sure these are actual issues that would need fixing.

@jdebacker
Copy link
Member

jdebacker commented Mar 19, 2023

@prrathi Thank you for looking through the code to find these issues.

Re the TPI.py file, you are correct that those uses of arrays to store the tax parameters in the TPI_params_to_use arrays won't work with the mono and mono2D functions that must be stored in lists (since Numpy arrays can't be arrays of functions).

Re the age_specific case, I think the code will execute with mono, although in that place outliers are not replaced (see here). It maybe that all that needed is to set this to:

if age_specific and tax_func_type not in ["mono", "mono2D"]:

cc @rickecon

@prrathi
Copy link
Contributor Author

prrathi commented Mar 20, 2023

One thing I wasn't clear on in your changes: can one estimate mono2D with age_specific=True and age_specific=False?

Yes, it only relies on the labor income, capital income, and weight columns of the data so would work with both options (regardless of filters on the age column).

Also realized the TPI_params_to_use arrays issue goes back to the paramtools discussion you were having last time. Maybe one solution is to compute the mono function at the end of update_specifications under a if self.tax_func_type in ['mono', 'mono2D'] condition?

@rickecon @jdebacker

@jdebacker
Copy link
Member

@prrathi Now that PR #861 is merged into the master branch, can you sync that here? I think those changes might suggest some changes to the mono2D base in txfunc.get_tax_rates().

Once you sync and make those changes, ping me and I'll run this branch in OG-USA to make sure we can not just estimate the mono2D functions, but pass them through OG model.

We're getting close! Took a few more changes I didn't anticipate when you set out to do this!

@prrathi
Copy link
Contributor Author

prrathi commented Apr 9, 2023

@jdebacker sorry for the delay on this, would the best way to do this be create a branch with my current updates, sync my master with this master, and then work through the two?

@jdebacker
Copy link
Member

@jdebacker sorry for the delay on this, would the best way to do this be create a branch with my current updates, sync my master with this master, and then work through the two?

@prrathi I think you should git fetch upstream and merge upstream/master in to this branch (which I guess is your master branch). Then sort out any conflicts and then make sure the tests pass as expected.

Let me know if you have questions about any merge conflicts and I can advise.

@prrathi
Copy link
Contributor Author

prrathi commented Apr 16, 2023

@rickecon @jdebacker the above changes should be set, just adapting the mono changes to mono2D

@jdebacker
Copy link
Member

@prrathi Thanks for the updates. I just added two comments. With those changes I was able to solve the SS of the OG-USA models using the mono2D functions.

@prrathi
Copy link
Contributor Author

prrathi commented Apr 21, 2023

@jdebacker do you mean a review for the code? I don't see anything from my end

ogcore/txfunc.py Outdated
]
for t in range(income.shape[0])
]
txrates = np.array(txrates)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this line:

txrates = np.squeeze(np.array(txrates))

Else the dimension is (S,1) not (S), which causes issues when using the functions in the model.

ogcore/txfunc.py Outdated
splines=[100, 100],
)
wsse = wsse_cstr
params = mono_interp
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make this line:

params = [mono_interp]

This will make sure the mono function is in a list and thus be consistent with how the parametric tax functions have their parameters in lists.

@jdebacker
Copy link
Member

@prrathi Sorry about that - I forgot to click to submit my review. Please see comments now.

@rickecon
Copy link
Member

@prrathi. I just submitted a PR to your branch that makes the changes that @jdebacker requested in txfunc.py and updates the OG-Core version number in setup.py and ogcore/__init__.py. Once you merge those changes into your branch, everything in this PR to the main OG-Core branch should be ready to merge.

Update version number and jdebacker review comments
@prrathi
Copy link
Contributor Author

prrathi commented Apr 22, 2023

@rickecon @jdebacker sounds good, just merged

@rickecon
Copy link
Member

@prrathi. Thank you for this excellent PR. And thanks @jdebacker for all the maintenance and testing of this. I think this is ready to merge.

@rickecon rickecon merged commit 5a770b5 into PSLmodels:master Apr 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Generalize spline tax functions to 2D
4 participants