Updates to tax function estimation #1029

jdebacker · 2025-04-16T02:53:27Z

This PR updates txfunc.py, in particular for the estimation of the GS and HSV functions. Specifically:

Sampling weights are added to the estimation of tax functions of theHSV form (addressing Issue HSV tax function estimation does not account for sampling weights #1027).
The estimation of functions of the GS form are improved by (1) changing the initial guesses to more reasonable value and (2) setting stricter bounds so the parameters do not get out of reasonable ranges in the minimizer.

Updated GS functions (2025 current law, TMD data):

Updated HSV functions (2025 current law, TMD data):

jdebacker · 2025-04-21T17:16:40Z

@rickecon This PR is ready for review.

We'll address issues and testing with the DEP functions in a subsequent PR.

Note the change to environment.yml to pin the version of marshmallow. At version 4.0 and above, there are errors reading in parameters objects that were created with a prior version of marshallow and then pickled. We can update those pickle files in a subsequent PR (or look at addressing Issue #1025).

jdebacker · 2025-04-21T22:56:17Z

Wanted to remove the local test only marker from test_taxfunc.py.test_tax_func_estimate, but test was failing on GH and couldn't tell why. Passes on my machine, so @rickecon, make sure you check this one test in particular.

codecov-commenter · 2025-04-21T23:20:31Z

Codecov Report

Attention: Patch coverage is 85.00000% with 3 lines in your changes missing coverage. Please review.

Project coverage is 72.67%. Comparing base (b753779) to head (c58f3a0).
Report is 19 commits behind head on master.

Files with missing lines	Patch %	Lines
ogcore/TPI.py	0.00%	2 Missing ⚠️
ogcore/txfunc.py	90.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1029      +/-   ##
==========================================
+ Coverage   70.17%   72.67%   +2.49%     
==========================================
  Files          20       20              
  Lines        5073     5068       -5     
==========================================
+ Hits         3560     3683     +123     
+ Misses       1513     1385     -128

Flag	Coverage Δ
unittests	`72.67% <85.00%> (+2.49%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
ogcore/SS.py	`72.46% <100.00%> (+0.42%)`	⬆️
ogcore/__init__.py	`100.00% <100.00%> (ø)`
ogcore/txfunc.py	`46.41% <90.00%> (+16.59%)`	⬆️
ogcore/TPI.py	`37.19% <0.00%> (-0.13%)`	⬇️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

rickecon · 2025-04-22T17:39:23Z

@jdebacker. All test_txfunc.py tests passed on my machine with your branch and a new ogcore-dev conda environment installed from your branch. Now I am running the full set of tests.

tests/test_txfunc.py ..............................                      [100%]

================= 30 passed, 20 warnings in 445.13s (0:07:25) ==================

rickecon · 2025-04-22T19:37:10Z

@jdebacker. I think this PR is ready to merge. I just submitted a PR to your branch that updates the version number to 0.14.3 in setup.py, __init__.py, and CHANGELOG.md. I ran the tests in test_txfunc.py which all passed. I am currently running the full battery of OG-Core tests, but that will likely take more hours. So I think this PR is ready to merge as soon as you evaluate my PR to your branch.

Updated version in setup.py, __init__.py, and CHANGELOG.md

jdebacker · 2025-04-23T00:37:42Z

@rickecon - thanks for your additions. I've merged them and believe this PR is ready to go.

rickecon · 2025-04-23T14:56:51Z

@jdebacker. When I ran the full set of local and non local tests of OG-Core on my machine, I got the following clean slate.

(ogcore-dev) richardevans@Richards-MacBook-Pro-5 OG-Core % pytest
============================= test session starts ==============================
platform darwin -- Python 3.12.10, pytest-8.3.5, pluggy-1.5.0
rootdir: /Users/richardevans/Docs/Economics/OSE/OG/OG-Core
configfile: pytest.ini
testpaths: ./tests
plugins: anyio-4.9.0, cov-6.1.1, xdist-3.6.1
collected 582 items                                                            

tests/test_SS.py ...................................                     [  6%]
tests/test_TPI.py ...........................                            [ 10%]
tests/test_aggregates.py ........................................        [ 17%]
tests/test_basic.py ....                                                 [ 18%]
tests/test_demographics.py ................                              [ 20%]
tests/test_elliptical_u_est.py .......                                   [ 22%]
tests/test_execute.py .                                                  [ 22%]
tests/test_firm.py ..................................................... [ 31%]
................                                                         [ 34%]
tests/test_fiscal.py ......................                              [ 37%]
tests/test_household.py ................................................ [ 46%]
..                                                                       [ 46%]
tests/test_output_plots.py ............................................. [ 54%]
..                                                                       [ 54%]
tests/test_output_tables.py ..............                               [ 57%]
tests/test_parameter_plots.py ........................................   [ 63%]
tests/test_parameter_tables.py .......                                   [ 65%]
tests/test_parameters.py ..............                                  [ 67%]
tests/test_pensions.py ...............................                   [ 72%]
tests/test_run_example.py ..                                             [ 73%]
tests/test_run_ogcore.py .                                               [ 73%]
tests/test_tax.py .................................                      [ 79%]
tests/test_txfunc.py ..............................                      [ 84%]
tests/test_user_inputs.py .........                                      [ 85%]
tests/test_utils.py .................................................... [ 94%]
...............................                                          [100%]
============= 582 passed, 226397 warnings in 50594.23s (14:03:14) ==============

But the GH Actions CI here has a weird set of 11 failures, all the same error, and all in the test_SS.py script:

============================= test session starts ==============================
platform darwin -- Python 3.11.12, pytest-[8](https://github.com/PSLmodels/OG-Core/actions/runs/14607341442/job/40978909541?pr=1029#step:5:9).3.5, pluggy-1.5.0
rootdir: /Users/runner/work/OG-Core/OG-Core
configfile: pytest.ini
testpaths: ./tests
plugins: anyio-4.[9](https://github.com/PSLmodels/OG-Core/actions/runs/14607341442/job/40978909541?pr=1029#step:5:10).0, cov-6.1.1, xdist-3.6.1
collected 582 items / 62 deselected / 520 selected

tests/test_SS.py .......FFFFFFFFFFF..                                    [  3%]
tests/test_TPI.py .......                                                [  5%]
tests/test_aggregates.py ........................................        [ 12%]
tests/test_demographics.py ........                                      [ 14%]
tests/test_elliptical_u_est.py .......                                   [ 15%]
tests/test_firm.py ..................................................... [ 25%]
................                                                         [ 29%]
tests/test_fiscal.py ......................                              [ 33%]
tests/test_household.py ................................................ [ 42%]
..                                                                       [ 42%]
tests/test_output_plots.py ............................................. [ 51%]
..                                                                       [ 51%]
tests/test_output_tables.py ..............                               [ 54%]
tests/test_parameter_plots.py ........................................   [ 62%]
tests/test_parameter_tables.py .......                                   [ 63%]
tests/test_parameters.py ..............                                  [ 66%]
tests/test_pensions.py ...............................                   [ 72%]
tests/test_tax.py .................................                      [ 78%]
tests/test_txfunc.py ............................                        [ 84%]
tests/test_utils.py .................................................... [ 94%]
...............................                                          [[10](https://github.com/PSLmodels/OG-Core/actions/runs/14607341442/job/40978909541?pr=1029#step:5:11)0%]

=================================== FAILURES ===================================
___________________________ test_SS_solver[Baseline] ___________________________

baseline = True, param_updates = {}, filename = 'SS_solver_outputs_baseline.pkl'
dask_client = <Client: 'tcp://127.0.0.1:49234' processes=3 threads=6, memory=7.00 GiB>

    @pytest.mark.parametrize(
        "baseline,param_updates,filename",
        [
            (True, param_updates1, filename1),
            (True, param_updates2, filename2),
            (False, param_updates3, filename3),
            (True, param_updates4, filename4),
        ],
        ids=[
            "Baseline",
            "Baseline, budget balance",
            "Reform, baseline spending=True",
            "Baseline, small open",
        ],
    )
    def test_SS_solver(baseline, param_updates, filename, dask_client):
        # Test SS.SS_solver function.  Provide inputs to function and
        # ensure that output returned matches what it has been before.
        p = Specifications(baseline=baseline, num_workers=NUM_WORKERS)
        p.update_specifications(param_updates)
        p.frac_tax_payroll = np.zeros(p.frac_tax_payroll.shape)
        p.output_base = CUR_PATH
        b_guess = np.ones((p.S, p.J)) * 0.07
        n_guess = np.ones((p.S, p.J)) * 0.35 * p.ltilde
        if p.zeta_K[-1] == 1.0:
            rguess = p.world_int_rate[-1]
        else:
            rguess = 0.06483431412921253
        r_p_guess = rguess
        wguess = firm.get_w_from_r(rguess, p, "SS")
        TRguess = 0.05738932081035772
        factorguess = 1[39](https://github.com/PSLmodels/OG-Core/actions/runs/14607341442/job/40978909541?pr=1029#step:5:40)355.15473[40](https://github.com/PSLmodels/OG-Core/actions/runs/14607341442/job/40978909541?pr=1029#step:5:41)256
        BQguess = aggregates.get_BQ(rguess, b_guess, None, p, "SS", False)
        Yguess = 0.6376591201150815
        p_m_guess = np.ones(p.M)
    
        # for new Ig_baseline arg
        if p.baseline_spending:
            Ig_baseline = 0.0  # tests only have zero infrastructure
        else:
            Ig_baseline = None
    
>       test_dict = SS.SS_solver(
            b_guess,
            n_guess,
            r_p_guess,
            rguess,
            wguess,
            p_m_guess,
            Yguess,
            BQguess,
            TRguess,
            Ig_baseline,
            factorguess,
            p,
            dask_client,
            False,
        )

tests/test_SS.py:[41](https://github.com/PSLmodels/OG-Core/actions/runs/14607341442/job/40978909541?pr=1029#step:5:42)1: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
ogcore/SS.py:652: in SS_solver
    ) = inner_loop(outer_loop_vars, p, client)
ogcore/SS.py:275: in inner_loop
    futures = client.compute(lazy_values, num_workers=p.num_workers)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <Client: 'tcp://127.0.0.1:49234' processes=3 threads=6, memory=7.00 GiB>
collections = (Delayed('root-efb09f34-a45e-409d-ab20-f20a1906aa9d'), Delayed('root-31bfc869-faab-473c-8752-1d9a5133d84f'), Delayed('...618'), Delayed('root-b4a04dee-b122-40d0-8320-1eda5890a264'), Delayed('root-1ca0c2b0-c2ce-4b47-93e8-62efa6b9f4f8'), ...)
sync = False, optimize_graph = True, workers = None, allow_other_workers = False
resources = None, retries = 0, priority = 0, fifo_timeout = '60s', actors = None
traverse = True, kwargs = {'num_workers': 3}, singleton = False
variables = [Delayed('root-efb09f34-a45e-409d-ab20-f20a1906aa9d'), Delayed('root-31bfc869-faab-473c-8752-1d9a5133d84f'), Delayed('...618'), Delayed('root-b4a04dee-b122-40d0-8320-1eda5890a264'), Delayed('root-1ca0c2b0-c2ce-4b47-93e8-62efa6b9f4f8'), ...]
metadata = {'collections': [{'type': 'Delayed'}, {'type': 'Delayed'}, {'type': 'Delayed'}, {'type': 'Delayed'}, {'type': 'Delayed'}, {'type': 'Delayed'}, ...]}
futures_dict = {}

    def compute(
        self,
        collections,
        sync=False,
        optimize_graph=True,
        workers=None,
        allow_other_workers=False,
        resources=None,
        retries=0,
        priority=0,
        fifo_timeout="60s",
        actors=None,
        traverse=True,
        **kwargs,
    ):
        """Compute dask collections on cluster
    
        Parameters
        ----------
        collections : iterable of dask objects or single dask object
            Collections like dask.array or dataframe or dask.value objects
        sync : bool (optional)
            Returns Futures if False (default) or concrete values if True
        optimize_graph : bool
            Whether or not to optimize the underlying graphs
        workers : string or iterable of strings
            A set of worker hostnames on which computations may be performed.
            Leave empty to default to all workers (common case)
        allow_other_workers : bool (defaults to False)
            Used with `workers`. Indicates whether or not the computations
            may be performed on workers that are not in the `workers` set(s).
        retries : int (default to 0)
            Number of allowed automatic retries if computing a result fails
        priority : Number
            Optional prioritization of task.  Zero is default.
            Higher priorities take precedence
        fifo_timeout : timedelta str (defaults to '60s')
            Allowed amount of time between calls to consider the same priority
        traverse : bool (defaults to True)
            By default dask traverses builtin python collections looking for
            dask objects passed to ``compute``. For large collections this can
            be expensive. If none of the arguments contain any dask objects,
            set ``traverse=False`` to avoid doing this traversal.
        resources : dict (defaults to {})
            Defines the `resources` each instance of this mapped task requires
            on the worker; e.g. ``{'GPU': 2}``.
            See :doc:`worker resources <resources>` for details on defining
            resources.
        actors : bool or dict (default None)
            Whether these tasks should exist on the worker as stateful actors.
            Specified on a global (True/False) or per-task (``{'x': True,
            'y': False}``) basis. See :doc:`actors` for additional details.
        **kwargs
            Options to pass to the graph optimize calls
    
        Returns
        -------
        List of Futures if input is a sequence, or a single future otherwise
    
        Examples
        --------
        >>> from dask import delayed
        >>> from operator import add
        >>> x = delayed(add)(1, 2)
        >>> y = delayed(add)(x, x)
        >>> xx, yy = client.compute([x, y])  # doctest: +SKIP
        >>> xx  # doctest: +SKIP
        <Future: status: finished, key: add-8f6e709[44](https://github.com/PSLmodels/OG-Core/actions/runs/14607341442/job/40978909541?pr=1029#step:5:45)6674bad78ea8aeecfee188e>
        >>> xx.result()  # doctest: +SKIP
        3
        >>> yy.result()  # doctest: +SKIP
        6
    
        Also support single arguments
    
        >>> xx = client.compute(x)  # doctest: +SKIP
    
        See Also
        --------
        Client.get : Normal synchronous dask.get function
        """
        if isinstance(collections, (list, tuple, set, frozenset)):
            singleton = False
        else:
            collections = [collections]
            singleton = True
    
        if traverse:
            collections = tuple(
                (
                    dask.delayed(a)
                    if isinstance(a, (list, set, tuple, dict, Iterator))
                    else a
                )
                for a in collections
            )
    
        variables = [a for a in collections if dask.is_dask_collection(a)]
        metadata = SpanMetadata(
            collections=[get_collections_metadata(v) for v in variables]
        )
        futures_dict = {}
        if variables:
>           expr = collections_to_expr(variables, optimize_graph, **kwargs)
E           TypeError: collections_to_expr() got an unexpected keyword argument 'num_workers'

../../../miniconda3/envs/ogcore-dev/lib/python3.11/site-packages/distributed/client.py:36[59](https://github.com/PSLmodels/OG-Core/actions/runs/14607341442/job/40978909541?pr=1029#step:5:60): TypeError

...

=========================== short test summary info ============================
FAILED tests/test_SS.py::test_SS_solver[Baseline] - TypeError: collections_to_expr() got an unexpected keyword argument 'num_workers'
FAILED tests/test_SS.py::test_SS_solver[Baseline, budget balance] - TypeError: collections_to_expr() got an unexpected keyword argument 'num_workers'
FAILED tests/test_SS.py::test_SS_solver[Reform, baseline spending=True] - TypeError: collections_to_expr() got an unexpected keyword argument 'num_workers'
FAILED tests/test_SS.py::test_SS_solver[Baseline, small open] - TypeError: collections_to_expr() got an unexpected keyword argument 'num_workers'
FAILED tests/test_SS.py::test_inner_loop[Baseline, Small Open] - TypeError: collections_to_expr() got an unexpected keyword argument 'num_workers'
FAILED tests/test_SS.py::test_inner_loop[Baseline, Balanced Budget] - TypeError: collections_to_expr() got an unexpected keyword argument 'num_workers'
FAILED tests/test_SS.py::test_inner_loop[Baseline] - TypeError: collections_to_expr() got an unexpected keyword argument 'num_workers'
FAILED tests/test_SS.py::test_inner_loop[Reform] - TypeError: collections_to_expr() got an unexpected keyword argument 'num_workers'
FAILED tests/test_SS.py::test_inner_loop[Reform, baseline spending] - TypeError: collections_to_expr() got an unexpected keyword argument 'num_workers'
FAILED tests/test_SS.py::test_inner_loop[Reform, M>1] - TypeError: collections_to_expr() got an unexpected keyword argument 'num_workers'
FAILED tests/test_SS.py::test_inner_loop[Reform, I!=>M] - TypeError: collections_to_expr() got an unexpected keyword argument 'num_workers'
== 11 failed, 509 passed, 62 deselected, 83742 warnings in 366.72s (0:06:06) ===

jdebacker · 2025-04-24T12:46:25Z

A recent move from dask 2025.3.x to dask 2025.4.0 has broken OG-Core. I've removed the num_workers arguments from calls to client.compute since that has been removed in dask (I believe the number of workers is now inferred from the client configuration.

However, I'm now running into some errors when we try to scatter the parameters object, p with the p_scatter global variable.

cc @rickecon

rickecon · 2025-04-25T14:39:35Z

@jdebacker. Really great work figuring out those updates with dask and distributed, especially the scattered commands. I just updated the CHANGELOG.md date and time and added a line about the dask and distributed changes. Will merge as soon as tests pass.

jdebacker added 6 commits April 15, 2025 22:15

update initial values and bounds for GS functions

47999ae

add sampling weights to HSV estimation

6fc372c

test that recover params with estimation

9b4fdbc

remove comments

d3b40e9

test with HSV funcs

33f0ae3

test with HSV funcs

d9d3779

jdebacker marked this pull request as ready for review April 21, 2025 16:58

jdebacker added 2 commits April 21, 2025 12:58

format

eb07081

pin marshmallow

a93e6eb

jdebacker added 3 commits April 21, 2025 17:28

remove unused tests

e6c91b8

format

3388a04

make test local only

38afaba

rickecon approved these changes Apr 22, 2025

View reviewed changes

Updated version in setup.py, __init__.py, and CHANGELOG.md

7656435

Merge pull request #37 from rickecon/jdebacker-taxfuncs

54a0359

Updated version in setup.py, __init__.py, and CHANGELOG.md

remove num_workers from compute

c995cf2

jdebacker added 3 commits April 24, 2025 15:21

update scattering

d2e23f1

scatter in tpi

b05deb8

format

3a8dccd

rickecon approved these changes Apr 25, 2025

View reviewed changes

Update CHANGELOG.md

c58f3a0

rickecon merged commit e35cfd5 into PSLmodels:master Apr 25, 2025
8 checks passed

jdebacker mentioned this pull request Apr 30, 2025

HSV tax function estimation does not account for sampling weights #1027

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Updates to tax function estimation #1029

Updates to tax function estimation #1029

Uh oh!

jdebacker commented Apr 16, 2025

Uh oh!

jdebacker commented Apr 21, 2025

Uh oh!

jdebacker commented Apr 21, 2025

Uh oh!

codecov-commenter commented Apr 21, 2025 •

edited

Loading

Uh oh!

rickecon commented Apr 22, 2025

Uh oh!

rickecon commented Apr 22, 2025

Uh oh!

jdebacker commented Apr 23, 2025

Uh oh!

rickecon commented Apr 23, 2025

Uh oh!

jdebacker commented Apr 24, 2025 •

edited

Loading

Uh oh!

rickecon commented Apr 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Updates to tax function estimation #1029

Updates to tax function estimation #1029

Uh oh!

Conversation

jdebacker commented Apr 16, 2025

Uh oh!

jdebacker commented Apr 21, 2025

Uh oh!

jdebacker commented Apr 21, 2025

Uh oh!

codecov-commenter commented Apr 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

rickecon commented Apr 22, 2025

Uh oh!

rickecon commented Apr 22, 2025

Uh oh!

jdebacker commented Apr 23, 2025

Uh oh!

rickecon commented Apr 23, 2025

Uh oh!

jdebacker commented Apr 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rickecon commented Apr 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-commenter commented Apr 21, 2025 •

edited

Loading

jdebacker commented Apr 24, 2025 •

edited

Loading