Skip to content

Conversation

@jdebacker
Copy link
Member

This PR updates txfunc.py, in particular for the estimation of the GS and HSV functions. Specifically:

  • Sampling weights are added to the estimation of tax functions of theHSV form (addressing Issue HSV tax function estimation does not account for sampling weights #1027).
  • The estimation of functions of the GS form are improved by (1) changing the initial guesses to more reasonable value and (2) setting stricter bounds so the parameters do not get out of reasonable ranges in the minimizer.

Updated GS functions (2025 current law, TMD data):
gs_etr_new

Updated HSV functions (2025 current law, TMD data):
hsv_etr_new

@jdebacker jdebacker marked this pull request as ready for review April 21, 2025 16:58
@jdebacker
Copy link
Member Author

@rickecon This PR is ready for review.

We'll address issues and testing with the DEP functions in a subsequent PR.

Note the change to environment.yml to pin the version of marshmallow. At version 4.0 and above, there are errors reading in parameters objects that were created with a prior version of marshallow and then pickled. We can update those pickle files in a subsequent PR (or look at addressing Issue #1025).

@jdebacker
Copy link
Member Author

Wanted to remove the local test only marker from test_taxfunc.py.test_tax_func_estimate, but test was failing on GH and couldn't tell why. Passes on my machine, so @rickecon, make sure you check this one test in particular.

@codecov-commenter
Copy link

codecov-commenter commented Apr 21, 2025

Codecov Report

Attention: Patch coverage is 85.00000% with 3 lines in your changes missing coverage. Please review.

Project coverage is 72.67%. Comparing base (b753779) to head (c58f3a0).
Report is 19 commits behind head on master.

Files with missing lines Patch % Lines
ogcore/TPI.py 0.00% 2 Missing ⚠️
ogcore/txfunc.py 90.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1029      +/-   ##
==========================================
+ Coverage   70.17%   72.67%   +2.49%     
==========================================
  Files          20       20              
  Lines        5073     5068       -5     
==========================================
+ Hits         3560     3683     +123     
+ Misses       1513     1385     -128     
Flag Coverage Δ
unittests 72.67% <85.00%> (+2.49%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
ogcore/SS.py 72.46% <100.00%> (+0.42%) ⬆️
ogcore/__init__.py 100.00% <100.00%> (ø)
ogcore/txfunc.py 46.41% <90.00%> (+16.59%) ⬆️
ogcore/TPI.py 37.19% <0.00%> (-0.13%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@rickecon
Copy link
Member

@jdebacker. All test_txfunc.py tests passed on my machine with your branch and a new ogcore-dev conda environment installed from your branch. Now I am running the full set of tests.

tests/test_txfunc.py ..............................                      [100%]

================= 30 passed, 20 warnings in 445.13s (0:07:25) ==================

@rickecon
Copy link
Member

@jdebacker. I think this PR is ready to merge. I just submitted a PR to your branch that updates the version number to 0.14.3 in setup.py, __init__.py, and CHANGELOG.md. I ran the tests in test_txfunc.py which all passed. I am currently running the full battery of OG-Core tests, but that will likely take more hours. So I think this PR is ready to merge as soon as you evaluate my PR to your branch.

Updated version in setup.py, __init__.py, and CHANGELOG.md
@jdebacker
Copy link
Member Author

@rickecon - thanks for your additions. I've merged them and believe this PR is ready to go.

@rickecon
Copy link
Member

@jdebacker. When I ran the full set of local and non local tests of OG-Core on my machine, I got the following clean slate.

(ogcore-dev) richardevans@Richards-MacBook-Pro-5 OG-Core % pytest
============================= test session starts ==============================
platform darwin -- Python 3.12.10, pytest-8.3.5, pluggy-1.5.0
rootdir: /Users/richardevans/Docs/Economics/OSE/OG/OG-Core
configfile: pytest.ini
testpaths: ./tests
plugins: anyio-4.9.0, cov-6.1.1, xdist-3.6.1
collected 582 items                                                            

tests/test_SS.py ...................................                     [  6%]
tests/test_TPI.py ...........................                            [ 10%]
tests/test_aggregates.py ........................................        [ 17%]
tests/test_basic.py ....                                                 [ 18%]
tests/test_demographics.py ................                              [ 20%]
tests/test_elliptical_u_est.py .......                                   [ 22%]
tests/test_execute.py .                                                  [ 22%]
tests/test_firm.py ..................................................... [ 31%]
................                                                         [ 34%]
tests/test_fiscal.py ......................                              [ 37%]
tests/test_household.py ................................................ [ 46%]
..                                                                       [ 46%]
tests/test_output_plots.py ............................................. [ 54%]
..                                                                       [ 54%]
tests/test_output_tables.py ..............                               [ 57%]
tests/test_parameter_plots.py ........................................   [ 63%]
tests/test_parameter_tables.py .......                                   [ 65%]
tests/test_parameters.py ..............                                  [ 67%]
tests/test_pensions.py ...............................                   [ 72%]
tests/test_run_example.py ..                                             [ 73%]
tests/test_run_ogcore.py .                                               [ 73%]
tests/test_tax.py .................................                      [ 79%]
tests/test_txfunc.py ..............................                      [ 84%]
tests/test_user_inputs.py .........                                      [ 85%]
tests/test_utils.py .................................................... [ 94%]
...............................                                          [100%]
============= 582 passed, 226397 warnings in 50594.23s (14:03:14) ==============

But the GH Actions CI here has a weird set of 11 failures, all the same error, and all in the test_SS.py script:

============================= test session starts ==============================
platform darwin -- Python 3.11.12, pytest-[8](https://github.com/PSLmodels/OG-Core/actions/runs/14607341442/job/40978909541?pr=1029#step:5:9).3.5, pluggy-1.5.0
rootdir: /Users/runner/work/OG-Core/OG-Core
configfile: pytest.ini
testpaths: ./tests
plugins: anyio-4.[9](https://github.com/PSLmodels/OG-Core/actions/runs/14607341442/job/40978909541?pr=1029#step:5:10).0, cov-6.1.1, xdist-3.6.1
collected 582 items / 62 deselected / 520 selected

tests/test_SS.py .......FFFFFFFFFFF..                                    [  3%]
tests/test_TPI.py .......                                                [  5%]
tests/test_aggregates.py ........................................        [ 12%]
tests/test_demographics.py ........                                      [ 14%]
tests/test_elliptical_u_est.py .......                                   [ 15%]
tests/test_firm.py ..................................................... [ 25%]
................                                                         [ 29%]
tests/test_fiscal.py ......................                              [ 33%]
tests/test_household.py ................................................ [ 42%]
..                                                                       [ 42%]
tests/test_output_plots.py ............................................. [ 51%]
..                                                                       [ 51%]
tests/test_output_tables.py ..............                               [ 54%]
tests/test_parameter_plots.py ........................................   [ 62%]
tests/test_parameter_tables.py .......                                   [ 63%]
tests/test_parameters.py ..............                                  [ 66%]
tests/test_pensions.py ...............................                   [ 72%]
tests/test_tax.py .................................                      [ 78%]
tests/test_txfunc.py ............................                        [ 84%]
tests/test_utils.py .................................................... [ 94%]
...............................                                          [[10](https://github.com/PSLmodels/OG-Core/actions/runs/14607341442/job/40978909541?pr=1029#step:5:11)0%]

=================================== FAILURES ===================================
___________________________ test_SS_solver[Baseline] ___________________________

baseline = True, param_updates = {}, filename = 'SS_solver_outputs_baseline.pkl'
dask_client = <Client: 'tcp://127.0.0.1:49234' processes=3 threads=6, memory=7.00 GiB>

    @pytest.mark.parametrize(
        "baseline,param_updates,filename",
        [
            (True, param_updates1, filename1),
            (True, param_updates2, filename2),
            (False, param_updates3, filename3),
            (True, param_updates4, filename4),
        ],
        ids=[
            "Baseline",
            "Baseline, budget balance",
            "Reform, baseline spending=True",
            "Baseline, small open",
        ],
    )
    def test_SS_solver(baseline, param_updates, filename, dask_client):
        # Test SS.SS_solver function.  Provide inputs to function and
        # ensure that output returned matches what it has been before.
        p = Specifications(baseline=baseline, num_workers=NUM_WORKERS)
        p.update_specifications(param_updates)
        p.frac_tax_payroll = np.zeros(p.frac_tax_payroll.shape)
        p.output_base = CUR_PATH
        b_guess = np.ones((p.S, p.J)) * 0.07
        n_guess = np.ones((p.S, p.J)) * 0.35 * p.ltilde
        if p.zeta_K[-1] == 1.0:
            rguess = p.world_int_rate[-1]
        else:
            rguess = 0.06483431412921253
        r_p_guess = rguess
        wguess = firm.get_w_from_r(rguess, p, "SS")
        TRguess = 0.05738932081035772
        factorguess = 1[39](https://github.com/PSLmodels/OG-Core/actions/runs/14607341442/job/40978909541?pr=1029#step:5:40)355.15473[40](https://github.com/PSLmodels/OG-Core/actions/runs/14607341442/job/40978909541?pr=1029#step:5:41)256
        BQguess = aggregates.get_BQ(rguess, b_guess, None, p, "SS", False)
        Yguess = 0.6376591201150815
        p_m_guess = np.ones(p.M)
    
        # for new Ig_baseline arg
        if p.baseline_spending:
            Ig_baseline = 0.0  # tests only have zero infrastructure
        else:
            Ig_baseline = None
    
>       test_dict = SS.SS_solver(
            b_guess,
            n_guess,
            r_p_guess,
            rguess,
            wguess,
            p_m_guess,
            Yguess,
            BQguess,
            TRguess,
            Ig_baseline,
            factorguess,
            p,
            dask_client,
            False,
        )

tests/test_SS.py:[41](https://github.com/PSLmodels/OG-Core/actions/runs/14607341442/job/40978909541?pr=1029#step:5:42)1: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
ogcore/SS.py:652: in SS_solver
    ) = inner_loop(outer_loop_vars, p, client)
ogcore/SS.py:275: in inner_loop
    futures = client.compute(lazy_values, num_workers=p.num_workers)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <Client: 'tcp://127.0.0.1:49234' processes=3 threads=6, memory=7.00 GiB>
collections = (Delayed('root-efb09f34-a45e-409d-ab20-f20a1906aa9d'), Delayed('root-31bfc869-faab-473c-8752-1d9a5133d84f'), Delayed('...618'), Delayed('root-b4a04dee-b122-40d0-8320-1eda5890a264'), Delayed('root-1ca0c2b0-c2ce-4b47-93e8-62efa6b9f4f8'), ...)
sync = False, optimize_graph = True, workers = None, allow_other_workers = False
resources = None, retries = 0, priority = 0, fifo_timeout = '60s', actors = None
traverse = True, kwargs = {'num_workers': 3}, singleton = False
variables = [Delayed('root-efb09f34-a45e-409d-ab20-f20a1906aa9d'), Delayed('root-31bfc869-faab-473c-8752-1d9a5133d84f'), Delayed('...618'), Delayed('root-b4a04dee-b122-40d0-8320-1eda5890a264'), Delayed('root-1ca0c2b0-c2ce-4b47-93e8-62efa6b9f4f8'), ...]
metadata = {'collections': [{'type': 'Delayed'}, {'type': 'Delayed'}, {'type': 'Delayed'}, {'type': 'Delayed'}, {'type': 'Delayed'}, {'type': 'Delayed'}, ...]}
futures_dict = {}

    def compute(
        self,
        collections,
        sync=False,
        optimize_graph=True,
        workers=None,
        allow_other_workers=False,
        resources=None,
        retries=0,
        priority=0,
        fifo_timeout="60s",
        actors=None,
        traverse=True,
        **kwargs,
    ):
        """Compute dask collections on cluster
    
        Parameters
        ----------
        collections : iterable of dask objects or single dask object
            Collections like dask.array or dataframe or dask.value objects
        sync : bool (optional)
            Returns Futures if False (default) or concrete values if True
        optimize_graph : bool
            Whether or not to optimize the underlying graphs
        workers : string or iterable of strings
            A set of worker hostnames on which computations may be performed.
            Leave empty to default to all workers (common case)
        allow_other_workers : bool (defaults to False)
            Used with `workers`. Indicates whether or not the computations
            may be performed on workers that are not in the `workers` set(s).
        retries : int (default to 0)
            Number of allowed automatic retries if computing a result fails
        priority : Number
            Optional prioritization of task.  Zero is default.
            Higher priorities take precedence
        fifo_timeout : timedelta str (defaults to '60s')
            Allowed amount of time between calls to consider the same priority
        traverse : bool (defaults to True)
            By default dask traverses builtin python collections looking for
            dask objects passed to ``compute``. For large collections this can
            be expensive. If none of the arguments contain any dask objects,
            set ``traverse=False`` to avoid doing this traversal.
        resources : dict (defaults to {})
            Defines the `resources` each instance of this mapped task requires
            on the worker; e.g. ``{'GPU': 2}``.
            See :doc:`worker resources <resources>` for details on defining
            resources.
        actors : bool or dict (default None)
            Whether these tasks should exist on the worker as stateful actors.
            Specified on a global (True/False) or per-task (``{'x': True,
            'y': False}``) basis. See :doc:`actors` for additional details.
        **kwargs
            Options to pass to the graph optimize calls
    
        Returns
        -------
        List of Futures if input is a sequence, or a single future otherwise
    
        Examples
        --------
        >>> from dask import delayed
        >>> from operator import add
        >>> x = delayed(add)(1, 2)
        >>> y = delayed(add)(x, x)
        >>> xx, yy = client.compute([x, y])  # doctest: +SKIP
        >>> xx  # doctest: +SKIP
        <Future: status: finished, key: add-8f6e709[44](https://github.com/PSLmodels/OG-Core/actions/runs/14607341442/job/40978909541?pr=1029#step:5:45)6674bad78ea8aeecfee188e>
        >>> xx.result()  # doctest: +SKIP
        3
        >>> yy.result()  # doctest: +SKIP
        6
    
        Also support single arguments
    
        >>> xx = client.compute(x)  # doctest: +SKIP
    
        See Also
        --------
        Client.get : Normal synchronous dask.get function
        """
        if isinstance(collections, (list, tuple, set, frozenset)):
            singleton = False
        else:
            collections = [collections]
            singleton = True
    
        if traverse:
            collections = tuple(
                (
                    dask.delayed(a)
                    if isinstance(a, (list, set, tuple, dict, Iterator))
                    else a
                )
                for a in collections
            )
    
        variables = [a for a in collections if dask.is_dask_collection(a)]
        metadata = SpanMetadata(
            collections=[get_collections_metadata(v) for v in variables]
        )
        futures_dict = {}
        if variables:
>           expr = collections_to_expr(variables, optimize_graph, **kwargs)
E           TypeError: collections_to_expr() got an unexpected keyword argument 'num_workers'

../../../miniconda3/envs/ogcore-dev/lib/python3.11/site-packages/distributed/client.py:36[59](https://github.com/PSLmodels/OG-Core/actions/runs/14607341442/job/40978909541?pr=1029#step:5:60): TypeError

...

=========================== short test summary info ============================
FAILED tests/test_SS.py::test_SS_solver[Baseline] - TypeError: collections_to_expr() got an unexpected keyword argument 'num_workers'
FAILED tests/test_SS.py::test_SS_solver[Baseline, budget balance] - TypeError: collections_to_expr() got an unexpected keyword argument 'num_workers'
FAILED tests/test_SS.py::test_SS_solver[Reform, baseline spending=True] - TypeError: collections_to_expr() got an unexpected keyword argument 'num_workers'
FAILED tests/test_SS.py::test_SS_solver[Baseline, small open] - TypeError: collections_to_expr() got an unexpected keyword argument 'num_workers'
FAILED tests/test_SS.py::test_inner_loop[Baseline, Small Open] - TypeError: collections_to_expr() got an unexpected keyword argument 'num_workers'
FAILED tests/test_SS.py::test_inner_loop[Baseline, Balanced Budget] - TypeError: collections_to_expr() got an unexpected keyword argument 'num_workers'
FAILED tests/test_SS.py::test_inner_loop[Baseline] - TypeError: collections_to_expr() got an unexpected keyword argument 'num_workers'
FAILED tests/test_SS.py::test_inner_loop[Reform] - TypeError: collections_to_expr() got an unexpected keyword argument 'num_workers'
FAILED tests/test_SS.py::test_inner_loop[Reform, baseline spending] - TypeError: collections_to_expr() got an unexpected keyword argument 'num_workers'
FAILED tests/test_SS.py::test_inner_loop[Reform, M>1] - TypeError: collections_to_expr() got an unexpected keyword argument 'num_workers'
FAILED tests/test_SS.py::test_inner_loop[Reform, I!=>M] - TypeError: collections_to_expr() got an unexpected keyword argument 'num_workers'
== 11 failed, 509 passed, 62 deselected, 83742 warnings in 366.72s (0:06:06) ===

@jdebacker
Copy link
Member Author

jdebacker commented Apr 24, 2025

A recent move from dask 2025.3.x to dask 2025.4.0 has broken OG-Core. I've removed the num_workers arguments from calls to client.compute since that has been removed in dask (I believe the number of workers is now inferred from the client configuration.

However, I'm now running into some errors when we try to scatter the parameters object, p with the p_scatter global variable.

cc @rickecon

@rickecon
Copy link
Member

@jdebacker. Really great work figuring out those updates with dask and distributed, especially the scattered commands. I just updated the CHANGELOG.md date and time and added a line about the dask and distributed changes. Will merge as soon as tests pass.

@rickecon rickecon merged commit e35cfd5 into PSLmodels:master Apr 25, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants