Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add 'dak.backend' and dak.to_list and verify that they overload the ak.* versions #498

Merged
merged 4 commits into from
Apr 12, 2024

Conversation

jpivarski
Copy link
Collaborator

The tests need type annotations! That's too much, man!

src/dask_awkward/lib/describe.py Show resolved Hide resolved
tests/test_inspect.py Outdated Show resolved Hide resolved
@martindurant
Copy link
Collaborator

Some floating point rounding error?

b          = <Array [[7.21, 5.83, 5.83], [], ..., [5, 4.47, 7]] type='15 * var * float64'>
b_comp     = <Array [[7.21, 5.83, 5.83], [], ..., [5, 4.47, 7]] type='15 * var * float64'>

@jpivarski
Copy link
Collaborator Author

Either that or radically different contents somewhere in b[2:-1] vs b_comp[2:-1].

Only on Python 3.12 on MacOS... On a comparison that was already approximate... Maybe some binary change nudged it over the threshold.

This is also marked as xfail with BAD_NP_AK_MIXIN_VERSIONING. I don't know why it's being marked as a failure if it's xfailed.

@jpivarski jpivarski changed the title feat: add 'dak.backend' and verify that it overloads 'ak.backend' feat: add 'dak.backend' and dak.to_list and verify that they overload the ak.* versions Apr 12, 2024
@jpivarski
Copy link
Collaborator Author

Since the test failure doesn't seem to be reproduced, just running it again, here's its output, which we can revisit in the future if we see that failure again:

=================================== FAILURES ===================================
____________________________ test_distance_behavior ____________________________

daa_p1 = dask.awkward<from-json-files, npartitions=3>
daa_p2 = dask.awkward<from-json-files, npartitions=3>
caa_p1 = <Array [{points: [{...}, ...]}, ..., {...}] type='15 * {points: var * {x: i...'>
caa_p2 = <Array [{points: [{...}, ...]}, ..., {...}] type='15 * {points: var * {x: i...'>

    @pytest.mark.xfail(
        BAD_NP_AK_MIXIN_VERSIONING,
        reason="NumPy 1.25 mixin __slots__ change",
    )
    def test_distance_behavior(
        daa_p1: dak.Array,
        daa_p2: dak.Array,
        caa_p1: ak.Array,
        caa_p2: ak.Array,
    ) -> None:
        daa1 = dak.with_name(daa_p1.points, name="Point", behavior=behaviors)
        daa2 = dak.with_name(daa_p2.points, name="Point", behavior=behaviors)
        caa1 = ak.Array(caa_p1.points, with_name="Point", behavior=behaviors)
        caa2 = ak.Array(caa_p2.points)
>       assert_eq(daa1.distance(daa2), caa1.distance(caa2))

caa1       = <PointArray [[{x: 6, y: 5}, {...}, {...}], ...] type='15 * var * Point[x: i...'>
            a_comp_form = a_comp.layout.form
            b_comp_form = b_comp.layout.form
            assert a_comp_form == a_form
            assert a_comp_form == b_form
            assert b_comp_form == a_comp_form
    
        if check_divisions:
            # check divisions if both collections
            if a_is_coll and b_is_coll:
                if a.known_divisions and b.known_divisions:
                    assert a.divisions == b.divisions
                else:
                    assert a.npartitions == b.npartitions
    
        # finally check the values
        if isclose_equal_nan:
            assert ak.all(ak.isclose(a_comp, b_comp, equal_nan=True))
        else:
            if convert_to_lists:
                assert a_comp.tolist() == b_comp.tolist()
            else:
>               assert ak.almost_equal(a_comp, b_comp, dtype_exact=True)
E               AssertionError

a          = dask.awkward<distance, npartitions=3>
a_comp     = <Array [[0, 0, 0], [], [0, ...], ..., [0], [0, 0, 0]] type='15 * var * float64'>
a_is_coll  = True
a_tt       = <Array-typetracer [...] type='## * var * float64'>
b          = <Array [[7.21, 5.83, 5.83], [], ..., [5, 4.47, 7]] type='15 * var * float64'>
b_comp     = <Array [[7.21, 5.83, 5.83], [], ..., [5, 4.47, 7]] type='15 * var * float64'>
b_is_coll  = False
b_tt       = <Array-typetracer [...] type='## * var * float64'>
check_divisions = True
check_forms = False
convert_to_lists = False
isclose_equal_nan = False
scheduler  = 'sync'

/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/dask_awkward/lib/testutils.py:99: AssertionError
------------------------------ Captured log setup ------------------------------
DEBUG    fsspec.local:local.py:314 open file: /private/var/folders/n2/pt_35rc53tdgkld9531s2tfh0000gn/T/pytest-of-runner/pytest-0/data0/points_ndjson1.json
DEBUG    fsspec.local:local.py:314 open file: /private/var/folders/n2/pt_35rc53tdgkld9531s2tfh0000gn/T/pytest-of-runner/pytest-0/data0/points_ndjson1.json
DEBUG    fsspec.local:local.py:314 open file: /private/var/folders/n2/pt_35rc53tdgkld9531s2tfh0000gn/T/pytest-of-runner/pytest-0/data1/points_ndjson2.json
DEBUG    fsspec.local:local.py:314 open file: /private/var/folders/n2/pt_35rc53tdgkld9531s2tfh0000gn/T/pytest-of-runner/pytest-0/data1/points_ndjson2.json
------------------------------ Captured log call -------------------------------
DEBUG    fsspec.local:local.py:314 open file: /private/var/folders/n2/pt_35rc53tdgkld9531s2tfh0000gn/T/pytest-of-runner/pytest-0/data0/points_ndjson1.json
DEBUG    dask_awkward.lib.io.json:json.py:122 columns read from disk: ['points.x', 'points.y']
DEBUG    fsspec.local:local.py:314 open file: /private/var/folders/n2/pt_35rc53tdgkld9531s2tfh0000gn/T/pytest-of-runner/pytest-0/data0/points_ndjson1.json
DEBUG    dask_awkward.lib.io.json:json.py:122 columns read from disk: ['points.x', 'points.y']
DEBUG    fsspec.local:local.py:314 open file: /private/var/folders/n2/pt_35rc53tdgkld9531s2tfh0000gn/T/pytest-of-runner/pytest-0/data0/points_ndjson1.json
DEBUG    dask_awkward.lib.io.json:json.py:122 columns read from disk: ['points.x', 'points.y']

</details>

Copy link
Collaborator Author

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dak.to_list implementation looks good! I can't check ✔️ approved! because I started this PR.

However, I'd say it's ready to be merged.

src/dask_awkward/lib/describe.py Show resolved Hide resolved
Comment on lines 69 to 70
print(f"{ak.to_list(daa1.distance(daa2)) = }")
print(f"{ak.to_list(caa1.distance(caa2)) = }")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was here to diagnose the test failure.

But now it looks like the test passed, just having been run a second time. (Usually, not a good thing!)

@martindurant martindurant merged commit ff7968f into main Apr 12, 2024
26 checks passed
@martindurant martindurant deleted the jpivarski/add-backend-function branch April 12, 2024 15:08
@martindurant
Copy link
Collaborator

(we may come back to check on that array comparison, maybe floating point inaccuracy)


Unlike most functions, this one requires a compute() of the data.
"""
return array.compute().to_list()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpivarski I wonder, should this be lazy and return a bag of lists?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We chatted in the meeting and thought not - this is only likely to be used in testing for small data.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, if I called this function, I wouldn't be interested in lazy data. The use-case that came up here was that I wanted to debug a test failure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement dak.backend as an overload of ak.backend
3 participants