Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: New features to PackedSelection #797

Merged
merged 43 commits into from
Jun 27, 2023
Merged

Conversation

ikrommyd
Copy link
Contributor

Starting a pull request to merge the new features of PackedSelection intto the master branch.
This isn't 100% complete yet as slightly more testing and actual writing of some more unittests is required

@ikrommyd ikrommyd changed the title Cutflow New features to PackedSelection Apr 20, 2023
@lgray
Copy link
Collaborator

lgray commented Apr 20, 2023

Small comment - for better granularity can you make the tests a bit shorter and more focused? Right now one test will fail and we won't know what's wrong without digging in!

@ikrommyd
Copy link
Contributor Author

Small comment - for better granularity can you make the tests a bit shorter and more focused? Right now one test will fail and we won't know what's wrong without digging in!

In bd37f9f, I split the tests into "basic", "nminusone" and "cutflow" tests.

@lgray
Copy link
Collaborator

lgray commented May 6, 2023

@iasonkrom just so the tests for your stuff are really exercising a realistic workflow (i.e. reading from a file), as we have found that to be a sticking point. Could you please add some tests that do so, so we aren't blindsided in the future? It's OK if the tests fail for, you can mark them with a pytest.xfail or test for the exception/failure to happen.

@ikrommyd
Copy link
Contributor Author

ikrommyd commented May 6, 2023

@lgray the "nminusone" and "cutflow" tests do use the nano_dy samples which I read once at the top of the test file to avoid reading in every test function. However for the "dak" version of the tests I currently use from_awkward because of the issues in dask-contrib/dask-awkward#256. I could convert those to xfail tests if you want.

@lgray
Copy link
Collaborator

lgray commented May 6, 2023

Yes, please do - the from_awkward to get your data is quite unrealistic as we discussed.

@lgray lgray changed the title New features to PackedSelection feat: New features to PackedSelection May 14, 2023
@lgray
Copy link
Collaborator

lgray commented Jun 3, 2023

@iasonkrom can you rebase this to master?

@ikrommyd ikrommyd closed this Jun 3, 2023
@ikrommyd ikrommyd force-pushed the cutflow branch 2 times, most recently from 9d8b14b to 3aaddc5 Compare June 3, 2023 23:53
@lgray
Copy link
Collaborator

lgray commented Jun 4, 2023

You are having a bad problem and will not go to space today.

@ikrommyd
Copy link
Contributor Author

ikrommyd commented Jun 4, 2023

@lgray Yeah I accidentally pressed "discard commits" on github and this closed the PR automatically because this synced my branch with master. Is it possible to reopen it?

The bin labels of the y axis of the histograms.
"""
for name, var in vars.items():
if len(var) != len(self._masksonecut[0]):
Copy link
Collaborator

@lgray lgray Jun 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto about partitions as opposed to checking length of arrays.

src/coffea/analysis_tools.py Outdated Show resolved Hide resolved
@lgray
Copy link
Collaborator

lgray commented Jun 26, 2023

The tutorial notebook is comprehensive, nice! :-)

@ikrommyd
Copy link
Contributor Author

@lgray Any idea how to get raising incompatible partitions to work since compatible_partitions doesn't detect incompatiability if var is sliced as I told you in slack dms?

        if self._delayed_mode:
            for name, var in vars.items():
                if not compatible_partitions(var, self._masks[0]):
                    raise IncompatiblePartitions("plot_vars", var, self._masks[0])

@ikrommyd
Copy link
Contributor Author

And I can't find I way to get rid of this last len call that I want to avoid.

            h.fill(
                dask_awkward.from_awkward(awkward.Array([0]), 1),
                weight=dask_awkward.from_awkward(awkward.Array([len(weight)]), 1),
            )

I basically just want to fill a delayed histogram with zeros, len(weight) times but the layouts just don't work if I just do dak.count

@lgray
Copy link
Collaborator

lgray commented Jun 27, 2023

what's the error you get if you try to do dak.count ?

@lgray
Copy link
Collaborator

lgray commented Jun 27, 2023

Also why doesn't h.fill(dak.zeros_like(weight), weight=1.0) work?

@lgray
Copy link
Collaborator

lgray commented Jun 27, 2023

for incompatible partitions can you print out the dask arrays you're getting at that stage, since we know it fails later when trying to apply a mask for exactly the same reason.

@ikrommyd
Copy link
Contributor Author

h.fill(dak.zeros_like(weight), weight=1.0)

yup this works. I'm just stupid and forgot about it.

@ikrommyd
Copy link
Contributor Author

ikrommyd commented Jun 27, 2023

for incompatible partitions can you print out the dask arrays you're getting at that stage, since we know it fails later when trying to apply a mask for exactly the same reason.

not anymore, because this was just replaced by the h.fill(dak.zeros_like(weight), weight=1.0). This was the line that was raising the error at _getitem_outer_bool_or_int_lazy_array

@lgray
Copy link
Collaborator

lgray commented Jun 27, 2023

oh, fun.

@lgray
Copy link
Collaborator

lgray commented Jun 27, 2023

that cannot possibly be that line does not fill anything with a mask?

@ikrommyd
Copy link
Contributor Author

wait maybe I'm just missing something. It's just that the only test failure was "did not raise incmpatible error" after I changed this line.

@lgray
Copy link
Collaborator

lgray commented Jun 27, 2023

Honestly for this I'm willing to save it for a follow-up PR to give some time to think about it correctly.

@lgray
Copy link
Collaborator

lgray commented Jun 27, 2023

Oh, and I mean for the case that should raise, when it is testing for partition compatibility can you just print out the dask_awkward.Array's and we take a look at them?

@ikrommyd
Copy link
Contributor Author

Oh you mean to find the bug? If yes then I'll create a reproducer.
What do I do with pytests though that now fail due to not raising incompatible partitions? Just skip?

@lgray
Copy link
Collaborator

lgray commented Jun 27, 2023

yeah just comment out that part of the test for now and let's take our time to do it the right way.

@ikrommyd
Copy link
Contributor Author

I think I finalized this for now. I believe you can just check today's changes and just merge.

@lgray lgray enabled auto-merge June 27, 2023 01:02
@lgray lgray merged commit 30f4619 into CoffeaTeam:master Jun 27, 2023
@ikrommyd ikrommyd deleted the cutflow branch September 6, 2023 22:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants