Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Awkward v2 transition #736

Merged
merged 97 commits into from Apr 6, 2023
Merged

Awkward v2 transition #736

merged 97 commits into from Apr 6, 2023

Conversation

lgray
Copy link
Collaborator

@lgray lgray commented Oct 25, 2022

  • Get all tests working again regardless of laziness in nanoevents (I should break this out into sub-bullets)

  • base schema passes tests

  • nanoaod schema passes tests

  • protodune schema passes tests

  • physlite schema passes tests

  • delphes schema passes tests

  • treemaker schema passes tests

  • local executor tests pass

  • dask executor passes tests

  • parsl executor passes tests

  • spark executor passes tests (requires spark upgrade and new data ingestion step!)

  • lookup tools passes tests

  • jetmet tools - individual correctors

  • jetmet tools - CorrectedJetsFactory (heavily lazy)

  • the above, again, except with laziness implemented in dask_awkward

@lgray lgray force-pushed the awkward2_dev branch 2 times, most recently from f6d6d30 to 18765f6 Compare November 4, 2022 21:59
@lgray
Copy link
Collaborator Author

lgray commented Nov 6, 2022

@kratsg hey - what does a form key that looks like:

Gettting ( ad4cd5ec-123e-11ec-92f6-93e3aac0beef/%2FDelphes%3B1/0-25/data/GenJet%2FGenJet.SoftDroppedJet%2C%21load%2C%21content%2C%40instance_version%2C%21item ) : ad4cd5ec-123e-11ec-92f6-93e3aac0beef /Delphes;1 0 25 ['GenJet/GenJet.SoftDroppedJet', '!load', '!content', '@instance_version', '!item']

mean? In particular the @instance_version token? The internal structure of RecordArray forms changed quite a bit in awkwardv2 and nanoevents makes quite a few assumptions.

@kratsg
Copy link
Contributor

kratsg commented Nov 7, 2022

In particular the @instance_version token? The internal structure of RecordArray forms changed quite a bit in awkwardv2 and nanoevents makes quite a few assumptions

Checking, but I don't recall an instance_version being added...

@lgray
Copy link
Collaborator Author

lgray commented Nov 7, 2022

Gotcha - things broke in interesting ways - there's only so much of this code nick/myself actually own and know how to maintain! Thanks for looking.

@lgray lgray force-pushed the awkward2_dev branch 2 times, most recently from 3147db7 to 3927dd3 Compare December 7, 2022 20:33
@lgray
Copy link
Collaborator Author

lgray commented Dec 7, 2022

@gordonwatts You should be aware that things have changed a lot with forms under the hood in awkward2. I have gotten your tests to pass again for the auto schema but there is not much confirmation that they work beyond that. @'ing you since it would be worth your time to work the rest of this out.

I'm not sure if the servicex tests are awkward2 compatible yet either.

@lgray
Copy link
Collaborator Author

lgray commented Dec 7, 2022

@tejinc awkward2 brings quite some changes to the inner workings of coffea. I have gotten your nanoevents schema (pdune) to pass tests but you may want to check it to ensure proper functionality is kept.

@lgray
Copy link
Collaborator Author

lgray commented Dec 7, 2022

@kpedro88 awkward2 brings quite some changes to the inner workings of coffea. I have gotten your nanoevents schema (treemaker) to pass your tests but you may want to check it to ensure proper functionality is kept.

@kpedro88
Copy link
Contributor

kpedro88 commented Dec 7, 2022

@kpedro88 awkward2 brings quite some changes to the inner workings of coffea. I have gotten your nanoevents schema (treemaker) to pass your tests but you may want to check it to ensure proper functionality is kept.

@yimuchen @cmadrid1 @Keane-Tan FYI

@lgray
Copy link
Collaborator Author

lgray commented Dec 8, 2022

@nikoladze awkward2 brings quite some changes to the inner workings of coffea. I have gotten your nanoevents schema (physlite) to pass your tests but you may want to check it to ensure proper functionality is kept.

@lgray
Copy link
Collaborator Author

lgray commented Dec 8, 2022

@kratsg awkward2 brings quite some changes to the inner workings of coffea. I have gotten your nanoevents schema (delphes) to pass your tests but you may want to check it to ensure proper functionality is kept.

@kratsg
Copy link
Contributor

kratsg commented Dec 8, 2022

@kratsg awkward2 brings quite some changes to the inner workings of coffea. I have gotten your nanoevents schema (delphes) to pass your tests but you may want to check it to ensure proper functionality is kept.

Checked it and it seemed ok out of the box? It at least works for me for the quick tests I ran so I think it's ok.

EDIT: works on some other older Delphes files I had too that crashed with coffea before... so I think the changes are going to give us better support(?!). Thanks for all the work!

@yimuchen
Copy link
Contributor

TreeMakerSchema looks good. But found another bug when loading in a large number events:

events = NanoEventsFactory.from_root("file:mytestfile.root", entry_start=0, entry_stop=100, schemaclass=TreeMakerSchema, treepath='TreeMaker2/PreSelection').events() # This works fine
events = NanoEventsFactory.from_root("file:mytestfile.root", entry_start=0, entry_stop=1000, schemaclass=TreeMakerSchema, treepath='TreeMaker2/PreSelection').events() # This errors

Traceback dump:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "coffeaenv/lib/python3.8/site-packages/coffea/nanoevents/factory.py", line 399, in events
    events = awkward.from_buffers(
  File "coffeaenv/lib/python3.8/site-packages/awkward/operations/ak_from_buffers.py", line 76, in from_buffers
    return _impl(
  File "coffeaenv/lib/python3.8/site-packages/awkward/operations/ak_from_buffers.py", line 121, in _impl
    out = reconstitute(form, length, container, getkey, backend, simplify)
  File "coffeaenv/lib/python3.8/site-packages/awkward/operations/ak_from_buffers.py", line 296, in reconstitute
    contents = [
  File "coffeaenv/lib/python3.8/site-packages/awkward/operations/ak_from_buffers.py", line 297, in <listcomp>
    reconstitute(content, length, container, getkey, backend, simplify)
  File "coffeaenv/lib/python3.8/site-packages/awkward/operations/ak_from_buffers.py", line 274, in reconstitute
    content = reconstitute(
  File "coffeaenv/lib/python3.8/site-packages/awkward/operations/ak_from_buffers.py", line 296, in reconstitute
    contents = [
  File "coffeaenv/lib/python3.8/site-packages/awkward/operations/ak_from_buffers.py", line 297, in <listcomp>
    reconstitute(content, length, container, getkey, backend, simplify)
  File "coffeaenv/lib/python3.8/site-packages/awkward/operations/ak_from_buffers.py", line 274, in reconstitute
    content = reconstitute(
  File "coffeaenv/lib/python3.8/site-packages/awkward/operations/ak_from_buffers.py", line 296, in reconstitute
    contents = [
  File "coffeaenv/lib/python3.8/site-packages/awkward/operations/ak_from_buffers.py", line 297, in <listcomp>
    reconstitute(content, length, container, getkey, backend, simplify)
  File "coffeaenv/lib/python3.8/site-packages/awkward/operations/ak_from_buffers.py", line 148, in reconstitute
    data = backend.nplike.frombuffer(raw_array, dtype=dtype, count=real_length)
  File "coffeaenv/lib/python3.8/site-packages/awkward/_nplikes.py", line 115, in frombuffer
    return self._module.frombuffer(*args, **kwargs)
ValueError: buffer is smaller than requested size

@lgray
Copy link
Collaborator Author

lgray commented Dec 12, 2022

@yimuchen thanks! what's the size of the file you're testing against?

@yimuchen
Copy link
Contributor

yimuchen commented Dec 12, 2022

@yimuchen thanks! what's the size of the file you're testing against?

This is a 2GB file containing ~60K events. So about 35KB (?) per event.

coffea/nanoevents/methods/candidate.py Outdated Show resolved Hide resolved
coffea/lookup_tools/lookup_base.py Outdated Show resolved Hide resolved
coffea/lookup_tools/lookup_base.py Outdated Show resolved Hide resolved
@lgray
Copy link
Collaborator Author

lgray commented Dec 12, 2022

@yimuchen can you try seeing if it's a particular event that causes problems, or if it's just really the size of the event range you want to pick up?

@lgray lgray force-pushed the awkward2_dev branch 3 times, most recently from 2103840 to 2bde16c Compare December 12, 2022 21:52
@lgray
Copy link
Collaborator Author

lgray commented Dec 12, 2022

Note to self: python 3.11 is a limitation of at least numba

@lgray lgray force-pushed the awkward2_dev branch 2 times, most recently from f347ada to 38dd522 Compare December 13, 2022 01:02
@yimuchen
Copy link
Contributor

@yimuchen can you try seeing if it's a particular event that causes problems, or if it's just really the size of the event range you want to pick up?

Tried a few points, it looks like if the event size fails for some slice size (seems fail at about ~400 events in this case), it would consistently fail. So it isn't a case of a single very large event.

@lgray
Copy link
Collaborator Author

lgray commented Dec 13, 2022

@yimuchen so it's really just the absolute size of the slice? or where you are in the file? or both? It may be useful to scan through the file 100 events at a time to make sure there's not some pattern that's bugging out.

form["contents"][field], prefix + f",{field},!item"
)
elif field.startswith("@"):
# workaround uproot5 bug
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug is fixed in scikit-hep/uproot5#808

@lgray
Copy link
Collaborator Author

lgray commented Dec 28, 2022

Tests will be broken for a bit as I get things daskified.

@lgray
Copy link
Collaborator Author

lgray commented Mar 16, 2023

@nikoladze I could use your help in daskifying PHYSLITESchema. Something is going wrong in zipping the forms but everything looks more or less OK.

I think if that can be fixed up it'll more or less work. You'll need to use uproot main to get it to run in the first place.

@lgray
Copy link
Collaborator Author

lgray commented Mar 16, 2023

@kratsg I think I did this correctly. Can you please check the usage of NanoEvents Delphes with dask_awkward? I have no idea what correct use really looks like.

@nikoladze
Copy link
Contributor

@nikoladze I could use your help in daskifying PHYSLITESchema. Something is going wrong in zipping the forms but everything looks more or less OK.

I think if that can be fixed up it'll more or less work. You'll need to use uproot main to get it to run in the first place.

Thanks a lot for putting all this effort into this! See #776 for a fix for the form zipping.

@lgray
Copy link
Collaborator Author

lgray commented Mar 19, 2023

@nikoladze there's still some problems in the rendered arrays with the trackParticleLinks - look at the test failures. :-/

Ah - I see it's passing around some dask_awkward stuff within all your offsets functions. I'll get to it in time but it's a fair amount of rather specific code, it would be best for you to figure out where to put in the map_partitions calls and such.

Basically the big change with dask_awkward is that we can only use the factory-reference-through-behavior trick on the client side, so your arrays have to render to something that doesn't require reindexing later.

@lgray
Copy link
Collaborator Author

lgray commented Mar 25, 2023

These latest commits require to be-released features in dask-awkward.

@lgray
Copy link
Collaborator Author

lgray commented Apr 6, 2023

OK so what we're gonna do is merge this puppy and start release candidates. There is a very short list of things that need doing to get a full release on the coffea side.

@nikoladze @kratsg I will need your help to fix your schemas.

Parquet nanoevents will have to wait for a better implementation of form-remapping in dask-awkward (which should also tidy up some code in uproot). @martindurant @douglasdavis

I will reformulate the task list from this PR into the remaining tasks that need doing, and release an RC for people to start exploring what's new.

@lgray
Copy link
Collaborator Author

lgray commented Apr 6, 2023

@btovar @dthain we should talk about workqueue + dask = taskvine? as well, times are a-changin'.

@lgray lgray merged commit 966c94d into master Apr 6, 2023
13 checks passed
@lgray lgray deleted the awkward2_dev branch June 26, 2023 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants