Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TailCut tree no longer accessible if make_minitrees false #133

Closed
coderdj opened this issue Aug 16, 2017 · 2 comments
Closed

TailCut tree no longer accessible if make_minitrees false #133

coderdj opened this issue Aug 16, 2017 · 2 comments

Comments

@coderdj
Copy link
Contributor

coderdj commented Aug 16, 2017

This is a more general problem for all never_store=True minitrees. If you have make_minitrees = False, which many people do because it allows them to only use the default set of trees, hax will never load this minitree.

My first reaction was to fix this line https://github.com/XENON1T/hax/blob/v1.6.1/hax/minitrees.py#L317 and add additional logic to ask if the tree happens to be a never_store == true, if so make it even if make_minitrees == False.

But this didn't actually work because TailCut depends on other minitrees. If they don't exist and make_minitrees is false, it will simply crash hard.

Any ideas? My initial thought is to change the line as I was going to then somehow make the crash more graceful if the dependency trees aren't available.

@JelleAalbers
Copy link
Contributor

If make_minitrees=False, load_single_minitree will raise NoMinitreeAvailable if it misses a minitree, which load_single_dataset will catch and return an empty dataframe instead (and finally the merging in load() ensures no events from this dataset end up in the final result).

I'm guessing the tail cut code (which uses load_single_dataset) isn't robust to an empty dataframe, so it will crash in a funny way. Is this what you're seeing?

We could change load_single_dataset to load_single_minitree (letting the exception bubble up to load_single_dataset that loaded TailCut, which knows what to do), or check for an empty dataframe and reraise the exception. Probably the first solution is best, since there might be other pathologies that could cause an empty dataframe (e.g. an empty run).

(By the way, running with make_minitrees=False is a bit dangerous, as it will silently skip datasets (if you have debug messages disabled), so live-time and rate calculations can be affected.)

@coderdj
Copy link
Contributor Author

coderdj commented Aug 17, 2017

In my update (#134) I updated the TailCut code to return an empty dataframe in case it's unable to access the dependent minitrees. I also added a single line that allows building of 'never_store' trees even in case make_minitrees is False. Is this reasonable?

I agree running with make_minitrees=False is sometimes dangerous. But I think most people use it. The reason is that we have this full default set of minitrees on midway and a few fail due to corrupt files, missing raw data, etc and people don't want to deal with that every time they want to start an analysis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants