Skip to content

Howto: Synchronization

Andreas edited this page Nov 13, 2019 · 1 revision

What is synchronization?

Synchronization here refers to the systematic comparison between different implementations of the nominally same physics analysis.

Why do we need it?

The main motivation is usually that multiple groups are working together and need to agree on consistent results. Beyond this practical argument, one might argue that synchronization between independent implementations is the most effective way of avoiding bugs (although it's clearly not fail safe).

How to do it?

In principle it's simple:

  1. Run your analysis over an agreed upon sample.

  2. For each cut you apply, record how many events pass it.

  3. Make a cut flow table showing these numbers.

In this framework, remember that analysis regions are defined as a list of cuts. Therefore, step 2 can easily be implemented by looping over the individual cuts, as done in the hinv processor and identically in the monojet processor. For step 3, there exists a simple script to translate the cutflows saved in the coffea output file into a nice table.

Common issues and how to deal with them

The cut flow I get has the cuts in the wrong order

The cuts are printed in the same order as they are defined in the region definition. Therefore, you can simply change the region definition order (here or here) to fix this.

I want to do something special for this round of synchronization, like removing the trigger selection

There are two ways of doing this. One way would be to just temporarily make that change in your working copy of the repository, run the processor, and either throw the changes away after or store them in a separate branch. While this works, it is not optimal for when you might want to keep redoing the sync over a period of time. To make this more efficient and keep the results easily traceable, it is a good idea to use a configuration setting to activate synchronization mode. Then you can implement the desired change in the python code to only occur when the sync flag is set. This is already done e.g. here, where no triggers are checked if the sync flag is on.

I want to avoid any prefiltering

The nanoaod files we use as input to the processors are preprocessed using nanoaod-tools. In this preprocessing events are selected if they fire one of our triggers of interest and have some minimal interesting event content (leptons, high MET, or something similar). Therefore, if you run the synchronization on top of these files, your selection efficiencies and cut flow results will be different than if you ran it on the same dataset without pre-filtering. To deal with this, we also keep pre-processed copies of the synchronization datasets without pre-filtering. The available datasets without prefiltering are listed in the table below

Skim version location eos path
21Aug19 lxplus /eos/cms/store/group/phys_exotica/monojet/aalbert/nanopost/sync_21Aug19/