Material for unfolding tutorial #862

anigamova · 2023-09-26T04:36:38Z

Hands-on session in the afternoon, should fit in 90-120 minutes

kcormi

Thanks Aliya! this is really nice.

I wonder if its worth at the beginning suggesting that newer users might also want to check out the original long exercise to get a better idea of basic combine usage?

I haven't worked through all of the commands yet (on the train), so I may have some more comments later as well.

I've added a bunch for now, but they are mostly small things; I think this will be really helpful to have.

I wonder if at the end we might also want a small section mentioning regularization and that details on regularization can be found in some of the other documentation pages, if necessary, though in general the recommendation is to avoid it where possible?

Thanks a lot!

kcormi · 2023-09-26T07:46:33Z

docs/tutorial2023_unfolding/unfolding_exercise.md

+
+When constructing the reco-level for any differential analysis the main goal is to match the gen-level bins as closely as possible. In the simplest case it can be done with the cut-based approach, i.e. applying the selection on the corresponding reco-level variables: $p_{T}(Z)$ and $n_{\text{add. jets}}$. Due to the good lepton $p_{T}$ resolution we can follow the original STXS scheme quite closely with the reco-level selection, with one exception, it is not possible to access the very-low transverse momenta bin $p_{T}(Z)<75$ GeV.  
+
+In `counting/regions` dicrectory you can find the datacards with for five reco-level categories, each targetting a corresponding gen-level bin. Below you can find an example of the datacard for reco-level bin with $p_{T}(Z)$>400 GeV, 


"with for five" --> "with five"

In the repo, I see the directory counting/ but it doesn't have a regions subdirectory, only a combined_ratesOnly.txt datacard.

kcormi · 2023-09-26T07:53:55Z

docs/tutorial2023_unfolding/unfolding_exercise.md

+--------------------------------------------------------------------------------
+
+```
+


Maybe it would be good to have them create a workspace from this datacard and run a simple fit (perhaps e.g. a fitdiagnostics to get the signal strength) on a single region, with the very simple datacard.

Then maybe add another heading before moving on to the full datacard and the migration matrix?

kcormi · 2023-09-26T07:56:51Z

docs/tutorial2023_unfolding/unfolding_exercise.md

+
+One of the most important stages in the analysis design, is to make sure that the reco-level categories are pure with the corresponding gen-level processes.
+
+To explicitly check it, one can plot the contributions of gen-level bins in all of the reco-level bins. We propose to use the script provided in the tutorial git-lab page. 


Maybe we could add a couple of words about what this script does. e.g.

"This script uses combine harvester to loop over detector level bins, and get the rate at which each of the signal processes (generator-level bins) contributes to that detector-level bin; which is then used to plot the migration matrix."

kcormi · 2023-09-26T07:59:04Z

docs/tutorial2023_unfolding/unfolding_exercise.md

+```shell
+text2workspace.py -m 125  counting/* -P HiggsAnalysis.CombinedLimit.PhysicsModel:multiSignalModel  --PO verbose --PO 'map=.*/.*ZH_lep_PTV_75_150_hbb:r_zh_75_150[1,-5,5]' --PO 'map=.*/.*ZH_lep_PTV_150_250_0J_hbb:r_zh_150_250noj[1,-5,5]'  --PO 'map=.*/.*ZH_lep_PTV_150_250_GE1J_hbb:r_zh_150_250wj[1,-5,5]' --PO 'map=.*/.*ZH_lep_PTV_250_400_hbb:r_zh_250_400[1,-5,5]' --PO 'map=.*/.*ZH_lep_PTV_GT400_hbb:r_zh_gt400[1,-5,5]' -o ws_counting.root
+```
+where we use `--PO 'map=bin/process:poi[init, min, max]'`. So in the example above a signal POI is assigned to each gen-level bin independent on reco-level bin. This allows to take into account the non-trivial acceptance effects. One can also perform bin-by-bin unfolding using the mapping to the bin names rather that processes, e.g. `'map= vhbb_Zmm_gt400_13TeV/.*:r_reco_zh_gt400[1,-5,5]'`, but this method is not recommended and can be used for tests.


I wonder if it is maybe better not to mention bin-by-bin at all here? Unless it is used a useful test? I haven't used it that way, but maybe there is a use case I'm not aware of.

Perhaps a couple of words about multiSignalModel would also be a good idea?

kcormi · 2023-09-26T08:01:25Z

docs/tutorial2023_unfolding/unfolding_exercise.md

+```
+where we use `--PO 'map=bin/process:poi[init, min, max]'`. So in the example above a signal POI is assigned to each gen-level bin independent on reco-level bin. This allows to take into account the non-trivial acceptance effects. One can also perform bin-by-bin unfolding using the mapping to the bin names rather that processes, e.g. `'map= vhbb_Zmm_gt400_13TeV/.*:r_reco_zh_gt400[1,-5,5]'`, but this method is not recommended and can be used for tests.
+
+To extract the measurement let's run the initial fit first: 


Maybe a few more words about MultiDimFit, e.g.

"We will use the MultiDimFit algorithm to simultaneously extract best-fit values and uncertainties on multiple parameters."

And maybe a short word about the initial values "we use --setParameters <param>=<value>,... to ... "

kcormi · 2023-09-26T08:16:23Z

docs/tutorial2023_unfolding/unfolding_exercise.md

+
+The datacards for this part of the exercise located `full_model_datacards/`, where you can find a separate datacard for each region within `full_model_datacards/regions` directory and also a combined datacard `full_model_datacards/comb_full_model.txt`. 
+
+As you will find the datacards also contain several background processes. To control them properly we will add the regions enriched in the respective backgrounds. Then we can define a common set rate parameters for signal and control regions to scale the rates or other parameters affecting their shape.  


"define a common set rate parameters" --> "define a common set of rate parameters"

kcormi · 2023-09-26T08:21:16Z

docs/tutorial2023_unfolding/unfolding_exercise.md

+Now we can create the workspace using the same `multiSignalmodel` 
+
+```shell
+text2workspace.py -m 125  full_model_datacards/comb_full_model.txt -P HiggsAnalysis.CombinedLimit.PhysicsModel:multiSignalModel  --PO verbose --PO 'map=.*/.*ZH_lep_PTV_75_150_hbb:r_zh_75_150[1,-5,5]' --PO 'map=.*/.*ZH_lep_PTV_150_250_0J_hbb:r_zh_150_250noj[1,-5,5]'  --PO 'map=.*/.*ZH_lep_PTV_150_250_GE1J_hbb:r_zh_150_250wj[1,-5,5]' --PO 'map=.*/.*ZH_lep_PTV_250_400_hbb:r_zh_250_400[1,-5,5]' --PO 'map=.*/.*ZH_lep_PTV_GT400_hbb:r_zh_gt400[1,-5,5]' --for-fits --no-wrappers --X-pack-asympows --optimize-si.png-constraints=cms --use-histsum -o ws_full.root


I think -simpdf- has morphed into -si.png- in the --optimize-simpdf-constraints flag. It might also be good to explain somewhere a little what some of these flags do? Perhaps we could add a collapsable info box, or similar?

kcormi · 2023-09-26T08:24:40Z

docs/tutorial2023_unfolding/unfolding_exercise.md

+
+> Following the instructions given earlier, create the workspace and run the initial fit with `-t -1` and set the name `-n .BestFit`. 
+
+Since this time the datacards include shape uncertainties as well as additional categories to improve the background description the fit might take much longer, but we can submit condor jobs and have results ready to look at in a few minutes. 


this is nice, showing how to scale up some aspects. But maybe we should also advertise that it will only work on systems with condor (and perhaps provide an alternative for people running in places without?)

kcormi · 2023-09-26T08:27:46Z

docs/tutorial2023_unfolding/unfolding_exercise.md

+## Unfolded measurements
+
+Now that we studied the NP impacts for each POI, we can finally extract the measurements.
+Note, that in this exercise we are skipping couple of checks that have to be done before the unblinding. Namely the goodness of fit test and the post-fit plots of folded observables. Both of these checks were detailed in the previous exercises, you can find the description under the following links. 


"skipping couple of checks" --> Maybe something like "we are skipping further checks and validation that you should do on your analysis for the purposes of the tutorial."

kcormi · 2023-09-26T08:30:07Z

docs/tutorial2023_unfolding/unfolding_exercise.md

+```shell
+python scripts/make_XSplot.py summary_zh_stxs.json
+```
+![](figures/stxs_zh.png)


Is there a reason I am not seeing observed values for two of the bins? Maybe it would be good to comment on this briefly.

anigamova · 2023-09-26T08:55:00Z

Thanks @kcormi, I will address your comments soon. But what I've realised actually, is that it's not completely ok to have some of these plots on combine public docs pages, even if the analysis is already public. I'm using only partial run2 datacards to be able to fit them within 1-2h so these results are not public or going to be public. I will move these plot to the cms-analysis gitlab

kcormi · 2023-09-26T09:58:59Z

plot

Hi @anigamova what if you generate some pseudodata from the model, and then use that as the input data histogram? Then I think we could make everything public right? There's no real data being used in that case, but it should work more or less as if it were.

anigamova · 2023-09-26T15:45:53Z

Added the labels for the plots, should be clear now that they are only intended for this particular tutorial

kcormi · 2023-09-26T16:02:50Z

docs/tutorial2023_unfolding/unfolding_exercise.md

-
-There's a set of combine (.txt) datacards which will help you get through the various parts of the exercise. The exercises should help you become familiar with the structure of  fitting datacards.
-
+Note that the general recomendation on unfolding in `Combine` are available [here](https://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/part3/regularisation/), which also includes recommendations on regularisation techniques and when to use it, which is completely is not discussed in this tutorial at all. 


"which is completely is not discussed" --> "is not discussed"

kcormi · 2023-09-26T16:15:31Z

docs/tutorial2023_unfolding/unfolding_exercise.md

 ```
-where we use `--PO 'map=bin/process:poi[init, min, max]'`. So in the example above a signal POI is assigned to each gen-level bin independent on reco-level bin. This allows to take into account the non-trivial acceptance effects. One can also perform bin-by-bin unfolding using the mapping to the bin names rather that processes, e.g. `'map= vhbb_Zmm_gt400_13TeV/.*:r_reco_zh_gt400[1,-5,5]'`, but this method is not recommended and can be used for tests.
+In the example given above a signal POI is assigned to each gen-level bin independent on reco-level bin. This allows to take into account the non-trivial acceptance effects. One can also perform bin-by-bin unfolding using the mapping to the bin names rather that processes, e.g. `'map= vhbb_Zmm_gt400_13TeV/.*:r_reco_zh_gt400[1,-5,5]'`, but this method is not recommended and can be used only for tests as another way to ensure that the migration matrix is close to diagonal. 


I think you mean "to take into account migrations" rather than "acceptance effects" here, and that will maybe be a bit more clear to the reader.

kcormi · 2023-09-26T16:17:52Z

docs/tutorial2023_unfolding/unfolding_exercise.md


 ## Shape analysis with control regions

 One of the advantages of the maximum likelihood unfolding is the flexibility to choose the analysis observable and include more information on the event kinematics, consequently improving the analysis sensitivity. This analysis benefits from the shape information of the DNN output trained to differentiate the VH(bb) signal from the SM backgrounds. 

 The datacards for this part of the exercise located `full_model_datacards/`, where you can find a separate datacard for each region within `full_model_datacards/regions` directory and also a combined datacard `full_model_datacards/comb_full_model.txt`. 

-As you will find the datacards also contain several background processes. To control them properly we will add the regions enriched in the respective backgrounds. Then we can define a common set rate parameters for signal and control regions to scale the rates or other parameters affecting their shape.  
+As you will find the datacards also contain several background processes. To control them properly we will add the regions enriched in the respective backgrounds. Then we can define a common set of rate parameters for signal and control regions to scale the rates or other parameters affecting their shape.  

 For the shape datacards one has to specify the mapping of histograms and channels/processes as given described below:


we could also link to the part of the documentation that describes this, for further information.

kcormi · 2023-09-26T16:19:13Z

docs/tutorial2023_unfolding/unfolding_exercise.md

@@ -142,26 +143,33 @@ Then the `shape` nuisance parameters can be defined in the systematics block in

 In the realistic CMS analysis there are hundreds of nuisance parameters corresponding to various source of systematics. 

-When we unfold to the gen-level observable we should remove the nuisances affecting the rate of the gen-level bins, i.e. the `lnN` NPs: `THU_ZH_mig*, THU_ZH_inc` and keep only the acceptance `shape` uncertainties: `THU_ZH_acc` and `THU_ggZH_acc`. This can be achieved by freezing the respective nuisance parameters with the option `--freezeParameters par_name1,par_name2`. Alternatively you can create a group following the syntax given below at the end of the combined datacard, and freeze the parameters with the `--freezeNuisanceGroups group_name` option.
+When we unfold to the gen-level observable we should remove the nuisances affecting the rate of the gen-level bins, i.e. the `lnN` NPs: `THU_ZH_mig*, THU_ZH_inc` and keep only the acceptance `shape` uncertainties: `THU_ZH_acc` and `THU_ggZH_acc`, which do not scale the inclusive cross sections by construction. 
+This can be achieved by freezing the respective nuisance parameters with the option `--freezeParameters par_name1,par_name2`. Alternatively you can create a group following the syntax given below at the end of the combined datacard, and freeze the parameters with the `--freezeNuisanceGroups group_name` option.


Maybe before this it would be good to point out explicitly that for this analysis you have already done the work of separating out the normalization and shape components by normalizing the theoretical variations to the nominal generator-level cross sections for each bin?

kcormi · 2023-09-26T17:54:25Z

docs/tutorial2023_unfolding/unfolding_exercise.md

+
+![](figures/scan_plot_r_zh_75_150.png)
+
+Repeat the same command for other POIs to fill the `summary_zh_stxs.json`, which can then be used to create the cross section plot as shown below. 


Maybe it would also be good to add a few pedagogical lines here to the effect of:

"In order to plot our differential cross sections measurements at generator-level, we need to multiply the fitted parameter of interest values by their original cross sections which are stored in .... "

kcormi

Some more comments about the presentation, mostly suggesting to make the writing and exposition a bit more pedagogical and try to make sure that the people following the tutorial understand how each of the pieces fit together.

kcormi · 2023-09-27T04:25:50Z

docs/tutorial2023_unfolding/unfolding_exercise.md

+# Combine unfolding exercise
+
+## Getting started
+By now you should have a working setup of Combine v9 from the pre-tutorial exercise. If so then move onto the cloning of the parametric fitting exercise gitlab repo below. If not then you need to set up a CMSSW area and checkout the combine package:


I wonder if for this part it actually makes more sense to link directly to the installation instructions and tell them to use one of those setups. That way we avoid repeating information (and lower the risk of it going out of date or changing)?

kcormi · 2023-09-27T04:27:05Z

docs/tutorial2023_unfolding/unfolding_exercise.md

+
+![](figures/simplifiedXS_VH_1_2.png) 
+
+In this tutorial we will focus on the ZH production, with Z boson decaying to charged leptons, and Higgs boson reconstructed with the resolved $b\bar{b}$ pair. We will also use only a part of the Run 2 categories, so the analysis sensitivity is not going to be achieved. Note that ggZH and ZH production modes are combined in the fit, since it is not possible to resolve them at this stage of the analysis. The STXS categories are defined independently of the Higgs decay channel, to streamline the combinations of the cross section measurement. 


"the analysis sensitivity is not going to be achieved" --> maybe "we will not achieve the same sensitivity as the full analysis."

kcormi · 2023-09-27T04:29:25Z

docs/tutorial2023_unfolding/unfolding_exercise.md

+
+## Simple datacards, one-bin measurement
+
+When constructing the reco-level for any differential analysis the main goal is to match the gen-level bins as closely as possible. In the simplest case it can be done with the cut-based approach, i.e. applying the selection on the corresponding reco-level variables: $p_{T}(Z)$ and $n_{\text{add. jets}}$. 


I think a word is missing in the first sentence here?

Maybe e.g. "When determining the reco-level binning for any differential analysis" (it may also be good to use "detector-level" rather than "reco-level", but not sure consistency with either choice is probably the most important thing)

I wonder if in general spelling out "generator level" and "detector level" is a bit more understandable and good for the tutorial.

kcormi · 2023-09-27T04:34:40Z

docs/tutorial2023_unfolding/unfolding_exercise.md

+
+One of the most important stages in the analysis design, is to make sure that the reco-level categories are pure with the corresponding gen-level processes.
+
+To explicitly check it, one can plot the contributions of gen-level bins in all of the reco-level bins. We propose to use the script provided in the tutorial git-lab page. This script uses `CombineHarvester` to loop over detector level bins, and get the rate at which each of the signal processes (generator-level bins) contributes to that detector-level bin; which is then used to plot the migration matrix.


Maybe "we propose to use the script" --> "You can use the script"

kcormi · 2023-09-27T04:37:18Z

docs/tutorial2023_unfolding/unfolding_exercise.md

+```shell
+text2workspace.py -m 125  counting/combined_ratesOnly.txt -P HiggsAnalysis.CombinedLimit.PhysicsModel:multiSignalModel  --PO verbose --PO 'map=.*/.*ZH_lep_PTV_75_150_hbb:r_zh_75_150[1,-5,5]' --PO 'map=.*/.*ZH_lep_PTV_150_250_0J_hbb:r_zh_150_250noj[1,-5,5]'  --PO 'map=.*/.*ZH_lep_PTV_150_250_GE1J_hbb:r_zh_150_250wj[1,-5,5]' --PO 'map=.*/.*ZH_lep_PTV_250_400_hbb:r_zh_250_400[1,-5,5]' --PO 'map=.*/.*ZH_lep_PTV_GT400_hbb:r_zh_gt400[1,-5,5]' -o ws_counting.root
+```
+In the example given above a signal POI is assigned to each gen-level bin independent on reco-level bin. This allows to take into account the non-trivial to take into account migration. One can also perform bin-by-bin unfolding using the mapping to the bin names rather that processes, e.g. `'map= vhbb_Zmm_gt400_13TeV/.*:r_reco_zh_gt400[1,-5,5]'`, but this method is not recommended and can be used only for tests as another way to ensure that the migration matrix is close to diagonal. 


"Independent on" --> "Independent of"

"This allows to take into account the non-trivial to take into account migration" --> "This allows us to take into account migrations from a given generator level bin into various detector level bins."

kcormi · 2023-09-27T04:59:25Z

docs/tutorial2023_unfolding/unfolding_exercise.md

+![](figures/simplifiedXS_VH_1_2.png) 
+
+In this tutorial we will focus on the ZH production, with Z boson decaying to charged leptons, and Higgs boson reconstructed with the resolved $b\bar{b}$ pair. We will also use only a part of the Run 2 categories, so the analysis sensitivity is not going to be achieved. Note that ggZH and ZH production modes are combined in the fit, since it is not possible to resolve them at this stage of the analysis. The STXS categories are defined independently of the Higgs decay channel, to streamline the combinations of the cross section measurement. 
+


Maybe we could also add a few words on the structure of the tutorial here.

e.g.

"In the first part of the tutorial, we will setup a relatively simple unfolding, where there is a single detector-level bin for every generator-level bin we are trying to measure. We will then perform a blind analysis using this setup to see the expected sensitivity.

In the second part of the tutorial we will perform the same measurement with a more advanced setup, making use of differential distributions per generator-level bin we are trying to measure, as well as control regions. By providing this additional information to the fit, we are able to achieve a better and more robust unfolding result. After checking the expected sensitivity, we will take a look at the impacts and pulls of the nuisance parameters. Then we will unblind and look at the results of the measurement, produce generator-level plots and provide the correlation matrix for our measured observables."

kcormi · 2023-09-27T05:02:30Z

docs/tutorial2023_unfolding/unfolding_exercise.md

+
+The hands-on exercise is split into seven parts: 
+
+1) Counting experiment


It wonder if it might be more clear to change this name from something like "counting experiment" to "simplified analysis strategy" or "simplified unfolding"?

And similarly, maybe changing "Shape analysis with control regions" to something like "Extending the analysis with other variables and control regions"

kcormi · 2023-09-27T05:13:18Z

docs/tutorial2023_unfolding/unfolding_exercise.md

+```
+
+![](figures/impacts_zh_75_150.png)
+* Do you observe differences in impacts plots for different POIs, do these differences make sense to you? 


because of the layout, I think people might miss this question if it is after the plot, might be easier to see if it comes just before the figure?

kcormi · 2023-09-27T05:14:59Z

docs/tutorial2023_unfolding/unfolding_exercise.md

+## Unfolded measurements
+
+Now that we studied the NP impacts for each POI, we can finally extract the measurements.
+Note, that in this exercise we are skipping further checks and validation that you should do on your analysis for the purposes of the tutorial. Namely the goodness of fit test and the post-fit plots of folded observables. Both of these checks were detailed in the previous exercises, you can find the description under the following links. 


Sorry, it was my suggestion earlier but maybe its better to remove "for the puproses of the tutorial" here. To make sure no one thinks we are saying that they should do the checks for the purposes of the tutorial. Alternatively we could write it as "Note, that in this exercise, for the purposes of the tutorial, we are skipping ... "

(and I think we are missing the links referred to by "the following links")

kcormi · 2023-09-27T05:17:39Z

docs/tutorial2023_unfolding/unfolding_exercise.md

+
+## POIs correlations
+
+In addition to the cross-section measurements it is very important to publish correlation matrix of measured cross sections.  


"to publish the correlation matrix" (missing the) "of the measured cross sections" (again missing the)

anigamova · 2023-10-16T14:53:35Z

Thanks for the updates @kcormi, all good from my side.

anigamova and others added 12 commits September 26, 2023 03:09

add tutorial files and figures

3f91747

add pngs

3b93726

add xs plot

7628e13

Update unfolding_exercise.md

7eae147

Update unfolding_exercise.md

a5f468b

Update unfolding_exercise.md

bf286f0

Add files via upload

76bdccf

Update unfolding_exercise.md

eddf2dd

Update unfolding_exercise.md

143b211

remove unused plots

a10fb78

add to yml

2afb47c

Update unfolding_exercise.md

09d66a9

kcormi reviewed Sep 26, 2023

View reviewed changes

updating the plots with disclaimers and add more details for the options

f64b210

kcormi reviewed Sep 26, 2023

View reviewed changes

anigamova and others added 2 commits September 26, 2023 19:01

address comments

6895f61

fix few bugs

686cd9f

kcormi reviewed Sep 26, 2023

View reviewed changes

Update unfolding_exercise.md

91d7a25

kcormi reviewed Sep 27, 2023

View reviewed changes

Update unfolding_exercise.md

436a1d0

anigamova mentioned this pull request Sep 29, 2023

Fix the Build ci job for python>=3p8 #863

Merged

kcormi added 4 commits October 12, 2023 18:49

Wording updates and other expository tutorial changes

dbcd0b6

more minor updates

69c60fa

A bit more pedagogy, switching migration matrix normalizaion

ac3a8e1

Minor typos

136a118

anigamova and others added 9 commits October 17, 2023 10:14

add the tutorial files

cfe04c9

add the path to the tutorial files

f0597f7

fix lint

e5bdf77

Fix most linter errors

03497b5

fix almost all linting errors

340b20a

merge

9b0a19a

last lint fixes

481a2af

black lint

ed1cc7e

lint with the proper black version

4ad1387

kcormi merged commit f5d8df9 into cms-analysis:main Feb 7, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Material for unfolding tutorial #862

Material for unfolding tutorial #862

anigamova commented Sep 26, 2023

kcormi left a comment

kcormi Sep 26, 2023

kcormi Sep 26, 2023

kcormi Sep 26, 2023

kcormi Sep 26, 2023

kcormi Sep 26, 2023

kcormi Sep 26, 2023

kcormi Sep 26, 2023

kcormi Sep 26, 2023

kcormi Sep 26, 2023

kcormi Sep 26, 2023

kcormi Sep 26, 2023

kcormi Sep 26, 2023

kcormi Sep 26, 2023

anigamova commented Sep 26, 2023

kcormi commented Sep 26, 2023

anigamova commented Sep 26, 2023

kcormi Sep 26, 2023

kcormi Sep 26, 2023

kcormi Sep 26, 2023

kcormi Sep 26, 2023

kcormi Sep 26, 2023

kcormi left a comment

kcormi Sep 27, 2023

kcormi Sep 27, 2023

kcormi Sep 27, 2023

kcormi Sep 27, 2023

kcormi Sep 27, 2023

kcormi Sep 27, 2023

kcormi Sep 27, 2023

kcormi Sep 27, 2023

kcormi Sep 27, 2023

kcormi Sep 27, 2023

anigamova commented Oct 16, 2023


		When constructing the reco-level for any differential analysis the main goal is to match the gen-level bins as closely as possible. In the simplest case it can be done with the cut-based approach, i.e. applying the selection on the corresponding reco-level variables: $p_{T}(Z)$ and $n_{\text{add. jets}}$. Due to the good lepton $p_{T}$ resolution we can follow the original STXS scheme quite closely with the reco-level selection, with one exception, it is not possible to access the very-low transverse momenta bin $p_{T}(Z)<75$ GeV.

		In `counting/regions` dicrectory you can find the datacards with for five reco-level categories, each targetting a corresponding gen-level bin. Below you can find an example of the datacard for reco-level bin with $p_{T}(Z)$>400 GeV,

		--------------------------------------------------------------------------------

		```


		One of the most important stages in the analysis design, is to make sure that the reco-level categories are pure with the corresponding gen-level processes.

		To explicitly check it, one can plot the contributions of gen-level bins in all of the reco-level bins. We propose to use the script provided in the tutorial git-lab page.


		The datacards for this part of the exercise located `full_model_datacards/`, where you can find a separate datacard for each region within `full_model_datacards/regions` directory and also a combined datacard `full_model_datacards/comb_full_model.txt`.

		As you will find the datacards also contain several background processes. To control them properly we will add the regions enriched in the respective backgrounds. Then we can define a common set rate parameters for signal and control regions to scale the rates or other parameters affecting their shape.


		> Following the instructions given earlier, create the workspace and run the initial fit with `-t -1` and set the name `-n .BestFit`.

		Since this time the datacards include shape uncertainties as well as additional categories to improve the background description the fit might take much longer, but we can submit condor jobs and have results ready to look at in a few minutes.


		There's a set of combine (.txt) datacards which will help you get through the various parts of the exercise. The exercises should help you become familiar with the structure of fitting datacards.

		Note that the general recomendation on unfolding in `Combine` are available [here](https://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/part3/regularisation/), which also includes recommendations on regularisation techniques and when to use it, which is completely is not discussed in this tutorial at all.


		![](figures/scan_plot_r_zh_75_150.png)

		Repeat the same command for other POIs to fill the `summary_zh_stxs.json`, which can then be used to create the cross section plot as shown below.


		![](figures/simplifiedXS_VH_1_2.png)

		In this tutorial we will focus on the ZH production, with Z boson decaying to charged leptons, and Higgs boson reconstructed with the resolved $b\bar{b}$ pair. We will also use only a part of the Run 2 categories, so the analysis sensitivity is not going to be achieved. Note that ggZH and ZH production modes are combined in the fit, since it is not possible to resolve them at this stage of the analysis. The STXS categories are defined independently of the Higgs decay channel, to streamline the combinations of the cross section measurement.


		## Simple datacards, one-bin measurement

		When constructing the reco-level for any differential analysis the main goal is to match the gen-level bins as closely as possible. In the simplest case it can be done with the cut-based approach, i.e. applying the selection on the corresponding reco-level variables: $p_{T}(Z)$ and $n_{\text{add. jets}}$.


		The hands-on exercise is split into seven parts:

		1) Counting experiment


		## POIs correlations

		In addition to the cross-section measurements it is very important to publish correlation matrix of measured cross sections.

Material for unfolding tutorial #862

Material for unfolding tutorial #862

Conversation

anigamova commented Sep 26, 2023

kcormi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anigamova commented Sep 26, 2023

kcormi commented Sep 26, 2023

anigamova commented Sep 26, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kcormi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anigamova commented Oct 16, 2023