# Xbb Tagger Calibration Package Tutorial 

## Introduction

### What is an Xbb tagger?

Think of this tagger in this way. After a series of collisions which happened in LHC, we collect a bunch of data and reconstruct them into many jets (Assume you are already familier with the concept of jet after the jet tutorial you went through). However, having these jets in hand, we are particularly interested in bb dijets. Xbb tagger is a tool which selects out all the bb jets from this vast ocean of jets, like a robot putting a "this is a bb jet" tag on all the bb jets, just as its name implies. 

"So why do we care about bb jets?", you may ask. This is because by prediction, some new unknown heavy particle (which we are desperately looking for) has Higgs boson included in its decay chains, and, by the previous study, Higgs would again decays into b and b-bar. Hence, in order to spot out the new heavy particle from the up stream, we need first isolate out all the bb jets at the down stream, this is why this tagger cares exclusively about bb jets. However, since H->bb decay is not easy to find, so we use g->bb decays instead to examine this tagger (On principle, Xbb tagger should be valid for identifing what ever particle decaying into bb, this is why we call it **X**bb, **X** as a sence of a variable). To identify g->bb events, we use the fact that most of the time in an g->bb decay, one of the b jets would include a muon. So as you go through the package, you will see some names like **Muon Inclusive** or **Muon Filtered**, and here is why. 




<img src="img/gbb_decay_diagram.png" alt="drawing" width="600"/>

### What is scale factor and how do we calculate it?

From the jet tutorial, you learned about how we use Monte Carlo (MC for short) simulation in high energy physics. The scale factor (SF for short) is baiscally a fraction comparing our tagger's efficiencies when applying on MC simulation and true measured data sample. A quick graphical definition is shown below:


<img src="img/SF_def.png" alt="drawing" width="500"/>

Where $\epsilon$ represents tagger efficiency and its subscripts denote two types of efficiency mentioned above. $N _{BB, total}$ coresponds to the number of bb jet before tagging, and $N _{BB, double-b-tagged}$ is the number of bb jet after tagging. Calculating the scale factor $\kappa _{SF}$ is by just take the quotient of two $\epsilon$. 

So we now know the definition of the SF, but how do we actually calculate it from all the sample sets we have in hand? Here is a breif discription of the percedure:

**1. Get the number of BB events in MC to obtain $N^{MC}_{BB, total}$**

From our MC simulation, the total number of BB events is already known (dark blue area). (Notice that we have classify the events into different bins according to their $s_{d0}$ to get a more percise result)

<img src="img/MC_pretag.png" alt="drawing" width="300"/>

**2. Apply double b tagging on MC to get $N^{MC}_{BB, double-b-tagged}$**

After applying the tagger, we could see that most of the events have been removed (LL, CL, CC, BL) except BB events. The number of BB events is still known (area of dark blue).

<img src="img/MC_posttag.png" alt="drawing" width="300"/>

At this point, we could already calculate $\epsilon_{MC}$ by taking the ration of dark blue area from two graphs above.

**3. Do fitting on (1.) to get $N^{data}_{BB, total}$**

From the above graphs, notice the black dots denoted *Data*, those are the total number of events we **measured** at LHC. In each bin, we don't know how many BB events we have, all we know is **the sum** of all events (i.e. LL, CL, CC, BL, BB). Here, we will make an important assumption: **"After applying fitting on MC, the proportion of BB events is the same as which in measured data"**. So, by scaling (or, in other words, fitting) the MC event number (the number including all events, not just BB) in each bin to match the data event number, we could obtain the BB events number in data. 

<img src="img/data_total.png" alt="drawing" width="300"/>

**4. Do fitting on (2.) to get $N^{data}_{BB, double-b-tagged}$**

<img src="img/data_bb.png" alt="drawing" width="300"/>

By taking the ratio of dark bule area again for the fitted graphs, you obtain $\epsilon_{data}$.

**5. Calculate $\kappa_{SF} = \frac{\epsilon_{data}}{\epsilon_{MC}}$**



<img src="img/efficiencies.png" alt="drawing" width="800"/>

The above graphs are $\epsilon_{MC}$ and $\epsilon_{data}$, subdivied by the transverse momentum of each jets (i.e. non-$\mu$ and $\mu$).
Divide the number in the right graph by the number in the left (reginal wise), you obtain the SF as below.

<img src="img/SF.png" alt="drawing" width="600"/>


So what's the use of this scale factor? One use of it is to calculate $\epsilon_{data}$, this is easy since $\epsilon_{MC}$ and $\kappa_{SF}$ are all known numbers (of course, after we do the scale factor calculation). Another use of the scale factor is to check the reliability of our MC simulation. Logically speaking, if our MC modle simulates real events fairly truthfully, $\epsilon_{MC}$ and $\epsilon_{data}$ should be about the same, that is, the scale factor $\kappa_{SF} = \frac{\epsilon_{data}}{\epsilon_{MC}}$ should be close to 1.

## Setting up

### First time set up

#### Applying for a NERSC account
First of all, you will need to apply for a **NERSC** account. Please contact your supervisor to do so. NERSC is a scientific computing facility for the Office of Science in the U.S. Department of Energy. Through your account, you could log into the super computer at NERSC from your local machine and do all your work over there. Beaware that NERSC has several nodes avalible, most of our group members use **Cori**. 

After you have your NERSC account, try logging in using:

<span style="color:red">$ ssh -Y **myusername**@</span><span style="color:red">cori.nersc.gov </span>

where **myusuername** should be replaced by your NERSC log in ID.  
Then you will be prompted to enter your NERSC password and one time password. Note: **DO NOT** put any space between these two passwords when entering to your terminal.


#### Set up NoMachine
NoMachine provides you a virtual desktop to work with, so when you disconnecte your laptop from the internet, you won't be logged out by Cori, all your previous work will still be there. Moreover, you will only need to key in your password when first connecting to your virtual desktop everyday. If you have setup Cori successfully, you know by now how cumbersome it is to login everytime you open a new tab.

Here is the link to a NERSC page explaning on how to setup NoMachine: 
<a href="https://docs.nersc.gov/connect/nx/" target="_blank">NoMachine setup</a>.

You could read through it and follow the instruction there if you wish, but here is the steps on how to setup NoMachine:

**1**. First you need to instal the NoMachine Client on your local machine. Go to the according page base on your operating system: <a href="https://www.nomachine.com/download/download&id=15" target="blank">Mac</a>, <a href="https://www.nomachine.com/download/linux&id=4" target="blank">Linux</a>, <a href="https://www.nomachine.com/download/download&id=16" target="blank">Windows</a>.


**2**. At your local machine, download the bash client sshproxy.sh from **<span>myusername</span>**@cori.nersc.gov:/project/projectdirs/mfa/NERSC-MFA/sshproxy.sh:  
<span style="color:red">\$ cd</span>  
<span style="color:red">\$ scp **myusername**</span><span style="color:red">@dtn01.nersc.gov:/project/projectdirs/mfa/NERSC-MFA/sshproxy.sh . </span>


**3**. Now run the sshproxy.sh  
<span style="color:red">\$ ./sshproxy.sh -u **myusername**</span>

Then you will be prompt to enter your password + OTP. If everything has been done correctly, you will see the following.  
<span style="color:blue">Successfully obtained ssh key /Users/wyang/.ssh/nersc  
Key /Users/**myusername**/.ssh/nersc is valid: from 2018-08-30T12:24:00 to 2018-08-31T12:25:52</span>

Now in your <span style="color:blue">~/.ssh</span> folder, you should see three files got generated: <span style="color:blue">nersc</span>, <span style="color:blue">nersc-cert.pub</span>, and <span style="color:blue">nersc.pub</span>.


**4**. Open the NoMachine client and click the "New" box in the upper right corner of the menu.


**5**. Select Protocol to be "SSH" and click Continue.


**6**. Type in "nxcloud01.nersc.gov" for Host (leave the port set to 22) and click Continue>


**7**. Choose "Private key" and click Continue.


**8**. Fill in the path of the key you generated in step 3, <span style="color:blue">~/.ssh/nersc </span>, and click Continue.


**9**. Select "Don't use a proxy" and click Continune.


**10**. Type in the name you like for your connection. An easy understandable name would be something like "Connection to cori.nersc.gov". After naming your connection, click Done. Now you should see the new connection you just created on your NoMachine interface.  


**11**. Exit and close NoMachine completely (i.e. not running). Open the file at <span style="color:blue">~/.nx/config/player.cfg</span> and change the following key from <span style="color:blue">library</span> to <span style="color:blue">native</span>:  

<span style="color:blue"><option key="SSH client mode" value="library" / > </span>


**12**. Fianlly, click on the NoMachine icon and connect to Cori by clicking the connection image you created in step 10. Enter your NERSC passwaord + OTP if prompted, then click on "Create a new virtual desktop". Now you have succefully created your virtual desktop, we will then do everything followed in this tutorial here. **NOTE: the ssh key you obtained using sshproxy only has a life time of 24hr.** So, if you get an error of "authentication fail" while connecting to Cori through NoMachine, which means your key has expired. Run step 3 again then you could be able to connect to Cori.



#### Obtaining the gbb calibration package
Clone the package with 


#### Enviroment setup
In order to use Cori on most of the ATLAS sites, you will need to add a missing library located at <span style="color:blue">/global/project/projectdirs/atlas/scripts/extra_libs_180822</span> to <span style="color:blue">LD_LIBRARY_PATH</span>.

### Every time set up