# SMEFiT Tutorial

To run this notebook remotely in Google Colab, click the button below

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/LHCfitNikhef/smefit_release/blob/tutorial/tutorial/tutorial.ipynb)

### What is SMEFiT? 
SMEFiT is a Python package for global analyses of particle physics data in the framework of the Standard Model Effective Field Theory (SMEFT). The SMEFT represents a powerful model-independent framework to constrain, identify, and parametrize potential deviations with respect to the predictions of the Standard Model (SM). A particularly attractive feature of the SMEFT is its capability to systematically correlate deviations from the SM between different processes. The full exploitation of the SMEFT potential for indirect New Physics searches from precision measurements requires combining the information provided by the broadest possible dataset, namely carrying out extensive global analysis which is the main purpose of SMEFiT.

In case you would like a recap on the basic ideas underlying the SMEFT, see https://lhcfitnikhef.github.io/smefit_release/theory/SMEFT.html

SMEFiT has been used in the following publications

- *A Monte Carlo global analysis of the Standard Model Effective Field Theory: the top quark sector*, N. P. Hartland, F. Maltoni, E. R. Nocera, J. Rojo, E. Slade, E. Vryonidou, C. Zhang.
- *Constraining the SMEFT with Bayesian reweighting*, S. van Beek, E. R. Nocera, J. Rojo, and E. Slade.
- *SMEFT analysis of vector boson scattering and diboson data from the LHC Run II* , J. Ethier, R. Gomez-Ambrosio, G. Magni, J. Rojo.
- *Combined SMEFT interpretation of Higgs, diboson, and top quark data from the LHC*, J. Ethier, G.Magni, F. Maltoni, L. Mantani, E. R. Nocera, J. Rojo, E. Slade, E. Vryonidou, C. Zhang .
- *The automation of SMEFT-assisted constraints on UV-complete models*, J. ter Hoeve, G. Magni, J. Rojo, A. N. Rossia, E. Vryonidou .
- *Mapping the SMEFT at High-Energy Colliders: from LEP and the (HL-)LHC to the FCC-ee*, E.Celada, T. Giani, J. ter Hoeve, L. Mantani, J. Rojo, A. N. Rossia, M. O. A. Thomas, E. Vryonidou.

### Exercise  0 - Installing SMEFiT

First things first, let us install SMEFiT:

In [5]:
!pip install smefit

Collecting smefit
  Using cached smefit-3.0.1-py3-none-any.whl.metadata (4.3 kB)
Collecting arviz<0.19.0,>=0.18.0 (from smefit)
  Using cached arviz-0.18.0-py3-none-any.whl.metadata (8.7 kB)
Collecting click<9.0.0,>=8.1.3 (from smefit)
  Using cached click-8.1.7-py3-none-any.whl.metadata (3.0 kB)
Collecting cma<4.0.0,>=3.2.2 (from smefit)
  Using cached cma-3.3.0-py3-none-any.whl.metadata (8.0 kB)
Collecting matplotlib<4.0.0,>=3.9.0 (from smefit)
  Using cached matplotlib-3.9.0-cp312-cp312-macosx_10_12_x86_64.whl.metadata (11 kB)
Collecting numpy<2.0.0,>=1.21.5 (from smefit)
  Using cached numpy-1.26.4-cp312-cp312-macosx_10_9_x86_64.whl.metadata (61 kB)
Collecting pandas<3.0.0,>=2.2.2 (from smefit)
  Using cached pandas-2.2.2-cp312-cp312-macosx_10_9_x86_64.whl.metadata (19 kB)
Collecting rich<14.0.0,>=13.7.1 (from smefit)
  Using cached rich-13.7.1-py3-none-any.whl.metadata (18 kB)
Collecting scipy<2.0.0,>=1.8.0 (from smefit)
  Using cached scipy-1.13.1-cp312-cp312-macosx_10_9_x86_64.w

Note to google colab users, you may ignore the error related to pandas in case it shows up. 

For the purpose of this tutorial, we also need the following additional packages

In [3]:
!pip install wget
import sys
import os
import wget
import subprocess
import pathlib
import yaml
from IPython.display import Image

Collecting wget
  Using cached wget-3.2.zip (10 kB)
  Preparing metadata (setup.py) ... [?25ldone
[?25hBuilding wheels for collected packages: wget
  Building wheel for wget (setup.py) ... [?25ldone
[?25h  Created wheel for wget: filename=wget-3.2-py3-none-any.whl size=9656 sha256=403f1ec12ff59e995ccda72ca0855ae999bff1dd907dfcf2717ee747f258bdaf
  Stored in directory: /Users/jaco/Library/Caches/pip/wheels/01/46/3b/e29ffbe4ebe614ff224bad40fc6a5773a67a163251585a13a9
Successfully built wget
Installing collected packages: wget
Successfully installed wget-3.2


In [7]:
import smefit
import smefit.runner
from smefit.analyze.report import Report

smefit.log.setup_console(None)

Download the smefit datasets and runcards

In [None]:
def file_downloader(url, download_dir='./downloads'):

    if not os.path.exists(download_dir):
        os.mkdir(download_dir)

    file = wget.download(url, out=download_dir)
    return file

smefit_database = file_downloader('https://github.com/LHCfitNikhef/smefit_database/archive/refs/heads/main.zip')
runcard_fit = file_downloader('https://raw.githubusercontent.com/LHCfitNikhef/smefit_release/tutorial/tutorial/runcard_fit.yaml')
runcard_report = file_downloader('htps://raw.githubusercontent.com/LHCfitNikhef/smefit_release/tutorial/tutorial/runcard_report.yaml')

subprocess.run([f"unzip {smefit_database} -d ./downloads"], capture_output=False, shell=True, text=False)

### Exercise 1 - Our first fit with SMEFiT

In this first exercise, we will study the relative impact of various datasets on a two-dimensional SMEFT parameter space. Consider the four-fermion Wilson coefficients $c_{Qq}^{1, 8}$ and $c_{Qq}^{3,8}$, defined as

$$
\begin{align}
c_{Qq}^{1,8} &= c_{qq}^{1(i33i)} + 3 c_{qq}^{3(i33i)} \\
c_{Qq}^{3,8} &= c_{qq}^{1(i33i)} - c_{qq}^{3(i33i)} \, ,
\end{align}
$$

with the corresponding operators on the RHS given by

$$
\begin{align}
\mathcal{O}_{qq}^{1(i33i)} &= \left(\bar{q}_i\gamma^\mu Q \right)\left(\bar{Q} \gamma_\mu q_i\right) \\
\mathcal{O}_{qq}^{3(i33i)} &= \left(\bar{q}_i\gamma^\mu \tau^I Q \right)\left(\bar{Q} \gamma_\mu \tau^I q_l\right) \, .
\end{align}
$$

Here the fields $q_i$ and $Q$ constitute respectively the light ($i=1,2$) and heavy quark $SU(2)$ doublets, while the $\tau^I$ are the usual Pauli matrices.

**Question 1A**
- Do $\mathcal{O}_{qq}^{1(ijkl)}$ and $\mathcal{O}_{qq}^{3(ijkl)}$ define valid SMEFT operators at dimension-six? If so, why?

These operators modify SM processes measured at the LHC, and hence provide a probe to study possible new physics beyond the SM. The goal of the rest of this exercise is to set bounds on the corresponding Wilson coefficients to check whether current data is compatible with the SM or not.

**Question 1B**
- Consider top-pair production in association with a Z or W boson. Convince yourself by drawing a representative Feynman diagram that the operators defined above modify $ttV$ ($V=W, Z)$ production.

Let us now perform a fit to actual $ttV$ data. Fits can be run in SMEFiT via the following syntax

``
smefit A <path/to/smefit_runcard.yaml>
``

with an example runcard given below

```yaml
# smefit_runcard.yaml

# name to give to fit
result_ID: ttV

# path where results are stored
result_path: ./results

# path to data
data_path: ./downloads/smefit_database-main/commondata

# path to theory tables
theory_path: ./downloads/smefit_database-main/theory

# perturbatve QCD order (LO or NLO)
order: NLO

# include theory uncertainties
use_theory_covmat: True
use_t0: True

# SMEFT Expansion Order (NHO = Lambda^-2 , HO = Lambda^-4)
use_quad: False

# number of samples
n_samples: 20000


# Datasets to include
datasets:

  - CMS_ttZ_13TeV
  - CMS_ttZ_13TeV_pTZ
  - CMS_ttZ_8TeV
  - ATLAS_ttZ_13TeV
  - ATLAS_ttZ_13TeV_2016
  - ATLAS_ttZ_13TeV_pTZ
  - ATLAS_ttZ_8TeV
  - CMS_ttW_13TeV
  - CMS_ttW_8TeV
  - ATLAS_ttW_13TeV
  - ATLAS_ttW_13TeV_2016
  - ATLAS_ttW_8TeV


# Coefficients to fit
coefficients:

  O81qq: { 'min': -2, 'max': 2 }
  O83qq: { 'min': -2, 'max': 2 }
```

The datasets entry shows which datasets enter the fit, and the coefficients to be fitted are the four-fermion operators we defined earlier. We are now ready to run our first fit.  

In [9]:
!smefit A ./downloads/runcard_ttV.yaml

               [35m┌────────────────────────────────────────────────┐[0m               
               [35m│[0m[35m                                                [0m[35m│[0m               
               [35m│[0m[35m [0m[35m     ____  __  __ _____ _____ _ _____[0m[35m         [0m[35m [0m[35m│[0m               
               [35m│[0m[35m [0m[35m    / ___||  \/  | ____|  ___(_)_   _|[0m[35m        [0m[35m [0m[35m│[0m               
               [35m│[0m[35m [0m[35m    \___ \| |\/| |  _| | |_  | | | |[0m[35m          [0m[35m [0m[35m│[0m               
               [35m│[0m[35m [0m[35m     ___) | |  | | |___|  _| | | | |[0m[35m          [0m[35m [0m[35m│[0m               
               [35m│[0m[35m [0m[35m    |____/|_|  |_|_____|_|   |_| |_|[0m[35m          [0m[35m [0m[35m│[0m               
               [35m│[0m[35m [0m[35m                                              [0m[35m [0m[35m│[0m               
 

The fit has finished, and the results can now be analysed by making a fit report

In [11]:
!smefit R ./downloads/runcard_report_ttV.yaml

[2;36m[10:43:58][0m[2;36m [0m[34mINFO    [0m Analyzing : [1m[[0m[32m'ttV'[0m[1m][0m                           ]8;id=335670;file:///Users/jaco/opt/anaconda3/envs/smefit_tutorial/lib/python3.12/site-packages/smefit/analyze/__init__.py\[2m__init__.py[0m]8;;\[2m:[0m]8;id=18716;file:///Users/jaco/opt/anaconda3/envs/smefit_tutorial/lib/python3.12/site-packages/smefit/analyze/__init__.py#34\[2m34[0m]8;;\
[2;36m          [0m[2;36m [0m[34mINFO    [0m Loading dataset : ATLAS_ttW_13TeV              ]8;id=543961;file:///Users/jaco/opt/anaconda3/envs/smefit_tutorial/lib/python3.12/site-packages/smefit/loader.py\[2mloader.py[0m]8;;\[2m:[0m]8;id=271038;file:///Users/jaco/opt/anaconda3/envs/smefit_tutorial/lib/python3.12/site-packages/smefit/loader.py#135\[2m135[0m]8;;\
[2;36m          [0m[2;36m [0m[34mINFO    [0m Loading dataset : ATLAS_ttW_13TeV_2016         ]8;id=315167;file:///Users/jaco/opt/anaconda3/envs/smefit_tutorial/lib/python3.12/site-pac

In [None]:
def make_report(runcard_report):

    with open(runcard_report, encoding="utf-8") as f:
        report_config = yaml.safe_load(f)
    
    report_name = report_config["name"]
    report_path = pathlib.Path(report_config["report_path"]).absolute()
    report_folder = report_path.joinpath(f"{report_name}")

    report_folder.mkdir(exist_ok=True, parents=True)

    report = Report(report_path, report_config["result_path"], report_config)
    
    report.coefficients(**report_config["coefficients_plots"])

In [None]:
make_report("./downloads/runcard_report_asy.yaml")

# show the exclusion contour
Image(filename='./reports/report_asy/contours_2d.png') 

**Questions**

- What special behaviour do you observe between the two operators?
- What options do we have to further constrain this 2-dim parameters space? Name at least three.

### Exercise 2 A - Adding more measurements

The two operators from exercise 1 modify more than just the charge asymmetries $A_C$. Here we add more measurements, in particular top procecesses for which $A_C = 0$.

In [None]:
runner_exc_2a = smefit.runner.Runner.from_file(pathlib.Path("./downloads/runcard_sym.yaml"))
runner_exc_2a.global_analysis("A")

The report can be produced again with

In [None]:
make_report("./downloads/runcard_report_sym.yaml")

# show the exclusion contour
Image(filename='./reports/report_sym/contours_2d.png') 

**Questions**

- Can we exclude the SM this time?
- What special behaviour do you observe between the two operators and how does this compare to exercise 1?

### Exercise 2 B - combined fit

The same operators can modify multiple datasets, and we have no a priori reason to include one over the other. We must include them all! Here we carry out such a combined fit where we add the measurements from exercise 1 and 2a.

In [None]:
runner_higgs = smefit.runner.Runner.from_file(pathlib.Path("./downloads/runcard_combined.yaml"))
runner_higgs.global_analysis("A")

In [None]:
make_report("./downloads/runcard_report_combined.yaml")

In [None]:
# show the exclusion contour
Image(filename='./reports/report_combined/contours_2d.png') 

**Questions**

- Comment on the interplay between the two classes of measurements
- What lesson do you take from this?

### Exercise 3