# ServiceX endpoints for ATLAS



| Type | Collaboration | Input data format | Location | Endpoint | Purpose |
| :----: |:----: | :-----------------: | :--------: | :--------: | :-- |
| Stand-alone | ATLAS | ROOT Ntuple | UC Analysis Facility | https://uproot-atlas.servicex.af.uchicago.edu/ | Production |
| Stand-alone | ATLAS |xAOD | UC Analysis Facility | https://xaod.servicex.af.uchicago.edu/ | Production |
| Stand-alone | ATLAS| ROOT Ntuple | SSL-River | https://uproot-atlas.servicex.ssl-hep.org/ | Development |
| Stand-alone | ATLAS |xAOD | SSL-River | https://xaod.servicex.ssl-hep.org/ | Development |


<br>
<br>

# Example analysis workflows with ServiceX

- Uproot ServiceX + ROOT-based Analysis
- Uproot ServiceX + coffea analysis
- Uproot ServiceX + TRExFitter

## Uproot ServiceX + ROOT-based Analysis

- Story
    - Analysis ROOT ntuples on the grid - too large to store locally
    - Analysis group prefers to use the existing framework
- Workflow with ServiceX
    - ServiceX DataBinder to deliver branches used in the subsequent analysis
    - Preselection can be applied to further reduce the delivered ROOT ntuples
    - Only modified or added requests will be submitted to ServiceX
    - Output directory is dynamically synchronized with the configuration file
    - A ROOT file contains requested trees and branches

<br>

### Hands-on

In [None]:
%%writefile config_databinder_uproot.yaml
General:
  ServiceXBackendName: uproot
  OutputDirectory: ServiceXData_uproot
  OutputFormat: root
  ZipROOTColumns: True
  WriteOutputDict: out_uproot
  
Sample:
  - Name: ttH
    RucioDID: user.mgeyik:user.mgeyik.mc16_13TeV.346344.PhPy8EG_ttH125_1l.SGTOP1.e7148_a875_r9364_p4346.ll.b_out.root, 
             user.mgeyik:user.mgeyik.mc16_13TeV.346344.PhPy8EG_ttH125_1l.SGTOP1.e7148_a875_r10201_p4346.ll.d_out.root
    Tree: nominal
    FuncADL: "Select(lambda event: {'jet_e': event.jet_e, 'jet_pt': event.jet_pt, 'met_met': event.met_met})"
  - Name: ttH
    RucioDID: user.mgeyik:user.mgeyik.mc16_13TeV.346344.PhPy8EG_ttH125_1l.SGTOP1.e7148_a875_r9364_p4346.ll.b_out.root, 
             user.mgeyik:user.mgeyik.mc16_13TeV.346344.PhPy8EG_ttH125_1l.SGTOP1.e7148_a875_r10201_p4346.ll.d_out.root
    Tree: sumWeights
    FuncADL: "Select(lambda event: {'dsid': event.dsid, 'totalEventsWeighted': event.totalEventsWeighted})"
  - Name: ttW
    RucioDID: user.mgeyik:user.mgeyik.mc16_13TeV.410155.aMCPy8EG_ttW.SGTOP1.e5070_s3126_r10724_p4346.ll.b_out.root
    Tree: nominal
    Filter: ""
    Columns: jet_e, jet_pt

In [None]:
from servicex_databinder import DataBinder
sx_db = DataBinder('config_databinder_uproot.yaml')

In [None]:
out = sx_db.deliver()

In [None]:
out

In [None]:
import uproot
out_file = uproot.open(out['ttH'][0])
print(f"Trees: {out_file.keys()}")
print(f"Branches in the tree 'nominal': {out_file['nominal'].keys()}")

<br>
<br>

## Uproot ServiceX + coffea analysis

- Story
    - Analysis ROOT ntuples on the grid - too large to store locally
    - Analysis team is interested in doing analysis in python ecosystem, but wants to use the familiar statistical tool such as TRExFitter
    
- Workflow with ServiceX
    - ServiceX DataBinder to deliver branches used in the subsequent analysis
    - The same procedure but deliver outputs in the format of parquet, which can be consumed by coffea

- ServiceX from coffea
    - ServiceX can be also directly accessible from coffea as shown by Alex Held ([notebook](https://github.com/iris-hep/analysis-grand-challenge/blob/main/workshops/agctools2021/HZZ_analysis_pipeline/HZZ_analysis_pipeline.ipynb))
    - More efficient since both ServiceX data delivery and coffea analysis are done asynchronously

<br>

### Hands-on

In [None]:
%%writefile config_databinder_uproot.yaml
General:
  ServiceXBackendName: uproot
  OutputDirectory: ServiceXData_uproot
  OutputFormat: parquet
  WriteOutputDict: out_uproot
  IgnoreServiceXCache: False
  
Sample:
  - Name: ttH
    RucioDID: user.mgeyik:user.mgeyik.mc16_13TeV.346344.PhPy8EG_ttH125_1l.SGTOP1.e7148_a875_r9364_p4346.ll.b_out.root, 
             user.mgeyik:user.mgeyik.mc16_13TeV.346344.PhPy8EG_ttH125_1l.SGTOP1.e7148_a875_r10201_p4346.ll.d_out.root
    Tree: nominal
    FuncADL: "Select(lambda event: {'jet_e': event.jet_e, 'jet_pt': event.jet_pt, 'met_met': event.met_met})"
  - Name: ttH
    RucioDID: user.mgeyik:user.mgeyik.mc16_13TeV.346344.PhPy8EG_ttH125_1l.SGTOP1.e7148_a875_r9364_p4346.ll.b_out.root, 
             user.mgeyik:user.mgeyik.mc16_13TeV.346344.PhPy8EG_ttH125_1l.SGTOP1.e7148_a875_r10201_p4346.ll.d_out.root
    Tree: sumWeights
    FuncADL: "Select(lambda event: {'dsid': event.dsid, 'totalEventsWeighted': event.totalEventsWeighted})"
  - Name: ttW
    RucioDID: user.mgeyik:user.mgeyik.mc16_13TeV.410155.aMCPy8EG_ttW.SGTOP1.e5070_s3126_r10724_p4346.ll.b_out.root
    Tree: nominal
    Filter: ""
    Columns: jet_e, jet_pt

In [None]:
from servicex_databinder import DataBinder
sx_db = DataBinder('config_databinder_uproot.yaml')

In [None]:
out = sx_db.deliver()

<br>
<br>

## Uproot ServiceX + TRExFitter

- Story
    - Analysis ROOT ntuples on the grid - too large to store locally
    - ROOT ntuples are prepared to be directly read by TRExFitter

- TRExFitter
    - Popular tool for statistical inference via profile likelihood fits
    - User provides 
        - inputs in the format of ROOT ntuples or ROOT histograms 
        - a configuration file to steer the framework
    
- Workflow with `servicex-for-trexfitter`
    - The package analyzes your TRExFitter configuration file, and delivers only necessary branches and entries to produce all histograms defined in your TRExFitter configuration file.


### Installation

```bash
pip install servicex-for-trexfitter
```

### Prepare `TRExFitter` configuration file

The followings are the settings needed for the workflow using `servicex-for-trexfitter`:

##### `Job` block settings

- `NtuplePaths: <PATH>` 
    - The path where input root files are stored. 
    - Write permission is required as ServiceX delivers root ntuples to the subdirectory `servicex` of this path.

##### `Sample` block settings

- `GridDID: <Rucio DID>`
    - Add option `GridDID` for the `Sample` using ServiceX for delivery.     
    - Both scope and name for `GridDID`, e.g., `user.kchoi:user.kchoi.WZana_WZ`.
    - Sample can have multiple DIDs: e.g., `user.kchoi:user.kchoi.WZana_WZ_mc16a`, `user.kchoi:user.kchoi.WZana_WZ_mc16d`, `user.kchoi:user.kchoi.WZana_WZ_mc16e`
    - `Sample` without an option `GridDID` is treated as a typical Sample, which reads ntuple files from local path.
- `NtupleFile: servicex/<SAMPLE NAME>`    
    - This option is required only for the Samples with option `GridDID`. Other Samples can use any option valid for option NTUP.
    - `servicex-for-trexfitter` delivers one `ROOT` file per `Sample` with the same name as the `Sample` name.


Here is a side-by-side comparsion of example configuration files:

`servicex-for-trexfitter` | Default
:--------:|:------:
![](img/config_servicex_2.png) | ![](img/config_trexfitter_2.png)


### Usage

#### Deliver `ROOT` ntuples using `servicex-for-trexfitter`

```python
from servicex_for_trexfitter import ServiceXTRExFitter
sx_trex = ServiceXTRExFitter("<TRExFitter configuration file>")
sx_trex.get_ntuples()
```

Once you load the package, you can define an instance with an argument of `TRExFitter` configuration file.
You can then ask for delivery of `ROOT` ntuples.
It will initiate `ServiceX` transformation(s) based on your `TRExFitter` configuration, and deliver `ROOT` ntuples to the path you specified at `Job/NtuplePath`.

#### Local data cache

`ServiceX` provides the feature that caches your queries and data into a local temporary directory.
Therefore, whenever you make further changes to the `TRExFitter` configuration file, `servicex-for-trexfitter` creates data delivery requests only for the updated parts.

#### Compatible TRExFitter framework

To run the subsequent steps of `TRExFitter` with the `ROOT` ntuples that `servicex-for-trexfitter` delivered, you need to checkout the branch `feat/servicex-integration` of `TRExFitter` framework.
Otherwise, `TRExFitter` will complain about the unknown options.
The feature branch will be merged into master in the near future.

Compatible `servicex-for-trexfitter` versions:

|TRExFitter branch/commit | `servicex-for-trexfitter` version |
|:--------:|:------:|
| [`feat/servicex-integration` / `d1f57d8e`](https://gitlab.cern.ch/TRExStats/TRExFitter/-/tree/d1f57d8ecb1b0c0be0b3aaf1d6c81b6ff50f22d9) | [v0.10.0](https://github.com/kyungeonchoi/ServiceXforTRExFitter/releases/tag/v0.10.0)  |
| [`feat/servicex-integration` / `abfe0cc3`](https://gitlab.cern.ch/TRExStats/TRExFitter/-/tree/abfe0cc360bc43c49c1155380d14024a7f64c76f) | [v0.9.1](https://github.com/kyungeonchoi/ServiceXforTRExFitter/releases/tag/v0.9.1)   |

### Caveats

#### Limited support for `Selection` expression

`ServiceX` utilizes [`func-adl`](https://github.com/iris-hep/func_adl), a python-based declarative analysis description language, to filter events and request branches from the input `ROOT` ntuple.
Since `TRExFitter` uses `TTreeForumla` for `TTree` selections, the python package [`tcut-to-qastle`](https://github.com/ssl-hep/TCutToQastleWrapper) is written to translate `TTreeFormula` expression into `func-adl`.

Supported expressions:

- Arithmetic operators: `+, -, *, /`
- Logical operators: `!, &&, ||`
- Relational and comparison operators: `==, !=, >, <, >=, <=`
- Mathematical function: `sqrt`
- Ternary operator: `(A?B:C)` - has to be enclosed in parentheses

Unsupported expressions:

- Special `ROOT` functions such as `Entry$, Sum$(formula)`

### Documentation
[ServiceX for TRExFitter](https://trexfitter-docs.web.cern.ch/trexfitter-docs/interfacing_tools/servicex/)

### Hands-on

In [None]:
from servicex_for_trexfitter import ServiceXTRExFitter

In [None]:
!cat config/example.config

In [None]:
sx_trex = ServiceXTRExFitter("config/example.config")

In [None]:
sx_trex.get_ntuples()

In [None]:
!ls example/servicex

In [None]:
import uproot
out_file = uproot.open('example/servicex/ttW.root')
print(f"Trees in ttH sample: \n{out_file.keys()}\n")
print(f"Branches in nominal tree: \n{out_file['nominal'].items()}")