# What is ServiceX?

<img src="img/logo_servicex.png" width=150 height=150 />

ServiceX is a <span style="color:red">scalable</span> <span style="color:blue">HEP event data</span> <span style="color:purple">extraction</span>, <span style="color:orange">transformation</span> and <span style="color:green">delivery</span> system
- <span style="color:blue"> HEP event data</span>: supports various input data formats - ROOT Ntuple (CMS nanoAOD), ATLAS xAOD, future data formats
- <span style="color:purple"> Extraction</span>: user-selected column(s) with filtering
- <span style="color:orange"> Transformation</span>: transform into various formats - Awkward arrays, Apache Parquet, ROOT Ntuple
- <span style="color:green"> Delivery </span>: on-demand delivery to a user or streaming into Analysis System from a remote via Rucio or XRootD
- <span style="color:red">Scalable</span>: runs on any Kubernetes cluster, scales up workers if necessary

<br>
<br>

### Example ServiceX workflow

<img src="img/ServiceX_workflow_2.png" width=800 />

1. A user makes a ServiceX delivery request from Jupyter notebook via a REST interface
1. ServiceX backend looks for input datasets and retrieves an input file list
1. A relevant code is generated based on the input data format, query in func-adl, and so on
1. Transformer pods (workers) are generated to process each file (10 pods at first and scale up if necessary)
1. Outputs are streamed into the object store inside the Kubernetes cluster
1. Download outputs asynchronously

<br>
<br>

### Where is ServiceX?

- ServiceX is deployed on Kubernetes cluster 
    - Enough resource to scale pods 
    - Preferred to be co-located with a data center for high network bandwidth
- Types
    - Stand-alone: Secured by own authentication system. Web API is accessible from anywhere.
    - Integrated into coffea-casa: Secured by CERN authentication system. Only accessible inside a coffea-casa.
- Input data format
    - Dedicated ServiceX deployment for each input data format: ROOT ntuple, ATLAS xAOD, CMS Run-1 AOD
    - Single deployment for all types of input data is currently under development
- Available ServiceX endpoints

| Type | Input data format | Location | Endpoint |
| :----: | :-----------------: | :--------: | :--------: |
| Stand-alone | ATLAS ROOT Ntuple | SSL-River | https://uproot-atlas.servicex.ssl-hep.org/ |
| Stand-alone | ATLAS xAOD | SSL-River | https://xaod.servicex.ssl-hep.org/ |
| Stand-alone | ATLAS OpenData | SSL-River | https://atlasopendata.servicex.ssl-hep.org/ |
| Stand-alone | ATLAS ROOT Ntuple | UC Analysis Facility | https://uproot-atlas.servicex.af.uchicago.edu/ |
| Stand-alone | ATLAS xAOD | UC Analysis Facility | https://xaod.servicex.af.uchicago.edu/ |
| Coffea-casa | CMS ROOT Ntuple | UNL | https://coffea.casa/ |
| Coffea-casa | ATLAS OpenData | UC Analysis Facility | http://coffea.af.uchicago.edu/ |

<p style="text-align: right;"> *There are other experimental endpoints </p>

<br>
<br>

# ServiceX Client Library

The base library to interact with ServiceX

- Makes a request to a ServiceX backend
- Monitors the progress of transformation
- Download files as soon as they are ready


### Prerequisites

- Python 3.6 or higher
- A `ServiceX` endpoint


### Configuration file

- Contains endpoint access information
- Optionally other information (such as `cache_path`) can be placed

```
api_endpoints:
  - name: <your-endpoint-name>
    endpoint: <your-endpoint>
    token: <api-token>
    type: uproot
```


### Local data cache

- ServiceX requests and returned data are stored in a local temporary directory by default


In [20]:
%%writefile servicex.yaml
api_endpoints:
  - name: opendata_uproot
    endpoint: https://atlasopendata.servicex.ssl-hep.org/
    type: uproot

Overwriting servicex.yaml


### ATLAS OpenData

In [35]:
from servicex import ServiceXDataset
from func_adl_servicex import ServiceXSourceUpROOT

In [36]:
dataset_opendata = "root://eospublic.cern.ch//eos/opendata/atlas/OutreachDatasets/2020-01-22/4lep/MC/mc_345060.ggH125_ZZ4lep.4lep.root"
uproot_transformer_image = "sslhep/servicex_func_adl_uproot_transformer:develop"

In [37]:
sx_dataset = ServiceXDataset(dataset_opendata, image=uproot_transformer_image, backend_name='opendata_uproot')

In [38]:
ds = ServiceXSourceUpROOT(sx_dataset, "mini")
# ds.return_qastle = True

In [39]:
data = ds.Select("lambda event: {'lep_pt': event.lep_pt, 'lep_eta': event.lep_eta}") \
    .AsPandasDF('lep_pt').value()

In [6]:
data

Unnamed: 0,lep_pt,lep_eta
0,"[51905.457, 41248.57, 16397.67, 7471.2275]","[-0.9257092, -0.8236952, -0.48641676, 0.26671788]"
1,"[41430.645, 40307.168, 16133.789, 7481.8574]","[-1.2331822, -0.396434, -0.5415077, -0.3021792]"
2,"[33646.71, 27313.271, 20035.95, 16472.64]","[-0.03232379, -0.044152576, 0.06701253, 1.8595..."
3,"[77118.56, 27845.74, 17726.541, 14714.521]","[0.51476425, 0.8453112, 2.1891582, 0.17971124]"
4,"[161909.22, 53367.754, 25596.69, 18864.479]","[-1.0373538, -0.8217277, -1.2618828, 0.12619522]"
...,...,...
164711,"[32143.482, 24158.068, 17203.547, 14358.152]","[-1.0038322, 0.60944843, 0.87633955, 1.0397455]"
164712,"[39488.273, 33694.094, 32709.998, 14797.52]","[0.1847904, 0.7994406, -0.4549886, -1.1673094]"
164713,"[63284.21, 22707.84, 15635.994, 14873.25]","[0.93559146, 0.18448293, 0.17450815, 2.1288664]"
164714,"[52538.805, 40321.457, 25766.85, 19381.92]","[0.8802495, 1.2056149, 1.7011378, 0.85303867]"


# TCut library

[TCut library](https://github.com/ssl-hep/TCutToQastleWrapper)

In [8]:
import tcut_to_qastle

In [13]:
query = tcut_to_qastle.translate("mini", "lep_pt","lep_eta<1")

In [14]:
print(query)

(Select (Where (call EventDataset 'ServiceXDatasetSource' 'mini') (lambda (list event) (< (attr event 'lep_eta') 1))) (lambda (list event) (dict (list 'lep_pt') (list (attr event 'lep_pt')))))


In [None]:
r = sx_dataset.get_data_pandas_df(query)

In [12]:
r

Unnamed: 0,lep_pt
0,"[51905.457, 41248.57, 16397.67, 7471.2275]"
1,"[41430.645, 40307.168, 16133.789, 7481.8574]"
2,"[33646.71, 27313.271, 20035.95, 16472.64]"
3,"[77118.56, 27845.74, 17726.541, 14714.521]"
4,"[161909.22, 53367.754, 25596.69, 18864.479]"
...,...
164711,"[32143.482, 24158.068, 17203.547, 14358.152]"
164712,"[39488.273, 33694.094, 32709.998, 14797.52]"
164713,"[63284.21, 22707.84, 15635.994, 14873.25]"
164714,"[52538.805, 40321.457, 25766.85, 19381.92]"


# DataBinder

In [21]:
%%writefile config_databinder_opendata.yaml
General:
  ServiceXBackendName: opendata_uproot
  OutputDirectory: ServiceXData_atlasopendata
  OutputFormat: root
  ZipROOTColumns: True
  WriteOutputDict: out_atlasopendata  

Sample:
  - Name: data
    XRootDFiles: root://eospublic.cern.ch//eos/opendata/atlas/OutreachDatasets/2020-01-22/4lep/Data/data_A.4lep.root,
            root://eospublic.cern.ch//eos/opendata/atlas/OutreachDatasets/2020-01-22/4lep/Data/data_B.4lep.root,
            root://eospublic.cern.ch//eos/opendata/atlas/OutreachDatasets/2020-01-22/4lep/Data/data_C.4lep.root,
            root://eospublic.cern.ch//eos/opendata/atlas/OutreachDatasets/2020-01-22/4lep/Data/data_D.4lep.root
    Tree: mini
    FuncADL: "Select(lambda event: {'lep_pt': event.lep_pt, 'lep_eta': event.lep_eta})"
  - Name: ggH125_ZZ4lep
    XRootDFiles: root://eospublic.cern.ch//eos/opendata/atlas/OutreachDatasets/2020-01-22/4lep/MC/mc_345060.ggH125_ZZ4lep.4lep.root
    Tree: mini
    FuncADL: "Select(lambda event: {'lep_pt': event.lep_pt, 'lep_eta': event.lep_eta})"

Overwriting config_databinder_opendata.yaml


In [1]:
from servicex_databinder import DataBinder
sx_db = DataBinder('config_databinder_opendata.yaml')

INFO - opening config file: config_databinder_opendata.yaml


In [2]:
out = sx_db.deliver()

INFO - retrieving data via opendata_uproot ServiceX..
INFO - complete ServiceX data delivery..
INFO - post-processing..
INFO - done.


In [3]:
out

{'data': {'mini': ['/Users/kchoi/Work/UTAustin/Computing/ServiceX/ServiceX-at-IRIS-HEP-ACG-workshop-2021/ServiceXData_atlasopendata/data/mini/root___eospublic.cern.ch__eos_opendata_atlas_OutreachDatasets_2020-01-22_4lep_Data_data_C.4lep.root.parquet',
   '/Users/kchoi/Work/UTAustin/Computing/ServiceX/ServiceX-at-IRIS-HEP-ACG-workshop-2021/ServiceXData_atlasopendata/data/mini/root___eospublic.cern.ch__eos_opendata_atlas_OutreachDatasets_2020-01-22_4lep_Data_data_B.4lep.root.parquet',
   '/Users/kchoi/Work/UTAustin/Computing/ServiceX/ServiceX-at-IRIS-HEP-ACG-workshop-2021/ServiceXData_atlasopendata/data/mini/root___eospublic.cern.ch__eos_opendata_atlas_OutreachDatasets_2020-01-22_4lep_Data_data_D.4lep.root.parquet',
   '/Users/kchoi/Work/UTAustin/Computing/ServiceX/ServiceX-at-IRIS-HEP-ACG-workshop-2021/ServiceXData_atlasopendata/data/mini/root___eospublic.cern.ch__eos_opendata_atlas_OutreachDatasets_2020-01-22_4lep_Data_data_A.4lep.root.parquet']},
 'ggH125_ZZ4lep': {'mini': ['/Users/kc

# Coffea interface

# Useful links

- [ServiceX readthedoc](https://servicex.readthedocs.io/en/latest/)
- [Github - ServiceX Frontend](https://github.com/ssl-hep/ServiceX_frontend)