# DQM Playground Data Engineering Test Client

A sample API client for accessing DQM Playground's data. 

## Install the auto-generated API client
Since we are using OpenAPI specification, we can automatically generate a python client using:
- The API schema generated by DQM Playground [here](https://ml4dqm-playground.web.cern.ch/openapi?format=openapi-json)
- The Swagger client generator [here](https://editor.swagger.io/)

We download the client and can install like like so:

In [1]:
!pip install ./client --user
!pip install -r requirements.txt

Processing ./client
  Preparing metadata (setup.py) ... [?25ldone
Building wheels for collected packages: swagger-client
  Building wheel for swagger-client (setup.py) ... [?25ldone
[?25h  Created wheel for swagger-client: filename=swagger_client-1.0.0-py3-none-any.whl size=91671 sha256=6f16366c38e8189395f51ed8c4bcf116b89eb23c9bf28459f65fe8007052d592
  Stored in directory: /tmp/pip-ephem-wheel-cache-s2cra5r1/wheels/35/6c/2c/fe2976cd10438467ecebb623a3c52b68b6041f09c09ccf69d8
Successfully built swagger-client
Installing collected packages: swagger-client
  Attempting uninstall: swagger-client
    Found existing installation: swagger-client 1.0.0
    Uninstalling swagger-client-1.0.0:
      Successfully uninstalled swagger-client-1.0.0
Successfully installed swagger-client-1.0.0
Defaulting to user installation because normal site-packages is not writeable


In [2]:
import swagger_client
from swagger_client.rest import ApiException
from pprint import pprint

## Configure the client
Replace `API_TOKEN` here.

In [3]:
# Configure and create an API client
# using an API token
API_TOKEN = None
MLP_URL = "https://ml4dqm-playground.web.cern.ch"

configuration = swagger_client.Configuration()
configuration.host = MLP_URL
client = swagger_client.ApiClient(configuration)
client.set_default_header(header_name="Content-Type", header_value="application/json")
client.set_default_header(header_name="Authorization", header_value=f"Token {API_TOKEN}")

api_instance = swagger_client.ApiApi(client)

## Fetch data

<div class="alert alert-block alert-info">
    <b>Note:</b> The fetched data is paginated by 50, meaning that each response will
    have at most 50 results. You can ask for more by specifying a page number.
</div>
<div class="alert alert-block alert-warning">
    <b>Warning:</b> The response should normally contain <b>next</b>, <b>previous</b> and <b>count</b> keys,
    showing the total number of results available. This does not currently work with
    this automatically generated client.
</div>

### Run information

Available parameters for filtering Runs are:


- `page = 56` # int | A page number within the paginated result set. (optional
- `run_number = 'run_number_example'` # str | run_number (optional)
- `run_date = 'run_date_example'` # str | run_date (optional)
- `year = 'year_example'` # str | year (optional)
- `period = 'period_example'` # str | period (optional)
- `_date = '_date_example'` # str | date (optional)
- `oms_fill = 'oms_fill_example'` # str | oms_fill (optional)
- `oms_lumisections = 'oms_lumisections_example'` # str | oms_lumisections (optional)
- `oms_initial_lumi = 'oms_initial_lumi_example'` # str | oms_initial_lumi (optional)
- `oms_end_lumi = 'oms_end_lumi_example'` # str | oms_end_lumi (optional)

In [4]:
run_number = 315741  # an int will also do
run = api_instance.list_runs(run_number=run_number)
pprint(run)

{'count': None,
 'next': None,
 'previous': None,
 'results': [{'_date': datetime.datetime(2022, 4, 29, 19, 3, 29, 654340, tzinfo=tzutc()),
              'run_number': 315741}]}


### Lumisections information

<div class="alert alert-block alert-info">
    <b>Note:</b> Currently not much info on Lumisections is available through the API
</div>

Available parameters for filtering Lumisections are:

- `page = 56` # int | A page number within the paginated result set. (optional)
- `run = 'run_example'` # str | run (optional)
- `ls_number = 'ls_number_example'` # str | ls_number (optional)
- `_date = '_date_example'` # str | date (optional)
- `oms_zerobias_rate = 'oms_zerobias_rate_example'` # str | oms_zerobias_rate (optional)

In [5]:
# BUG FIXED - You can use run_number here to fetch all lumisections for a specific run
run_number = 315257 
ls = api_instance.list_lumisections(run=run_number)
pprint(ls)

{'count': None,
 'next': '/api/lumisections/?page=2&run=315257',
 'previous': None,
 'results': [{'_date': datetime.datetime(2022, 7, 15, 15, 58, 37, 115504, tzinfo=tzutc()),
              'ls_number': 1,
              'run': '315257'},
             {'_date': datetime.datetime(2022, 7, 15, 15, 58, 37, 178947, tzinfo=tzutc()),
              'ls_number': 2,
              'run': '315257'},
             {'_date': datetime.datetime(2022, 7, 15, 15, 58, 37, 184986, tzinfo=tzutc()),
              'ls_number': 3,
              'run': '315257'},
             {'_date': datetime.datetime(2022, 7, 15, 15, 58, 37, 190822, tzinfo=tzutc()),
              'ls_number': 4,
              'run': '315257'},
             {'_date': datetime.datetime(2022, 7, 15, 15, 58, 37, 197316, tzinfo=tzutc()),
              'ls_number': 5,
              'run': '315257'},
             {'_date': datetime.datetime(2022, 7, 15, 15, 58, 37, 203367, tzinfo=tzutc()),
              'ls_number': 6,
              'run': '315257'}

### Lumisection Histogram 1D

Available parameters for filtering 1D Lumisection Histograms are:
- `page = 56` # int | A page number within the paginated result set. (optional)
- `lumisection__run__run_number = 'lumisection__run__run_number_example'` # str | lumisection__run__run_number (optional)
- `lumisection__run__run_number__gte = 'lumisection__run__run_number__gte_example'` # str | lumisection__run__run_number__gte (optional)
- `lumisection__run__run_number__lte = 'lumisection__run__run_number__lte_example'` # str | lumisection__run__run_number__lte (optional)
- `lumisection__ls_number = 'lumisection__ls_number_example'` # str | lumisection__ls_number (optional)
- `lumisection__ls_number__gte = 'lumisection__ls_number__gte_example'` # str | lumisection__ls_number__gte (optional)
- `lumisection__ls_number__lte = 'lumisection__ls_number__lte_example'` # str | lumisection__ls_number__lte (optional)
- `entries__gte = 'entries__gte_example'` # str | entries__gte (optional)
- `entries__lte = 'entries__lte_example'` # str | entries__lte (optional)
- `title = 'title_example'` # str | title (optional)
- `lumisection__ls_number__in = 'lumisection__ls_number__in_example'` # str | lumisection__ls_number__in (optional)
- `lumisection__run__run_number__in = 'lumisection__run__run_number__in_example'` # str | lumisection__run__run_number__in (optional)


In [6]:
# By title
title = "Summary_ClusterStoNCorr__OnTrack__TEC__MINUS__wheel__7"
lh1d = api_instance.list_lumisection_histogram1_ds(title=title) 
print(f"Got {len(lh1d.results)} results!\n")
pprint(lh1d)

Got 50 results!

{'count': None,
 'next': '/api/lumisection_histograms_1d/?page=2&title=Summary_ClusterStoNCorr__OnTrack__TEC__MINUS__wheel__7',
 'previous': None,
 'results': [{'data': [0.0,
                       0.0,
                       0.0,
                       0.0,
                       0.0,
                       0.0,
                       0.0,
                       2.0,
                       0.0,
                       2.0,
                       18.0,
                       43.0,
                       148.0,
                       386.0,
                       665.0,
                       1124.0,
                       1672.0,
                       2157.0,
                       2443.0,
                       2616.0,
                       2786.0,
                       2799.0,
                       2766.0,
                       2604.0,
                       2389.0,
                       2212.0,
                       1961.0,
                       1726.0,
     

In [7]:
# By run number
run_num = 315267
lh1d = api_instance.list_lumisection_histogram1_ds(
    lumisection__run__run_number=run_num
) 
print(f"Got {len(lh1d.results)} results!\n")
pprint(lh1d)

Got 50 results!

{'count': None,
 'next': '/api/lumisection_histograms_1d/?lumisection__run__run_number=315267&page=2',
 'previous': None,
 'results': [{'data': [0.0,
                       0.0,
                       3835.0,
                       16561.0,
                       17598.0,
                       1527.0,
                       401.0,
                       222.0,
                       98.0,
                       65.0,
                       39.0,
                       21.0,
                       15.0,
                       13.0,
                       10.0,
                       5.0,
                       3.0,
                       5.0,
                       3.0,
                       3.0,
                       5.0,
                       0.0,
                       2.0,
                       1.0,
                       0.0,
                       2.0,
                       0.0,
                       0.0,
                       0.0,
                       1

### Lumisection Histogram 2D
Same filters with 1D Histograms apply.

In [11]:
# By run number
run_num = 297656
lh2d = api_instance.list_lumisection_histogram2_ds(
    lumisection__run__run_number=run_num
) 
print(f"Got {len(lh2d.results)} results!\n")
# Warning -- this contains a LOT of data
#pprint(lh2d)

Got 50 results!



### Working with the data received
The data returned from the client we have been using until here are **lists of instances of custom classes**, which reflect the data stored in the database that the API operates upon.
These classes have a `to_dict()` method which can be used to convert them to dictionaries and, then, use whatever data format you require.

#### Convert to pandas DataFrame
An example for 1D Lumisection histograms, using the `pd.DataFrame.from_dict()` method to load them.

In [9]:
import pandas as pd

lh1d_df = pd.DataFrame.from_dict([result.to_dict() for result in lh1d.results])

print(lh1d_df.head())

   id     run  lumisection           title  entries  \
0  41  315267           25  size_PXDisk_+1    40450   
1  42  315267           25  size_PXDisk_+2    71063   
2  43  315267           25  size_PXDisk_+3    59147   
3  44  315267           25  size_PXDisk_-1    88853   
4  45  315267           25  size_PXDisk_-2    84309   

                                                data x_min x_max x_bin  \
0  [0.0, 0.0, 3835.0, 16561.0, 17598.0, 1527.0, 4...  None  None  None   
1  [0.0, 0.0, 8453.0, 31112.0, 28307.0, 1779.0, 5...  None  None  None   
2  [0.0, 0.0, 7363.0, 27059.0, 21525.0, 1645.0, 5...  None  None  None   
3  [0.0, 0.0, 9164.0, 35896.0, 38397.0, 3173.0, 9...  None  None  None   
4  [0.0, 0.0, 10433.0, 37838.0, 32247.0, 2218.0, ...  None  None  None   

   source_data_file  
0            175119  
1            175119  
2            175119  
3            175119  
4            175119  
