### Input data

Per-lumisection storage of selected monitoring elements in the regular DQMIO files is enabled since the 2023C era of data taking. All information on the per-lumisection DQMIO can be found on this twiki: https://twiki.cern.ch/twiki/bin/viewauth/CMS/PerLsDQMIO.

Since the input data are just regular DQMIO files, they can be accessed by anyone via DAS. Focusing for example on the ZeroBias dataset (but others are available just as well), the relavant dataset names are:
- `/ZeroBias/Run2023C-PromptReco-v1/DQMIO`
- `/ZeroBias/Run2023C-PromptReco-v2/DQMIO`
- `/ZeroBias/Run2023C-PromptReco-v3/DQMIO`
- `/ZeroBias/Run2023C-PromptReco-v4/DQMIO`
- `/ZeroBias/Run2023D-PromptReco-v1/DQMIO`
- `/ZeroBias/Run2023D-PromptReco-v2/DQMIO`
- `/ZeroBias/Run2023E-PromptReco-v1/DQMIO`
- `/ZeroBias/Run2023F-PromptReco-v1/DQMIO`

If one wants to start from scratch, one can access the data via DAS (either by remote file reading, or by copying files to a local area, or however you would usually access any file via DAS). But here we will make use of the files being already copied to an `\eos` area in the context of the MLPlayground project. In detail, the files are here: `/eos/project-m/mlplayground/public/DQMIO/nanodqmio_from_das/`

### Reading per-lumisection DQMIO

Some dedicated piece of code is needed to read these files and extract the per-lumisection monitoring elements, copied here to the `DQMIOReader.py` file. Probably the easiest workflow for more extensive studies would be to call this ad-hoc code only once, store the extracted MEs in an easier format (e.g. as numpy arrays in parquet files), and work with central python packages for reading and processing from there on. But for the small examples here we will skip this step and work with the `DQMIOReader` directly.

In [None]:
# import the DQMIOReader class
import os
from DQMIOReader import DQMIOReader

# define input file
inputdir = '/eos/project-m/mlplayground/public/DQMIO/nanodqmio_from_das/'
inputfile = 'store_data_Run2023C_ZeroBias_DQMIO_PromptReco-v1_000_367_231_00000_5C5AD0A2-40CB-4364-BAC8-0B168732DF43.root'
inputfile = os.path.join(inputdir, inputfile)

# make a DQMIOReader and open the file
reader = DQMIOReader(*[inputfile])

# retrieve available lumisections in the file
runsls = sorted(reader.listLumis())
print('Available lumisections: ({})'.format(len(runsls)))
for runls in runsls: print('  - Run {}, LS {}'.format(runls[0],runls[1]))
    
# retrieve available monitoring elements in the file
menames = sorted(reader.listMEs())
print('Available monitoring elements: ({})'.format(len(menames)))
# as the number of monitoring elements is quite large, we will print only a subset that is relevant here
menames = [mename for mename in menames if mename.startswith('PixelPhase1/Tracks')]
print('Selected monitoring elements: ({})'.format(len(menames)))
for mename in menames: print('  - {}'.format(mename))

Extracting a single monitoring element can be done as follows:

In [None]:
# define name of monitoring element we want to extract
mename = 'PixelPhase1/Tracks/PXForward/clusterposition_xy_ontrack_PXDisk_+1'

# extract the monitoring element
mes = reader.getSingleMEs(mename, callback=None)

# the output of getSingleMEs is a list of namedtuples (something like a dictionary)
# the actual histogram is stored in the 'data' filed of the namedtuple as a THx object
print('Type of mes: {}'.format(type(mes)))
print('mes[0]: {}'.format(mes[0]))
print('Type of mes[0].data: {}'.format(type(mes[0].data)))

Using the code above, one could extract the monitoring elements as THx objects if that is convenient for your purposes. However, for the plotting examples below, we would like to have them as numpy arrays. This can be achieved through the `getSingleMEsToDataFrame` function. The gist of this function is however simply looping over the list of MEs as extracted above, looping over the bins of each ME, and filling a numpy array with the bin contents. This could potentially be optimized, but it's good enough for now.

In [None]:
import numpy as np

# extract the monitoring elements as a pandas DataFrame
df = reader.getSingleMEsToDataFrame(mename)
df.head()

# the actual ME data have to be stored as a flattened list in the DataFrame,
# so one more extra step is needed to get 2D numpy arrays
nhists = len(df)
xbins = df['Xbins'][0]
ybins = df['Ybins'][0]
hists = np.array([np.array(df['histo'][i]).reshape(xbins+2,ybins+2).T for i in range(nhists)])
runs = np.array(df['fromrun'])
lumis = np.array(df['fromlumi'])
entries = np.array(df['entries'])
print('Shape of hists array: {}'.format(hists.shape))
print('Runs: {}'.format(runs))
print('Lumis: {}'.format(lumis))
print('Entries: {}'.format(entries))

And now for some plotting

In [None]:
import plot_utils as pu

indices = np.random.choice(len(lumis), size=3, replace=False)
for i in indices:
    title = 'Run {}, LS {}'.format(runs[i], lumis[i])
    pu.plot_hist_2d(hists[i], title=title, titlesize=15)

The cell below make a GIF image of the consecutive lumisections, so one can more easily see the time evolution.

In [None]:
titles = ['Run {}, LS {}'.format(runs[i], lumis[i]) for i in range(len(hists))]
figname = 'temp_gif.gif' # do not change or display below will not work
caxrange = (0.01,60)

pu.plot_hists_2d_gif(hists, titles=titles, figname=figname, caxrange=caxrange,
                       duration=300, mode='imageio')

In [None]:
import IPython
from IPython.display import Image
Image(filename=figname)