`TTbarResCoffeaOutputs` Notebook to produce Coffea output files for an all hadronic $t\bar{t}$ analysis.  The outputs will be found in the corresponding **CoffeaOutputs** directory.

In [None]:
import time
import copy
import scipy.stats as ss
from coffea import hist
from coffea.analysis_objects import JaggedCandidateArray
import coffea.processor as processor
from coffea import util
from awkward import JaggedArray
import numpy as np
import glob as glob
import itertools
import pandas as pd
from numpy.random import RandomState

In [None]:
#from columnservice.client import ColumnClient
#cc = ColumnClient("coffea-dask.fnal.gov")
#client = cc.get_dask()

#from distributed import Client
#client = Client('coffea-dask.fnal.gov:8786')

In [None]:
#from columnservice.client import FileManager
#FileManager.open_file(TTbarResProcessor.py)

As of 2/1/21, I haven't found a way to import the other modules defined in this current directory while running the uproot job with `processor.dask_executor`.  Any attempt to do so, the coffea processor will not recognize the module(s) as being imported properly due to how `cloudpickle` is currently implemented.  A solution (or a workaround) is being sought, but in the meantime, `processor.futures_executor` works just fine!

One possible fix for this is to find some importing method that is found in the `columnservice.client` tools.  Needs a deeper look...

If time is of the essence, one can copy and paste the cells from these modules in place of the `import` statements below to run with dask. Otherwise, run this notebook as is and grab some popcorn while Coffea works its magic :)

In [None]:
from TTbarResProcessor import TTbarResProcessor

In [None]:
from Filesets import filesets

In [None]:
tstart = time.time()

outputs_unweighted = {}

seed = 1234577890
prng = RandomState(seed)
Chunk = [100000, 10] # [chunksize, maxchunks]

for name,files in filesets.items(): 
    

    print(name)
    output = processor.run_uproot_job({name:files},
                                      treename='Events',
                                      processor_instance=TTbarResProcessor(UseLookUpTables=False,
                                                                           ModMass=False,
                                                                           RandomDebugMode=True,
                                                                           prng=prng),
                                      #executor=processor.dask_executor,
                                      #executor=processor.iterative_executor,
                                      executor=processor.futures_executor,
                                      executor_args={
                                          #'client': client, 
                                          'nano':False, 
                                          'flatten':True, 
                                          'skipbadfiles':False,
                                          'workers': 2},
                                      chunksize=Chunk[0], maxchunks=Chunk[1]
                                     )

    elapsed = time.time() - tstart
    outputs_unweighted[name] = output
    print(output)
    #util.save(output, 'CoffeaOutputs/UnweightedOutputs/TTbarResCoffea_' + name + '_unweighted_output_partial_2021_dask_run.coffea')

In [None]:
print('Elapsed time = ', elapsed, ' sec.')
print('Elapsed time = ', elapsed/60., ' min.')
print('Elapsed time = ', elapsed/3600., ' hrs.') 

In [None]:
for name,output in outputs_unweighted.items(): 
    print("-------Unweighted " + name + "--------")
    for i,j in output['cutflow'].items():        
        print( '%20s : %12d' % (i,j) )

First, run the `TTbarResLookUpTables` module by simply importing it.  If it works, it will print out varies pandas dataframes with information about the mistag rates and finally print the `luts` multi-dictionary

In [None]:
import TTbarResLookUpTables

Next, import that multi-dictionary `luts`, as it is needed for the processor to create output files.  These new output files will have the necessary datasets weighted by their corresponding mistag rate

In [None]:
from TTbarResLookUpTables import luts

In [None]:
from Filesets import filesets_forweights

Ensure that the necessary files have been included in the `TTbarResLookUpTables` process before running the next processor, as the mistag procedure is found within that module.  For details about the categories used to write the mistag procedure, refer to the `TTbarResProcessor` module.

In [None]:
""" Runs Processor, Weights Datasets with Corresponding Mistag Weight, Implements Mass Modification Procedure """

tstart = time.time()

seed = 1234577890
outputs_weighted = {}
prng = RandomState(seed)
Chunk = [100000, 10] # [chunksize, maxchunks]

for name,files in filesets_forweights.items(): 
    

    print(name)
    output = processor.run_uproot_job({name:files},
                                      treename='Events',
                                      processor_instance=TTbarResProcessor(UseLookUpTables=True,
                                                                           ModMass = True,
                                                                           RandomDebugMode = False,
                                                                           lu=luts,
                                                                           prng=prng),
                                      #executor=processor.dask_executor,
                                      #executor=processor.iterative_executor,
                                      executor=processor.futures_executor,
                                      executor_args={
                                          #'client': client, 
                                          'nano':False, 
                                          'flatten':True, 
                                          'skipbadfiles':False,
                                          'workers': 2},
                                      chunksize=Chunk[0], maxchunks=Chunk[1]
                                     )

    elapsed = time.time() - tstart
    outputs_weighted[name] = output
    print(output)
    #util.save(output, 'CoffeaOutputs/WeightedModMassOutputs/TTbarResCoffea_' + name + '_ModMass_weighted_output_partial_2021_dask_run.coffea')

In [None]:
print('Elapsed time = ', elapsed, ' sec.')
print('Elapsed time = ', elapsed/60., ' min.')
print('Elapsed time = ', elapsed/3600., ' hrs.') 

In [None]:
for name,output in outputs_weighted.items(): 
    print("-------Unweighted " + name + "--------")
    for i,j in output['cutflow'].items():        
        print( '%20s : %12d' % (i,j) )