# ctapipe provenance

Let's run a ctapipe tool to generate provenance files. So we ask ctapipe which tools are available for us to use.

In [1]:
!ctapipe-info --tools


*** ctapipe tools ***

the following can be executed by typing ctapipe-<toolname>:

INFO - NumExpr defaulting to 8 threads.
INFO - Downloading http://cccta-dataserver.in2p3.fr/data/ctapipe-extra/v0.3.3/optics.fits.gz to /Users/jer/.cache/ctapipe/cccta-dataserver.in2p3.fr/data/ctapipe-extra/v0.3.3/optics.fits.gz
INFO - Downloading http://cccta-dataserver.in2p3.fr/data/ctapipe-extra/v0.3.3/optics.fits to /Users/jer/.cache/ctapipe/cccta-dataserver.in2p3.fr/data/ctapipe-extra/v0.3.3/optics.fits
INFO - Downloading http://cccta-dataserver.in2p3.fr/data/ctapipe-extra/v0.3.3/optics.ecsv to /Users/jer/.cache/ctapipe/cccta-dataserver.in2p3.fr/data/ctapipe-extra/v0.3.3/optics.ecsv
ctapipe-camdemo                 -  Example tool, displaying fake events in a
                                   camera.  the animation should remain
                                   interactive, so try zooming in when it is
                                   running.

ctapipe-display-dl1             -  Calibrate dl0 

In [2]:
!ctapipe-stage1 --help

Process data from lower-data levels up to DL1, including both image
extraction and optinally image parameterization
 This currently writes v1.1.0 DL1 data

Options
The options below are convenience aliases to configurable class-options,
as listed in the "Equivalent to" description-line of the aliases.
To see all configurable class-options for some <cmd>, use:
    <cmd> --help-all

--write-images
    store DL1/Event/Telescope images in output
    Equivalent to: [--DL1Writer.write_images=True]
--write-parameters
    store DL1/Event/Telescope parameters in output
    Equivalent to: [--DL1Writer.write_parameters=True]
--write-index-tables
    generate PyTables index tables for the parameter and image datasets
    Equivalent to: [--DL1Writer.write_index_tables=True]
--overwrite
    Overwrite output file if it exists
    Equivalent to: [--DL1Writer.overwrite=True]
--progress
    show a progress bar during event processing
    Equivalent to: [--Stage1Tool.progress_bar=True]
-q, --quiet
    Di

Ok, let's copy a sample events file to this working directory to run the `ctapipe-stage1` tool and clean the folder of previous log files

In [3]:
from ctapipe import utils
import shutil

shutil.copy(utils.get_dataset_path("gamma_test.simtel.gz"), "gamma_test.simtel.gz")

'gamma_test.simtel.gz'

In [4]:
!rm *.log

Now we run `ctapipe-stage1` with the `--log-file` option

In [5]:
!ctapipe-stage1 --input gamma_test.simtel.gz --progress --overwrite --log-file mylog.log

SimTelEventSource: 0ev [00:00, ?ev/s]OMP: Info #271: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
SimTelEventSource: 9ev [00:04,  1.97ev/s]
[0m

A JSON formatted provenance file is produced with the name of the executed tool in the filename together with a classic timestamps log file

In [6]:
!ls *.log

ctapipe-stage1.provenance.log mylog.log


In [7]:
!head ctapipe-stage1.provenance.log 

[
   {
      "activity_name": "ctapipe-stage1",
      "activity_uuid": "491e433c-06d1-4199-b714-f81b93846076",
      "start": {
         "time_utc": "2022-01-17T12:46:20.036"
      },
      "stop": {
         "time_utc": "2022-01-17T12:46:26.265"
      },


In [8]:
!cat mylog.log

2022-01-17 13:46:20,035 INFO [ctapipe.ctapipe-stage1] (tool.initialize): ctapipe version 0.11.0
2022-01-17 13:46:20,035 INFO [ctapipe.ctapipe-stage1] (tool.run): Starting: ctapipe-stage1
2022-01-17 13:46:20,060 INFO [ctapipe.ctapipe-stage1.SimTelEventSource] (eventsource.__init__): INPUT PATH = /Users/jer/Desktop/curso/notebooks/gamma_test.simtel.gz
2022-01-17 13:46:21,562 INFO [ctapipe.ctapipe-stage1] (subarray.info): Subarray : MonteCarloArray
2022-01-17 13:46:21,562 INFO [ctapipe.ctapipe-stage1] (subarray.info): Num Tels : 126
2022-01-17 13:46:21,563 INFO [ctapipe.ctapipe-stage1] (subarray.info): Footprint: 7.32 km2
2022-01-17 13:46:21,563 INFO [ctapipe.ctapipe-stage1] (subarray.info): 
2022-01-17 13:46:21,600 INFO [ctapipe.ctapipe-stage1] (subarray.info):        Type       Count Tel IDs
2022-01-17 13:46:21,600 INFO [ctapipe.ctapipe-stage1] (subarray.info): ----------------- ----- -------
2022-01-17 13:46:21,600 INFO [ctapipe.ctapipe-stage1] (subarray.info):    SST_ASTRI_CHEC    72 

Ok, now let's load the content of the provenance file into a Python object to better explore it

In [9]:
import json
with open('ctapipe-stage1.provenance.log') as json_file:
    provdata = json.load(json_file)

In [10]:
provdata

[{'activity_name': 'ctapipe-stage1',
  'activity_uuid': '491e433c-06d1-4199-b714-f81b93846076',
  'start': {'time_utc': '2022-01-17T12:46:20.036'},
  'stop': {'time_utc': '2022-01-17T12:46:26.265'},
  'system': {'ctapipe_version': '0.11.0',
   'ctapipe_resources_version': 'not installed',
   'eventio_version': '1.5.1.post1',
   'ctapipe_svc_path': '/Users/jer/git/cta-observatory/ctapipe-extra/ctapipe_resources',
   'executable': '/opt/miniconda3/envs/lst-school-2022-01/bin/python',
   'platform': {'architecture_bits': '64bit',
    'architecture_linkage': '',
    'machine': 'x86_64',
    'processor': 'i386',
    'node': 'Joses-MacBook-Pro.local',
    'version': 'Darwin Kernel Version 21.2.0: Sun Nov 28 20:28:54 PST 2021; root:xnu-8019.61.5~1/RELEASE_X86_64',
    'system': 'Darwin',
    'release': '21.2.0',
    'libcver': ['', ''],
    'num_cpus': 8,
    'boot_time': '2021-12-14T21:43:28.000'},
   'python': {'version_string': '3.8.12 | packaged by conda-forge | (default, Oct 12 2021, 21:

In [11]:
type(provdata)

list

In [12]:
len(provdata)

1

In [13]:
type(provdata[0])

dict

We can access provenance browsing the dictionary object

In [14]:
provdata[0]['system'].keys()

dict_keys(['ctapipe_version', 'ctapipe_resources_version', 'eventio_version', 'ctapipe_svc_path', 'executable', 'platform', 'python', 'environment', 'arguments', 'start_time_utc'])

In [15]:
provdata[0]['system']['python']['version'] # add and/or remove dict keys to browse the dict 

['3', '8', '12']

We can run again the same command and reprocess the data with eventually different parameters

In [16]:
!ctapipe-stage1 --input gamma_test.simtel.gz --progress --overwrite --log-file mylog.log --log-level DEBUG

2022-01-17 13:46:30,620 [1;32mINFO[0m [ctapipe.ctapipe-stage1] (tool.initialize): ctapipe version 0.11.0
2022-01-17 13:46:30,620 [1;32mINFO[0m [ctapipe.ctapipe-stage1] (tool.run): Starting: ctapipe-stage1
2022-01-17 13:46:30,643 [1;34mDEBUG[0m [ctapipe.core.provenance] (provenance.start_activity): started activity: ctapipe-stage1
2022-01-17 13:46:30,646 [1;32mINFO[0m [ctapipe.ctapipe-stage1.SimTelEventSource] (eventsource.__init__): INPUT PATH = /Users/jer/Desktop/curso/notebooks/gamma_test.simtel.gz
2022-01-17 13:46:30,646 [1;34mDEBUG[0m [ctapipe.core.provenance] (provenance.add_input_file): added input entity '/Users/jer/Desktop/curso/notebooks/gamma_test.simtel.gz' to activity: 'ctapipe-stage1'
2022-01-17 13:46:31,444 [1;34mDEBUG[0m [ctapipe.ctapipe-stage1.SimTelEventSource] (simteleventsource.__init__): Using gain selector <ctapipe.calib.camera.gainselection.ThresholdGainSelector object at 0x7fa72c8b15e0>
2022-01-17 13:46:31,507 [1;34mDEBUG[0m [ctapipe.core.traits] (t

We still have the same log files

In [17]:
!ls *.log

ctapipe-stage1.provenance.log mylog.log


The log content produced in this second execution has been **appended** both log files

In [18]:
!cat mylog.log

2022-01-17 13:46:20,035 INFO [ctapipe.ctapipe-stage1] (tool.initialize): ctapipe version 0.11.0
2022-01-17 13:46:20,035 INFO [ctapipe.ctapipe-stage1] (tool.run): Starting: ctapipe-stage1
2022-01-17 13:46:20,060 INFO [ctapipe.ctapipe-stage1.SimTelEventSource] (eventsource.__init__): INPUT PATH = /Users/jer/Desktop/curso/notebooks/gamma_test.simtel.gz
2022-01-17 13:46:21,562 INFO [ctapipe.ctapipe-stage1] (subarray.info): Subarray : MonteCarloArray
2022-01-17 13:46:21,562 INFO [ctapipe.ctapipe-stage1] (subarray.info): Num Tels : 126
2022-01-17 13:46:21,563 INFO [ctapipe.ctapipe-stage1] (subarray.info): Footprint: 7.32 km2
2022-01-17 13:46:21,563 INFO [ctapipe.ctapipe-stage1] (subarray.info): 
2022-01-17 13:46:21,600 INFO [ctapipe.ctapipe-stage1] (subarray.info):        Type       Count Tel IDs
2022-01-17 13:46:21,600 INFO [ctapipe.ctapipe-stage1] (subarray.info): ----------------- ----- -------
2022-01-17 13:46:21,600 INFO [ctapipe.ctapipe-stage1] (subarray.info):    SST_ASTRI_CHEC    72 

While classic timestamped log file `mylog.log` is still easy readable, the content of `ctapipe-stage1.provenance.log` is the concatenation of two JSON objects issued from the two executions of the `ctapipe-stage1` tool. 

In [19]:
!cat ctapipe-stage1.provenance.log

[
   {
      "activity_name": "ctapipe-stage1",
      "activity_uuid": "491e433c-06d1-4199-b714-f81b93846076",
      "start": {
         "time_utc": "2022-01-17T12:46:20.036"
      },
      "stop": {
         "time_utc": "2022-01-17T12:46:26.265"
      },
      "system": {
         "ctapipe_version": "0.11.0",
         "ctapipe_resources_version": "not installed",
         "eventio_version": "1.5.1.post1",
         "ctapipe_svc_path": "/Users/jer/git/cta-observatory/ctapipe-extra/ctapipe_resources",
         "executable": "/opt/miniconda3/envs/lst-school-2022-01/bin/python",
         "platform": {
            "architecture_bits": "64bit",
            "architecture_linkage": "",
            "machine": "x86_64",
            "processor": "i386",
            "node": "Joses-MacBook-Pro.local",
            "version": "Darwin Kernel Version 21.2.0: Sun Nov 28 20:28:54 PST 2021; root:xnu-8019.61.5~1/RELEASE_X86_64",
            "system": "Darwin",
            "release": "21.2.0",
            "

In order to load the content of this provenance file (concatenaded JSON objects) to a Python list we should proceed in the following way.

In [20]:
with open('ctapipe-stage1.provenance.log') as json_file:
    rawprov = json_file.read().split('][')
    
prefix = "[\n"
suffix = "\n]"


for i in range(0, len(rawprov)):
    if rawprov[i].startswith(prefix):
        rawprov[i] = rawprov[i][len(prefix):]
    if rawprov[i].endswith(suffix):
        rawprov[i] = rawprov[i][:-len(suffix)]
        
executions = [json.loads(provdata) for provdata in rawprov]

In [21]:
len(executions)

2

Let's see the provenance produced in the second execution of the `ctapipe-stage1` tool

In [22]:
executions[1]

{'activity_name': 'ctapipe-stage1',
 'activity_uuid': '99f18a96-dc2e-47c4-89bd-aa50acaff48a',
 'start': {'time_utc': '2022-01-17T12:46:30.621'},
 'stop': {'time_utc': '2022-01-17T12:46:37.331'},
 'system': {'ctapipe_version': '0.11.0',
  'ctapipe_resources_version': 'not installed',
  'eventio_version': '1.5.1.post1',
  'ctapipe_svc_path': '/Users/jer/git/cta-observatory/ctapipe-extra/ctapipe_resources',
  'executable': '/opt/miniconda3/envs/lst-school-2022-01/bin/python',
  'platform': {'architecture_bits': '64bit',
   'architecture_linkage': '',
   'machine': 'x86_64',
   'processor': 'i386',
   'node': 'Joses-MacBook-Pro.local',
   'version': 'Darwin Kernel Version 21.2.0: Sun Nov 28 20:28:54 PST 2021; root:xnu-8019.61.5~1/RELEASE_X86_64',
   'system': 'Darwin',
   'release': '21.2.0',
   'libcver': ['', ''],
   'num_cpus': 8,
   'boot_time': '2021-12-14T21:43:28.000'},
  'python': {'version_string': '3.8.12 | packaged by conda-forge | (default, Oct 12 2021, 21:50:56) \n[Clang 11.1.

# Exercise

Once we have the provenance info issued by all the executions of a tool stored into a list of sessions, we could easily access a specific information (i.e. `duration_min`) across all the history of executions and do some porovenance analysis.