Refactor of name for container of waveforms #608

watsonjj · 2017-12-13T15:41:47Z

I would like to propose a refactor of the name for the container of waveforms. Currently we have:

event.r0.tel[1].adc_samples
event.r1.tel[1].pe_samples
event.dl0.tel[1].pe_samples
event.dl1.tel[1].image

It is potentially misleading to call them "adc_samples" and "pe_samples" as these assume units that might not be the case. Therefore I would propose to rename them to "samples". This would also make the access of waveforms more consistent across data levels.

This would be a huge refactor that would presumably affect every ctapipe user's code. Therefore it should probably be delayed until we intend a huge refactor of the code. This could be the same time we remove the "tel" containers for the R0 -> DL1 containers (I presume this is an intended change once we move to single telescopes per file).

calispac · 2017-12-13T17:40:01Z

In this case should :

pe_samples.shape == (n_channels, n_pixels, n_samples) ? i.e. some kind of waveform where the max is the reconstructed p.e. charge
pe_samples.shape == (n_channels, n_pixels) ?

In case of 2. I think it is more intuitive to have them named pe_image. Or not ?

watsonjj · 2017-12-13T17:44:30Z

event.r0.tel[1].adc_samples --> event.r0.tel[1].samples
event.r1.tel[1].pe_samples --> event.r1.tel[1].samples
event.dl0.tel[1].pe_samples --> event.dl0.tel[1].samples

All of these are waveforms, so have shape (n_channels, n_pixels, n_samples)

The units depend on the low-level R1 calibration each camera applies on them. I'm not sure if there is a requirement to have the waveforms in p.e. at the dl0 level (they will at least be something proportional to p.e. i presume, such that the online analysis can use them)

calispac · 2017-12-13T17:53:43Z

Ok I did not get your point first.
It seems more user friendly to have the "main data" being named similarly across data levels. As the data level indicates some transformation of samples.

kosack · 2018-01-15T09:14:45Z

I'm happy with the suggested names, though there will need to be some small change to support the fact that in the final DL1 we have only samples for a subset and images for the rest (so we need both DL0.image and DL0.samples.

In fact, I'm working on standardizing the definitions for all of CTA. In my current doc I have "image" = a single image for a full camera, and "traces" = the time-evolution of a subset of pixels, but "samples" is perhaps better (any preference or better name?).

kosack · 2018-01-15T09:15:59Z

In case of 2. I think it is more intuitive to have them named pe_image. Or not ?

Yes, I think if there is no sample dimension, it should be "image" (for example, we may have the intensity image, peak_time_image, etc...)

watsonjj · 2018-01-15T09:34:01Z

I'm happy with the suggested names, though there will need to be some small change to support the fact that in the final DL1 we have only samples for a subset and images for the rest (so we need both DL0.image and DL0.samples.

I'm a little confused. I didn't think there is a image at DL0, we are still in samples at that data level, no charge extraction has been performed. whereas at DL1 we no longer have samples and only have image, as charge extraction has been performed.

Correct me if I'm wrong, but it should be changed to:

event.r0.tel[1].samples
event.r1.tel[1].samples
event.dl0.tel[1].samples
event.dl1.tel[1].image

watsonjj · 2018-01-15T10:28:15Z

Did you mean to only specify DL0, and were talking about that some pixels (the important ones) will have all the samples, but other pixels will be reduced to single samples as a form of data reduction to save storage space?

If so then yeah your suggestion may be appropriate...but it might be useful to have a proper discussion of how to handle this.

Some immediate question I have:

Will the "important" pixels always have the same number of samples?
Is there a way when we load the data that is could still be stored in a (npixels, nsamples) array?
Is there some special array object that we could define to properly handle the reduced pixels properly? (therefore requiring minimum changes to other code)

kosack · 2018-01-15T12:44:01Z

Did you mean to only specify DL0, and were talking about that some pixels (the important ones) >will have all the samples, but other pixels will be reduced to single samples as a form of data >reduction to save storage space?

Yes, that's only DL0 - for R0, it's much simpler, just as we have now

Some immediate question I have:

Will the "important" pixels always have the same number of samples?

I think we can assume yes

Is there a way when we load the data that is could still be stored in a (npixels, nsamples) array?
Is there some special array object that we could define to properly handle the reduced pixels >properly? (therefore requiring minimum changes to other code)

I think that is basically a sparse array (which numpy supports), or we could just use a masked array (wasting a bit of memory). We can assume that even if the data are stored in a more sparse way, they can be "unpacked" into this sort of in-memory structure, for convenience. So I think to answer the question, we can just assume it looks like it is now, but some elements may be not filled in, or all zeros.

The way I've defined it so far in the doc for DL0 is a bit different, but can be converted to this way of thinking. It is stored as a K_pix (only for pixels with waveforms) by N_samp array, and there is a third array that maps one to the other: it is basically a mask of which pixels have waveforms. So we could also store it that way, which s more compact. Then you'd have

samples[K_pix, N_samp]
image[N_pix]
sample_mask[N_pix] (Maps K_pix to N_pix)

kosack · 2018-01-15T15:39:16Z

I should note in that previous comment, K_pix is the number of "important" pixels (so it's a variable quantity per event).

kosack · 2018-01-16T10:36:51Z

Another option would be dl0.tel[x].waveform for waveform data? That maybe is less jargon than "samples" (though of course the waveform is sampled at regular intervals, techincally the integrated pixel is also a sample, just over a larger time-base).

There doesn't seem to be a good name for a sequence of images in time (other than "movie", but that doesn't sound very scientific). In any case, since we will only have a subset of the image with waveforms.

watsonjj · 2018-01-16T10:55:36Z

I have no preference over samples/traces/waveform. The only input I have is that with ASTRI, its a little bit of a misnomer to describe their data as a waveform or a trace. Whereas it is more accurate to describe it as a sample (because as you stated "the integrated pixel is also a sample, just over a larger time-base")

kosack · 2018-01-16T12:43:00Z

Ok, I've updated the DL0 interface doc to use "waveform" everywhere where there are sampled data, and "image" where it is integrated. I like that the best so far. In the case of ASTRI, there is only the image, and no waveform data (it's not really a single sample anyhow, since they use the pulse shape and time-over-threshold to predict the integral)

kosack · 2018-01-16T12:52:26Z

So far (not finalized) the DL0 fields are called:

For the pipeline, this is a bit too low-level, since we can unpack the pix_status bitfield (contains the gain channel used and the waveform mask), and create a single Time field, so it would be more simply:

 dl0.tel[x].trig_time # telescope trigger time (base time of waveform)
 dl0.tel[x].trig_type # what kind of trigger (cherenkov, FF flasher, muon, etc)
 dl0.tel[x].image  # integrated value for all pixels
 dl0.tel[x].waveform  # generally a sparse or masked array, since not all pix have values
 dl0.tel[x].channel_info # 0=pixel off, 1=low-gain used, 2=high-gain used, 3=saturated low-gain

dneise · 2018-01-16T12:57:20Z

Could you put a link to that document please?

kosack · 2018-01-16T13:01:35Z

For R1, it's nearly identical, except there is no image field, and waveform is not sparse (there are entries for all pixels).

kosack · 2018-01-16T13:05:24Z

Could you put a link to that document please?

No, that document is not yet complete or publicly available - I'm working on it now with a few other experts, and there will be a meeting with the project office to validate it in a month or so. The camera groups will need to review and accept it, in particular, before it is official, so it will take some time. I'm just putting this info here, since I think it will be closer to this format than what we have now, and it helps me if we can prototype it as I write the requirements and specs. If we discover something missing, it helps a lot. However, we can't add much more information to DL0.TEL.EVT info without saturating the off-site link.

maxnoe · 2018-01-16T15:54:41Z

@kosack I know it's personal taste, but i'm not a huge fan of unneeded abbreviations or truncations. Why not trigger_type, trigger_time, pixel_status.

Just out of curiosity: why quarter nano seconds?

kosack · 2018-01-30T10:56:20Z

@kosack I know it's personal taste, but i'm not a huge fan of unneeded abbreviations or truncations. Why not trigger_type, trigger_time, pixel_status.

Agreed. In fact, this is even explicitly stated in the coding guidelines document (which I wrote, and apparently ignored :-) I'll remove the abbreviations.

Just out of curiosity: why quarter nano seconds?

Mainly so it fits in a 32-bit integer, and also since we don't need better precision than that and 250ps is close to the precision of the underlying time distribution system, so smaller would be useless, and larger would throw away some info.

- rename - change run_id -> obs_id

* rename containers and fields according to #608 - rename - change run_id -> obs_id - adc_sums -> image - adc_samples and pe_samples -> waveform everywhere

watsonjj added API change discussion labels Dec 13, 2017

kosack added a commit to kosack/ctapipe that referenced this issue Feb 1, 2018

rename containers and fields according to cta-observatory#608

dc71f33

- rename - change run_id -> obs_id

kosack mentioned this issue Feb 1, 2018

Refactor containers #638

Merged

kosack added a commit that referenced this issue Feb 5, 2018

Refactor containers (#638)

e994664

* rename containers and fields according to #608 - rename - change run_id -> obs_id - adc_sums -> image - adc_samples and pe_samples -> waveform everywhere

kosack closed this as completed Feb 6, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor of name for container of waveforms #608

Refactor of name for container of waveforms #608

watsonjj commented Dec 13, 2017

calispac commented Dec 13, 2017

watsonjj commented Dec 13, 2017

calispac commented Dec 13, 2017 •

edited

Loading

kosack commented Jan 15, 2018

kosack commented Jan 15, 2018

watsonjj commented Jan 15, 2018 •

edited

Loading

watsonjj commented Jan 15, 2018 •

edited

Loading

kosack commented Jan 15, 2018 •

edited

Loading

kosack commented Jan 15, 2018

kosack commented Jan 16, 2018 •

edited

Loading

watsonjj commented Jan 16, 2018

kosack commented Jan 16, 2018

kosack commented Jan 16, 2018

dneise commented Jan 16, 2018

kosack commented Jan 16, 2018

kosack commented Jan 16, 2018

maxnoe commented Jan 16, 2018

kosack commented Jan 30, 2018

Refactor of name for container of waveforms #608

Refactor of name for container of waveforms #608

Comments

watsonjj commented Dec 13, 2017

calispac commented Dec 13, 2017

watsonjj commented Dec 13, 2017

calispac commented Dec 13, 2017 • edited Loading

kosack commented Jan 15, 2018

kosack commented Jan 15, 2018

watsonjj commented Jan 15, 2018 • edited Loading

watsonjj commented Jan 15, 2018 • edited Loading

kosack commented Jan 15, 2018 • edited Loading

kosack commented Jan 15, 2018

kosack commented Jan 16, 2018 • edited Loading

watsonjj commented Jan 16, 2018

kosack commented Jan 16, 2018

kosack commented Jan 16, 2018

dneise commented Jan 16, 2018

kosack commented Jan 16, 2018

kosack commented Jan 16, 2018

maxnoe commented Jan 16, 2018

kosack commented Jan 30, 2018

calispac commented Dec 13, 2017 •

edited

Loading

watsonjj commented Jan 15, 2018 •

edited

Loading

watsonjj commented Jan 15, 2018 •

edited

Loading

kosack commented Jan 15, 2018 •

edited

Loading

kosack commented Jan 16, 2018 •

edited

Loading