Corrections Management Tool #303

ahiguera-mx · 2020-08-11T17:39:53Z

Corrections are an important part of the event building process, applying these corrections help us to remove detectors effects so we can have a “correct” measurement e.g corrected S2 area [PE]. Having a common tool that maintains and does bookkeeping can be beneficial. Thus, we are proposing a corrections management tool (CMT). Corrections are placed in DB, the format is a panda DataFrame with DatetimeIndex. Currently examples of electron lifetime and pmt gains corrections are implemented via cmt_services.py.
This PR should be transparent, core changes to strax should be minor
example_corrections.pdf

tunnell · 2020-08-11T17:44:15Z

Reimplementation of #301 due to Git issue. (See comments therein)

JoranAngevaare · 2020-08-12T12:58:04Z

Hi Aaron, thanks again for your work (and resubmitting the PR). I just send this on slack but might as well post it here for bookkeeping.

Every TPC needs some corrections machinery. This would go into strax. This code would know nothing what they mean, but just be 'read correction from DB', or 'interpolate correction', or helper routines to evaluate correction X at datetime Y

That is true, my main suggestion to move it to straxen is that it seems to be build up to read from our database(s):

strax/strax/corrections/cmt_DB_interface.py

Line 15 in 31884ec

CLIENT = pymongo.MongoClient(host='xenon1t-daq.lngs.infn.it',

strax/strax/corrections/cmt_DB_interface.py

Line 122 in 31884ec

    
           collection = pymongo.MongoClient('mongodb://pax:@xenon1t-daq.lngs.infn.it:27017/run',password=os.environ['DB_password'])['xenonnt']['run/runs']

And our kind of corrections:

strax/strax/corrections/cmt_services.py

Line 12 in 31884ec

if xenon1t:

But I do see the point that these kind of corrections are needed in each experiment. I guess most of the general stuff is actually in the database interactions right? Like: https://github.com/AxFoundation/strax/blob/31884ecf47b48814b1047c52d1ebd45a8980860e/strax/corrections/cmt_DB_interface.py. Here you add some functions that are not (sufficiently?) in e.g.
https://github.com/AxFoundation/strax/blob/31884ecf47b48814b1047c52d1ebd45a8980860e/strax/storage/mongo.py but it looks like there are quite some similarities. In straxen we only specify the mongo urls here https://github.com/XENONnT/straxen/blob/master/straxen/rundb.py (because there xenon specific).

Perhaps a similar approach might make sense for the framework you build Aaron? As in make a general class out of this one https://github.com/AxFoundation/strax/blob/31884ecf47b48814b1047c52d1ebd45a8980860e/strax/corrections/cmt_DB_interface.py initialise it in straxen and use that class to exploit your functions https://github.com/AxFoundation/strax/blob/31884ecf47b48814b1047c52d1ebd45a8980860e/strax/corrections/cmt_services.py. What do you think?

… of corrections management tool

ahiguera-mx · 2020-08-13T04:04:52Z

Hi Jorana
Thanks for the useful feedback, I think I understand what you suggested, based on that I made some modifications to cmt_DB_interface.py
To have a complete picture of the changes please look at this fork https://github.com/ahiguera-mx/straxen
Where you can find the implementation into straxen, xenon specifics are implemented there.
See https://github.com/ahiguera-mx/straxen/blob/master/straxen/corrections_services.py
and for implementation within the plugins you can see
https://github.com/ahiguera-mx/straxen/blob/master/straxen/plugins/peaklet_processing.py
https://github.com/ahiguera-mx/straxen/blob/master/straxen/plugins/event_processing.py

WenzDaniel

Thanks Aaron, I think this was already a first step in the right direction. But I still think this is already way to specify for strax. Could you do it like it is done in case of our storage system? You define a very generic CTM class which has a read and write function defined like

strax/strax/storage/common.py

Line 503 in 8578b21

def _read_chunk(self, backend_key, chunk_info, dtype, compressor):

Which is then later defined in a subclass called e.g. XENONnTCTM(CTM) like here

strax/strax/storage/mongo.py

Line 32 in 8578b21

def _read_chunk(self, backend_key, chunk_info, dtype, compressor):

. I think your interpolate function can be already part of the CTM class itself. Further I found some minor things here and there.

strax/corrections/cmt_DB_interface.py

WenzDaniel · 2020-08-13T06:40:21Z

You should also add pdmongo to https://github.com/AxFoundation/strax/blob/master/requirements.txt so that travis is not failing

JoranAngevaare

Thanks Aaron, things are progressing, hereby some ideas on the current iteration

strax/corrections/cmt_DB_interface.py

ahiguera-mx · 2020-08-14T16:46:19Z

Hi guys,

Thanks again for the feedback, I have made the corresponding changes to the code, still trying to figure out what you guys mean by following what is done in the storage package, but If I can say, we want to be the corrections class self consistent, in that sense the write and read functions are defined within the class

tunnell · 2020-08-14T18:38:49Z

@ahiguera-mx and I are going to do a code review together on Monday, if others are interested.

tunnell

@ahiguera-mx nice work! I think this is sorely needed. There are a few changes I request to make it more clearly generic across experiments (removing XENON specific things). I think we also should PEP8 things and more clearly document the input variables, since these functions will be called a lot by analysts. After those changes, I think it's good to add if others agree. A lot of the style things are just hazing for every new person to strax :)

Thanks Aaron, I think this was already a first step in the right direction. But I still think this is already way to specify for strax. Could you do it like it is done in case of our storage system? You define a very generic CTM class which has a read and write function defined like

strax/strax/storage/common.py

Line 503 in 8578b21

def _read_chunk(self, backend_key, chunk_info, dtype, compressor):

Which is then later defined in a subclass called e.g. XENONnTCTM(CTM) like here

strax/strax/storage/mongo.py

Line 32 in 8578b21

def _read_chunk(self, backend_key, chunk_info, dtype, compressor):

. I think your interpolate function can be already part of the CTM class itself. Further I found some minor things here and there.

@WenzDaniel I think this extra abstraction is a good idea if we have multiple correction systems (like storage backends). However, I don't agree that this should be done now. Every analyst will come across the corrections and probably look here, but almost no analysts are diving into how chunks are loaded from disk. The goal of this code should be ultimate transparency as the only part of strax that many people will look at. Therefore, I think the abstraction obfuscates the code by having two levels of abstraction for one implementation, especially for novice.

At the moment, I think @ahiguera-mx could make more clear that they can overload read/write at the moment then it should work. If somebody wants a file backend, I think doing what you suggest is the right way to go since the abstraction will help. Until then, I think it might obfuscate things.... what do you think?

strax/corrections/README.md

strax/corrections/cmt_DB_interface.py

ahiguera-mx · 2020-08-21T02:51:53Z

Hi @jorana @WenzDaniel @tunnell
I've implemented all the changes requested, there is one last thing wether I should move out this module from the corrections directory, I would say yes, since there is only one file, and probably I would rename it as corrections.py
Please let me know if I can proceed to this last change

WenzDaniel · 2020-08-24T10:12:26Z

@WenzDaniel I think this extra abstraction is a good idea if we have multiple correction systems (like storage backends). However, I don't agree that this should be done now. Every analyst will come across the corrections and probably look here, but almost no analysts are diving into how chunks are loaded from disk. The goal of this code should be ultimate transparency as the only part of strax that many people will look at. Therefore, I think the abstraction obfuscates the code by having two levels of abstraction for one implementation, especially for novice.

At the moment, I think @ahiguera-mx could make more clear that they can overload read/write at the moment then it should work. If somebody wants a file backend, I think doing what you suggest is the right way to go since the abstraction will help. Until then, I think it might obfuscate things.... what do you think?

@tunnell fine with me and I totally see you point of readability and agree. But, I was also thinking more in a direction, that strax should be usable by any LXe TPC experiment and therefore as little explicit as possible. Other experiments may have a different type of database. I thought this was the idea of having this code in strax in the first place.

tunnell · 2020-08-24T11:33:41Z

@WenzDaniel I think this extra abstraction is a good idea if we have multiple correction systems (like storage backends). However, I don't agree that this should be done now. Every analyst will come across the corrections and probably look here, but almost no analysts are diving into how chunks are loaded from disk. The goal of this code should be ultimate transparency as the only part of strax that many people will look at. Therefore, I think the abstraction obfuscates the code by having two levels of abstraction for one implementation, especially for novice.

At the moment, I think @ahiguera-mx could make more clear that they can overload read/write at the moment then it should work. If somebody wants a file backend, I think doing what you suggest is the right way to go since the abstraction will help. Until then, I think it might obfuscate things.... what do you think?

@tunnell fine with me and I totally see you point of readability and agree. But, I was also thinking more in a direction, that strax should be usable by any LXe TPC experiment and therefore as little explicit as possible. Other experiments may have a different type of database. I thought this was the idea of having this code in strax in the first place.

If other experiments don't want to use MongoDB for storing configuration (like we do for RunDB), then I'd add it. I see your point, but am just hesitant to introduce a layer of abstraction for a use case that might not occur. It's just 100 lines of code, so better to just abstract when the other use case comes since it might introduce some other requirements in my view.

ahiguera-mx · 2020-08-27T15:33:57Z

Hi @jorana @WenzDaniel @tunnell
As mentioned before, I have moved out the corrections module and remove the directory, let me know if you have any other suggestion or comment

strax/corrections.py

WenzDaniel · 2020-08-28T08:57:09Z

Hej @ahiguera-mx it looks very nice thanks a lot for your hard work. For the code itself I just have some minor style comments. I did not look in the logic of your code. But given your nice presentation in one of the Team-A telecons and that Chris reviewed your code I guess it is not necessary.

One bigger addition which still is required are some automated tests. A high percentage of the strax code is tested automatically and it would be nice if we could keep this coverage (Also one of our automated test will prevent us from merging your code if the coverage drops by too much.). I am not an expert on how to write tests involving mongo, but mabye @tunnell knows?

JoranAngevaare · 2020-08-31T10:39:49Z

Hi @ahiguera-mx , can you ping me when you've had time to address Daniels comments. I might be able to also look a little into the syntax of your mongo logic but I'd rather wait until you've had time to address his comments if that is okay with you.

tunnell · 2020-08-31T15:42:58Z

Aren't most Mongo interfaces untested at the moment? I think tests are good to add, but we should also not go too crazy. I'd do a "read in, read out" test.

Within TravisCI, you can turn on MongoDB locally. There is also mongomock that can be used.

I think there is pressure growing to finalize the corrections to avoid processing chaos when we get more data...

ahiguera-mx · 2020-08-31T16:29:12Z

Hi @ahiguera-mx , can you ping me when you've had time to address Daniels comments. I might be able to also look a little into the syntax of your mongo logic but I'd rather wait until you've had time to address his comments if that is okay with you.

Hi @jorana I have addressed @WenzDaniel comments, please have a look. Thanks

JoranAngevaare

Hi @ahiguera-mx , thanks for the effort you put in here. Things have been progressing a lot and apart from some changes mostly related to adding comments and tests that people are using this tool the way you intended it to be used.

Please let me know if you have any questions, I think we should be able to merge it soon.

Finally I have one additional request. Could you edit your PR-message above to include a limited example of how the data-frames should minimally look like. This will help bookkeeping if we ever need to revise this PR and should be quite easy for you to make a screenshot of.

strax/corrections.py

JoranAngevaare · 2020-09-02T07:58:40Z

strax/corrections.py

+        :param correction: corrections' name (str type)
+        :param df: pandas DataFrame.
+        '''
+        if 'ONLINE' not in df.columns:


This needs to be documented better, I cannot find the difference between 'online' and 'v1' in this code. Neither can I understand why we have online.

Finally I would assume that writing it like this is easier:

def write(self, correction, df, required_columns = ('ONLINE', 'v1'): """ ... :param required_columns: <please explain> ... """ for req in required_columns: if req not in df.columns: raise ValueError(f'Muls specify {req} in dataframe') ...

Information on online and v1 is specify at docstring of the class, I have added required_columns

Hej Aaron could you please elaborate where you specified this in the class doc string? I cannot find anything which says "Has to have the column "online" which means..." (same for v1). Did you maybe forget to push your solution?

Update:

Sorry not sure what exactly happened, but github showed me an older version of your code missing the corresponding lines.

strax/corrections.py

WenzDaniel

Hej Aaron, thanks a lot. ~~I think the one comment I unresolved must be fixed first since it will cause errors~~. Beside that, frankly speaking I am still not happy with your doc strings. Many of them are not useful at all. E.g. line 61: ":param what: date" can basically mean anything. It could be a string like this "2020-02-14 8:13:30" or "14.02.2020" or a datetime object or a timestamp in unix time in µs or ns..... I raised this concern now more then once, but if @jorana and @tunnell do not agree on this point I am fine to approve this PR as soon as you fixed this **kwargs thing.

Update: Just noticed it says data not date sorry, nvm.

WenzDaniel · 2020-09-09T07:20:46Z

strax/corrections.py

+        :param correction: corrections' name (str type)
+        :param df: pandas DataFrame.
+        '''
+        if 'ONLINE' not in df.columns:


Hej Aaron could you please elaborate where you specified this in the class doc string? I cannot find anything which says "Has to have the column "online" which means..." (same for v1). Did you maybe forget to push your solution?

Update:

Sorry not sure what exactly happened, but github showed me an older version of your code missing the corresponding lines.

JoranAngevaare · 2020-09-09T08:23:09Z

Aaron, thanks for providing the example in de description. As there is a strong push to merge this soon, I think we could go ahead with a merge, also as this PR doesn't interfere with any other strax logic we could merge it and finetune it later if needed.

However, I tend to agree with Daniel, it's little effort to add some comments and docstrings that actually help the user understand. Writing it in the document class is not sufficient as when you do:

??CorrectionsInterface.write

you will see the functions docstring but not the one of the class, the fact that df is a dataframe does not help the user.

Furthermore, I'm really worried that people will input the wrong kind of objects for time stamps with all kinds of time zones. I don't think there are checks in place to catch this sufficiently.

Without @tunnell s approval we cannot merge at the moment so he will have the final verdict.

JoranAngevaare · 2020-09-09T08:25:20Z

strax/corrections.py

+    is available, meaning a unique set of correction maps.
+    '''
+
+    def __init__(self, host='127.0 0.1', username=None, password=None,


Shouldn't there be a . here, i.e. 127.0.0.1?

ahiguera-mx · 2020-09-09T14:36:11Z

Aaron, thanks for providing the example in de description. As there is a strong push to merge this soon, I think we could go ahead with a merge, also as this PR doesn't interfere with any other strax logic we could merge it and finetune it later if needed.

However, I tend to agree with Daniel, it's little effort to add some comments and docstrings that actually help the user understand. Writing it in the document class is not sufficient as when you do:
??CorrectionsInterface.write
you will see the functions docstring but not the one of the class, the fact that df is a dataframe does not help the user.

Furthermore, I'm really worried that people will input the wrong kind of objects for time stamps with all kinds of time zones. I don't think there are checks in place to catch this sufficiently.

Without @tunnell s approval we cannot merge at the moment so he will have the final verdict.

Hi @jorana, regarding the time stamps and time zones, that info is experiment specific, if this is meant to be a general tool that should be open to the user to save the time stamps into the DB as he/she wish

…nd checking that

strax/corrections.py

JoranAngevaare · 2020-09-10T07:31:14Z

strax/corrections.py

        return df.set_index('time')

-    def interpolate(self, what, when, how='interpolate', **kwargs):
-        '''Interpolate values of a given correction
+    @staticmethod


Sorry Aaron, this suggestion is no longer valid of you need self.check_timezone. You can remove this line and use self as a first argument in the function

JoranAngevaare · 2020-09-10T07:31:41Z

strax/corrections.py

+        :param how: Interpolation method, can be either how='interpolate' or how='fill'
+        :param kwargs: are forward to the interpolation
+        """
+        CorrectionsInterface.check_timezone(when)


self.check_timezone should work if you don't have a staticmethod anymore

JoranAngevaare · 2020-09-10T07:31:51Z

strax/corrections.py

        :param global_config: a map of corrections
        :param global_version: global configuration's version
-        '''
+        """
+        CorrectionsInterface.check_timezone(when)


see comment above

JoranAngevaare · 2020-09-10T07:32:25Z

strax/corrections.py

        df_global = self.read(global_config)

        context_config = {}
        # loop over corrections and versions to get a global context
        for correction, version in df_global.iloc[-1][global_version].items():
            df = self.read(correction)
-            df = self.interpolate(df, when)
+            df = CorrectionsInterface.interpolate(df, when)


see comment above

JoranAngevaare · 2020-09-10T07:35:56Z

strax/corrections.py

+        if date.tzinfo == pytz.utc:
+            return date
+        else:
+            raise ValueError(f'{date} must be specify in UTC timezone')


There is a typo here. Perhaps add that the tz info should be in the datetime object:

ValueError(f'{date} must have UTC timezone. Insert datetime object like "datetime.datetime.now(tz=datetime.timezone.utc)"')```

https://github.com/ahiguera-mx/strax/pull/1

Add corrections package

31884ec

Based on the suggestion from this PR (#303) a modified implementation…

2a114a5

… of corrections management tool

remove cmt_services.py

fbf7993

WenzDaniel reviewed Aug 13, 2020

View reviewed changes

JoranAngevaare requested changes Aug 14, 2020

View reviewed changes

ahiguera-mx added 2 commits August 14, 2020 11:35

improvements on the code, add pdmongo to requirements

7a3792d

Merge https://github.com/ahiguera-mx/strax

b73b50a

ahiguera-mx added 2 commits August 14, 2020 13:53

small fix

2338ba5

improve syntaxis as recomended

37db1c7

tunnell requested changes Aug 17, 2020

View reviewed changes

tunnell and others added 4 commits August 17, 2020 17:14

Merge branch 'master' into master

5b51e17

Update README.md

deefff4

updates based on the following request #303 (review) and #303 (review)

700baff

Merge branch 'master' of https://github.com/ahiguera-mx/strax

a1230dd

tunnell and others added 5 commits August 24, 2020 13:35

Merge branch 'master' into master

b9d30d8

Removing corrections directory based on the suggestions by @tunnell

4f2baee

Adding corrections.py

2becd4a

Merge branch 'master' of https://github.com/ahiguera-mx/strax

d14d478

Adding corrections.py

e13956d

WenzDaniel reviewed Aug 28, 2020

View reviewed changes

strax/corrections.py Outdated Show resolved Hide resolved

strax/corrections.py Outdated Show resolved Hide resolved

strax/corrections.py Show resolved Hide resolved

strax/corrections.py Outdated Show resolved Hide resolved

strax/corrections.py Show resolved Hide resolved

Improve on doc-str info

0ff6a1f

JoranAngevaare approved these changes Sep 2, 2020

View reviewed changes

JoranAngevaare and others added 2 commits September 2, 2020 10:11

Merge branch 'master' into master

0464407

small additions, more user friendly

a45ce44

WenzDaniel requested changes Sep 9, 2020

View reviewed changes

WenzDaniel approved these changes Sep 9, 2020

View reviewed changes

JoranAngevaare reviewed Sep 9, 2020

View reviewed changes

Add useful info to docstring, require that DatetimeIndex are in UTC a…

6cc321b

…nd checking that

WenzDaniel reviewed Sep 10, 2020

View reviewed changes

strax/corrections.py Show resolved Hide resolved

JoranAngevaare reviewed Sep 10, 2020

View reviewed changes

JoranAngevaare added 2 commits September 10, 2020 13:28

Small editorial changes to the PR

c7bd6e8

https://github.com/ahiguera-mx/strax/pull/1

Merge branch 'master' into master

90b8535

tunnell approved these changes Sep 10, 2020

View reviewed changes

JoranAngevaare merged commit fd165f7 into AxFoundation:master Sep 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Corrections Management Tool #303

Corrections Management Tool #303

ahiguera-mx commented Aug 11, 2020 •

edited

tunnell commented Aug 11, 2020

JoranAngevaare commented Aug 12, 2020

ahiguera-mx commented Aug 13, 2020 •

edited

WenzDaniel left a comment

WenzDaniel commented Aug 13, 2020

JoranAngevaare left a comment

ahiguera-mx commented Aug 14, 2020

tunnell commented Aug 14, 2020

tunnell left a comment •

edited

ahiguera-mx commented Aug 21, 2020

WenzDaniel commented Aug 24, 2020

tunnell commented Aug 24, 2020

ahiguera-mx commented Aug 27, 2020

WenzDaniel commented Aug 28, 2020

JoranAngevaare commented Aug 31, 2020

tunnell commented Aug 31, 2020

ahiguera-mx commented Aug 31, 2020

JoranAngevaare left a comment •

edited

JoranAngevaare Sep 2, 2020

ahiguera-mx Sep 8, 2020

WenzDaniel Sep 9, 2020 •

edited

WenzDaniel left a comment •

edited

WenzDaniel Sep 9, 2020 •

edited

JoranAngevaare commented Sep 9, 2020 •

edited

JoranAngevaare Sep 9, 2020 •

edited

ahiguera-mx commented Sep 9, 2020

JoranAngevaare Sep 10, 2020

JoranAngevaare Sep 10, 2020

JoranAngevaare Sep 10, 2020

JoranAngevaare Sep 10, 2020

JoranAngevaare Sep 10, 2020

Corrections Management Tool #303

Corrections Management Tool #303

Conversation

ahiguera-mx commented Aug 11, 2020 • edited

tunnell commented Aug 11, 2020

JoranAngevaare commented Aug 12, 2020

ahiguera-mx commented Aug 13, 2020 • edited

WenzDaniel left a comment

Choose a reason for hiding this comment

WenzDaniel commented Aug 13, 2020

JoranAngevaare left a comment

Choose a reason for hiding this comment

ahiguera-mx commented Aug 14, 2020

tunnell commented Aug 14, 2020

tunnell left a comment • edited

Choose a reason for hiding this comment

ahiguera-mx commented Aug 21, 2020

WenzDaniel commented Aug 24, 2020

tunnell commented Aug 24, 2020

ahiguera-mx commented Aug 27, 2020

WenzDaniel commented Aug 28, 2020

JoranAngevaare commented Aug 31, 2020

tunnell commented Aug 31, 2020

ahiguera-mx commented Aug 31, 2020

JoranAngevaare left a comment • edited

Choose a reason for hiding this comment

JoranAngevaare Sep 2, 2020

Choose a reason for hiding this comment

ahiguera-mx Sep 8, 2020

Choose a reason for hiding this comment

WenzDaniel Sep 9, 2020 • edited

Choose a reason for hiding this comment

WenzDaniel left a comment • edited

Choose a reason for hiding this comment

WenzDaniel Sep 9, 2020 • edited

Choose a reason for hiding this comment

JoranAngevaare commented Sep 9, 2020 • edited

JoranAngevaare Sep 9, 2020 • edited

Choose a reason for hiding this comment

ahiguera-mx commented Sep 9, 2020

JoranAngevaare Sep 10, 2020

Choose a reason for hiding this comment

JoranAngevaare Sep 10, 2020

Choose a reason for hiding this comment

JoranAngevaare Sep 10, 2020

Choose a reason for hiding this comment

JoranAngevaare Sep 10, 2020

Choose a reason for hiding this comment

JoranAngevaare Sep 10, 2020

Choose a reason for hiding this comment

ahiguera-mx commented Aug 11, 2020 •

edited

ahiguera-mx commented Aug 13, 2020 •

edited

tunnell left a comment •

edited

JoranAngevaare left a comment •

edited

WenzDaniel Sep 9, 2020 •

edited

WenzDaniel left a comment •

edited

WenzDaniel Sep 9, 2020 •

edited

JoranAngevaare commented Sep 9, 2020 •

edited

JoranAngevaare Sep 9, 2020 •

edited