Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corrections Management Tool #303

Merged
merged 22 commits into from Sep 10, 2020
Merged

Corrections Management Tool #303

merged 22 commits into from Sep 10, 2020

Conversation

ahiguera-mx
Copy link
Contributor

@ahiguera-mx ahiguera-mx commented Aug 11, 2020

Corrections are an important part of the event building process, applying these corrections help us to remove detectors effects so we can have a “correct” measurement e.g corrected S2 area [PE]. Having a common tool that maintains and does bookkeeping can be beneficial. Thus, we are proposing a corrections management tool (CMT). Corrections are placed in DB, the format is a panda DataFrame with DatetimeIndex. Currently examples of electron lifetime and pmt gains corrections are implemented via cmt_services.py.
This PR should be transparent, core changes to strax should be minor
example_corrections.pdf

@tunnell
Copy link
Member

tunnell commented Aug 11, 2020

Reimplementation of #301 due to Git issue. (See comments therein)

@JoranAngevaare
Copy link
Member

Hi Aaron, thanks again for your work (and resubmitting the PR). I just send this on slack but might as well post it here for bookkeeping.

Every TPC needs some corrections machinery. This would go into strax. This code would know nothing what they mean, but just be 'read correction from DB', or 'interpolate correction', or helper routines to evaluate correction X at datetime Y

That is true, my main suggestion to move it to straxen is that it seems to be build up to read from our database(s):

CLIENT = pymongo.MongoClient(host='xenon1t-daq.lngs.infn.it',

collection = pymongo.MongoClient('mongodb://pax:@xenon1t-daq.lngs.infn.it:27017/run',password=os.environ['DB_password'])['xenonnt']['run/runs']

And our kind of corrections:

But I do see the point that these kind of corrections are needed in each experiment. I guess most of the general stuff is actually in the database interactions right? Like: https://github.com/AxFoundation/strax/blob/31884ecf47b48814b1047c52d1ebd45a8980860e/strax/corrections/cmt_DB_interface.py. Here you add some functions that are not (sufficiently?) in e.g.
https://github.com/AxFoundation/strax/blob/31884ecf47b48814b1047c52d1ebd45a8980860e/strax/storage/mongo.py but it looks like there are quite some similarities. In straxen we only specify the mongo urls here https://github.com/XENONnT/straxen/blob/master/straxen/rundb.py (because there xenon specific).

Perhaps a similar approach might make sense for the framework you build Aaron? As in make a general class out of this one https://github.com/AxFoundation/strax/blob/31884ecf47b48814b1047c52d1ebd45a8980860e/strax/corrections/cmt_DB_interface.py initialise it in straxen and use that class to exploit your functions https://github.com/AxFoundation/strax/blob/31884ecf47b48814b1047c52d1ebd45a8980860e/strax/corrections/cmt_services.py. What do you think?

@ahiguera-mx
Copy link
Contributor Author

ahiguera-mx commented Aug 13, 2020

Hi Jorana
Thanks for the useful feedback, I think I understand what you suggested, based on that I made some modifications to cmt_DB_interface.py
To have a complete picture of the changes please look at this fork https://github.com/ahiguera-mx/straxen
Where you can find the implementation into straxen, xenon specifics are implemented there.
See https://github.com/ahiguera-mx/straxen/blob/master/straxen/corrections_services.py
and for implementation within the plugins you can see
https://github.com/ahiguera-mx/straxen/blob/master/straxen/plugins/peaklet_processing.py
https://github.com/ahiguera-mx/straxen/blob/master/straxen/plugins/event_processing.py

Copy link
Collaborator

@WenzDaniel WenzDaniel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Aaron, I think this was already a first step in the right direction. But I still think this is already way to specify for strax. Could you do it like it is done in case of our storage system? You define a very generic CTM class which has a read and write function defined like

def _read_chunk(self, backend_key, chunk_info, dtype, compressor):
Which is then later defined in a subclass called e.g. XENONnTCTM(CTM) like here
def _read_chunk(self, backend_key, chunk_info, dtype, compressor):
. I think your interpolate function can be already part of the CTM class itself. Further I found some minor things here and there.

strax/corrections/cmt_DB_interface.py Outdated Show resolved Hide resolved
strax/corrections/cmt_DB_interface.py Outdated Show resolved Hide resolved
strax/corrections/cmt_DB_interface.py Outdated Show resolved Hide resolved
strax/corrections/cmt_DB_interface.py Outdated Show resolved Hide resolved
strax/corrections/cmt_DB_interface.py Outdated Show resolved Hide resolved
strax/corrections/cmt_DB_interface.py Outdated Show resolved Hide resolved
strax/corrections/cmt_DB_interface.py Outdated Show resolved Hide resolved
strax/corrections/cmt_DB_interface.py Outdated Show resolved Hide resolved
strax/corrections/cmt_DB_interface.py Outdated Show resolved Hide resolved
strax/corrections/cmt_DB_interface.py Outdated Show resolved Hide resolved
@WenzDaniel
Copy link
Collaborator

You should also add pdmongo to https://github.com/AxFoundation/strax/blob/master/requirements.txt so that travis is not failing

Copy link
Member

@JoranAngevaare JoranAngevaare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Aaron, things are progressing, hereby some ideas on the current iteration

strax/corrections/cmt_DB_interface.py Outdated Show resolved Hide resolved
strax/corrections/cmt_DB_interface.py Outdated Show resolved Hide resolved
strax/corrections/cmt_DB_interface.py Outdated Show resolved Hide resolved
strax/corrections/cmt_DB_interface.py Outdated Show resolved Hide resolved
strax/corrections/cmt_DB_interface.py Outdated Show resolved Hide resolved
@ahiguera-mx
Copy link
Contributor Author

Hi guys,

Thanks again for the feedback, I have made the corresponding changes to the code, still trying to figure out what you guys mean by following what is done in the storage package, but If I can say, we want to be the corrections class self consistent, in that sense the write and read functions are defined within the class

@tunnell
Copy link
Member

tunnell commented Aug 14, 2020

@ahiguera-mx and I are going to do a code review together on Monday, if others are interested.

Copy link
Member

@tunnell tunnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahiguera-mx nice work! I think this is sorely needed. There are a few changes I request to make it more clearly generic across experiments (removing XENON specific things). I think we also should PEP8 things and more clearly document the input variables, since these functions will be called a lot by analysts. After those changes, I think it's good to add if others agree. A lot of the style things are just hazing for every new person to strax :)

Thanks Aaron, I think this was already a first step in the right direction. But I still think this is already way to specify for strax. Could you do it like it is done in case of our storage system? You define a very generic CTM class which has a read and write function defined like

def _read_chunk(self, backend_key, chunk_info, dtype, compressor):

Which is then later defined in a subclass called e.g. XENONnTCTM(CTM) like here

def _read_chunk(self, backend_key, chunk_info, dtype, compressor):

. I think your interpolate function can be already part of the CTM class itself. Further I found some minor things here and there.

@WenzDaniel I think this extra abstraction is a good idea if we have multiple correction systems (like storage backends). However, I don't agree that this should be done now. Every analyst will come across the corrections and probably look here, but almost no analysts are diving into how chunks are loaded from disk. The goal of this code should be ultimate transparency as the only part of strax that many people will look at. Therefore, I think the abstraction obfuscates the code by having two levels of abstraction for one implementation, especially for novice.

At the moment, I think @ahiguera-mx could make more clear that they can overload read/write at the moment then it should work. If somebody wants a file backend, I think doing what you suggest is the right way to go since the abstraction will help. Until then, I think it might obfuscate things.... what do you think?

strax/corrections/README.md Outdated Show resolved Hide resolved
strax/corrections/README.md Outdated Show resolved Hide resolved
strax/corrections/cmt_DB_interface.py Outdated Show resolved Hide resolved
strax/corrections/cmt_DB_interface.py Outdated Show resolved Hide resolved
strax/corrections/cmt_DB_interface.py Outdated Show resolved Hide resolved
strax/corrections/cmt_DB_interface.py Outdated Show resolved Hide resolved
strax/corrections/cmt_DB_interface.py Outdated Show resolved Hide resolved
strax/corrections/cmt_DB_interface.py Outdated Show resolved Hide resolved
strax/corrections/cmt_DB_interface.py Outdated Show resolved Hide resolved
strax/corrections/cmt_DB_interface.py Outdated Show resolved Hide resolved
@ahiguera-mx
Copy link
Contributor Author

Hi @jorana @WenzDaniel @tunnell
I've implemented all the changes requested, there is one last thing wether I should move out this module from the corrections directory, I would say yes, since there is only one file, and probably I would rename it as corrections.py
Please let me know if I can proceed to this last change

@WenzDaniel
Copy link
Collaborator

@WenzDaniel I think this extra abstraction is a good idea if we have multiple correction systems (like storage backends). However, I don't agree that this should be done now. Every analyst will come across the corrections and probably look here, but almost no analysts are diving into how chunks are loaded from disk. The goal of this code should be ultimate transparency as the only part of strax that many people will look at. Therefore, I think the abstraction obfuscates the code by having two levels of abstraction for one implementation, especially for novice.

At the moment, I think @ahiguera-mx could make more clear that they can overload read/write at the moment then it should work. If somebody wants a file backend, I think doing what you suggest is the right way to go since the abstraction will help. Until then, I think it might obfuscate things.... what do you think?

@tunnell fine with me and I totally see you point of readability and agree. But, I was also thinking more in a direction, that strax should be usable by any LXe TPC experiment and therefore as little explicit as possible. Other experiments may have a different type of database. I thought this was the idea of having this code in strax in the first place.

@tunnell
Copy link
Member

tunnell commented Aug 24, 2020

@WenzDaniel I think this extra abstraction is a good idea if we have multiple correction systems (like storage backends). However, I don't agree that this should be done now. Every analyst will come across the corrections and probably look here, but almost no analysts are diving into how chunks are loaded from disk. The goal of this code should be ultimate transparency as the only part of strax that many people will look at. Therefore, I think the abstraction obfuscates the code by having two levels of abstraction for one implementation, especially for novice.

At the moment, I think @ahiguera-mx could make more clear that they can overload read/write at the moment then it should work. If somebody wants a file backend, I think doing what you suggest is the right way to go since the abstraction will help. Until then, I think it might obfuscate things.... what do you think?

@tunnell fine with me and I totally see you point of readability and agree. But, I was also thinking more in a direction, that strax should be usable by any LXe TPC experiment and therefore as little explicit as possible. Other experiments may have a different type of database. I thought this was the idea of having this code in strax in the first place.

If other experiments don't want to use MongoDB for storing configuration (like we do for RunDB), then I'd add it. I see your point, but am just hesitant to introduce a layer of abstraction for a use case that might not occur. It's just 100 lines of code, so better to just abstract when the other use case comes since it might introduce some other requirements in my view.

@ahiguera-mx
Copy link
Contributor Author

Hi @jorana @WenzDaniel @tunnell
As mentioned before, I have moved out the corrections module and remove the directory, let me know if you have any other suggestion or comment

strax/corrections.py Outdated Show resolved Hide resolved
strax/corrections.py Outdated Show resolved Hide resolved
strax/corrections.py Show resolved Hide resolved
strax/corrections.py Outdated Show resolved Hide resolved
strax/corrections.py Show resolved Hide resolved
@WenzDaniel
Copy link
Collaborator

Hej @ahiguera-mx it looks very nice thanks a lot for your hard work. For the code itself I just have some minor style comments. I did not look in the logic of your code. But given your nice presentation in one of the Team-A telecons and that Chris reviewed your code I guess it is not necessary.

One bigger addition which still is required are some automated tests. A high percentage of the strax code is tested automatically and it would be nice if we could keep this coverage (Also one of our automated test will prevent us from merging your code if the coverage drops by too much.). I am not an expert on how to write tests involving mongo, but mabye @tunnell knows?

@JoranAngevaare
Copy link
Member

Hi @ahiguera-mx , can you ping me when you've had time to address Daniels comments. I might be able to also look a little into the syntax of your mongo logic but I'd rather wait until you've had time to address his comments if that is okay with you.

@tunnell
Copy link
Member

tunnell commented Aug 31, 2020

Aren't most Mongo interfaces untested at the moment? I think tests are good to add, but we should also not go too crazy. I'd do a "read in, read out" test.

Within TravisCI, you can turn on MongoDB locally. There is also mongomock that can be used.

I think there is pressure growing to finalize the corrections to avoid processing chaos when we get more data...

@ahiguera-mx
Copy link
Contributor Author

Hi @ahiguera-mx , can you ping me when you've had time to address Daniels comments. I might be able to also look a little into the syntax of your mongo logic but I'd rather wait until you've had time to address his comments if that is okay with you.

Hi @jorana I have addressed @WenzDaniel comments, please have a look. Thanks

Copy link
Member

@JoranAngevaare JoranAngevaare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ahiguera-mx , thanks for the effort you put in here. Things have been progressing a lot and apart from some changes mostly related to adding comments and tests that people are using this tool the way you intended it to be used.

Please let me know if you have any questions, I think we should be able to merge it soon.

Finally I have one additional request. Could you edit your PR-message above to include a limited example of how the data-frames should minimally look like. This will help bookkeeping if we ever need to revise this PR and should be quite easy for you to make a screenshot of.

strax/corrections.py Outdated Show resolved Hide resolved
strax/corrections.py Show resolved Hide resolved
strax/corrections.py Outdated Show resolved Hide resolved
strax/corrections.py Show resolved Hide resolved
strax/corrections.py Show resolved Hide resolved
strax/corrections.py Outdated Show resolved Hide resolved
:param correction: corrections' name (str type)
:param df: pandas DataFrame.
'''
if 'ONLINE' not in df.columns:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be documented better, I cannot find the difference between 'online' and 'v1' in this code. Neither can I understand why we have online.

Finally I would assume that writing it like this is easier:

def write(self, correction, df, required_columns = ('ONLINE', 'v1'):
"""
...
:param required_columns: <please explain>
...
"""
for req in required_columns:
   if req not in df.columns:
       raise ValueError(f'Muls specify {req} in dataframe')
...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Information on online and v1 is specify at docstring of the class, I have added required_columns

Copy link
Collaborator

@WenzDaniel WenzDaniel Sep 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hej Aaron could you please elaborate where you specified this in the class doc string? I cannot find anything which says "Has to have the column "online" which means..." (same for v1). Did you maybe forget to push your solution?

Update:

Sorry not sure what exactly happened, but github showed me an older version of your code missing the corresponding lines.

strax/corrections.py Show resolved Hide resolved
Copy link
Collaborator

@WenzDaniel WenzDaniel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hej Aaron, thanks a lot. I think the one comment I unresolved must be fixed first since it will cause errors. Beside that, frankly speaking I am still not happy with your doc strings. Many of them are not useful at all. E.g. line 61: ":param what: date" can basically mean anything. It could be a string like this "2020-02-14 8:13:30" or "14.02.2020" or a datetime object or a timestamp in unix time in µs or ns..... I raised this concern now more then once, but if @jorana and @tunnell do not agree on this point I am fine to approve this PR as soon as you fixed this **kwargs thing.

Update: Just noticed it says data not date sorry, nvm.

:param correction: corrections' name (str type)
:param df: pandas DataFrame.
'''
if 'ONLINE' not in df.columns:
Copy link
Collaborator

@WenzDaniel WenzDaniel Sep 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hej Aaron could you please elaborate where you specified this in the class doc string? I cannot find anything which says "Has to have the column "online" which means..." (same for v1). Did you maybe forget to push your solution?

Update:

Sorry not sure what exactly happened, but github showed me an older version of your code missing the corresponding lines.

@JoranAngevaare
Copy link
Member

JoranAngevaare commented Sep 9, 2020

Aaron, thanks for providing the example in de description. As there is a strong push to merge this soon, I think we could go ahead with a merge, also as this PR doesn't interfere with any other strax logic we could merge it and finetune it later if needed.

However, I tend to agree with Daniel, it's little effort to add some comments and docstrings that actually help the user understand. Writing it in the document class is not sufficient as when you do:

??CorrectionsInterface.write

you will see the functions docstring but not the one of the class, the fact that df is a dataframe does not help the user.

Furthermore, I'm really worried that people will input the wrong kind of objects for time stamps with all kinds of time zones. I don't think there are checks in place to catch this sufficiently.

Without @tunnell s approval we cannot merge at the moment so he will have the final verdict.

is available, meaning a unique set of correction maps.
'''

def __init__(self, host='127.0 0.1', username=None, password=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't there be a . here, i.e. 127.0.0.1?

@ahiguera-mx
Copy link
Contributor Author

Aaron, thanks for providing the example in de description. As there is a strong push to merge this soon, I think we could go ahead with a merge, also as this PR doesn't interfere with any other strax logic we could merge it and finetune it later if needed.

However, I tend to agree with Daniel, it's little effort to add some comments and docstrings that actually help the user understand. Writing it in the document class is not sufficient as when you do:

??CorrectionsInterface.write

you will see the functions docstring but not the one of the class, the fact that df is a dataframe does not help the user.

Furthermore, I'm really worried that people will input the wrong kind of objects for time stamps with all kinds of time zones. I don't think there are checks in place to catch this sufficiently.

Without @tunnell s approval we cannot merge at the moment so he will have the final verdict.

Hi @jorana, regarding the time stamps and time zones, that info is experiment specific, if this is meant to be a general tool that should be open to the user to save the time stamps into the DB as he/she wish

strax/corrections.py Show resolved Hide resolved
return df.set_index('time')

def interpolate(self, what, when, how='interpolate', **kwargs):
'''Interpolate values of a given correction
@staticmethod
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry Aaron, this suggestion is no longer valid of you need self.check_timezone. You can remove this line and use self as a first argument in the function

:param how: Interpolation method, can be either how='interpolate' or how='fill'
:param kwargs: are forward to the interpolation
"""
CorrectionsInterface.check_timezone(when)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.check_timezone should work if you don't have a staticmethod anymore

:param global_config: a map of corrections
:param global_version: global configuration's version
'''
"""
CorrectionsInterface.check_timezone(when)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment above

df_global = self.read(global_config)

context_config = {}
# loop over corrections and versions to get a global context
for correction, version in df_global.iloc[-1][global_version].items():
df = self.read(correction)
df = self.interpolate(df, when)
df = CorrectionsInterface.interpolate(df, when)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment above

if date.tzinfo == pytz.utc:
return date
else:
raise ValueError(f'{date} must be specify in UTC timezone')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a typo here. Perhaps add that the tz info should be in the datetime object:

ValueError(f'{date} must have UTC timezone. Insert datetime object like "datetime.datetime.now(tz=datetime.timezone.utc)"')``` 

@JoranAngevaare JoranAngevaare merged commit fd165f7 into AxFoundation:master Sep 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants