New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Corrections Management Tool #303
Conversation
Reimplementation of #301 due to Git issue. (See comments therein) |
Hi Aaron, thanks again for your work (and resubmitting the PR). I just send this on slack but might as well post it here for bookkeeping.
That is true, my main suggestion to move it to straxen is that it seems to be build up to read from our database(s):
strax/strax/corrections/cmt_DB_interface.py Line 122 in 31884ec
And our kind of corrections: strax/strax/corrections/cmt_services.py Line 12 in 31884ec
But I do see the point that these kind of corrections are needed in each experiment. I guess most of the general stuff is actually in the database interactions right? Like: https://github.com/AxFoundation/strax/blob/31884ecf47b48814b1047c52d1ebd45a8980860e/strax/corrections/cmt_DB_interface.py. Here you add some functions that are not (sufficiently?) in e.g. Perhaps a similar approach might make sense for the framework you build Aaron? As in make a general class out of this one https://github.com/AxFoundation/strax/blob/31884ecf47b48814b1047c52d1ebd45a8980860e/strax/corrections/cmt_DB_interface.py initialise it in straxen and use that class to exploit your functions https://github.com/AxFoundation/strax/blob/31884ecf47b48814b1047c52d1ebd45a8980860e/strax/corrections/cmt_services.py. What do you think? |
… of corrections management tool
Hi Jorana |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Aaron, I think this was already a first step in the right direction. But I still think this is already way to specify for strax. Could you do it like it is done in case of our storage system? You define a very generic CTM class which has a read and write function defined like
Line 503 in 8578b21
def _read_chunk(self, backend_key, chunk_info, dtype, compressor): |
Line 32 in 8578b21
def _read_chunk(self, backend_key, chunk_info, dtype, compressor): |
You should also add pdmongo to https://github.com/AxFoundation/strax/blob/master/requirements.txt so that travis is not failing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Aaron, things are progressing, hereby some ideas on the current iteration
Hi guys, Thanks again for the feedback, I have made the corresponding changes to the code, still trying to figure out what you guys mean by following what is done in the storage package, but If I can say, we want to be the corrections class self consistent, in that sense the write and read functions are defined within the class |
@ahiguera-mx and I are going to do a code review together on Monday, if others are interested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ahiguera-mx nice work! I think this is sorely needed. There are a few changes I request to make it more clearly generic across experiments (removing XENON specific things). I think we also should PEP8 things and more clearly document the input variables, since these functions will be called a lot by analysts. After those changes, I think it's good to add if others agree. A lot of the style things are just hazing for every new person to strax :)
Thanks Aaron, I think this was already a first step in the right direction. But I still think this is already way to specify for strax. Could you do it like it is done in case of our storage system? You define a very generic CTM class which has a read and write function defined like
Line 503 in 8578b21
def _read_chunk(self, backend_key, chunk_info, dtype, compressor): Which is then later defined in a subclass called e.g. XENONnTCTM(CTM) like here
Line 32 in 8578b21
def _read_chunk(self, backend_key, chunk_info, dtype, compressor): . I think your interpolate function can be already part of the CTM class itself. Further I found some minor things here and there.
@WenzDaniel I think this extra abstraction is a good idea if we have multiple correction systems (like storage backends). However, I don't agree that this should be done now. Every analyst will come across the corrections and probably look here, but almost no analysts are diving into how chunks are loaded from disk. The goal of this code should be ultimate transparency as the only part of strax that many people will look at. Therefore, I think the abstraction obfuscates the code by having two levels of abstraction for one implementation, especially for novice.
At the moment, I think @ahiguera-mx could make more clear that they can overload read/write at the moment then it should work. If somebody wants a file backend, I think doing what you suggest is the right way to go since the abstraction will help. Until then, I think it might obfuscate things.... what do you think?
Hi @jorana @WenzDaniel @tunnell |
@tunnell fine with me and I totally see you point of readability and agree. But, I was also thinking more in a direction, that strax should be usable by any LXe TPC experiment and therefore as little explicit as possible. Other experiments may have a different type of database. I thought this was the idea of having this code in strax in the first place. |
If other experiments don't want to use MongoDB for storing configuration (like we do for RunDB), then I'd add it. I see your point, but am just hesitant to introduce a layer of abstraction for a use case that might not occur. It's just 100 lines of code, so better to just abstract when the other use case comes since it might introduce some other requirements in my view. |
Hi @jorana @WenzDaniel @tunnell |
Hej @ahiguera-mx it looks very nice thanks a lot for your hard work. For the code itself I just have some minor style comments. I did not look in the logic of your code. But given your nice presentation in one of the Team-A telecons and that Chris reviewed your code I guess it is not necessary. One bigger addition which still is required are some automated tests. A high percentage of the strax code is tested automatically and it would be nice if we could keep this coverage (Also one of our automated test will prevent us from merging your code if the coverage drops by too much.). I am not an expert on how to write tests involving mongo, but mabye @tunnell knows? |
Hi @ahiguera-mx , can you ping me when you've had time to address Daniels comments. I might be able to also look a little into the syntax of your mongo logic but I'd rather wait until you've had time to address his comments if that is okay with you. |
Aren't most Mongo interfaces untested at the moment? I think tests are good to add, but we should also not go too crazy. I'd do a "read in, read out" test. Within TravisCI, you can turn on MongoDB locally. There is also mongomock that can be used. I think there is pressure growing to finalize the corrections to avoid processing chaos when we get more data... |
Hi @jorana I have addressed @WenzDaniel comments, please have a look. Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ahiguera-mx , thanks for the effort you put in here. Things have been progressing a lot and apart from some changes mostly related to adding comments and tests that people are using this tool the way you intended it to be used.
Please let me know if you have any questions, I think we should be able to merge it soon.
Finally I have one additional request. Could you edit your PR-message above to include a limited example of how the data-frames should minimally look like. This will help bookkeeping if we ever need to revise this PR and should be quite easy for you to make a screenshot of.
strax/corrections.py
Outdated
:param correction: corrections' name (str type) | ||
:param df: pandas DataFrame. | ||
''' | ||
if 'ONLINE' not in df.columns: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be documented better, I cannot find the difference between 'online' and 'v1' in this code. Neither can I understand why we have online.
Finally I would assume that writing it like this is easier:
def write(self, correction, df, required_columns = ('ONLINE', 'v1'):
"""
...
:param required_columns: <please explain>
...
"""
for req in required_columns:
if req not in df.columns:
raise ValueError(f'Muls specify {req} in dataframe')
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Information on online and v1 is specify at docstring of the class, I have added required_columns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hej Aaron could you please elaborate where you specified this in the class doc string? I cannot find anything which says "Has to have the column "online" which means..." (same for v1). Did you maybe forget to push your solution?
Update:
Sorry not sure what exactly happened, but github showed me an older version of your code missing the corresponding lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hej Aaron, thanks a lot. I think the one comment I unresolved must be fixed first since it will cause errors. Beside that, frankly speaking I am still not happy with your doc strings. Many of them are not useful at all. E.g. line 61: ":param what: date" can basically mean anything. It could be a string like this "2020-02-14 8:13:30" or "14.02.2020" or a datetime object or a timestamp in unix time in µs or ns..... I raised this concern now more then once, but if @jorana and @tunnell do not agree on this point I am fine to approve this PR as soon as you fixed this **kwargs thing.
Update: Just noticed it says data not date sorry, nvm.
strax/corrections.py
Outdated
:param correction: corrections' name (str type) | ||
:param df: pandas DataFrame. | ||
''' | ||
if 'ONLINE' not in df.columns: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hej Aaron could you please elaborate where you specified this in the class doc string? I cannot find anything which says "Has to have the column "online" which means..." (same for v1). Did you maybe forget to push your solution?
Update:
Sorry not sure what exactly happened, but github showed me an older version of your code missing the corresponding lines.
Aaron, thanks for providing the example in de description. As there is a strong push to merge this soon, I think we could go ahead with a merge, also as this PR doesn't interfere with any other strax logic we could merge it and finetune it later if needed. However, I tend to agree with Daniel, it's little effort to add some comments and docstrings that actually help the user understand. Writing it in the document class is not sufficient as when you do: ??CorrectionsInterface.write you will see the functions docstring but not the one of the class, the fact that Furthermore, I'm really worried that people will input the wrong kind of objects for time stamps with all kinds of time zones. I don't think there are checks in place to catch this sufficiently. Without @tunnell s approval we cannot merge at the moment so he will have the final verdict. |
strax/corrections.py
Outdated
is available, meaning a unique set of correction maps. | ||
''' | ||
|
||
def __init__(self, host='127.0 0.1', username=None, password=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't there be a .
here, i.e. 127.0.0.1?
Hi @jorana, regarding the time stamps and time zones, that info is experiment specific, if this is meant to be a general tool that should be open to the user to save the time stamps into the DB as he/she wish |
strax/corrections.py
Outdated
return df.set_index('time') | ||
|
||
def interpolate(self, what, when, how='interpolate', **kwargs): | ||
'''Interpolate values of a given correction | ||
@staticmethod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry Aaron, this suggestion is no longer valid of you need self.check_timezone
. You can remove this line and use self as a first argument in the function
strax/corrections.py
Outdated
:param how: Interpolation method, can be either how='interpolate' or how='fill' | ||
:param kwargs: are forward to the interpolation | ||
""" | ||
CorrectionsInterface.check_timezone(when) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.check_timezone should work if you don't have a staticmethod anymore
strax/corrections.py
Outdated
:param global_config: a map of corrections | ||
:param global_version: global configuration's version | ||
''' | ||
""" | ||
CorrectionsInterface.check_timezone(when) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comment above
strax/corrections.py
Outdated
df_global = self.read(global_config) | ||
|
||
context_config = {} | ||
# loop over corrections and versions to get a global context | ||
for correction, version in df_global.iloc[-1][global_version].items(): | ||
df = self.read(correction) | ||
df = self.interpolate(df, when) | ||
df = CorrectionsInterface.interpolate(df, when) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comment above
strax/corrections.py
Outdated
if date.tzinfo == pytz.utc: | ||
return date | ||
else: | ||
raise ValueError(f'{date} must be specify in UTC timezone') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a typo here. Perhaps add that the tz info should be in the datetime object:
ValueError(f'{date} must have UTC timezone. Insert datetime object like "datetime.datetime.now(tz=datetime.timezone.utc)"')```
Corrections are an important part of the event building process, applying these corrections help us to remove detectors effects so we can have a “correct” measurement e.g corrected S2 area [PE]. Having a common tool that maintains and does bookkeeping can be beneficial. Thus, we are proposing a corrections management tool (CMT). Corrections are placed in DB, the format is a panda DataFrame with DatetimeIndex. Currently examples of electron lifetime and pmt gains corrections are implemented via cmt_services.py.
This PR should be transparent, core changes to strax should be minor
example_corrections.pdf