Add read by index plus some extra checks #529

ahiguera-mx · 2021-09-01T21:43:14Z

What is the problem / what does the code in this PR do
Add function to read by index (collections include index in the db)
Can you briefly describe how it works?
Without indexes, a MongoDB database has to scan every document in a collection to look for the relevant documents. However, using indexes greatly reduces the number of documents MongoDB needs to scan
Can you give a minimal working example (or illustrate with a figure)?
db.collection.find({'document':'value'}).explain()['executionStats']) should return IXSCAN which is faster than COLLSCAN
Please include the following if applicable:

Update the docstring(s)
Update the documentation
Tests to check the (new) code is working as desired.
Does it solve one of the open issues on github?

Please make sure that all automated tests have passed before asking for a review (you can save the PR as a draft otherwise).

jmosbacher

still some work to do in updating values

strax/corrections.py

ahiguera-mx · 2021-09-02T22:13:56Z

Hi Yossi,
Thanks for the very useful comments, I believe the current logic covers all usage cases

jmosbacher

I think this is safe to merge but I think it would really help to have a few test cases added at some point to ensure the updating logic remains consistent when things change

WenzDaniel · 2021-09-06T05:26:30Z

Hej guys short question. Is this PR relevant for reprocessing? If not I would prefer to add those tests first, mentioned by Yossi. I know it is annoying, but based on experience I also know if it is not done right away we tend to forget.

ahiguera-mx · 2021-09-07T14:44:42Z

Hej guys short question. Is this PR relevant for reprocessing? If not I would prefer to add those tests first, mentioned by Yossi. I know it is annoying, but based on experience I also know if it is not done right away we tend to forget.

I think is needed for reprocessing, but @ershockley should confirm. We already have some automated test but those are in straxen, here https://github.com/XENONnT/straxen/blob/master/tests/test_cmt.py

ershockley · 2021-09-09T17:48:30Z

Yes we need this for reprocessing.

WenzDaniel · 2021-09-10T05:52:39Z

We already have some automated test but those are in straxen,

Yes but this is strax and not straxen. Strax is a stand a lone thing. Hence we should test it also here. As I said I know it is an annoying work, but it pays off later.

WenzDaniel

In your write method: :param required_columns: DataFrame must include an online and v1 columns should not this be a property of the class. Can you have the same class instance but with different requirements on the minimal corrections shape? Or is this an option load specific columns. E.g. I can also specify to load ONLINE till version v4? In this case I think the doc-string is a bit misleading.

I am sorry, but for me it is not possible to tell whether the code will do the desired change. Could you give some additional information and draw a sketch?
In addition I think we need desperately much more documentation here. I am sorry, but to be honest I would not approve this PR.

strax/corrections.py

ahiguera-mx · 2021-09-22T17:17:24Z

Hi @WenzDaniel
Thanks a lot for the useful comments, I've tried to resolve all the issues raised to the best of my ability, please let me know if I missed something

ahiguera-mx · 2021-09-22T20:43:43Z

Hi @WenzDaniel,
In the latest commit I've added a test for the corrections part, so with this I think I addressed all comments. Thanks a lot again for the very useful review

requirements.txt

jmosbacher · 2021-09-23T05:56:23Z

I would like to see some more test cases. e.g. changes to existing dates with/without nans both in the past and future and for online and offline case. Also what about inserting new dates in-between existing ones?

ahiguera-mx · 2021-09-23T11:14:58Z

I would like to see some more test cases. e.g. changes to existing dates with/without nans both in the past and future and for online and offline case. Also what about inserting new dates in-between existing ones?

Hi @jmosbacher
I've added more test, some of the test you suggest would throw an error which will break the automated test like changed non-nan values in the past

jmosbacher · 2021-09-23T11:20:10Z

I would like to see some more test cases. e.g. changes to existing dates with/without nans both in the past and future and for online and offline case. Also what about inserting new dates in-between existing ones?

Hi @jmosbacher
I've added more test, some of the test you suggest would throw an error which will break the automated test like changed non-nan values in the past

@ahiguera-mx tests should be separated into different functions, the cmt setup should be a fixture and pytest.raises(Exception) should be used for testing correct behavior of errors.

WenzDaniel · 2021-09-23T11:57:30Z

and pytest.raises(Exception) should be used for testing correct behavior of errors.

Yes there are several different methods which you can use to test whether a function or method returns the correct exception. unittest also offers such options.

… class with simple functions

WenzDaniel · 2021-09-24T14:30:50Z

Hej Aaron I have a general comment about your tests. Maybe I am missing something, but I think you are not really testing if the outcome is correct, is not it? E.g. take for example test_modify_nan. You modify df2 and write the result, but you are not testing if the result is written correctly is not it?
I would expect another read and check,

ahiguera-mx · 2021-09-24T14:41:21Z

Hej Aaron I have a general comment about your tests. Maybe I am missing something, but I think you are not really testing if the outcome is correct, is not it? E.g. take for example test_modify_nan. You modify df2 and write the result, but you are not testing if the result is written correctly is not it?
I would expect another read and check,

Hey @WenzDaniel
If there were any issues the write() will throw an error, as in the example test_change_past
For read and write a general test is done at test_db()

jmosbacher · 2021-09-24T14:53:10Z

Hej Aaron I have a general comment about your tests. Maybe I am missing something, but I think you are not really testing if the outcome is correct, is not it? E.g. take for example test_modify_nan. You modify df2 and write the result, but you are not testing if the result is written correctly is not it?
I would expect another read and check,

Hey @WenzDaniel
If there were any issues the write() will throw an error, as in the example test_change_past
For read and write a general test is done at test_db()

@ahiguera-mx this is also the general comment I gave during the meeting yesterday, a test should not only check that the function doesn't throw an error when it runs to but also check that it has performed the correct task ie the resulting dataframe is consistent with our rules of what can and cannot be changed.

ahiguera-mx · 2021-09-24T14:58:14Z

Hej Aaron I have a general comment about your tests. Maybe I am missing something, but I think you are not really testing if the outcome is correct, is not it? E.g. take for example test_modify_nan. You modify df2 and write the result, but you are not testing if the result is written correctly is not it?
I would expect another read and check,

Hey @WenzDaniel
If there were any issues the write() will throw an error, as in the example test_change_past
For read and write a general test is done at test_db()

@ahiguera-mx this is also the general comment I gave during the meeting yesterday, a test should not only check that the function doesn't throw an error when it runs to but also check that it has performed the correct task ie the resulting dataframe is consistent with our rules of what can and cannot be changed.
@jmosbacher @WenzDaniel
What I meant by "function throwing an error" is that the result is consistent with our rules of what can and cannot changed, I can add another read() and assert that dataframe is identical to what we modify, but it seems redundant to me. However if you guys think is necessary I'm happy to add it

WenzDaniel · 2021-09-25T12:16:50Z

Hej Aaron, thanks for the clarification. I overlooked your test_db. I think things are already looking quite nice. I have one last suggestion and a request.

Your tests have quite some duplicated code. You may want to look into this test for example. You can see this set-up method. It is called always before you run any other test of this sub-class. In this way you could for example initialize your db and your dataframes. In this way we reduce the code duplication and you do not have to change all tests again in case we modify some CMT function.

Regarding your write-method. Could we quickly meet Monday after the analysis meeting such that we can check if we can simplify the logic. I still do not understand your code. It is quite nested, but maybe I also have a wrong shape of the corrections in my mind.

ahiguera-mx and others added 7 commits August 4, 2021 18:56

A few improvements on handling ONLINE vs OFFLINE corrections

eb29e0f

comply with codeFactor suggestion

22d9cc4

Merge branch 'master' into master

931aa78

Merge branch 'AxFoundation:master' into master

354a379

Merge branch 'AxFoundation:master' into master

a06d9fc

Merge branch 'AxFoundation:master' into master

7a3086d

Add query by index, it is should be faster

aedba43

ahiguera-mx mentioned this pull request Sep 1, 2021

Use read by index and check for NaNs XENONnT/straxen#661

Merged

ahiguera-mx requested a review from jmosbacher September 1, 2021 21:53

jmosbacher requested changes Sep 2, 2021

View reviewed changes

strax/corrections.py Outdated Show resolved Hide resolved

strax/corrections.py Show resolved Hide resolved

strax/corrections.py Outdated Show resolved Hide resolved

strax/corrections.py Show resolved Hide resolved

strax/corrections.py Outdated Show resolved Hide resolved

ahiguera-mx added 2 commits September 2, 2021 11:32

few changes based on review

51c185e

few tweaks

7f264a2

ahiguera-mx requested a review from ershockley September 2, 2021 22:12

jmosbacher approved these changes Sep 3, 2021

View reviewed changes

ershockley approved these changes Sep 3, 2021

View reviewed changes

WenzDaniel reviewed Sep 10, 2021

View reviewed changes

strax/corrections.py Outdated Show resolved Hide resolved

strax/corrections.py Show resolved Hide resolved

strax/corrections.py Show resolved Hide resolved

strax/corrections.py Outdated Show resolved Hide resolved

strax/corrections.py Show resolved Hide resolved

a1exndr reviewed Sep 10, 2021

View reviewed changes

strax/corrections.py Show resolved Hide resolved

ahiguera-mx and others added 4 commits September 22, 2021 10:35

Merge branch 'AxFoundation:master' into master

fd7d557

update based on review

9ad7564

Add developer documentation for corrections class

72fd4e5

remove extra line

6b2d321

ahiguera-mx and others added 3 commits September 22, 2021 15:18

update format

92ed18c

Add test for corrections interface, include mongomock in requirements

7cb8b56

Merge branch 'master' of https://github.com/ahiguera-mx/strax

15f0542

ahiguera-mx added 2 commits September 22, 2021 15:31

Add mongomock to extra_requirements

58da277

add mongomock client as an option for testing

b299ab8

jmosbacher reviewed Sep 23, 2021

View reviewed changes

requirements.txt Outdated Show resolved Hide resolved

Add more test cases

eef4409

ahiguera-mx added 7 commits September 23, 2021 12:35

Improvements on testing

36fcc03

fix issue with unittest

0658443

try another fix with unittest

13ba7b2

fix

b7c56ba

since pytest cannot test classes with a __init__ constructor, replace…

ec5db13

… class with simple functions

fix

06fc2b1

extra line

86f05f2

WenzDaniel merged commit 5466cb4 into AxFoundation:master Sep 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add read by index plus some extra checks #529

Add read by index plus some extra checks #529

ahiguera-mx commented Sep 1, 2021

jmosbacher left a comment

ahiguera-mx commented Sep 2, 2021

jmosbacher left a comment

WenzDaniel commented Sep 6, 2021

ahiguera-mx commented Sep 7, 2021

ershockley commented Sep 9, 2021

WenzDaniel commented Sep 10, 2021

WenzDaniel left a comment •

edited

ahiguera-mx commented Sep 22, 2021

ahiguera-mx commented Sep 22, 2021

jmosbacher commented Sep 23, 2021

ahiguera-mx commented Sep 23, 2021

jmosbacher commented Sep 23, 2021

WenzDaniel commented Sep 23, 2021

WenzDaniel commented Sep 24, 2021

ahiguera-mx commented Sep 24, 2021

jmosbacher commented Sep 24, 2021

ahiguera-mx commented Sep 24, 2021

WenzDaniel commented Sep 25, 2021

Add read by index plus some extra checks #529

Add read by index plus some extra checks #529

Conversation

ahiguera-mx commented Sep 1, 2021

jmosbacher left a comment

Choose a reason for hiding this comment

ahiguera-mx commented Sep 2, 2021

jmosbacher left a comment

Choose a reason for hiding this comment

WenzDaniel commented Sep 6, 2021

ahiguera-mx commented Sep 7, 2021

ershockley commented Sep 9, 2021

WenzDaniel commented Sep 10, 2021

WenzDaniel left a comment • edited

Choose a reason for hiding this comment

ahiguera-mx commented Sep 22, 2021

ahiguera-mx commented Sep 22, 2021

jmosbacher commented Sep 23, 2021

ahiguera-mx commented Sep 23, 2021

jmosbacher commented Sep 23, 2021

WenzDaniel commented Sep 23, 2021

WenzDaniel commented Sep 24, 2021

ahiguera-mx commented Sep 24, 2021

jmosbacher commented Sep 24, 2021

ahiguera-mx commented Sep 24, 2021

WenzDaniel commented Sep 25, 2021

WenzDaniel left a comment •

edited