Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add read by index plus some extra checks #529

Merged
merged 26 commits into from Sep 27, 2021

Conversation

ahiguera-mx
Copy link
Contributor

What is the problem / what does the code in this PR do
Add function to read by index (collections include index in the db)
Can you briefly describe how it works?
Without indexes, a MongoDB database has to scan every document in a collection to look for the relevant documents. However, using indexes greatly reduces the number of documents MongoDB needs to scan
Can you give a minimal working example (or illustrate with a figure)?
db.collection.find({'document':'value'}).explain()['executionStats']) should return IXSCAN which is faster than COLLSCAN
Please include the following if applicable:

  • Update the docstring(s)
  • Update the documentation
  • Tests to check the (new) code is working as desired.
  • Does it solve one of the open issues on github?

Please make sure that all automated tests have passed before asking for a review (you can save the PR as a draft otherwise).

Copy link
Contributor

@jmosbacher jmosbacher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still some work to do in updating values

strax/corrections.py Outdated Show resolved Hide resolved
strax/corrections.py Show resolved Hide resolved
strax/corrections.py Outdated Show resolved Hide resolved
strax/corrections.py Show resolved Hide resolved
strax/corrections.py Outdated Show resolved Hide resolved
@ahiguera-mx
Copy link
Contributor Author

Hi Yossi,
Thanks for the very useful comments, I believe the current logic covers all usage cases

Copy link
Contributor

@jmosbacher jmosbacher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is safe to merge but I think it would really help to have a few test cases added at some point to ensure the updating logic remains consistent when things change

@WenzDaniel
Copy link
Collaborator

Hej guys short question. Is this PR relevant for reprocessing? If not I would prefer to add those tests first, mentioned by Yossi. I know it is annoying, but based on experience I also know if it is not done right away we tend to forget.

@ahiguera-mx
Copy link
Contributor Author

Hej guys short question. Is this PR relevant for reprocessing? If not I would prefer to add those tests first, mentioned by Yossi. I know it is annoying, but based on experience I also know if it is not done right away we tend to forget.

I think is needed for reprocessing, but @ershockley should confirm. We already have some automated test but those are in straxen, here https://github.com/XENONnT/straxen/blob/master/tests/test_cmt.py

@ershockley
Copy link
Contributor

Yes we need this for reprocessing.

@WenzDaniel
Copy link
Collaborator

We already have some automated test but those are in straxen,

Yes but this is strax and not straxen. Strax is a stand a lone thing. Hence we should test it also here. As I said I know it is an annoying work, but it pays off later.

Copy link
Collaborator

@WenzDaniel WenzDaniel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your write method: :param required_columns: DataFrame must include an online and v1 columns should not this be a property of the class. Can you have the same class instance but with different requirements on the minimal corrections shape? Or is this an option load specific columns. E.g. I can also specify to load ONLINE till version v4? In this case I think the doc-string is a bit misleading.

I am sorry, but for me it is not possible to tell whether the code will do the desired change. Could you give some additional information and draw a sketch?
In addition I think we need desperately much more documentation here. I am sorry, but to be honest I would not approve this PR.

strax/corrections.py Outdated Show resolved Hide resolved
strax/corrections.py Show resolved Hide resolved
strax/corrections.py Show resolved Hide resolved
strax/corrections.py Outdated Show resolved Hide resolved
strax/corrections.py Show resolved Hide resolved
@ahiguera-mx
Copy link
Contributor Author

Hi @WenzDaniel
Thanks a lot for the useful comments, I've tried to resolve all the issues raised to the best of my ability, please let me know if I missed something

@ahiguera-mx
Copy link
Contributor Author

Hi @WenzDaniel,
In the latest commit I've added a test for the corrections part, so with this I think I addressed all comments. Thanks a lot again for the very useful review

requirements.txt Outdated Show resolved Hide resolved
@jmosbacher
Copy link
Contributor

I would like to see some more test cases. e.g. changes to existing dates with/without nans both in the past and future and for online and offline case. Also what about inserting new dates in-between existing ones?

@ahiguera-mx
Copy link
Contributor Author

I would like to see some more test cases. e.g. changes to existing dates with/without nans both in the past and future and for online and offline case. Also what about inserting new dates in-between existing ones?

Hi @jmosbacher
I've added more test, some of the test you suggest would throw an error which will break the automated test like changed non-nan values in the past

@jmosbacher
Copy link
Contributor

I would like to see some more test cases. e.g. changes to existing dates with/without nans both in the past and future and for online and offline case. Also what about inserting new dates in-between existing ones?

Hi @jmosbacher
I've added more test, some of the test you suggest would throw an error which will break the automated test like changed non-nan values in the past

@ahiguera-mx tests should be separated into different functions, the cmt setup should be a fixture and pytest.raises(Exception) should be used for testing correct behavior of errors.

@WenzDaniel
Copy link
Collaborator

and pytest.raises(Exception) should be used for testing correct behavior of errors.

Yes there are several different methods which you can use to test whether a function or method returns the correct exception. unittest also offers such options.

@WenzDaniel
Copy link
Collaborator

Hej Aaron I have a general comment about your tests. Maybe I am missing something, but I think you are not really testing if the outcome is correct, is not it? E.g. take for example test_modify_nan. You modify df2 and write the result, but you are not testing if the result is written correctly is not it?
I would expect another read and check,

@ahiguera-mx
Copy link
Contributor Author

Hej Aaron I have a general comment about your tests. Maybe I am missing something, but I think you are not really testing if the outcome is correct, is not it? E.g. take for example test_modify_nan. You modify df2 and write the result, but you are not testing if the result is written correctly is not it?
I would expect another read and check,

Hey @WenzDaniel
If there were any issues the write() will throw an error, as in the example test_change_past
For read and write a general test is done at test_db()

@jmosbacher
Copy link
Contributor

Hej Aaron I have a general comment about your tests. Maybe I am missing something, but I think you are not really testing if the outcome is correct, is not it? E.g. take for example test_modify_nan. You modify df2 and write the result, but you are not testing if the result is written correctly is not it?
I would expect another read and check,

Hey @WenzDaniel
If there were any issues the write() will throw an error, as in the example test_change_past
For read and write a general test is done at test_db()

@ahiguera-mx this is also the general comment I gave during the meeting yesterday, a test should not only check that the function doesn't throw an error when it runs to but also check that it has performed the correct task ie the resulting dataframe is consistent with our rules of what can and cannot be changed.

@ahiguera-mx
Copy link
Contributor Author

Hej Aaron I have a general comment about your tests. Maybe I am missing something, but I think you are not really testing if the outcome is correct, is not it? E.g. take for example test_modify_nan. You modify df2 and write the result, but you are not testing if the result is written correctly is not it?
I would expect another read and check,

Hey @WenzDaniel
If there were any issues the write() will throw an error, as in the example test_change_past
For read and write a general test is done at test_db()

@ahiguera-mx this is also the general comment I gave during the meeting yesterday, a test should not only check that the function doesn't throw an error when it runs to but also check that it has performed the correct task ie the resulting dataframe is consistent with our rules of what can and cannot be changed.
@jmosbacher @WenzDaniel
What I meant by "function throwing an error" is that the result is consistent with our rules of what can and cannot changed, I can add another read() and assert that dataframe is identical to what we modify, but it seems redundant to me. However if you guys think is necessary I'm happy to add it

@WenzDaniel
Copy link
Collaborator

Hej Aaron, thanks for the clarification. I overlooked your test_db. I think things are already looking quite nice. I have one last suggestion and a request.

Your tests have quite some duplicated code. You may want to look into this test for example. You can see this set-up method. It is called always before you run any other test of this sub-class. In this way you could for example initialize your db and your dataframes. In this way we reduce the code duplication and you do not have to change all tests again in case we modify some CMT function.

Regarding your write-method. Could we quickly meet Monday after the analysis meeting such that we can check if we can simplify the logic. I still do not understand your code. It is quite nested, but maybe I also have a wrong shape of the corrections in my mind.

@WenzDaniel WenzDaniel merged commit 5466cb4 into AxFoundation:master Sep 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants