New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add read by index plus some extra checks #529
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still some work to do in updating values
Hi Yossi, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is safe to merge but I think it would really help to have a few test cases added at some point to ensure the updating logic remains consistent when things change
Hej guys short question. Is this PR relevant for reprocessing? If not I would prefer to add those tests first, mentioned by Yossi. I know it is annoying, but based on experience I also know if it is not done right away we tend to forget. |
I think is needed for reprocessing, but @ershockley should confirm. We already have some automated test but those are in straxen, here https://github.com/XENONnT/straxen/blob/master/tests/test_cmt.py |
Yes we need this for reprocessing. |
Yes but this is strax and not straxen. Strax is a stand a lone thing. Hence we should test it also here. As I said I know it is an annoying work, but it pays off later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In your write method: :param required_columns: DataFrame must include an online and v1 columns
should not this be a property of the class. Can you have the same class instance but with different requirements on the minimal corrections shape? Or is this an option load specific columns. E.g. I can also specify to load ONLINE till version v4? In this case I think the doc-string is a bit misleading.
I am sorry, but for me it is not possible to tell whether the code will do the desired change. Could you give some additional information and draw a sketch?
In addition I think we need desperately much more documentation here. I am sorry, but to be honest I would not approve this PR.
Hi @WenzDaniel |
Hi @WenzDaniel, |
I would like to see some more test cases. e.g. changes to existing dates with/without nans both in the past and future and for online and offline case. Also what about inserting new dates in-between existing ones? |
Hi @jmosbacher |
@ahiguera-mx tests should be separated into different functions, the cmt setup should be a fixture and pytest.raises(Exception) should be used for testing correct behavior of errors. |
Yes there are several different methods which you can use to test whether a function or method returns the correct exception. unittest also offers such options. |
Hej Aaron I have a general comment about your tests. Maybe I am missing something, but I think you are not really testing if the outcome is correct, is not it? E.g. take for example |
Hey @WenzDaniel |
@ahiguera-mx this is also the general comment I gave during the meeting yesterday, a test should not only check that the function doesn't throw an error when it runs to but also check that it has performed the correct task ie the resulting dataframe is consistent with our rules of what can and cannot be changed. |
|
Hej Aaron, thanks for the clarification. I overlooked your Your tests have quite some duplicated code. You may want to look into this test for example. You can see this set-up method. It is called always before you run any other test of this sub-class. In this way you could for example initialize your db and your dataframes. In this way we reduce the code duplication and you do not have to change all tests again in case we modify some CMT function. Regarding your |
What is the problem / what does the code in this PR do
Add function to read by index (collections include index in the db)
Can you briefly describe how it works?
Without indexes, a MongoDB database has to scan every document in a collection to look for the relevant documents. However, using indexes greatly reduces the number of documents MongoDB needs to scan
Can you give a minimal working example (or illustrate with a figure)?
db.collection.find({'document':'value'}).explain()['executionStats'])
should returnIXSCAN
which is faster thanCOLLSCAN
Please include the following if applicable:
Please make sure that all automated tests have passed before asking for a review (you can save the PR as a draft otherwise).