New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not load entire database for all commands #269
Do not load entire database for all commands #269
Conversation
I was just thinking about this, and I am thinking that it might make more sense for being/end to walk backwards through the inclusions. That would eliminate most of the use of rbegin/rend and put them in "natural" order since @1 is always the most recent, etc.. |
You mean instead of adding an extra set of iterators ( |
The pull request already defines two sets of iterators begin/end and rbegin/rend. I was just thinking that the semantics of the two sets should be reversed, which would allow more of code to use the ranged based for loops. But ok, I'll update the pull request with that change. |
b1da348
to
dcdeda1
Compare
I've updated this branch so that range based for loops on the database start with the most recent interval (and renamed the lastLine to firstLine to match). Also as part of working to remove the users of getAllInclusions, I added an interface to the Database in order to get the set of tags directly from the tags database. This leaves only a single user of getAllInclusions left. |
Ubuntu 18.04.3 does not have the -v option to the date utility.
a3bd5b4
to
0a4c629
Compare
Since the performance appears to be close to O(1) now, I do not intend to do any more work on this branch pending review from @lauft |
Since the switch to python3, there is another method that starts with "test" higher up in the stack which produces unhelpful file / line information on a failed test like: ERROR: CommandError on file /usr/lib/python3.6/unittest/case.py line 59 in testPartExecutor: 'yield': This change restores the previous behavior from before the switch to python 3.
Not only does this eliminate the need to copy the stings to the caller, it will also eliminate the need for any iterators over the entries in the Database from having to hold a copy of the lines from the Datafile. Related to GothenburgBitFactory#245.
This allows the database to be treated as a single collection of strings, but can be used to avoid loading the entire database when only interested in recent entries. Related to issue GothenburgBitFactory#245.
Now that we have the iterators, we can standardize on their use. Related to issue GothenburgBitFactory#245.
intervalSummarize is called at the end of most commands. The cost of parsing all the lines in the database can be significant as the size of the database grows. Related to issue GothenburgBitFactory#245.
This does not appear to be necessary anymore given that the database lines are generated from intervals and are all well formed. Any open interval *should* be at the end of the database. Related to issue GothenburgBitFactory#245.
We can eliminate the need to parse the entire database if we only look for overlaps based on the latest interval. Related to issue GothenburgBitFactory#245.
outerRange is no longer used, since the filter was simply started based on the first line in the database now.
The Database class itself can now be used in range-based for loops for iterating over all the lines.
The inclusion database for the user always starts with the most recent entry. It is now the same way in the code as well.
The database class now separatly tracks tag information. So for the one place where all the inclusions were iterated over in order to build up a tag set, we now instead ask the database for this set directly. Related to issue GothenburgBitFactory#245
The getUntracked, called as part of the `timew gaps` command, is normally looking at a relatively recent interval. We do not want to take the performance hit of loading the entire database into memory when processing this command. Related to issue GothenburgBitFactory#245
All locations in the code that was creating Intervals for all entries in the database have been removed. This function can now be removed as well.
getIntervalsByIds will be used by commands that are loading complete database currently when they really want a few intervals that the user specified by ID. Related to issue GothenburgBitFactory#245
This change eliminates the call to getTracked with an empty filter, which causes the entire database to be parsed. Related to issue GothenburgBitFactory#245
Related to issue GothenburgBitFactory#245
Related to issue GothenburgBitFactory#245
Related to issue GothenburgBitFactory#245
Related to issue GothenburgBitFactory#245
Related to issue GothenburgBitFactory#245
Related to issue GothenburgBitFactory#245
…c intervals This test updates one of the existing tests to make sure that an non-synthetic interval in addition with the synthetic intervals can be moved properly.
Related to issue GothenburgBitFactory#245
Related to issue GothenburgBitFactory#245
Related to issue GothenburgBitFactory#245
Related to issue GothenburgBitFactory#245
0a4c629
to
45cf400
Compare
I pushed another update. I realized that the original getIntervalsByIds would consider the latest interval twice if there were synthetic intervals, and this condition was not caught in any tests. I updated two of the tests, annotate.t and move.t, to include a mix of non-synthetic and synthetic intervals and fixed getIntervalsByIds. |
This fixes an issue in the modify command since it was first added. This will allow modify to work in the presence of synthetic intervals.
After updating most of the tests in this pull request, I realized that when I first added support for the modify command I did not handle synthetic intervals. This change really isn't specifically related to performance improvements, but it just seemed easier to add it to this series since the easiest fix uses the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some small issues but I give a general thumbs up. 👍
@sruffell: Because this PR has now waited quite a while to get reviewed I would do some final tests on my machine and then merge it as soon as possible. Please address the review comments in a separate PR after the merge.
timwarrior coding standard is for there to be curly braces around all code blocks. See GothenburgBitFactory#269 (comment)
firstLine is ambiguous (the first line that was added in time? The first line that will be returned when iterating the database?) See GothenburgBitFactory#269 (comment)
timwarrior coding standard is for there to be curly braces around all code blocks. See #269 (comment)
firstLine is ambiguous (the first line that was added in time? The first line that will be returned when iterating the database?) See #269 (comment)
This merge request improves the performance when there are many entries in the database by not loading or parsing all the lines in many cases where it is not necessary.
Related to issue #245