Do not load entire database for all commands #269

sruffell · 2019-12-23T21:35:38Z

This merge request improves the performance when there are many entries in the database by not loading or parsing all the lines in many cases where it is not necessary.

Related to issue #245

sruffell · 2019-12-24T21:45:36Z

I was just thinking about this, and I am thinking that it might make more sense for being/end to walk backwards through the inclusions. That would eliminate most of the use of rbegin/rend and put them in "natural" order since @1 is always the most recent, etc..

lauft · 2019-12-27T16:27:20Z

You mean instead of adding an extra set of iterators (rbegin/rend), to redefine the existing ones? That's fine with me. As you already mentioned, the natural order for Timewarrior is going backwards in time through the intervals.

sruffell · 2020-01-02T14:21:42Z

You mean instead of adding an extra set of iterators (rbegin/rend), to redefine the existing ones?

The pull request already defines two sets of iterators begin/end and rbegin/rend. I was just thinking that the semantics of the two sets should be reversed, which would allow more of code to use the ranged based for loops.

But ok, I'll update the pull request with that change.

sruffell · 2020-01-04T04:45:39Z

I've updated this branch so that range based for loops on the database start with the most recent interval (and renamed the lastLine to firstLine to match).

Also as part of working to remove the users of getAllInclusions, I added an interface to the Database in order to get the set of tags directly from the tags database. This leaves only a single user of getAllInclusions left.

Ubuntu 18.04.3 does not have the -v option to the date utility.

sruffell · 2020-01-06T04:06:25Z

Since the performance appears to be close to O(1) now, I do not intend to do any more work on this branch pending review from @lauft

Since the switch to python3, there is another method that starts with "test" higher up in the stack which produces unhelpful file / line information on a failed test like: ERROR: CommandError on file /usr/lib/python3.6/unittest/case.py line 59 in testPartExecutor: 'yield': This change restores the previous behavior from before the switch to python 3.

Not only does this eliminate the need to copy the stings to the caller, it will also eliminate the need for any iterators over the entries in the Database from having to hold a copy of the lines from the Datafile. Related to GothenburgBitFactory#245.

This allows the database to be treated as a single collection of strings, but can be used to avoid loading the entire database when only interested in recent entries. Related to issue GothenburgBitFactory#245.

Now that we have the iterators, we can standardize on their use. Related to issue GothenburgBitFactory#245.

intervalSummarize is called at the end of most commands. The cost of parsing all the lines in the database can be significant as the size of the database grows. Related to issue GothenburgBitFactory#245.

Related to issue GothenburgBitFactory#245.

This does not appear to be necessary anymore given that the database lines are generated from intervals and are all well formed. Any open interval *should* be at the end of the database. Related to issue GothenburgBitFactory#245.

We can eliminate the need to parse the entire database if we only look for overlaps based on the latest interval. Related to issue GothenburgBitFactory#245.

outerRange is no longer used, since the filter was simply started based on the first line in the database now.

The Database class itself can now be used in range-based for loops for iterating over all the lines.

The inclusion database for the user always starts with the most recent entry. It is now the same way in the code as well.

The database class now separatly tracks tag information. So for the one place where all the inclusions were iterated over in order to build up a tag set, we now instead ask the database for this set directly. Related to issue GothenburgBitFactory#245

The getUntracked, called as part of the `timew gaps` command, is normally looking at a relatively recent interval. We do not want to take the performance hit of loading the entire database into memory when processing this command. Related to issue GothenburgBitFactory#245

All locations in the code that was creating Intervals for all entries in the database have been removed. This function can now be removed as well.

getIntervalsByIds will be used by commands that are loading complete database currently when they really want a few intervals that the user specified by ID. Related to issue GothenburgBitFactory#245

This change eliminates the call to getTracked with an empty filter, which causes the entire database to be parsed. Related to issue GothenburgBitFactory#245

Related to issue GothenburgBitFactory#245

…c intervals This test updates one of the existing tests to make sure that an non-synthetic interval in addition with the synthetic intervals can be moved properly.

Related to issue GothenburgBitFactory#245

sruffell · 2020-01-06T07:29:40Z

I pushed another update. I realized that the original getIntervalsByIds would consider the latest interval twice if there were synthetic intervals, and this condition was not caught in any tests.

I updated two of the tests, annotate.t and move.t, to include a mix of non-synthetic and synthetic intervals and fixed getIntervalsByIds.

This fixes an issue in the modify command since it was first added. This will allow modify to work in the presence of synthetic intervals.

sruffell · 2020-01-08T04:57:25Z

After updating most of the tests in this pull request, I realized that when I first added support for the modify command I did not handle synthetic intervals. This change really isn't specifically related to performance improvements, but it just seemed easier to add it to this series since the easiest fix uses the flattenDatabase helper added as part of it.

lauft

There are some small issues but I give a general thumbs up. 👍

@sruffell: Because this PR has now waited quite a while to get reviewed I would do some final tests on my machine and then merge it as soon as possible. Please address the review comments in a separate PR after the merge.

test/move.t

test/modify.t

src/commands/CmdUntag.cpp

test/modify.t

test/annotate.t

src/Database.h

src/commands/CmdTag.cpp

src/commands/CmdResize.cpp

src/data.cpp

timwarrior coding standard is for there to be curly braces around all code blocks. See GothenburgBitFactory#269 (comment)

firstLine is ambiguous (the first line that was added in time? The first line that will be returned when iterating the database?) See GothenburgBitFactory#269 (comment)

timwarrior coding standard is for there to be curly braces around all code blocks. See #269 (comment)

firstLine is ambiguous (the first line that was added in time? The first line that will be returned when iterating the database?) See #269 (comment)

sruffell force-pushed the do-not-load-entire-database branch from b1da348 to dcdeda1 Compare January 4, 2020 04:43

sruffell mentioned this pull request Jan 4, 2020

Performance Issues #245

Closed

test: Use faketime instead of date -v for relative dates

e3f9989

Ubuntu 18.04.3 does not have the -v option to the date utility.

sruffell force-pushed the do-not-load-entire-database branch 2 times, most recently from a3bd5b4 to 0a4c629 Compare January 6, 2020 04:04

sruffell added 20 commits January 6, 2020 01:22

Database: Add forward/reverse iterator

76c7d0f

This allows the database to be treated as a single collection of strings, but can be used to avoid loading the entire database when only interested in recent entries. Related to issue GothenburgBitFactory#245.

Database: Use reverse iterator in lastLine

09857d8

Now that we have the iterators, we can standardize on their use. Related to issue GothenburgBitFactory#245.

intervalSummarize should not load the entire database

e3f754d

intervalSummarize is called at the end of most commands. The cost of parsing all the lines in the database can be significant as the size of the database grows. Related to issue GothenburgBitFactory#245.

Remove call to getAllInclusions when initializing tag database

43102a0

getTracked does not need to read in entire database

189f473

Related to issue GothenburgBitFactory#245.

getOverlaps should use non-empty range filter

af920ce

We can eliminate the need to parse the entire database if we only look for overlaps based on the latest interval. Related to issue GothenburgBitFactory#245.

Remove unused outerRange

11720cd

outerRange is no longer used, since the filter was simply started based on the first line in the database now.

Database: Remove Database::allLines()

013d5a9

The Database class itself can now be used in range-based for loops for iterating over all the lines.

Database: Switch the natural order from newest inclusion to oldest

33473f8

The inclusion database for the user always starts with the most recent entry. It is now the same way in the code as well.

Remove getAllInclusions helper function

7932daf

All locations in the code that was creating Intervals for all entries in the database have been removed. This function can now be removed as well.

Add helpers flattenDatabase and getIntervalsByIds

a7d2aba

getIntervalsByIds will be used by commands that are loading complete database currently when they really want a few intervals that the user specified by ID. Related to issue GothenburgBitFactory#245

Database: Add method empty

4cca869

Database: add assert in addInterval if start is greater than end

18e80ea

CmdContinue: Do not load entire database

4cc94f9

This change eliminates the call to getTracked with an empty filter, which causes the entire database to be parsed. Related to issue GothenburgBitFactory#245

CmdTag: Do not load entire database

3bfa466

Related to issue GothenburgBitFactory#245

sruffell added 12 commits January 6, 2020 01:22

CmdModify: Do not load entire database

e52f88b

Related to issue GothenburgBitFactory#245

CmdUntag: Do not load entire database when untagging intervals

0e5cc58

Related to issue GothenburgBitFactory#245

test/annotate.t: Annotate a mix of synthetic / non-synthetic intervals

7f62b00

CmdAnnotate: Do not load entire database

9ff25c7

Related to issue GothenburgBitFactory#245

CmdDelete: Do not load entire database

1ff12df

Related to issue GothenburgBitFactory#245

CmdJoin: Do not load entire database

ff6cfdb

Related to issue GothenburgBitFactory#245

CmdLenghten: Do not load entire database

812e208

Related to issue GothenburgBitFactory#245

test/move.t: Make sure move handles mix of synthetic and non-syntheti…

1ca1967

…c intervals This test updates one of the existing tests to make sure that an non-synthetic interval in addition with the synthetic intervals can be moved properly.

CmdMove: Do not load entire database

69e45e6

Related to issue GothenburgBitFactory#245

CmdResize: Do not load entire database

9f697b3

Related to issue GothenburgBitFactory#245

CmdShorten: Do not load entire database

8f48599

Related to issue GothenburgBitFactory#245

CmdSplit: Do not load entire database

45cf400

Related to issue GothenburgBitFactory#245

sruffell force-pushed the do-not-load-entire-database branch from 0a4c629 to 45cf400 Compare January 6, 2020 07:24

CmdModify: Allow modification of synthetic intervals

86bf7f9

This fixes an issue in the modify command since it was first added. This will allow modify to work in the presence of synthetic intervals.

lauft self-requested a review January 12, 2020 21:10

lauft approved these changes Jan 17, 2020

View reviewed changes

lauft added this to the 1.2.1 milestone Jan 17, 2020

lauft merged commit d9480b5 into GothenburgBitFactory:dev Jan 17, 2020

sruffell mentioned this pull request Jan 18, 2020

Do not load entire database cleanup #277

Merged

lauft pushed a commit that referenced this pull request Jan 26, 2020

trivial:coding-style: Add curly braces around blocks modified recently

2fcca6f

timwarrior coding standard is for there to be curly braces around all code blocks. See #269 (comment)

lauft pushed a commit that referenced this pull request Jan 26, 2020

Database: firstLine -> getLatestEntry

1abf6e9

firstLine is ambiguous (the first line that was added in time? The first line that will be returned when iterating the database?) See #269 (comment)

sruffell mentioned this pull request Feb 11, 2020

285: Pass interval id to extensions #286

Merged

sruffell deleted the do-not-load-entire-database branch February 23, 2020 19:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not load entire database for all commands #269

Do not load entire database for all commands #269

sruffell commented Dec 23, 2019

sruffell commented Dec 24, 2019

lauft commented Dec 27, 2019

sruffell commented Jan 2, 2020

sruffell commented Jan 4, 2020

sruffell commented Jan 6, 2020

sruffell commented Jan 6, 2020

sruffell commented Jan 8, 2020

lauft left a comment

Do not load entire database for all commands #269

Do not load entire database for all commands #269

Conversation

sruffell commented Dec 23, 2019

sruffell commented Dec 24, 2019

lauft commented Dec 27, 2019

sruffell commented Jan 2, 2020

sruffell commented Jan 4, 2020

sruffell commented Jan 6, 2020

sruffell commented Jan 6, 2020

sruffell commented Jan 8, 2020

lauft left a comment

Choose a reason for hiding this comment