New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQlite datastore #57

Open
wants to merge 22 commits into
base: master
from

Conversation

Projects
None yet
4 participants
@johan-bjareholt
Copy link
Member

johan-bjareholt commented May 13, 2018

Includes peewee_v2 -> sqlite_v1 migration

@johan-bjareholt

This comment has been minimized.

Copy link
Member

johan-bjareholt commented May 13, 2018

sqlite storage seems to fail completely on windows in appveyor

@johan-bjareholt

This comment has been minimized.

Copy link
Member

johan-bjareholt commented May 13, 2018

No events seem to insert on windows not sure why, need help debugging.
Things seem to be pretty stable otherwise.

@ErikBjare

This comment has been minimized.

Copy link
Member

ErikBjare commented May 14, 2018

Seems to work fine on Windows 64bit, but not on Windows 32bit? Weird indeed.

@johan-bjareholt

This comment has been minimized.

Copy link
Member

johan-bjareholt commented May 14, 2018

@ErikBjare Duh! I'm using sys.maxsize so shouldn't work on 32-bit linux/mac either.

Since SQLite doesn't have datetimes i used milliseconds since 1970, I did the maths on how much would fit inside an 8-byte integer (64-bit, max integer size in sqlite) which was well further than the 50th century. For a 4-byte/32-bit integer however this is not the case.

@johan-bjareholt johan-bjareholt force-pushed the dev/sqlite branch from 302d3bb to 6393abf May 14, 2018

@ActivityWatch ActivityWatch deleted a comment from codecov-io May 14, 2018

@codecov-io

This comment has been minimized.

Copy link

codecov-io commented May 14, 2018

Codecov Report

Merging #57 into master will decrease coverage by 0.12%.
The diff coverage is 93.58%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #57      +/-   ##
==========================================
- Coverage   94.16%   94.04%   -0.13%     
==========================================
  Files          31       33       +2     
  Lines        1372     1527     +155     
  Branches      222      230       +8     
==========================================
+ Hits         1292     1436     +144     
- Misses         41       47       +6     
- Partials       39       44       +5
Impacted Files Coverage Δ
aw_datastore/datastore.py 95.83% <ø> (-0.42%) ⬇️
aw_datastore/storages/memory.py 90.9% <100%> (-2.85%) ⬇️
aw_datastore/storages/mongodb.py 93.4% <100%> (-2.2%) ⬇️
aw_datastore/storages/__init__.py 100% <100%> (ø) ⬆️
aw_datastore/__init__.py 100% <100%> (ø) ⬆️
aw_datastore/storages/peewee.py 99.2% <100%> (+3.43%) ⬆️
aw_datastore/storages/abstract.py 96.87% <50%> (-3.13%) ⬇️
aw_datastore/migration.py 73.52% <73.52%> (ø)
aw_datastore/storages/sqlite.py 98.57% <98.57%> (ø)
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 37d850e...3531010. Read the comment docs.

@johan-bjareholt

This comment has been minimized.

Copy link
Member

johan-bjareholt commented May 14, 2018

It works! 🎉

@johan-bjareholt johan-bjareholt force-pushed the dev/sqlite branch from 6393abf to c9cd5de May 24, 2018

@ErikBjare

This comment has been minimized.

Copy link
Member

ErikBjare commented Jun 15, 2018

As I wrote in chat: Consider switching to one table per bucket.

@ErikBjare ErikBjare referenced this pull request Jun 15, 2018

Open

Improved coverage #62

@ahnlabb
Copy link

ahnlabb left a comment

Really good work, there are some things we need to discuss and some things I think should change before this is merged but even now it is great.

c = self.conn.cursor()
res = c.execute("SELECT * FROM buckets WHERE id = ?", [bucket_id])
row = res.fetchone()
bucket = {

This comment has been minimized.

@ahnlabb

ahnlabb Jun 20, 2018

This should be equivalent to:

bucket = dict(zip(row.keys(), row))

I understand if you prefer what you've written but thought I'd mention it.

"""

INDEX_EVENTS_TABLE = """
CREATE INDEX IF NOT EXISTS event_index ON events(bucket, starttime, endtime);

This comment has been minimized.

@ahnlabb

ahnlabb Jun 20, 2018

This does not look like the correct index for our queries, we should discuss this in further detail.

c = self.conn.cursor()
starttime = event.timestamp.timestamp() * 1000000
endtime = starttime + (event.duration.total_seconds() * 1000000)
datastr = json.dumps(event.data)

This comment has been minimized.

@ahnlabb

ahnlabb Jun 20, 2018

Consider doing sqlite3.register_adapter(dict, json.dumps), then you can just remove all these conversions.
If you want to be more declarative when you read as well you can do:

self.conn = sqlite3.connect(filepath, detect_types=sqlite3.PARSE_DECLTYPES) # row 59
"datastr dict NOT NULL, "# row 38
sqlite3.register_converter('dict', json.reads) # beggining of __init__
def insert_one(self, bucket_id: str, event: Event) -> Event:
c = self.conn.cursor()
starttime = event.timestamp.timestamp() * 1000000
endtime = starttime + (event.duration.total_seconds() * 1000000)

This comment has been minimized.

@ahnlabb

ahnlabb Jun 20, 2018

Would be nice if this could be simplified.

endtime = event.timestamp + event.duration

then like I describe below, register an adapter

sqlite3.register_adapter(datetime.datetime, lambda ts: ts.timestamp()) # Multiply with 1000000 if you want or just save as REAL
query = "INSERT INTO EVENTS(bucket, starttime, endtime, datastr) " + \
"VALUES (?, ?, ?, ?)"
self.conn.executemany(query, event_rows)
if len(event_rows) > 50:

This comment has been minimized.

@ahnlabb

ahnlabb Jun 20, 2018

When will this finally commit? What guarantees do we provide?

datastr = json.dumps(event.data)
query = "UPDATE events " + \
"SET starttime = ?, endtime = ?, datastr = ? " + \
"WHERE endtime = (SELECT max(endtime) FROM events WHERE bucket = ?) AND bucket = ?"

This comment has been minimized.

@ahnlabb

ahnlabb Jun 20, 2018

This might not be optimal, bucket first?

endtime_i = endtime.timestamp()*1000000 if endtime else MAX_TIMESTAMP
query = "SELECT count(*) " + \
"FROM events " + \
"WHERE bucket = ? AND endtime >= ? AND starttime <= ?"

This comment has been minimized.

@ahnlabb

ahnlabb Jun 20, 2018

Reminder to discuss this when we discuss indexing.

@ErikBjare

This comment has been minimized.

Copy link
Member

ErikBjare commented Jul 2, 2018

What are the action points for this PR?

Just talked briefly to @ahnlabb, what I gathered:

  • Enable WAL (done in ebd3a5e)
  • Fix the indexes (done in ebd3a5e)
  • Disable the lazy commit? Or at least make it toggleable with a flag/parameter (probably off by default)
    • Also possibly have a thread running that auto-commits every n seconds to have some guarantees
    • Edit: I saw you fixed this with conditional_commit, looks acceptable to me.

And my own thought:

  • Should we change the bucket_name column such that it uses the bucket rowid instead of the string name/key (would reduce database size a fair bit)

Some of the things that don't need a DB migration, could wait for after this PR.

@johan-bjareholt

This comment has been minimized.

Copy link
Member

johan-bjareholt commented Jul 2, 2018

@ErikBjare What's left is:

  • sqlite type converter/adapter for timestamp (started implementing this, not working and not pushed yet)
  • change the bucket_name column such that it uses the bucket rowid instead of the whole string (awesome idea)
@ErikBjare

This comment has been minimized.

Copy link
Member

ErikBjare commented Jul 2, 2018

Also, I configured codecov.io to not complain so much about coverage (@johan-bjareholt should be able to view the config here: https://codecov.io/account/gh/ActivityWatch/yaml). Hopefully the next commit will be green 👍

@ErikBjare

This comment has been minimized.

Copy link
Member

ErikBjare commented Sep 4, 2018

@johan-bjareholt Could I add a small request to this PR? I'd like for buckets to have a data property as well for arbitrary bucket-level metadata (JSON just as for events), would be needed for my syncing MVP. I figure it would be easier to add to this PR than to write a migration for the old Peewee store.

@ErikBjare

This comment has been minimized.

Copy link
Member

ErikBjare commented Sep 22, 2018

@johan-bjareholt I'd really like to merge this soon. It doesn't have to be set as the default yet (or be perfect), but getting rid of large PRs in aw-core is a blocker for ActivityWatch/activitywatch#227.

@johan-bjareholt

This comment has been minimized.

Copy link
Member

johan-bjareholt commented Sep 22, 2018

@ErikBjare Well, everything works well (including the importing) but it is not set as default yet.
The only thing missing is the type converter/adapter for timestamp, but that is not really necessary.

CREATE_EVENTS_TABLE = """
CREATE TABLE IF NOT EXISTS events (
id INTEGER PRIMARY KEY AUTOINCREMENT,
bucket TEXT NOT NULL,

This comment has been minimized.

@ErikBjare

ErikBjare Sep 22, 2018

Member

This is still TEXT though

This comment has been minimized.

@ErikBjare

ErikBjare Sep 22, 2018

Member

Doesn't seem like the change was made despite you checking the box above (I unchecked it)

This comment has been minimized.

@johan-bjareholt

johan-bjareholt Sep 23, 2018

Member

Ah, good point. Fixed

@johan-bjareholt

This comment has been minimized.

Copy link
Member

johan-bjareholt commented Nov 2, 2018

I'm starting to regularly getting "Database locked" errors in the peewee storage on my desktop (home folder is on a HDD and have 237k window events and 84k web events so it's quite slow).

This isn't an issue for aw-watcher-afk and aw-watcher-window, but for aw-watcher-web where the queue is only on the master branch but not on the published version this is an issue since the events are then dropped.

Will add the "data" field into the buckets as you suggested in #57 (comment), but can someone else also test this first so we can merge this then? We seriously need to get this merged!

@johan-bjareholt

This comment has been minimized.

Copy link
Member

johan-bjareholt commented Nov 2, 2018

Haven't tested the code in a while but did now, it's completely broken since the transaction and connection needs to be owned by a single thread but apparently some dependency (flask?) has updated so it uses multiple threads now.

sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread. 
The object was created in thread id 140137293559296 and this is thread id 140137240098560.

I also had a corrupt peewee database where timestamp sometimes was an int instead of a string (which is odd since peewee is supposed to guarantee types in the fields).
I had to add this patch the peewee get_event function like this to discard the corrupt events

+events = []
+for e in list(map(EventModel.json, q.execute())):
+    if type(e["timestamp"]) == int:
+        print(e)
+        break
+    events.append(Event(**e))
+return events
-return [Event(**e) for e in list(map(EventModel.json, q.execute()))]

Here's a log of where it migrates and prints the corrupt entries

2018-11-02 17:28:43 [INFO ]: Migrating database from peewee v2 to sqlite v1  (aw_datastore.migration:31)
2018-11-02 17:28:43 [INFO ]: Using database file: /home/johan/.local/share/activitywatch/aw-server/peewee-sqlite-testing.v2.db  (aw_datastore.storages.peewee:89)
2018-11-02 17:28:43 [INFO ]: Migrating bucket aw-watcher-afk_johan-desktop  (aw_datastore.migration:38)
{'id': 254479, 'timestamp': 1530639430153000, 'duration': 494.927, 'data': {'status': 'not-afk'}}
2018-11-02 17:28:45 [INFO ]: Migrating bucket aw-watcher-window_johan-desktop  (aw_datastore.migration:38)
{'id': 255096, 'timestamp': 1530639892986000, 'duration': 30.744, 'data': {'app': 'URxvt', 'title': 'johan@johan-desktop: /home/johan/Programming/activitywatch/aw-core'}}
2018-11-02 17:29:18 [INFO ]: Migrating bucket aw-watcher-web-firefox  (aw_datastore.migration:38)
2018-11-02 17:29:30 [INFO ]: Migrating bucket aw-watcher-web-chrome  (aw_datastore.migration:38)
2018-11-02 17:29:30 [INFO ]: Migrating bucket aw-watcher-vim_johan-desktop  (aw_datastore.migration:38)
2018-11-02 17:29:31 [INFO ]: Migrating bucket aw-watcher-vscode_johan-desktop  (aw_datastore.migration:38)
2018-11-02 17:29:31 [INFO ]: Migration of peewee v2 to sqlite v1 finished  (aw_datastore.migration:50)

So apparently there was only two corrupt entries (one in aw-watcher-afk and one in aw-watcher-window)

@ErikBjare

This comment has been minimized.

Copy link
Member

ErikBjare commented Nov 3, 2018

So apparently there was only two corrupt entries

No there wasn't, you should do a continue in the discard-invalid timestamp function, not break.

@johan-bjareholt

This comment has been minimized.

Copy link
Member

johan-bjareholt commented Nov 3, 2018

Ah, good point. Here are all of them

2018-11-03 19:25:48 [INFO ]: Migrating bucket aw-watcher-afk_johan-desktop  (aw_datastore.migration:38)
{'id': 254479, 'timestamp': 1530639430153000, 'duration': 494.927, 'data': {'status': 'not-afk'}}
{'id': 254481, 'timestamp': 1530639430153000, 'duration': 489.862, 'data': {'status': 'not-afk'}}
{'id': 254482, 'timestamp': 1530639430153000, 'duration': 479.773, 'data': {'status': 'not-afk'}}
{'id': 254484, 'timestamp': 1530639430153000, 'duration': 484.802, 'data': {'status': 'not-afk'}}
{'id': 254491, 'timestamp': 1530639430153000, 'duration': 474.745, 'data': {'status': 'not-afk'}}
{'id': 254495, 'timestamp': 1530639430153000, 'duration': 469.702, 'data': {'status': 'not-afk'}}
{'id': 254496, 'timestamp': 1530639430153000, 'duration': 461.605, 'data': {'status': 'not-afk'}}
{'id': 254502, 'timestamp': 1530639430153000, 'duration': 456.559, 'data': {'status': 'not-afk'}}
{'id': 254516, 'timestamp': 1530639430153000, 'duration': 448.687, 'data': {'status': 'not-afk'}}
{'id': 254519, 'timestamp': 1530639430153000, 'duration': 383.183, 'data': {'status': 'not-afk'}}
{'id': 255016, 'timestamp': 1530639430153000, 'duration': 377.333, 'data': {'status': 'not-afk'}}
{'id': 255017, 'timestamp': 1530639430153000, 'duration': 370.878, 'data': {'status': 'not-afk'}}
{'id': 255045, 'timestamp': 1530639430153000, 'duration': 364.423, 'data': {'status': 'not-afk'}}
{'id': 255046, 'timestamp': 1530639430153000, 'duration': 358.17, 'data': {'status': 'not-afk'}}
{'id': 255055, 'timestamp': 1530639430153000, 'duration': 351.715, 'data': {'status': 'not-afk'}}
2018-11-03 19:25:50 [INFO ]: Migrating bucket aw-watcher-window_johan-desktop  (aw_datastore.migration:38)
{'id': 255096, 'timestamp': 1530639892986000, 'duration': 30.744, 'data': {'app': 'URxvt', 'title': 'johan@johan-desktop: /home/johan/Programming/activitywatch/aw-core'}}
{'id': 255097, 'timestamp': 1530639892986000, 'duration': 25.673, 'data': {'app': 'URxvt', 'title': 'johan@johan-desktop: /home/johan/Programming/activitywatch/aw-core'}}
{'id': 255098, 'timestamp': 1530639892986000, 'duration': 20.633, 'data': {'app': 'URxvt', 'title': 'johan@johan-desktop: /home/johan/Programming/activitywatch/aw-core'}}
{'id': 255099, 'timestamp': 1530639892986000, 'duration': 15.556, 'data': {'app': 'URxvt', 'title': 'johan@johan-desktop: /home/johan/Programming/activitywatch/aw-core'}}
{'id': 255100, 'timestamp': 1530639892986000, 'duration': 10.46, 'data': {'app': 'URxvt', 'title': 'johan@johan-desktop: /home/johan/Programming/activitywatch/aw-core'}}
{'id': 255114, 'timestamp': 1530639892986000, 'duration': 5.388, 'data': {'app': 'URxvt', 'title': 'johan@johan-desktop: /home/johan/Programming/activitywatch/aw-core'}}
{'id': 255101, 'timestamp': 1530639883900000, 'duration': 5.109, 'data': {'app': 'URxvt', 'title': 'johan@johan-desktop: /home/johan/Programming/activitywatch/aw-core'}}
{'id': 255113, 'timestamp': 1530639883900000, 'duration': 5.109, 'data': {'app': 'URxvt', 'title': 'johan@johan-desktop: /home/johan/Programming/activitywatch/aw-core'}}
{'id': 255112, 'timestamp': 1530639880210000, 'duration': 0.0, 'data': {'app': 'URxvt', 'title': 'johan@johan-desktop: /home/johan/Programming/activitywatch/aw-core'}}
{'id': 255102, 'timestamp': 1530639871969000, 'duration': 7.239, 'data': {'app': 'URxvt', 'title': 'johan@johan-desktop: /home/johan'}}
{'id': 255111, 'timestamp': 1530639871969000, 'duration': 5.011, 'data': {'app': 'URxvt', 'title': 'johan@johan-desktop: /home/johan'}}
{'id': 255110, 'timestamp': 1530639793166000, 'duration': 0.0, 'data': {'app': 'VirtualBox', 'title': 'Create Virtual Hard Disk'}}
{'id': 255109, 'timestamp': 1530639785299000, 'duration': 0.0, 'data': {'app': 'VirtualBox', 'title': 'Create Virtual Hard Disk'}}
{'id': 255108, 'timestamp': 1530639783286000, 'duration': 0.0, 'data': {'app': 'VirtualBox', 'title': 'Create Virtual Hard Disk'}}
{'id': 255107, 'timestamp': 1530639780658000, 'duration': 0.0, 'data': {'app': 'VirtualBox', 'title': 'Create Virtual Hard Disk'}}
{'id': 255106, 'timestamp': 1530639778641000, 'duration': 0.0, 'data': {'app': 'VirtualBox', 'title': 'Create Virtual Hard Disk'}}
{'id': 255105, 'timestamp': 1530639776019000, 'duration': 0.0, 'data': {'app': 'VirtualBox', 'title': 'Create Virtual Hard Disk'}}
{'id': 255104, 'timestamp': 1530639774002000, 'duration': 0.0, 'data': {'app': 'VirtualBox', 'title': 'Create Virtual Hard Disk'}}
{'id': 255103, 'timestamp': 1530639771783000, 'duration': 0.0, 'data': {'app': 'VirtualBox', 'title': 'Create Virtual Hard Disk'}}
@johan-bjareholt

This comment has been minimized.

Copy link
Member

johan-bjareholt commented Jan 23, 2019

So, what's left to do in this review?

I'm pretty sure that those bad events were from when I was messing around with aw-server (all of them were from a 1h timeperiod a long time ago) so that really shouldn't happen in a real scenario. I have removed those events manually from my bucket and the migration works fine now.

We could merge this, let it not be enabled by default and then have more people test the migration code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment