Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Syncing #35

Open
ErikBjare opened this issue Mar 18, 2017 · 88 comments
Open

Syncing #35

ErikBjare opened this issue Mar 18, 2017 · 88 comments
Assignees
Labels
bounty An issue or PR with a bounty for solving improves: ux size: large type: enhancement
Projects
Milestone

Comments

@ErikBjare
Copy link
Member

ErikBjare commented Mar 18, 2017

Vote on this issue on the forum!


There are two usage issues with ActivityWatch at the moment to which syncing is a solution:

  • If you use more than one device, you need to check every device individually, or run one centralized instance of aw-server (not recommended!)
  • If a machine is lost, so is the data (the user could have exported it, but data stored after the export would still be lost). While ActivityWatch cannot replace a proper backup system, syncing could help by storing copies of the data across devices.

I know of two interesting solutions to this problem:

  • Centralized server which stores all data encrypted (the server is unable to decrypt)
  • P2P synchronization (encrypted, possibly including relays)
    • Done by @syncthing very well, perhaps we could use it in some way. Also: MPL2 licensed and written in Go.
      • Downside: Clients must be online at the same time for sync.
      • They have the ability to set some folders to "read only", useful when you want to ensure the data stays intact in its source.
    • Implementing it ourselves would be an enormous effort, I assume.

Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

@ErikBjare
Copy link
Member Author

ErikBjare commented Mar 28, 2017

@calmh might know a thing or two about using Syncthing in an application-specific context like this. I haven't seen it done before so we might want to check with him before we start.

I've taken a look at the arguments to Syncthing and found -home which can be used to set a custom configuration directory. So pretty promising.

@ErikBjare
Copy link
Member Author

ErikBjare commented Mar 29, 2017

I've started prototyping something small here: https://github.com/ActivityWatch/aw-syncthing/

Could be made to work both with standalone Syncthing and bundled Syncthing, but standalone would probably be preferred due to the dependency on the Python package syncthing which targets a specific version (right now targets 0.14.24 and the latest is 0.14.25).

What it does:

  • Moves the database file to a specific location
  • Creates a symlink from the new to the old location (so aw-server will just follow the symlink to the file)
  • Starts Syncthing with a custom configuration directory
  • Configures Syncthing via the REST API to add the new database folder as a synced folder
  • Add another device to sync the folder with

@calmh
Copy link

calmh commented Mar 29, 2017 via email

@ErikBjare
Copy link
Member Author

@calmh: Awesome! I'll let you know when we have a working release.

@ErikBjare
Copy link
Member Author

ErikBjare commented Jun 15, 2017

I've started using Standard Notes recently (finally getting off Evernote) and have been impressed by the architecture. They have designed a neat data format/server called Standard File that defines how data should be encrypted and stored both client-side and server-side. Definitely something to check out.

Edit: It's interesting, but I'd rather have it distributed than just decentralized.

@ErikBjare
Copy link
Member Author

ErikBjare commented Aug 24, 2017

I've been thinking about this a bit more.

My current idea is to simply configure a folder as a synced databases-folder. Basically aw-server would copy local data to this folder on a regular basis.

This folder could then be synced with Syncthing, Dropbox, or Gdrive (we should probably explicitly recommend Syncthing). The synced database files would not be allowed to be modified from another host then the one who owns them, since such changes could cause syncing conflicts.

Potential problems:

  • It would be nice to have the synced databases encrypted
  • Compressing them would also lead to huge storage savings

@ErikBjare
Copy link
Member Author

ErikBjare commented Sep 13, 2017

Reddittracker turned up this today. Makes it pretty clear that sync is a vital feature for most users.

The best part is that you can put it on all computers (home and work) and on a smartphone. It'll track the software and sites you use on all of them and aggregate it to one account.

@hippylover
Copy link

Would be nice if this is implemented that it doesn't add to the system requirements to run the program. So for people who don't need the functionality or would rather just set up a cron job to copy it to a remote server manually can still disable the feature.

@ErikBjare
Copy link
Member Author

@hippylover Noted! Thanks for the feedback.

@brizzbane
Copy link

I googled activitywatch + backup. Trying to locate where the data is stored. Would be really nice to be able to 'set' where the data is stored.

Backup solution I use--is to put important stuff I'm working on under Dropbox or MEGA. I'm under linux...and I actually add a home directory, so that I ...guess it makes me more 'aware' that it's Dropbox data.

I just read through the above comments--supposedly MEGA is end to end encrypted. I started using because more free data.. but it has the bonus of not having to mess with an encryption solution if you want to store it encrypted.

@1000i100
Copy link

1000i100 commented Apr 6, 2018

your sync look like auto-backup to me (or i've miss understoud)

how do you merge activity from multiples devices ?

if i was in charge, i probably use git as sync/merge tool if the data are stored in plain text files.
But i've not explore your code base to make me an idea if it's a good way or not for this projet.

@johan-bjareholt
Copy link
Member

@1000i100 The difference between sync and auto-backup would be that in auto-backup there's a definition of a producer and a consumer while in sync it doesn't, and by that definition we might actually refer to auto-backup yes.

Merging activity from multiple devices is not an issue as long as the one device you are requesting data from has the data for all the devices you want to view. Each kind of data is separated by activity type per each host which we call buckets.

Plaintext is simply not scalable and therefore git is out of the question. If we have 500MB of data and convert it back and forth between a database and plaintext file it would be incredibly slow.

@ErikBjare
Copy link
Member Author

Started working on something small as an experiment: ActivityWatch/aw-server#50

@madumlao
Copy link

raises hand

Just wondering - isn't the storage a database? syncthing doesn't handle database syncing.

@johan-bjareholt
Copy link
Member

@madumlao I don't get that either, syncthing syncs file by file and it is near impossible to do a diff of a binary sqlite fine. The database can easily grow past 100MB and it's not viable to sync such a large file frequently.

@ErikBjare
Copy link
Member Author

@madumlao Correct, but the database is stored in a file, which can be synced.

@johan-bjareholt Syncthing is smart enough to not sync the entire file if only parts of it have changed, see: https://forum.syncthing.net/t/noob-question-incremental-sync/1030/17

@johan-bjareholt
Copy link
Member

@ErikBjare Oh nice. Googled a bit on the sqlite database files and they seem to be paged so that should be fine then. I just assumed that it was as bad as git when comparing binaries but apparently they have solved that issue.

@johan-bjareholt
Copy link
Member

Would syncing with syncthing also mean that we will have multiple database files? In that case we might need a lot of refactoring.

@madumlao
Copy link

madumlao commented Nov 28, 2018

@ErikBjare I'm not convinced that an SQLite db will survive syncthing. At best case you'll lose transactions done on one side, at worse case you'll have a mispaired hot journal which will corrupt the whole db. Effectively, if an aw-server process is running on two machines there's going to be contention.

https://www.sqlite.org/howtocorrupt.html

The only way that syncthing, rsync, or similar process is going to be "safe" is if each transaction is a separate file, but I guarantee that that's going to be bad. You really need to implement some kind of peer to peer syncing db, such as for example, a multi-master LDAP.

@ErikBjare
Copy link
Member Author

ErikBjare commented Nov 28, 2018

@johan-bjareholt Yes, each instance would write to its own file in the synced folder(s) (there are some benefits to having one Syncthing-folder per instance, as Syncthing can enforce "master copies" preventing accidental deletion/corruption on other machines). An instance would therefore have read-only access to database files from remote machines. I don't think this requires any major refactoring.

@madumlao I am aware, I'm not proposing we sync a single sqlite database file.

I thought I had mentioned it in the issue before, but I realize now that I hadn't. Hopefully this should clear things up: I'm not proposing two-way sync in the sense that you can edit remote buckets, only read them (and create copies, which you could in turn modify).

@madumlao
Copy link

madumlao commented Nov 29, 2018

I see. A full-on p2p system would be very much appreciated. I have a case where I have multiple laptops / devices that all move around. Unless I set up a single server and configured all clients (including firefox extensions etc) to talk to that server, my activity watchers will all have gaps in activitytracking, defeating the purpose of review.

Ideally a user who has multiple devices can transfer in between devices with little setup, and the tracking will follow them throughout.

Maybe the laziest / easiest way to do this without major rearchitecting is to use periodic "sync checkpoints", which would basically:

  1. generate periodic sqlite dumps into some shared syncthing
  2. upon startup (or periodically), check the shared syncthing folder for all sqlite dumps made by other nodes and import any transaction later than the "last remote transaction synced"
  3. write down the "last remote transaction synced" somewhere for tracking

Could be implemented as a separate watcher-like process.

(My assumption is that tracking events are largely just additive transactions, there is little editing done)

By the way, I have no idea where the sqlite database is saved. Any pointers?

@ErikBjare
Copy link
Member Author

@madumlao That's almost the exact design I had in mind for the MVP, nice to see we arrived at the solution independently!

We use appdirs to manage files like the database, caches, and logs. So check /home/<USER>/.local/share/activitywatch/aw-server if you're on Linux, or the appdirs documentation for user_data_dir otherwise.

@x-ji
Copy link

x-ji commented Dec 19, 2018

Just to be sure, there is currently no across-device syncing available yet, right? If so, once syncing available I'd gladly switch from RescueTime. I constantly switch across different computers.

@johan-bjareholt
Copy link
Member

@x-ji No it's sadly not available yet.

@supertinou
Copy link

@ErikBjare any way we can help you with this feature? ❤️

@BeatLink
Copy link

Decsync is a P2P syncing library built directly around using syncthing.

@ErikBjare
Copy link
Member Author

@BeatLink DecSync looks interesting, thanks for the tip! Lots of similarities to the sync MVP.

@contrun
Copy link

contrun commented Sep 23, 2021

It may take a while for us to get to decentralized synchronization. Meanwhile, can we setup a centralized server, which receive data from various devices? Yeah, we need authorization. What about using something like basic auth for that?

@WantToLearnJapanese

This comment has been minimized.

@ActivityWatch ActivityWatch locked and limited conversation to collaborators Oct 4, 2021
@ErikBjare
Copy link
Member Author

ErikBjare commented Oct 4, 2021

To summarize the status: there's a fair bit of work done, but it needs testing, review, and refactoring.

The reason why it's taking this long is due to limited time (mostly due to my thesis work) and other issues have taken priority. Once I'm done with my thesis, I'll have more time to work on ActivityWatch (and hopefully get to this issue).

I've also recently started hiring freelancers to help with development, which will hopefully lead to some progress on this issue.

To people suggesting centralized sync solutions as a stop-gap in the meantime (in addition to #35 (comment)): they are not faster to build ("sync with remote server" is at least as difficult as "sync with folder"), but if you really want to you can set up ActivityWatch to send events to a remote server (no real sync taking place, events are sent directly to the remote without a local server), as described in the docs: https://docs.activitywatch.net/en/latest/remote-server.html

The last merged PR (which needs continued work): ActivityWatch/aw-server-rust#89

You can help by:

  • looking at that PR and related code (after reading this issue to understand the process)
  • testing it out
  • make PRs with improvements.

You can get paid for working on all of these, if you can show the related ActivityWatch events! (cc @supertinou)

After the sync itself is done, there's then a bunch of issues around buckets with non-unique ID's (like the web watcher), and hostnames not being set for some buckets (like the web watcher). And then, finally, to merge analysis results such that data from several devices can be combined and shown in a single view.

Limiting comments to collaborators due to lots of comments.

@ErikBjare ErikBjare added bounty An issue or PR with a bounty for solving type: enhancement labels Oct 4, 2021
@ErikBjare ErikBjare self-assigned this Mar 24, 2022
@ErikBjare ErikBjare removed the backlog label Mar 24, 2022
@ActivityWatch ActivityWatch unlocked this conversation Mar 24, 2023
@ErikBjare
Copy link
Member Author

I'm looking for more people to try out the WIP sync implementation. Try to get it working, find some bugs, fix some bugs, etc.

If you feel up to the task of writing a little bit of Rust and helping out with an important feature, please give it a shot!

@OGoodness
Copy link

Is there a timeline for the feature? Understandably, not looking for any hard commitments, just wondering how things are looking given the current state.

@rolltidehero
Copy link

Is there a timeline for the feature? Understandably, not looking for any hard commitments, just wondering how things are looking given the current state.

First off — I really appreciate the work you guys put into this. But I would also like to know if the syncing feature will be coming soon? I would def be willing to beta test if needed. Unfortunately, I don't know Rust, so I'm unable to help with the coding.

@Andrew-Pynch
Copy link

Is there a timeline for the feature? Understandably, not looking for any hard commitments, just wondering how things are looking given the current state.

First off — I really appreciate the work you guys put into this. But I would also like to know if the syncing feature will be coming soon? I would def be willing to beta test if needed. Unfortunately, I don't know Rust, so I'm unable to help with the coding.

+1 to this. Although I do know Rust and would love to help. Is this feature even being pursued still?

@dada051
Copy link

dada051 commented Nov 9, 2023

Distributed sync would be a great feature. But is that not easier to host a dockerized version of ActivityWatch that will use MariaDB? It will allow easy backup (MariaDB is easy to backup, can be replicated if needed etc...). A lot of softwares/services work in this way

@axmachina
Copy link

How about syncing to one's social account like Google?

@ErikBjare
Copy link
Member Author

I just updated the aw-sync README with proper usage instructions, for those bold enough to try it out.

Still probably rough around the edges, but I've had it working for a while now.

Please give it a try, report issues, and submit PRs!

@OGoodness No timelines. I work on it when I find the spare time. It's ready when it's ready :)
@rolltidehero Time for you to try it out now! See if you can follow the README.
@Andrew-Pynch I've been busy, things have been slowly moving forward for years, and but is finally nearing fruition! I just really need other people to test it and give their feedback, which has been harder than I thought.

Shoutout to @nathanmerrill for his awesome PR improving error handling and other stuff in the syncing code: ActivityWatch/aw-server-rust#437

Thank you all for your kind expressions of impatience and overall understanding. I hope I get to give you all what you are waiting for soon :)

@Hananel-Hazan
Copy link

I would love to test this feature, can you please compile it for windows?

@Terrance
Copy link

Terrance commented Mar 3, 2024

Is this coming to Android too? My main use case for syncing would be pulling AW data from my phone, both to back it up and to hopefully visualise it on desktop rather than fumbling around with the UI on mobile.

As far as I can tell, the beta builds don't include Android, as that's maintained separately but only released occasionally?

E: Looks like this has been requested already: ActivityWatch/aw-android#107

@asandikci
Copy link

It would be good to be able to self-host and store all data in our own server (centralized) as encyrpted

vasuemme111 pushed a commit to vasuemme111/activitywatch that referenced this issue Mar 26, 2024
@ethnh
Copy link

ethnh commented May 28, 2024

veilid is p2p, secure, mobile-first and has python bindings here: https://gitlab.com/veilid/veilid/-/tree/main/veilid-python?ref_type=heads
demo here: https://gitlab.com/veilid/python-demo

@ethnh
Copy link

ethnh commented May 28, 2024

file upload example (very old) : http://vdrop.link

@ErikBjare
Copy link
Member Author

The latest version v0.13.1 that was released last week now ships with aw-sync bundled! 🎉

It is not enabled by default, so you need to manually start it in the trayicon menu, or modify the aw-qt.toml config file to autostart it when ActivityWatch starts.

I hope this gets more people to try it out and give their feedback. It's not guaranteed to be stable, but it works.

I'm still working out how to bring this to Android.

@tippfehlr
Copy link

image

here is my timeline - it works wonderfully for me, syncing with syncthing.

I think it took a day to appear in the web ui (I have no data on this), but because the data can’t conflict it just works for now.

the ui could be polished (eg. visually grouping the devices in timeline mode and much more) but that is a topic on its own.

Thanks for making this work!!

@brimwats
Copy link

brimwats commented Jul 7, 2024

It has been working well for me using syncthing and syncing both to the same folder; I haven't quite figured out getting both on the same timeline, but I can switch between both easily

image

@Terrance
Copy link

Terrance commented Jul 8, 2024

Not sure if the same folks are watching all the AW repositories, but I've logged a sync-related issue ActivityWatch/aw-qt#105 where sync (along with the input watcher, another new module?) doesn't load at startup, unlike the other modules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bounty An issue or PR with a bounty for solving improves: ux size: large type: enhancement
Projects
Road to 1.0
  
To do
Status: In Progress
Development

No branches or pull requests