Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete server data after 21 days #115

Closed
mitra42 opened this issue May 7, 2020 · 21 comments
Closed

Delete server data after 21 days #115

mitra42 opened this issue May 7, 2020 · 21 comments
Assignees

Comments

@mitra42
Copy link
Collaborator

mitra42 commented May 7, 2020

Need to make sure to delete old data

@danaronson
Copy link
Collaborator

should old data be archived somewhere. I'm a bit concerned that there could be accidental deletion of data with no way to recover.

@jmday
Copy link
Collaborator

jmday commented May 7, 2020

Can we have rolling backups that allow us to discard backup data when it's 21 days old?

@danaronson
Copy link
Collaborator

danaronson commented May 7, 2020 via email

@jmday
Copy link
Collaborator

jmday commented May 7, 2020

I'll let @mitra42 weight in as well, but I think it's important that we delete all data (including backed up data) within at most 45 days.

Any data that is not being used (or returned) should really not be saved in the live system.

@danaronson
Copy link
Collaborator

danaronson commented May 7, 2020 via email

@mitra42
Copy link
Collaborator Author

mitra42 commented May 7, 2020

I'd suggest not retaining the data past some point, mostly because the assertion of privacy is important to back up. I'd suggest these are two values in the config file, probably 21 days for live data (that we return in response to a query) and 45 days for full deletion (not backed up). Note the live data could actually be kept for a MUCH shorter time (as little as 2 days) since
a) active clients poll for it regularly, so we are really only trying to allow a client to catch up after its been offline.
b) new clients cant get anything useful since they don't have a location or id history to compare against the old data.

@danaronson
Copy link
Collaborator

danaronson commented May 7, 2020 via email

@jmday
Copy link
Collaborator

jmday commented May 9, 2020

Suggest at least 14 days for live data. This will ensure the servers have any data necessary to inform Safe Score calculations, even if someone has not had signal for 14 days (such as some of the remote communities we are seeking to serve).

@mitra42
Copy link
Collaborator Author

mitra42 commented May 9, 2020

OK - I'll take this

@mitra42 mitra42 self-assigned this May 9, 2020
@mitra42
Copy link
Collaborator Author

mitra42 commented May 9, 2020

  • Add to config
  • Add a task to be run periodically to delete any data earlier than now-(configured days)
  • figure out how to run this under cron or under whatever mechanism runs sync

@danaronson
Copy link
Collaborator

Twisted makes it easy to run scheduled tasks in the server. I think it makes most sense to do it there.

@mitra42
Copy link
Collaborator Author

mitra42 commented May 9, 2020

Ok, done and fixed the timing issues - PR submitted

@danaronson
Copy link
Collaborator

I made some signficant code changes, which adds a serial number to the item (so you don't have to do the clock hack). I also moded the deletion code to move the actual file deletes to a thread. The tests pass, but I think you might want to take a look at the update code and make sure I got it right. FYI, file names are now of the format KEY:FLOATING_TIME:SERIAL_NUMBER.data

@mitra42
Copy link
Collaborator Author

mitra42 commented May 9, 2020

Ok - but can we not change the file name format any more ! Its not pulled out into separate functions and there is code in multiple places going from values to filenames and back to indexes making changes such as this likely to break stuff in other places.

@mitra42
Copy link
Collaborator Author

mitra42 commented May 10, 2020

Also - this version is failing tests - I can't figure out the code changes so I think it will have to be you to find the problem. (Note the tests were all working pre refactor)

@danaronson
Copy link
Collaborator

all good to go now, yes i agree that file names formats should be centralized to one place, I don't see a problem with changing formats as long as the code supports the old formats. happy to discuss though.

@mitra42
Copy link
Collaborator Author

mitra42 commented May 13, 2020

I think this is complete now - unless there is refactoring to happen ?

@mitra42 mitra42 closed this as completed May 13, 2020
@danaronson
Copy link
Collaborator

we don't have the 14 day live window yet, let's keep this open until we do.

@danaronson danaronson reopened this May 13, 2020
@mitra42
Copy link
Collaborator Author

mitra42 commented May 13, 2020

You mean that when a request for a set of locations comes in, then it should only return those after a certain time ?

@mitra42
Copy link
Collaborator Author

mitra42 commented May 13, 2020

If so, then that is worth its own Issue - and I can tackle it.

@danaronson
Copy link
Collaborator

ok

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants