-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delete server data after 21 days #115
Comments
should old data be archived somewhere. I'm a bit concerned that there could be accidental deletion of data with no way to recover. |
Can we have rolling backups that allow us to discard backup data when it's 21 days old? |
Since each datum has a timestamp of when it came into the server, we can:
1) only have the server return data with timestamps within 21 days (or
more probably configurable)
2) have the server delete data that is older than 42 days (or more
probably configurable)
…On Thu, May 7, 2020 at 11:55 AM jmday ***@***.***> wrote:
Can we have rolling backups that allow us to discard backup data when it's
21 days old?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#115 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAYRXCRG3H5WBQLM3HFZA3RQL7ZPANCNFSM4M24P4QA>
.
|
I'll let @mitra42 weight in as well, but I think it's important that we delete all data (including backed up data) within at most 45 days. Any data that is not being used (or returned) should really not be saved in the live system. |
agreed! In not saving the data in the live system (even if we don't return
it) before we delete, what are we trying to solve for? Depending on the
answer I would recommend different solutions. Note that if we save it
"off" system, then we have to figure out where, how it gets there, etc...
adds complexity.
I'm not necessarily advocating saving data for too long (I'm very aware of
the privacy concerns), but I want us to as thoroughly as possible vet
retention procedures.
…On Thu, May 7, 2020 at 1:14 PM jmday ***@***.***> wrote:
I'll let @mitra42 <https://github.com/mitra42> weight in as well, but I
think it's important that we delete all data (including backed up data)
within at most 45 days.
Any data that is not being used (or returned) should really not be saved
in the live system.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#115 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAYRXAOEVYF7I3SNQDQV3DRQMJDVANCNFSM4M24P4QA>
.
|
I'd suggest not retaining the data past some point, mostly because the assertion of privacy is important to back up. I'd suggest these are two values in the config file, probably 21 days for live data (that we return in response to a query) and 45 days for full deletion (not backed up). Note the live data could actually be kept for a MUCH shorter time (as little as 2 days) since |
yup
…On Thu, May 7, 2020 at 2:23 PM Mitra Ardron ***@***.***> wrote:
I'd suggest not retaining the data past some point, mostly because the
assertion of privacy is important to back up. I'd suggest these are two
values in the config file, probably 21 days for live data (that we return
in response to a query) and 45 days for backup. Note the live data could
actually be kept for a MUCH shorter time (as little as 2 days) since
a) active clients poll for it regularly, so we are really only trying to
allow a client to catch up after its been offline.
b) new clients cant get anything useful since they don't have a location
or id history to compare against the old data.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#115 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAYRXC5TVDU7B6OUCDLBULRQMRFRANCNFSM4M24P4QA>
.
|
Suggest at least 14 days for live data. This will ensure the servers have any data necessary to inform Safe Score calculations, even if someone has not had signal for 14 days (such as some of the remote communities we are seeking to serve). |
OK - I'll take this |
|
Twisted makes it easy to run scheduled tasks in the server. I think it makes most sense to do it there. |
Ok, done and fixed the timing issues - PR submitted |
I made some signficant code changes, which adds a serial number to the item (so you don't have to do the clock hack). I also moded the deletion code to move the actual file deletes to a thread. The tests pass, but I think you might want to take a look at the update code and make sure I got it right. FYI, file names are now of the format KEY:FLOATING_TIME:SERIAL_NUMBER.data |
Ok - but can we not change the file name format any more ! Its not pulled out into separate functions and there is code in multiple places going from values to filenames and back to indexes making changes such as this likely to break stuff in other places. |
Also - this version is failing tests - I can't figure out the code changes so I think it will have to be you to find the problem. (Note the tests were all working pre refactor) |
all good to go now, yes i agree that file names formats should be centralized to one place, I don't see a problem with changing formats as long as the code supports the old formats. happy to discuss though. |
I think this is complete now - unless there is refactoring to happen ? |
we don't have the 14 day live window yet, let's keep this open until we do. |
You mean that when a request for a set of locations comes in, then it should only return those after a certain time ? |
If so, then that is worth its own Issue - and I can tackle it. |
ok |
Need to make sure to delete old data
The text was updated successfully, but these errors were encountered: