Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPII-3646: Remove expired access tokens from the database #777

Merged
merged 27 commits into from Jun 10, 2019

Conversation

Projects
None yet
4 participants
@klown
Copy link
Contributor

commented Apr 16, 2019

JIRA: https://issues.gpii.net/browse/GPII-3646

@cindyli, @mrtyler,
Here is code for archiving and then flushing access tokens from the Couch database. Some discussion points follow.

First, the access tokens are filtered based on whether they have expired. That is, the ones that are still active are left in the database. However, there is an option to archive and flush all of them.

Secondly, the archive is pretty simple: it's a file that is appended to whenever this code is run. Its contents is an array of JSON objects, where each object is an access token. I suspect we want something more sophisticated, but I'm uncertain what. I could use some ideas from ops here.

Thanks.

klown added some commits Apr 10, 2019

GPII-3646: Improved append operation
Improved the way that new, to-be-archived access tokens are
appended to the existing archive.
@mrtyler

This comment has been minimized.

Copy link
Contributor

commented Apr 16, 2019

Secondly, the archive is pretty simple

Why do we want to keep old, expired tokens? How long do we want to keep them?

@gpii-bot

This comment has been minimized.

Copy link
Collaborator

commented Apr 16, 2019

@klown

This comment has been minimized.

Copy link
Contributor Author

commented Apr 16, 2019

Why do we want to keep old, expired tokens? How long do we want to keep them?

My understanding is for statistics -- who used the system, and when, how frequently, and ?? -- I don't know how exactly how much information one could get from an access token.

As for how long, I have no intuitions. This is likely a Gregg question.

@mrtyler

This comment has been minimized.

Copy link
Contributor

commented Apr 16, 2019

I guess we need more discussion to figure out the requirements but here are my first thoughts:

  1. GPII Cloud has exactly one official persistence layer: couchdb. If we want to keep it, that's where it goes. The ops team provides monitoring and backups for couchdb data.

  2. I agree that it is unclear what we hope to learn by analyzing expired access tokens.

  3. If you asked me to analyze some data, the first thing I would do is put it in a database so I can work with it. Moving the data out of the database and into a static JSON file seems like a step in the wrong direction.

  4. Given all this, my recommendation is to discard expired access tokens. If at a later time we have a real use case for post-hoc analysis, we can revisit this decision.

@klown

This comment has been minimized.

Copy link
Contributor Author

commented Apr 16, 2019

  1. GPII Cloud has exactly one official persistence layer: couchdb. If we want to keep it, that's where it goes. The ops team provides monitoring and backups for couchdb data.

I'm speculating, but it might be useful to separate the access tokens from the gpii database and move them to a access database. Then again, it's trivial to find them all -- there is a design view for that.

What triggered this was the proliferation of these tokens and worries about their taking up too much space. If we decide to simply delete them periodically, I can easily change the code here to do only that.

@gpii-bot

This comment has been minimized.

Copy link
Collaborator

commented Apr 16, 2019

@klown

This comment has been minimized.

Copy link
Contributor Author

commented Apr 16, 2019

More spit-balling: the current documentation on the access tokens suggests the kinds of statistics that could be gleaned from them.

Although they are commonly referred to as "access tokens", they are officially "GPII App Installation Authorizations". Relevant fields are:

  • The GPII Key that picks a specific preferences safe:
    • Sort of a user ID -- can track the usage of that key.
    • Can track how often a particular set of preferences is "used"**,
  • Whether this authorization has been revoked, why, and when -- track unauthorized access(?)
  • When created -- tracks how often this GPII key was used, and the interval between uses.

** where "used" means "retrieved to configure machine", or "accessed to modify the preferences set".

@gpii-bot

This comment has been minimized.

Copy link
Collaborator

commented Apr 17, 2019

@gpii-bot

This comment has been minimized.

Copy link
Collaborator

commented Apr 18, 2019

@klown

This comment has been minimized.

Copy link
Contributor Author

commented Apr 18, 2019

@mrtyler I've written a bash script that uses docker and the universal image to run the access token deletion script. It's useful to me for testing, at least. But it might be useful for the cron job you mentioned at the architecture meeting.

FYI: The default values of the environment variables are all based on what vagrantCloudBasedContainers.sh creates.

@gpii-bot

This comment has been minimized.

Copy link
Collaborator

commented Apr 22, 2019

@@ -0,0 +1,451 @@
/*!

This comment has been minimized.

Copy link
@cindyli

cindyli Apr 23, 2019

Contributor

Shall this script be removed if the decision is not to archive access tokens?

This comment has been minimized.

Copy link
@klown

klown Apr 23, 2019

Author Contributor

Strange ... I thought I used git mv to change the name of the script file. This file shouldn't exist anymore. Anyhow, I"ll remove it.

* @return {ResponseCallback} - Reponse callback function suitable for an http
* request.
*/
gpii.accessTokens.createResponseHandler = function (handleEnd, options, promise, errorMsg) {

This comment has been minimized.

Copy link
@cindyli

cindyli Apr 23, 2019

Contributor

It seems there are some functions shared between this script and scripts/deleteAndLoadSnapsets.js. This function is one of them. Can you please move shared utility functions into a separate file in the /shared sub-directory to reduce the duplicates. Thanks.

This comment has been minimized.

Copy link
@klown

klown Apr 23, 2019

Author Contributor

Sure.

klown added some commits Apr 23, 2019

GPII-3646: Refactored common database requests
Moved data base requests to the shared folder as a set of utility
functions and modified the deletion script to use them.
@gpii-bot

This comment has been minimized.

Copy link
Collaborator

commented Apr 23, 2019

@gpii-bot

This comment has been minimized.

Copy link
Collaborator

commented Apr 23, 2019

@gpii-bot

This comment has been minimized.

Copy link
Collaborator

commented Apr 23, 2019

GPII-3646: Refactored common database requests, part II
Modified deleteAndLoadSnapsets.js script to use the new database
utilities.
@gpii-bot

This comment has been minimized.

Copy link
Collaborator

commented May 8, 2019

GPII-3646: Fixed exit status on a request failure
Modified both the node and bash scripts to return/exit with a
non-zero exit status.
@gpii-bot

This comment has been minimized.

Copy link
Collaborator

commented May 9, 2019

node ${DELETE_ACCESS_TOKENS_JS} ${GPII_COUCHDB_URL} ${DELETE_ALL}
err=$?
if [ "${err}" != '0' ]; then
log "${DELETE_ACCESS_TOKENS_JS} failed with ${err}, exiting"

This comment has been minimized.

Copy link
@cindyli

cindyli May 9, 2019

Contributor

It might be easier to understand by saying "failed with the exit code ${err}" to indicate what this number is.

This comment has been minimized.

Copy link
@klown

klown May 9, 2019

Author Contributor

Okay. I also got rid of the trailing ", exiting". It's redundant.

@@ -88,7 +88,7 @@ gpii.dbRequest.configureStep = function (details, options) {
);
} else {
request = gpii.dbRequest.queryDatabase(
details.requestUrl, response, details.requestErrMsg
details.requestUrl, response, details.requestErrMsg, togo

This comment has been minimized.

Copy link
@cindyli

cindyli May 9, 2019

Contributor

Shall the similar error handling be added for gpii.dbRequest.createPostRequest?

This comment has been minimized.

Copy link
@klown

klown May 9, 2019

Author Contributor

Yes! How did I miss that? Thanks for catching it.

@gpii-bot

This comment has been minimized.

Copy link
Collaborator

commented May 9, 2019

@gpii-bot

This comment has been minimized.

Copy link
Collaborator

commented May 9, 2019

GPII-3646: Added CouchDB health check to the dataloader
Also fixed up some log messages and formatting.
@gpii-bot

This comment has been minimized.

Copy link
Collaborator

commented May 9, 2019

@gpii-bot

This comment has been minimized.

Copy link
Collaborator

commented May 21, 2019

@gpii-bot

This comment has been minimized.

Copy link
Collaborator

commented May 27, 2019

@klown

This comment has been minimized.

Copy link
Contributor Author

commented Jun 2, 2019

@mrtyler wrote (taken from a GPII-3888 comment):

If you want to make dockerDeleteExpiredAccessTokens.sh usable in both development and cloud environments, you could add a flag RUN_IN_DOCKER (defaulting to true), add the docker run parts of the command only if RUN_IN_DOCKER, and set RUN_IN_DOCKER to false in the chart.

I replied:

Working on it...

I changed my mind because it's semi self-referential. What I mean is ...

The current scenario:

  1. a docker container is run,
  2. the bash script is executed inside that docker container (dockerDeleteExpiredAccessTokens.sh)

If that same script is run outside of a docker container, but with the option that the script itself creates the docker container to run inside of, then you find yourself in the position of wondering what exactly do I run at this point in the script when I've launched the docker container?

Put another way: in the first scenario you run the script in a container. In the second scenario you are running inside the script already when you want to run something inside a docker container the script is launching.

There is a way to do it, but I think it makes the script overly complicated and difficult to follow. If we really want something that launches a docker container to run the script, then I think a better way is to write another script to do that (start a container and have it execute dockerDeleteExpiredAccessTokens.sh). That feels cleaner and more obvious in terms of what is going on.

Even so, I'm not sure this second scenario is all that useful. I have been working on this branch and its associates in gpii-ops (PR 383 and PR 22), and have yet to use this second scenario.

@klown klown changed the title GPII-3646: Archive and remove access tokens from the database GPII-3646: Remove expired access tokens from the database Jun 3, 2019

@gpii-bot

This comment has been minimized.

Copy link
Collaborator

commented Jun 5, 2019

var expiredTokens = [];
options.totalTokens = 0;
if (tokens.rows) {
fluid.each(tokens.rows, function (aRow) {

This comment has been minimized.

Copy link
@cindyli

cindyli Jun 5, 2019

Contributor

It would save a round of iteration by incorporating the 2nd loop at line 151, which marks the "deletion" flag, into this loop since the 2nd one is dealing with filtered access tokens anyway.

This comment has been minimized.

Copy link
@klown

klown Jun 5, 2019

Author Contributor

I've looked and it appears that gpii.dbRequest.markDocsForDeletion is used only by this loop in deleteExpiredAccessTokens.js. What do you think about simply removing the markDocsForDeletion() function from the gpii.dbRequest namespace?

This comment has been minimized.

Copy link
@cindyli

cindyli Jun 5, 2019

Contributor

I think it's a good idea.


GPII_APP_DIR=${GPII_APP_DIR:-"/app"}
GPII_COUCHDB_URL=${GPII_COUCHDB_URL:-"http://couchdb:5984/gpii"}
DELETE_ALL=${DELETE_ALL:-""}

This comment has been minimized.

Copy link
@cindyli

cindyli Jun 5, 2019

Contributor

As we chatted, let's explore the possibility of setting the default "DELETE_ALL" flag to false.

This comment has been minimized.

Copy link
@klown

klown Jun 6, 2019

Author Contributor

Given @mrtyler's response on PR 383, switching to an explicit false for a default is fine and can be used in the .yaml file in gpii-infra.

But, the actual optional argument for deleteAccessTokens.js is --deleteAll. Like all optional arguments, it's either "--deleteAll" or missing (the empty string). Hence, making the default argument "false" here entails translating it to the empty string, and I've added that logic.

This comment has been minimized.

Copy link
@klown

klown Jun 6, 2019

Author Contributor

Actually, after thinking about it for a bit, it doesn't entail transforming to the empty string, since "false" will not be seen as "--deleteAll" within deleteAccessTokens.js, and it will run properly (it won't delete all of the access tokens). Still it's somewhat cleaner to make the switch.

* request.
*/
gpii.dbRequest.createResponseHandler = function (handleEnd, options, promise, errorMsg) {
if (!errorMsg) {

This comment has been minimized.

Copy link
@cindyli

cindyli Jun 5, 2019

Contributor

These 3 lines can be simplified to errorMsg = errorMsg || "";

This comment has been minimized.

Copy link
@klown

klown Jun 6, 2019

Author Contributor

Ok

@gpii-bot

This comment has been minimized.

Copy link
Collaborator

commented Jun 6, 2019

* @param {Array} docs - Array of records to delete from the database.
* @return {Array} - the records marked for deletion.
*/
gpii.dbRequest.markDocsForDeletion = function (docs) {

This comment has been minimized.

Copy link
@cindyli

cindyli Jun 7, 2019

Contributor

This utility function can be removed.

This comment has been minimized.

Copy link
@klown

klown Jun 7, 2019

Author Contributor

Yup.

I thought I had, but I missed it. It's gone now.

@gpii-bot

This comment has been minimized.

Copy link
Collaborator

commented Jun 7, 2019

@cindyli cindyli merged commit f49316d into GPII:master Jun 10, 2019

1 check passed

default Build finished.
Details
@cindyli

This comment has been minimized.

Copy link
Contributor

commented Jun 10, 2019

Merged at 9bd021f

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.