Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPII-3138: Update snapsets in data base #626

Merged
merged 51 commits into from Oct 17, 2018

Conversation

Projects
None yet
8 participants
@klown
Copy link
Contributor

klown commented Jul 10, 2018

@cindyli Here is the pull request that goes with the one in gpii-dataloader, gpii-ops/gpii-dataloader#6

cindyli and others added some commits Jun 27, 2018

GPII-3138: Update snapsets in the data base
Modified vagrantCloudBasedContainers.sh script to make use of
the changes in the GPII-3138 branch of gpii/gpii-dataloader
GPII-3138: Update snapsets in the data base
Moved deleteSnapset.js from gpii-dataloader to universal's
script folder.
GPII-3138: Update snapsets in the data base
Fixed some (grievous) typos.
GPII-3138: Update snapsets in the data base
Fixed erroneous call to fluid.error() -- replaced with fluid.log().
GPII-3138: Update snapsets in the data base
Removed hard-coded host ("localhost") and port for the CouchDB
URL and used the actual value passed in on the command line.
GPII-3138: Update snapsets in the data base
- Replaced all occurrences of forEach() with fluid.each().
- Properly set up shell environment variable NODE_PATH.
@gpii-bot

This comment has been minimized.

Copy link
Collaborator

gpii-bot commented Jul 10, 2018

var dbLoader = gpii.dataLoader;
dbLoader.couchDbUrl = process.argv[2];
if (!fluid.isValue(dbLoader.couchDbUrl)) {
fluid.log ("COUCHDB_URL environment variable must be defined");

This comment has been minimized.

@cindyli

cindyli Jul 10, 2018

Contributor

Would be useful to also output the command usage: node deleteSnapsets.js $COUCHDBURL

This comment has been minimized.

@klown

klown Jul 20, 2018

Author Contributor

Done.

}
fluid.log("COUCHDB_URL: '" + dbLoader.couchDbUrl + "'");
dbLoader.prefsSafesViewUrl = dbLoader.couchDbUrl + "/_design/views/_view/findSnapsetPrefsSafes";
dbLoader.gpiiKeyViewUrl = dbLoader.couchDbUrl + "/_design/views/_view/findGpiiKeysByPrefsSafeId";

This comment has been minimized.

@cindyli

cindyli Jul 10, 2018

Contributor

What do you think to define the full url here to provide a general view: dbLoader.gpiiKeyViewUrl = dbLoader.couchDbUrl + "/_design/views/_view/findGpiiKeysByPrefsSafeId%22%gpiiKey%22". When this var is used later on, replace %gpiiKey with actual key values using fluid.stringTemplate().

This comment has been minimized.

@klown

klown Jul 20, 2018

Author Contributor

Since we now use the get-all-gpiikeys view, this is no longer needed -- "overtaken by events".

dbLoader.addGpiiKeysAndBulkDelete = function (snapSets, docsToRemove) {
fluid.each(snapSets, function (aSnapset) {
var gpiiKeyId = aSnapset.value._id;
fluid.log("Snapset: " + gpiiKeyId);

This comment has been minimized.

@cindyli

cindyli Jul 10, 2018

Contributor

Output a more meaningful message such as "Finding GPII keys associated with the snapset prefs safe id: ".

This comment has been minimized.

@klown

klown Jul 20, 2018

Author Contributor

I've updated all the log messages to be very simple, such as you have suggested. Actual snapset or gpiikeys internal information such as the "_id" field are no longer logged.

};

/**
* Delete the snapset Prefs Safes and their associated GPII Keys.

This comment has been minimized.

@cindyli

cindyli Jul 10, 2018

Contributor

This function only deletes prefs safes, not gpii keys.

This comment has been minimized.

@klown

klown Jul 20, 2018

Author Contributor

With the use of promises, this now does delete all prefs safes and associated gpii keys.

fluid.log("STATUS: " + res.statusCode);
fluid.log("HEADERS: " + JSON.stringify(res.headers, null, 2));
res.on('end', function () {
fluid.log('Batch deletion of snapsets');

This comment has been minimized.

@cindyli

cindyli Jul 10, 2018

Contributor

snapsets -> snapset prefs safes.

This comment has been minimized.

@klown

klown Jul 20, 2018

Author Contributor

Due to the refactoring, the log message includes "Prefs Safes", and also "GPII Keys".

batchDeleteReq.end();
};

dbLoader.snapSetsRequest = http.request(dbLoader.prefsSafesViewUrl, dbLoader.processSnapsets);

This comment has been minimized.

@cindyli

cindyli Jul 10, 2018

Contributor

The couchdb url on the GPII cloud might be using https instead http. Will find out by testing with the developer cloud.

This comment has been minimized.

@klown

klown Jul 20, 2018

Author Contributor

Any progress on this, @cindyli ?

This comment has been minimized.

@cindyli

cindyli Jul 20, 2018

Contributor

Sorry about the confusion. I meant YOU will find out when testing with the cloud, not me. It's also worth to check with @mrtyler on this information since production/staging might act differently than dev clusters.

});
getGpiiKeysRequest.end();
});
dbLoader.doBatchDelete(docsToRemove);

This comment has been minimized.

@cindyli

cindyli Jul 10, 2018

Contributor

Deleting all prefs safe records here at the end of the script doesn't guarantee it to be run as the last task due to the aync nature of http requests above for getting and deleting gpii keys.

This comment has been minimized.

@klown

klown Jul 20, 2018

Author Contributor

Right, but it's another instance of "overtaken by events": The latest code handles the asynchrony.

var gpiiKeyId = aSnapset.value._id;
fluid.log("Snapset: " + gpiiKeyId);
var gpiiKeyViewUrl = dbLoader.gpiiKeyViewUrl + "?key=%22" + gpiiKeyId + "%22";
var getGpiiKeysRequest = http.request(gpiiKeyViewUrl, function (resp) {

This comment has been minimized.

@amb26

amb26 Jul 10, 2018

Member

Worrying lack of abstraction in all of this work. At the least there should be a wrapper for the process of invoking an HTTP request and waiting for responses from it as some kind of promise-producer. I don't want to slow down this work one atom so I'm not suggesting that it be re-expressed using https://github.com/fluid-project/kettle/blob/master/docs/DataSources.md#simple-example-of-using-an-http-datasource but something needs to happen to unwrap the tangle in this method which by line 88 we are 4 closures deep. It's hard to follow this expression of the algorithm, it is brittle, and seems prone to races which are already causing confusion. Also consider utilities like https://docs.fluidproject.org/infusion/development/PromisesAPI.html#fluidpromisesequencesources-options

var snapSetsString = "";
response.setEncoding("utf8");
response.on("data", function (chunk) {
snapSetsString += chunk;

This comment has been minimized.

@amb26

amb26 Jul 10, 2018

Member

See below

@mrtyler
Copy link
Contributor

mrtyler left a comment

IANA Javascript developer but to me this looks like it implements the dataloading strategy Cindy and I have discussed.

Two areas I want to make sure we're thinking about:

  1. Idempotence - the goal is that we should be able to run the dataloader over and over again, and the end result will always be that the correct data is in the database (and any old data is removed). I think this implementation satisfies that goal, but please confirm for me Joseph/Cindy.

  2. Failure modes - the goal is that a failure along the way (e.g. an HTTP timeout while communicating with couchdb) should NOT leave the database in a bad or inconsistent state.

This is a nuanced topic with many possible solutions that range from "cheap but disruptive" to "resilient but expensive". A few examples:

  • Current solution: drop entire database and re-populate from scratch

    • Cheap!
    • Very disruptive. Ignoring the showstopping problem -- this approach destroys any data that cannot be reloaded from canned snapsets -- a user will be unable to use the system from the moment the database is dropped until the Snapset the user wants is re-uploaded (on the order of 10s of seconds).
  • Proposed solution in this PR: delete all snapsets in a first pass, re-populate snapsets in a second pass

    • More expensive (the cost of writing the code in this PR, basically)
    • Less disruptive. This solves the showstopping problem above, and narrows the window where a user is affected (on the order of seconds).
  • Possible future solution: delete each snapset individually, then re-upload that snapset immediately

    • A bit more expensive to implement
    • A bit less disruptive. Narrows the window where a user is affected (on the order of deciseconds).

Note that these are all "happy paths". If the dataloading process halts due to an error after it has deleted a snapset but before it has re-uploaded that snapset, the window where users are affected grows to the time it takes the dataloader process to be run again (on the order of minutes or tens of minutes).

First, are my concerns clear? If not, I'm happy to elaborate and clarify, perhaps in a real-time chat.

Second, are my concerns worth considering now? The solution in this PR is better than what we have today, and deadlines are looming, so perhaps this is sufficient. My main purpose in raising these questions is so that everyone is thinking about failure modes and how to handle them since failure on the internet is inevitable :p.

fluid.log ("COUCHDB_URL environment variable must be defined");
process.exit(1);
}
fluid.log("COUCHDB_URL: '" + dbLoader.couchDbUrl + "'");

This comment has been minimized.

@mrtyler

mrtyler Jul 12, 2018

Contributor

COUCHDB_URL can (and usually will) contain credentials. Please do not write it to the log without sanitization.

It looks like url.parse may be helpful in reporting useful data like hostname without reporting sensitive data like password.

This comment has been minimized.

@klown

klown Jul 20, 2018

Author Contributor

Thanks for pointing that out, @mrtyler. I've gone with logging the

  • protocol
  • host
  • port
  • pathname

I don't think any of those contain sensitive information, but I could be wrong. What do you think?

This comment has been minimized.

@mrtyler

mrtyler Jul 24, 2018

Contributor

Looks good.

});
});
getGpiiKeysRequest.on("error", function (e) {
fluid.log("Error finding snapsets' associated GPII Keys: " + e.message);

This comment has been minimized.

@mrtyler

mrtyler Jul 12, 2018

Contributor

I think this should be "snapset's", unless the error message is going to contain errors from all the snapsets that failed?

This comment has been minimized.

@klown

klown Jul 20, 2018

Author Contributor

The closest log message to this old one now has no apostrophe: "Error finding snapset Prefs Safes associated GPII Keys".

fluid.log("STATUS: " + res.statusCode);
fluid.log("HEADERS: " + JSON.stringify(res.headers, null, 2));
res.on('end', function () {
fluid.log('Batch deletion of snapsets');

This comment has been minimized.

@mrtyler

mrtyler Jul 12, 2018

Contributor

Maybe this is clearer as "Finished batch deletion of..."?

This comment has been minimized.

@klown

klown Jul 20, 2018

Author Contributor

And this one is now "Bulk deletion completed."

klown and others added some commits Jul 16, 2018

Merge pull request #1 from cindyli/GPII-3138
GPII-3138: Add a couchDB view to return all GPII keys
@gpii-bot

This comment has been minimized.

Copy link
Collaborator

gpii-bot commented Jul 18, 2018

@klown

This comment has been minimized.

Copy link
Contributor Author

klown commented Jul 20, 2018

@cindyli, @mrtyler, @amb26 I'm pushing the latest code that uses promises. But, this is not complete as I have yet to migrate the code that loads the snapsets and their keys using bash, as you suggested @cindyli. I wanted to test what I have so far using the docker image and see if there are any problems.

GPII-3138: Delete and (re)load snapsets into the database
- Refactored deleteSnapsets.js to use promises.
- Modified Dockerfile to remove dependency on NODE_ENV, as it is
not needed by the gpii-dataloader script built into the docker image.
@gpii-bot

This comment has been minimized.

Copy link
Collaborator

gpii-bot commented Jul 20, 2018

dbLoader.snapsetPrefsSafes.push(aSnapset.value);
});
fluid.log("\tSnapset Prefs Safes marked for deletion.");
return dbLoader.snapsetPrefsSafes;

This comment has been minimized.

@cindyli

cindyli Jul 20, 2018

Contributor

This line seems unnecessary because dbLoader.snapsetPrefsSafes is a global variable.

This comment has been minimized.

@klown

klown Jul 23, 2018

Author Contributor

... and nothing actually uses the return value. Removed.

gpiiKeyRecords, dbLoader.snapsetPrefsSafes
);
fluid.log("\tGPII Keys associated with snapset Prefs Safes marked for deletion.");
return dbLoader.gpiiKeys;

This comment has been minimized.

@cindyli

cindyli Jul 20, 2018

Contributor

Same as above.

This comment has been minimized.

@klown

klown Jul 23, 2018

Author Contributor

Right.

@cindyli

This comment has been minimized.

Copy link
Contributor

cindyli commented Oct 10, 2018

I believe these snapset data are already in the production and staging databases and marked as "user" type. We can do a one time cleanup to wipe them out before the new dataloader goes alive.

@klown

This comment has been minimized.

Copy link
Contributor Author

klown commented Oct 10, 2018

Okay, I won't add that one time cleanup to the new dataloader.

dbOptions.couchDbUrl = processArgv[2];
dbOptions.staticDataDir = processArgv[3];
dbOptions.buildDataDir = processArgv[4];
if (processArgv.length > 5 && processArgv[5] === "--justDelete") { // for debugging.

This comment has been minimized.

@amb26

amb26 Oct 10, 2018

Member

This can just be dbOptions.justDelete = processArgv.length > 5 && processArgv[5] === "--justDelete";

This comment has been minimized.

@klown

klown Oct 11, 2018

Author Contributor

Okay.

return dbOptions;
};

/*

This comment has been minimized.

@amb26

amb26 Oct 10, 2018

Member

Turn this into a proper doc comment that gets linted. Add a special warning that the input argument will be modified.

This comment has been minimized.

@klown

klown Oct 11, 2018

Author Contributor

In terms of a warning, I've documented the properties added to the options input parameter in the function description. It would be nice to be able to use some form of @return block tag here, but everything I tried failed to lint.

fluid.log("\tViews data " + ( views ? "retrieved." : "missing." ));
};

/*

This comment has been minimized.

@amb26

amb26 Oct 10, 2018

Member

/** To make this a proper doc comment

This comment has been minimized.

@amb26

amb26 Oct 10, 2018

Member

And the following 3 - fix globally

This comment has been minimized.

@klown

klown Oct 11, 2018

Author Contributor

Thanks for catching them -- done, globally.


/*
* Create the step that retrieves the current views from the database.
* @param {Object} options - Object containing the views URL into the database.

This comment has been minimized.

@amb26

amb26 Oct 10, 2018

Member

Be a little clearer about what fields are expected/allowed in this options block, possibly using a JSDocs @typedef

This comment has been minimized.

@klown

klown Oct 11, 2018

Author Contributor

See below.

/**
* Generate a response handler, setting up the given promise to resolve/reject
* at the correct time.
* @param {Function} handleEnd - Function to call that deals with the response

This comment has been minimized.

@amb26

amb26 Oct 10, 2018

Member

Document signature of this function, possibly using an @callback JSDocs directive

This comment has been minimized.

@klown

klown Oct 11, 2018

Author Contributor

See below.

* return. It is up to the caller to trigger the request by calling its end()
* function.
* @param {String} databaseURL - URL to query the database with.
* @param {Function} handleResponse - callback that processes the response from

This comment has been minimized.

@amb26

amb26 Oct 10, 2018

Member

As above, use @callback/@typedef

This comment has been minimized.

@klown

klown Oct 11, 2018

Author Contributor

See below.

* Utility to configure a step: creates a response callback, binds it to an
* http database request, and configures a promise to resolve/reject when the
* response callback finishes or fails.
* @param {Object} details - Specific information for the request and response,

This comment has been minimized.

@amb26

amb26 Oct 10, 2018

Member

Document mandatory fields in this object either inline here as with http://usejsdoc.org/tags-param.html#parameters-with-properties or else using a JSDocs @typedef http://usejsdoc.org/tags-typedef.html if this options structure appears in the signature of other functions

This comment has been minimized.

@klown

klown Oct 11, 2018

Author Contributor

I've defined a few @typedef and @callback block tags and used them accordingly throughout. And, where options is a parameter, I've followed the Parameters with properties JSDoc section, listing the properties of options specifically used within the function.

That said, this function, gpii.dataLoader.configureStep(), was a struggle because it is abstract and generic. The actual option properties and their meaning is defined in the specific "set up" functions that ultimately call this one. It's tricky being informative and vague at the same time...

GPII-3138: Update snapsets in the database
Modified based on Antranig's comments:
- modified the JSDoc documentation to better explain the rationale
behind each function, its input parameters and outputs.
@gpii-bot

This comment has been minimized.

Copy link
Collaborator

gpii-bot commented Oct 11, 2018

README.md Outdated
`%gpii-universal/build/dbData/snapset/` folder. These are used to update the snapsets in CouchDB when GPII is
run in a production or staging configuration.
* They are also converted into `user` preferences safes and GPII keys and placed into the
`%gpii-universal/build/dbData/user/` folder. These are used with PouchDB when GPII runs in a development

This comment has been minimized.

@cindyli

cindyli Oct 12, 2018

Contributor

When GPII runs in a development configuration, it also uses snapset-type data from %gpii-universal/build/dbData/snapset/. See pouchManager code.

%gpii-universal/build/dbData/user/ is only used for running integration tests. See PouchTestCaseHolder.

The doc in testData/dbData/README.txt for these data sets are correct. Please sync up. Thanks.

README.md Outdated
* The preferences files for running GPII and for integration tests are located at
`%gpii-universal/testData/preferences`. These files are converted into two types of preferences safes and GPII keys:
* They are converted into `snapset` preferences safes and GPII keys and placed into the
`%gpii-universal/build/dbData/snapset/` folder. These are used to update the snapsets in CouchDB when GPII is

This comment has been minimized.

@cindyli

cindyli Oct 12, 2018

Contributor

Reading "when GPII is run in a production or staging configuration" reminds me the use of config files in gpii/configs directory. All those configs use pouchDB at the backend.

Probably adjust this sentence to express datasets in this folder are:

  1. loaded into the production and staging CouchDB in the real clouds.
  2. loaded into the pouchDB when GPII runs locally (regardless which config is used.).

This comment has been minimized.

@klown

klown Oct 15, 2018

Author Contributor

Thanks @cindyli I've reworked the whole section to:

  1. better sync up with the testData\dbData\README.txt, and
  2. adjust the wording to reflect your suggestions.
GPII-3138: Update snapsets in the database
- modified the main README.md to better explain the various ways
that the source preferences files are converted into PrefsSafes
and GPII Keys, and for which purposes they are used.
- minor grammaticial changes to README.txt.
@gpii-bot

This comment has been minimized.

Copy link
Collaborator

gpii-bot commented Oct 15, 2018

GPII-3138: Update snapsets in the database
Fixed markdown errors.
@gpii-bot

This comment has been minimized.

Copy link
Collaborator

gpii-bot commented Oct 17, 2018

@amb26 amb26 merged commit ebde69d into GPII:master Oct 17, 2018

1 check passed

default Build finished.
Details
@stepanstipl

This comment has been minimized.

Copy link
Contributor

stepanstipl commented Oct 18, 2018

I already confirmed this with @cindyli (thanks for help), but can you @klown please also confirm that removing following keys and corresponding prefsSafes is what we want to do for cleanup? (matches the files in https://github.com/GPII/universal/tree/master/testData/preferences)

GPII-270-rbmm-demo
MikelVargas
alice
alsa
andrei
audio
carla
carla_24751
catalina
chris
chromeDefault
condTest
condTest2
davey
david
debbie
easit1
easit2
elaine
elmer
elmerv
elod
explodeLaunchHandlerStart
explodeLaunchHandlerStop
explodeSettingsHandlerGet
explodeSettingsHandlerSet
franklin
gert
jaws
jme_app
jme_common
li
livia
maavis1
maavis2
maggie
maguro
manuel
mary
mickey
mobileaccessibility1
mobileaccessibility2
multi_context
nisha
olb_Alicia_app
omar
omnitor1
omnitor2
os_android
os_android_common
phil
randy
review3_chrome_high_contrast
review3_ma1
review3_ma2
review3_user_1
review3_user_2
review3_user_3
review3_user_4
roger
rwg1
rwg2
salem
sammy
slater
snapset_1a
snapset_1b
snapset_1c
snapset_2a
snapset_2b
snapset_2c
snapset_3
snapset_4a
snapset_4b
snapset_4c
snapset_4d
snapset_5
sociable1
sociable2
talkback1
talkback2
telugu
testUser1
timothy
tom
tvm_jasmin
tvm_sammy
tvm_vladimir
uioPlusCommon
uioPlus_captions
uioPlus_character_space
uioPlus_defaults
uioPlus_font_size
uioPlus_high_contrast
uioPlus_highlight_colour
uioPlus_inputs_larger
uioPlus_line_space
uioPlus_multiple_settings
uioPlus_self_voicing
uioPlus_simplified
uioPlus_syllabification
uioPlus_toc
vicky
vladimir
wayne
@klown

This comment has been minimized.

Copy link
Contributor Author

klown commented Oct 18, 2018

@stepanstipl @cindyli
I compared @stepanstipl's list against what it is in testData/preferences/ and found a couple more, likely because they are relatively new: empty and uioPlus_word_space. After the snapset update machinery runs, there are both empty and uioPlus_word_space snapset prefsSafes in the database.

There is also an nyx, but it's ignored because its extension is .json. The preferences conversion script converts only files with .json5 extensions. (Aside: I'm wondering if nyx.json is a bug or intentional (@stegru ?))

And, sorry in advance for being pedantic: the prefsSafes for this one-time removal must have prefsSafeType equal to "user". The problem is the self-same testData/preferences files are also converted to "snapset" prefsSafes and we don't want to remove those. Having said that, it's likely that if prefsSafes are in the database as "user", then they are not also in as "snapset", since they would have the same database ID, and that's not allowed.

Using empty as an example, here is what I mean regarding "prefsSafeType:

{
  "_id": "prefsSafe-empty",
  "_rev": "1-ae02f36527f53c9fde5d60e7422160be",
  "type": "prefsSafe",
  "schemaVersion": "0.1",
  "prefsSafeType": "user",  <== NOT "snapset"  *****
  "name": "empty",
  "password": null,
  "email": null,
  "preferences": {
    "flat": {
      "name": "Empty",
      "contexts": {
        "gpii-default": {
          "name": "Default preferences",
          "preferences": {}
        }
      }
    }
  },
  "timestampCreated": "2018-10-18T18:52:21.285Z",
  "timestampUpdated": null
}

Hope that helps

@stepanstipl

This comment has been minimized.

Copy link
Contributor

stepanstipl commented Oct 18, 2018

thanks @klown . You're right that empty, uioPlus_word_space and nyx are in testData/preferences, but they are not in the actual database (neither prod nor staging). The list here is based on the current state of prod DB.

And yes, the prefsSafes were selected using "selector": {"type": "prefsSafe", "prefsSafeType": "user"} filter, I believe that's what you had in mind. I documented the intended cleanup process, please check the description of gpii-ops/gpii-infra#163 PR for details.

I agree, and that's why I'm double-checking, since we're gonna do the changes on live production database, it's better to be safe than sorry! thanks for comments and if you see anything else that might be an issue pls. let me know.

@stegru

This comment has been minimized.

Copy link
Member

stegru commented Oct 18, 2018

nyx.json is fixed with #630.

@klown

This comment has been minimized.

Copy link
Contributor Author

klown commented Oct 18, 2018

thanks @klown . You're right that empty, uioPlus_word_space and nyx are in testData/preferences, but they are not in the actual database (neither prod nor staging). The list here is based on the current state of prod DB.

Of course. Makes sense.

And yes, the prefsSafes were selected using "selector": {"type": "prefsSafe", "prefsSafeType": "user"} filter, I believe that's what you had in mind. I documented the intended cleanup process, please check the description of gpii-ops/gpii-infra#163 PR for details.

I agree, and that's why I'm double-checking, since we're gonna do the changes on live production database, it's better to be safe than sorry! thanks

Oh, for sure.

@klown

This comment has been minimized.

Copy link
Contributor Author

klown commented Oct 18, 2018

nyx.json is fixed with #630.

Thanks @stegru.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.