Region mapping advanced matching #924

Merged
merged 105 commits into from Oct 15, 2015

Projects

None yet

2 participants

@stevage
Contributor
stevage commented Sep 21, 2015

Significant refactoring of the region mapping code. Major benefits:

  • Region mapping code split out from CsvCatalogItem to RegionProvider, RegionProviderList, DataVariable, DataTable and TableDataSource
  • added functionality: matching by regex (enabling matching LGA names), disambiguation using a second column
  • dozens of new unit tests
  • fixed many edge cases, particularly with drag-and-drop CSV files (which lack explicit styling).
  • temporal datasets work much better: each datapoint by default is shown until the next time point.
  • new architecture will make it much easier to add new region provider types
stevage added some commits Jul 1, 2015
@stevage stevage Add more comments 25337e0
@stevage stevage Give better feedback about regions that failed to match, or matched
more than once.
e14ad34
@stevage stevage Typos
7537dc0
@stevage stevage Improve ux of CSV feedback with styling, show message properly,
discard CSV if no rows match.
ebae69d
@stevage stevage Fix lint.
342fff1
@stevage stevage Support regexes to improve name matching. There are still 6 ambiguous
LGAs using this method though.
4157348
@stevage stevage Give blank CSV column names a name like _Column5 7923f05
@stevage stevage Work in progress supporting more sophisticated matches. In particular:
- fuzzy matches ('City of Blah' = 'Blah (C)'), through regexes
- multi-column matches (State+LGA for disambiguation)
f516bca
@stevage stevage Merge branch 'master' into region-mapping-advanced-matching
Conflicts:
  lib/Models/CsvCatalogItem.js
  lib/Styles/PopupMessage.less
24f0bf1
@stevage stevage Restructure CsvCatalogItem, splitting into three. Semi-functional at …
…this stage.
bf3a14d
@stevage stevage Temporal region datasets work again 1c68b45
@stevage stevage More cleanups, still no disambiguation support.
835d6af
@stevage stevage Disambiguation works! c929fdd
@stevage stevage Allow DataTable.getDataVariable(true) to include ENUM types. 27d2d9a
@stevage stevage Avoid casting numbers with leading zeros to scalars. 1b497d7
@stevage stevage More selective data column type guessing based on column headings. Fixes
issue with "lone_person" column classification as longitude.
66b8310
@stevage stevage Remove extraneous extra leading / on CKAN API, not liked by some inst…
…ances.
9ab4e5a
@stevage stevage RegionProvider code cleanups and robustness. 8614660
@stevage stevage CsvCatalogItem code cleanup, docs etc. 996dda7
@stevage stevage Rework RegionpProviderList slightly. d4365dc
@stevage stevage Test cases for CsvCatalogItem, RegionProvider, RegionProviderList 2c2fc3b
@stevage stevage Test files for CSV and Socrata
15e0fe1
@stevage stevage Support CSVs lacking final linefeed. b49a4f3
@stevage stevage CSV test cases. e31c6f4
@stevage stevage CSV test cases. e8490df
@stevage stevage Make feature picking work on region-mapped CSVs. 38de950
@stevage stevage More CSV test case work c41659e
@stevage stevage Browserify spam c9846bd
@stevage stevage Move some region mapping stuff into DataTable/DataVariable e6b85bd
@stevage stevage Wrong comment. 6b90a03
@stevage stevage Automatically choose data variable. Risky? 113e8b0
@stevage stevage Move more region mapping to dataset 979feb5
@stevage stevage Comments and cleanup e5cdd42
@stevage stevage CSV refactoring 4c95d96
@stevage stevage CSV test cases 8a83a30
@stevage stevage Automatically display time-based records until the next time record, …
…not a fixed amount.
14da819
@stevage stevage Make region mapped dates robust. ac56b18
@stevage stevage Fix auto-setting of datavariable. e267db9
@stevage stevage General comments and cleanup. 4ed6196
@stevage stevage Don't report ambiguous matches for CSVs with time fields.
f4f9a7c
@stevage stevage More CSV test cases for dates. 9f349ea
@stevage stevage ABS catalog items works again.
6219527
@stevage stevage Support thousand-commas in CSV files: "1,234.56". Fixes #901.
93e9073
@stevage stevage Respect provided csvItem.rectangle Fixes #898. d78ea5b
@stevage stevage Treat region provider aliases as regexes. b9ec231
@stevage stevage Don't use our production region mapping server for unit tests.
22e6d77
@stevage stevage A much better idiom for when.js test cases: .then(done).otherwise(fail); 42c8f02
@stevage stevage Fixes #914
7a24cf4
@stevage stevage Move feature picking and image recoloring out to Cesium.
ced41d5
@stevage stevage Clean up tests. cae90eb
@stevage stevage Revert "Move feature picking and image recoloring out to Cesium."
This reverts commit ced41d5.
5aa86f3
@stevage stevage Move imagery hooks to new file, addImageryProviderHooks.js
f4a2c5f
@stevage stevage Trimming. 49d8345
@stevage stevage Rename to ImageryProviderHooks
11a263a
@stevage stevage Merge remote-tracking branch 'origin/master' into region-mapping-adva…
…nced-matching

* origin/master:
  Bump version.
  Update CHANGES.md.
  Fix typo preventing open state of groups from being serialized.
  Revert "Merge pull request #1 from travissimon/master"
  Fix typo in CHANGES.md.
  Update CHANGES.md.
  Reset animation clock on initial playback
  Add the ability to specify a cache duration.
  Make timeseries autoplay optional, preserving existing autoplay behaviour.
  Remove spammy JSON browserify dumps as per TerriaJS/nationalmap@680b4dc
  1.0.42
  Don't say that we are downloading the WFS data as GeoJSON, because many (GA in particular) services do not support GeoJSON.
  Add proxy support to the OpenStreetMapImageryProvider.
  Don't serialize csvDataset property.
  b35 and b33

Conflicts:
    lib/Models/CkanCatalogGroup.js
    lib/Models/CsvCatalogItem.js
e9571cd
@stevage stevage Small refinements of region mapping. baf2993
@stevage stevage Support feature picking with fuzzy matched regions.
c5d5eb8
@kring
Member
kring commented Sep 22, 2015

I haven't looked at the code yet, but I ran through the CSVs in #test and noticed some differences. It's possible these are intentional but I wanted to note them regardless:

  • Miscellaneous Files -> Auto Incidents No Time now uses the id to color the points. Previously all the points were the same color.
    image
    image
  • Region Mapping (CSV) -> NEXIS NSW 2012 (LGA) and SEIFA 2011 (POA) no longer load:
    image
  • Temporal -> Pacific Earthquakes shows each earthquake for a very brief time, so it's flickery. This might be a case where showing each point until the start of the next doesn't make sense?
@kring kring commented on the diff Sep 22, 2015
wwwroot/test/csv/regionMapping.json
@@ -0,0 +1,197 @@
+{
+ "comments": "Matching takes place in the order defined in this file. Place code matches before name matches, and smaller regions before larger ones.",
@kring
kring Sep 22, 2015 Member

As you probably know, JSON keys are unordered. So if this currently works in all browsers it's just a coincidence. Unless you're talking about the order in the aliases array, in which case maybe clarify this in the comment?

@kring
kring Sep 22, 2015 Member

Actually this may not be a new problem, eh? Probably worth fixing in any case.

@stevage
stevage Sep 23, 2015 Contributor

As you probably know, JSON keys are unordered

Yeah, good point. The truth seems to be a lot murkier than that simple statement. To fix it will require backwardly incompatible changes so I won't try to squeeze it in here. #930

@kring kring and 1 other commented on an outdated diff Sep 22, 2015
lib/Models/RegionProvider.js
+ /**
+ * Name of the WMS layer where these regions are found.
+ * @type {String}
+ */
+ this.layerName = properties.layerName;
+ /**
+ * URL of the WMS server
+ * @type {String}
+ */
+ this.server = properties.server;
+ /**
+ * List of aliases which will be matched against if found as column headings.
+ * @type {String[]}
+ */
+ this.aliases = defaultValue(properties.aliases, [this.regiontype]);
+ this.serverReplacements = properties.serverReplacements;
@kring
kring Sep 22, 2015 Member

It'd be helpful to have doc for these three properties.

@stevage
stevage Sep 23, 2015 Contributor

Done. Unfortunately that reminded me how silly the disambig data structure is.

@kring kring and 1 other commented on an outdated diff Sep 22, 2015
lib/Models/RegionProvider.js
+ * @type {Boolean}
+ */
+ this.textCode = defaultValue(properties.textCodes, false); // yes, it's singular...
+
+ /**
+ * Array of attributes of each region, once retrieved from the server.
+ * @type {Object[]}
+ */
+ this.regions = [];
+
+
+};
+
+/**
+ * Turn a WFS list of attribute values into a straight array
+ * @param {String} propertyName: optional, for setting an additional (disambiguating) property on a region
@kring
kring Sep 22, 2015 Member

Optional properties are indicated in jsdoc by enclosing them in square brackets. E.g. [propertyName] or [propertyName='foo'] (to state the default value when a value isn't supplied).

@stevage
stevage Sep 23, 2015 Contributor

Thanks, done.

@kring kring commented on an outdated diff Sep 22, 2015
lib/Models/RegionProvider.js
+
+ if (!(obj.member instanceof Array)) {
+ obj.member = [obj.member];
+ }
+ if (obj.member.length === 0) {
+ throw new ModelError("Zero region boundaries found for region " + this.regionProp);
+ }
+ for (var i = 0; i < obj.member.length; i++) {
+ if (this.regions[i] === undefined) {
+ this.regions[i] = {};
+ }
+ // either { id: 123 } or { id: 'Bunbury', STE_NAME: 'VIC' }
+ if (!propertyName) {
+ this.regions[i].id = obj.member[i][this.regionProp];
+ } else {
+ this.regions[i][propertyName] = obj.member[i][propertyName];
@kring
kring Sep 22, 2015 Member

Not sure if I'm following this right, but isn't it necessary to set the value of the id property even when a propertyName is supplied?

@kring
kring Sep 22, 2015 Member

Nevermind, I see now that the id and propertyName are set on separate invocations of the function from loadRegionIDs. But it does rely on the WFS getPropertyValue returning the same features in the same order in two successive calls to getPropertyValue. Is there a better way to do it?

@kring kring and 1 other commented on an outdated diff Sep 22, 2015
lib/Models/RegionProvider.js
+ }
+ if (this.server === undefined) {
+ throw (new DeveloperError('No server for region mapping defined: ' + this.regionType));
+ }
+ var that = this;
+
+ var baseuri = URI(this.server).addQuery({
+ service: 'wfs',
+ version: '2.0',
+ request: 'getPropertyValue',
+ typenames: this.layerName});
+
+ // get the list of IDs that we will attempt to match against for this column
+ var p = [];
+ var url = baseuri.addSearch('valueReference', this.regionProp).toString();
+ p.push(loadText(corsProxy.getURL(url))
@kring
kring Sep 22, 2015 Member

It's unfortunate, but corsProxy.getURL always proxies the URL even if it is not necessary to do so. That's why we usually call corsProxy.shouldUseProxy first. A helper function would be great, along the lines of proxyCatalogItemUrl.

@stevage
stevage Sep 23, 2015 Contributor

Definitely unfortunate, I just stripped out the references to terria ;) My short term fix is to never proxy - our region mapping server(s) support CORS.

@kring
kring Sep 23, 2015 Member

My short term fix is to never proxy - our region mapping server(s) support CORS.

This will break region mapping in IE9, because IE9 doesn't support CORS at all.

@kring kring and 1 other commented on an outdated diff Sep 22, 2015
lib/Models/RegionProvider.js
+ var p = [];
+ var url = baseuri.addSearch('valueReference', this.regionProp).toString();
+ p.push(loadText(corsProxy.getURL(url))
+ .then(function(xml) {
+ that.loadRegionsFromXML(xml);
+ }).otherwise(function(err) {
+ console.log(err);
+ throw(err);
+ }));
+ // if this column might be ambiguous then fetch the disambiguating values for each column as well (usually State)
+ var dab = this.disambigColumns;
+ if (defined(dab)) {
+
+ url = baseuri.addQuery('valueReference' + dab[0].regionProp).toString();
+ p.push(loadText(corsProxy.getURL(url)).then(function(xml) {
+ that.loadRegionsFromXML(xml, dab[0].regionProp);
@kring
kring Sep 22, 2015 Member

Only the first disambiguation column is used here, and that appears to be true everywhere else disambigColumns is used in this file as well. Why take an array and then only use the first element?

@stevage
stevage Sep 22, 2015 Contributor

Yeah, initially I was thinking there would be multiple levels of disambiguation (suburb, state, country?), but I don't think any use cases warrant it at this stage. The disambiguation stuff is a bit more general than it needs to be.

@kring
kring Sep 22, 2015 Member

Probably worthwhile to simplify it then, no?

@stevage
stevage Sep 23, 2015 Contributor

Yep.

@kring kring and 1 other commented on an outdated diff Sep 22, 2015
lib/Models/RegionProvider.js
+ }
+ return when.all(p).yield(true);
+};
+
+
+function applyReplacements(s, replacements) {
+// replacements: [ [ regex, replacement], ... ]
+// replacement can contain '$1' etc.
+ var r = s.trim().toLowerCase(); // just in case...
+
+ if (replacements === undefined || replacements.length === 0) {
+ return r;
+ }
+
+ var cachekey = s + '%' + replacements.reduce(function(key, item) {
+ return key + item[0] + '%' + (item[1] ? item[1] : '') + '%';
@kring
kring Sep 22, 2015 Member

How about using JSON.stringify(replacements) as the cache key?

@kring
kring Sep 22, 2015 Member

Sorry, s is in there too. So JSON.stringify({s: s, replacements: replacements}).

@stevage
stevage Sep 22, 2015 Contributor

JSON.stringify turns regexes into {}, but good point.

@kring
kring Sep 22, 2015 Member

Despite the name, the regexes are actually strings that will be turned into regexes, not actual RegExp instances, right?

@stevage
stevage Sep 23, 2015 Contributor

Oh, good point. Done. (The cache is much less important now that server side matching happens up front. It was critical when I first implemented it.)

@kring kring and 1 other commented on an outdated diff Sep 22, 2015
lib/Models/RegionProvider.js
+ }
+
+ replacements.forEach(function(rep) {
+ r = r.replace(new RegExp(rep[0].toLowerCase(), 'gi'), rep[1].toLowerCase());
+ });
+ applyReplacements.cache[s] =r;
+ return r;
+}
+applyReplacements.cache = {};
+
+
+/* Checks if a given region code matches a given ID after applying regexs and additional disambiguation lookups */
+RegionProvider.prototype.codeMatchesRegionID = function(id, code, disambigId, disambigCode) { //, row, idnum, csvItem ) {
+ code = String(code).toLowerCase(); //## should be happening upstream somewhere
+ id = String(id).toLowerCase();
+ if (applyReplacements(id, this.serverReplacements) !== applyReplacements(code, this.dataReplacements)) {
@kring
kring Sep 22, 2015 Member

Couldn't you do the serverReplacements once instead of doing it while testing each code? Then your replacement cache, if you need it at all after turning this n*n operation into an n operation, could just be a map of strings (pre-replacement codes) to region indices.

@kring
kring Sep 22, 2015 Member

Also can you come up with clearer names than id and code so that it's clearer that id is the actual ID of a region on the server and code is a value in a column that hopefully matches one of the known region IDs?

@stevage
stevage Sep 22, 2015 Contributor

Couldn't you do the serverReplacements once instead of doing it while testing each code?

Ah, yep. Logical next refinement. Basically just moving the cache up a level, so it's a simple lookup. Nice catch.

@kring kring and 1 other commented on an outdated diff Sep 22, 2015
lib/Models/RegionProvider.js
+ id = String(id).toLowerCase();
+ if (applyReplacements(id, this.serverReplacements) !== applyReplacements(code, this.dataReplacements)) {
+ return false; // failed match, even after best efforts
+ }
+ if (!defined(this.disambigColumns) || this.disambigColumns.length === 0) {
+ return true; // unambiguous successful match
+ }
+
+ if (defined(disambigId) && !defined(disambigCode)) {
+ // warn? we have a value to disambiguate against, but it's not provided
+ return true;
+ }
+
+ // Hooray, let's disambiguate. Our args look something like: Campbelltown, Campbelltown, New South Wales, NSW.
+
+ var dabMatched = this._getDisambigProvider().codeMatchesRegionID(disambigId, disambigCode);
@kring
kring Sep 22, 2015 Member

It seems a bit awkward for each region provider to keep a reference back to the list of providers that contains it just so that it can go find its disambiguation provider.

It would be better, I think, if the regionProviderList were passed down as parameters to the functions along the way, rather than given to the RegionProvider constructor and stored.

Also, I suspect that looking up the disambig provider with a a linear search on every attempted match is making the matching go quite a bit slower than it needs to.

@stevage
stevage Sep 22, 2015 Contributor

Agreed - in fact, as I progressed, I started backtracking on actually needing to store that reference.

I think actually the DataTable should know what the RegionVariable and DisambigVariable are, and pass those references around as necessary.

@kring kring and 1 other commented on an outdated diff Sep 22, 2015
lib/Models/RegionProvider.js
+/**
+ * Provides a list of values for each region, based on a provided dataset.
+ *
+ * @param {Dataset} dataset
+ * @param {String} regionVariable
+ * @param {Array} rowList Optional array of integers: only show these rows from dataset. Used for time-filtering.
+ * @returns {Object} Object containing properties:
+ * regionValues: Sparse array of values corresponding to the provided list.
+ * ambiguousMatches: Object whose keys are IDs of regions that had more than one row match against them.
+ * failedMatches: Object whose keys are unmatched region codes.
+ * successes: Number of successful matches.
+ * totalRows: Number of attempted matches.
+ */
+RegionProvider.prototype.getRegionValues = function (dataset, regionVariable, disambigVariable, rowList) {
+ if (this.regions.length < 1) {
+ // should this return a promise instead of throwing?
@kring
kring Sep 22, 2015 Member

I don't think so, it looks like a synchronous function to me.

@stevage
stevage Sep 22, 2015 Contributor

Fixed.

@kring kring commented on an outdated diff Sep 22, 2015
lib/Models/RegionProvider.js
+RegionProvider.prototype.getRegionValues = function (dataset, regionVariable, disambigVariable, rowList) {
+ if (this.regions.length < 1) {
+ // should this return a promise instead of throwing?
+ throw new DeveloperError('Region provider not ready to match regions.');
+ }
+ var codes = dataset.getVariableEnums(regionVariable);
+ var disambigCodes = (defined(disambigVariable) ? dataset.getVariableEnums(disambigVariable) : []);
+ var vals = dataset.getVariableValues(dataset.getDataVariable());
+ if (!defined(vals)) {
+ console.log('Warning: no data variable for region-mapped dataset with region variable "' + regionVariable + '"');
+ }
+
+ var numericCodes = false;
+ if (!defined(codes)) {
+ // not an enum type? the IDs must be digit-based and have been misclassified as numbers
+ // ##TODO think about whether this matters for disambigs
@kring
kring Sep 22, 2015 Member

Let's try to avoid TODOs in the code. This sounds like something that should be considered before we merge this. In other cases, TODOs can be turned into GitHub issues instead.

@kring kring and 1 other commented on an outdated diff Sep 22, 2015
lib/Models/RegionProvider.js
+ for (var i = 0; i < this.regions.length; i++) {
+ var dabId;
+ if (defined(this.disambigColumns) && defined(this.disambigColumns[0])) {
+ dabId = this.regions[i][this.disambigColumns[0].regionProp];
+ }
+ if (this.codeMatchesRegionID(this.regions[i].id, code, dabId, disambigCode)) {
+ return i;
+ }
+ }
+ return -1;
+};
+
+/**
+ * Provides a list of values for each region, based on a provided dataset.
+ *
+ * @param {Dataset} dataset
@kring
kring Sep 22, 2015 Member

I don't see a type called Dataset. DataTable maybe?

@stevage
stevage Sep 22, 2015 Contributor

Fixed. There seems to be a convention of variables called dataset referring to DataTable objects.

@stevage
Contributor
stevage commented Sep 22, 2015

Miscellaneous Files -> Auto Incidents No Time now uses the id to color the points.

Previously lat-long CSVs didn't support enums - which I think they ought to. The question now is really, "for how many distinct values does colour coding not make sense for enums" or "are there fields which should not be colour coded on"?

So, yes, it's intentional, but there are cases where the new behaviour is undesirable.

Temporal -> Pacific Earthquakes shows each earthquake for a very brief time, so it's flickery. This might be a case where showing each point until the start of the next doesn't make sense?

Yeah. I suspect that the default behaviour was probably crafted around this dataset, or ones very similar to it. Options:

  • specify an explicit displayDuration value for this dataset
  • have a minimum display duration for any point

Region Mapping (CSV) -> NEXIS NSW 2012 (LGA) and SEIFA 2011 (POA) no longer load:

That's because you're testing in NM, which has a regionMapping.json which is incompatible. Try copying over the one from node_modules/terriajs/wwwroot/test/csv/regionMapping.json.

@kring kring and 1 other commented on an outdated diff Sep 22, 2015
lib/Models/RegionProvider.js
+ results.successes ++;
+ }
+ return results;
+};
+
+/**
+ * Pre-generates a function which quickly turns a value into a colour.
+ * @param {Number[]} regionValues Array of values.
+ * @param {Function} colorFunc should be function(val) { return [r,g,b,a]; }
+ * @returns {Function} Function of type f(regionIndex) { return [r,g,b,a]; }
+ */
+RegionProvider.prototype.getColorLookupFunc = function(regionValues, colorFunc) {
+ var colors = regionValues.map(colorFunc);
+ return function(regionIndex) {
+ // actually maybe it's ok to return undefined after all
+ return colors[regionIndex] !== undefined ? colors[regionIndex] : [0,0,0,0]; //# is it important to get rid of this if?
@kring
kring Sep 22, 2015 Member

These comments are a bit dodgy. :)

@stevage
stevage Sep 23, 2015 Contributor

Done.

@kring kring and 1 other commented on an outdated diff Sep 22, 2015
lib/Models/RegionProvider.js
+
+/*
+ * @returns The index of the first column that left-matches any of the given aliases
+ * @param varNAmes Array of variable names.
+ */
+RegionProvider.prototype.findRegionVariable = function(varNames) {
+ for (var j = 0; j < this.aliases.length; j++) {
+ for (var i = 0; i < varNames.length; i++) {
+ //check that it is the right type of code
+ //if (typeof this.regionTextCode === "boolean" && this.regionTextCode !== variables[i].isTextCode) {
+ // continue;
+ //}
+ // SB: I don't think that's a worthwhile test
+
+ //if (varNames[i].toLowerCase() === this.aliases[j]) {
+ if (new RegExp('^' + this.aliases[j] + '$', 'i').test(varNames[i])) {
@kring
kring Sep 22, 2015 Member

Hoist the construction of the regex out of the inner loop.

@stevage
stevage Sep 23, 2015 Contributor

Nice. Done.

@kring kring and 1 other commented on an outdated diff Sep 22, 2015
lib/Models/RegionProvider.js
+};
+
+
+
+/*
+ * @returns The index of the first column that left-matches any of the given aliases
+ * @param varNAmes Array of variable names.
+ */
+RegionProvider.prototype.findRegionVariable = function(varNames) {
+ for (var j = 0; j < this.aliases.length; j++) {
+ for (var i = 0; i < varNames.length; i++) {
+ //check that it is the right type of code
+ //if (typeof this.regionTextCode === "boolean" && this.regionTextCode !== variables[i].isTextCode) {
+ // continue;
+ //}
+ // SB: I don't think that's a worthwhile test
@kring
kring Sep 22, 2015 Member

I don't understand the question well enough to say if I agree with you, but if you're sure then just delete the code and comment.

@stevage
stevage Sep 23, 2015 Contributor

Done.

@kring kring commented on an outdated diff Sep 22, 2015
lib/Models/RegionProvider.js
+ var disambig = this.disambigColumns[0];
+ return this.regionProviderList.getRegionProvider(disambig.region);
+};
+
+/**
+ * If a disambiguation column is known for this provider, return a column matching its description.
+ */
+RegionProvider.prototype.findDisambigVariable = function(variables) {
+ var dabp = this._getDisambigProvider();
+ if (!defined(dabp)) {
+ return undefined;
+ }
+ var dv = variables[dabp.findRegionVariable(variables)];
+ return (defined(dv) ? dv.name : undefined);
+};
+// this probably doesn't belong here
@kring
kring Sep 22, 2015 Member

yeah, I agree. It's ok for RegionProvider to offer up this information in an object literal or something like that, but it shouldn't be formatting it as HTML.

@kring kring and 1 other commented on an outdated diff Sep 22, 2015
lib/Models/RegionProviderList.js
+* 3. Contact the WFS server defined in that file, fetch IDs for all regions of that type
+* 4. Based on values in the region variable column, generate a linear choropleth mapping
+* 5. Fetch specially prepared WMS tiles which are coloured with a unique colour per region.
+* 6. Recolor each tile, using the unique colour to determine its ID, then replacing that unique colour with a mapped color.
+*
+* Voilà - very fast client-side choroplething with no user data sent to servers, and no vector boundaries sent to client. */
+
+
+/**
+ RegionProviderList encapsulates the regionMapping.json file and provides support for choosing the best region
+ provider for a given dataset.
+*/
+
+var RegionProviderList = function(terria) {
+ /**
+ * URL pointing to a JSON file where the region provider definitions will be loaded from.
@kring
kring Sep 22, 2015 Member

Doc is wrong.

@stevage
stevage Sep 23, 2015 Contributor

Err, which bit?

@kring
kring Sep 23, 2015 Member

The doc says (said?) it's a URL, but it's actually a Terria instance.

@kring kring and 1 other commented on an outdated diff Sep 22, 2015
lib/Models/RegionProviderList.js
+* 6. Recolor each tile, using the unique colour to determine its ID, then replacing that unique colour with a mapped color.
+*
+* Voilà - very fast client-side choroplething with no user data sent to servers, and no vector boundaries sent to client. */
+
+
+/**
+ RegionProviderList encapsulates the regionMapping.json file and provides support for choosing the best region
+ provider for a given dataset.
+*/
+
+var RegionProviderList = function(terria) {
+ /**
+ * URL pointing to a JSON file where the region provider definitions will be loaded from.
+ * @type {String}
+ */
+ this.terria = terria;
@kring
kring Sep 22, 2015 Member

This isn't used for anything other than getting the regionMappingDefinitionsUrl. RegionProvider doesn't really use Terria or any other models, either.

So I think it would make sense to move both RegionProvider and RegionProviderList to Map.

Then, change RegionProviderList to take the JSON region definition (like initFromObject), remove the _initialized logic entirely, and add a static fromRegionMappingDefinitionsUrl function that returns a promise for a RegionProviderList. fromRegionMappingDefinitionsUrl can also cache the loaded instance so that multiple CsvCatalogItems only need to load it once.

@stevage
stevage Sep 22, 2015 Contributor

Great ideas, done.

@kring kring and 1 other commented on an outdated diff Sep 22, 2015
test/Models/CsvCatalogItemSpec.js
});
- it('is correctly loading csv data from text', function() {
- expect(csvItem instanceof CatalogItem).toBe(true);
+ it('throws an error on non-csv file', function(done) {
+ csvItem.url = 'test/GeoJSON/polygon.topojson';
+ expect(true).toBe(true);
@kring
kring Sep 22, 2015 Member

Remove this.

@stevage
stevage Sep 23, 2015 Contributor

Done.

@kring kring and 1 other commented on an outdated diff Sep 22, 2015
test/Models/CsvCatalogItemSpec.js
});
- it('returns an error on non-csv text', function() {
- expect(csvItem instanceof CatalogItem).toBe(true);
+ it('matches LGAs by code', function(done) {
+ csvItem.updateFromJson( { data: 'lga_code,value\n31000,1' });
+ csvItem.load().then(function() {
+ expect(csvItem._regionMapped).toBe(true);
+ expect(csvItem.colorFunc).toBeDefined();
+ // 242 is the shapefile index of LGA boundary 31000. What a crappy way to test...
@kring
kring Sep 22, 2015 Member

Can you look at the created entities maybe? expect(csvItem._tableDataSource.czmlDataSource.entites.values[0].point.color.getValue()).not.toEqual(new Color(0,0,0,0)). Still pretty ugly but at least there's no magic 242 involved.

@kring
kring Sep 22, 2015 Member

Actually that can be slightly simplified:
expect(csvItem._tableDataSource.entites.values[0].point.color.getValue()).not.toEqual(new Color(0,0,0,0))

@stevage
stevage Sep 23, 2015 Contributor

Done.

@kring kring and 1 other commented on an outdated diff Sep 22, 2015
test/Models/CsvCatalogItemSpec.js
});
+ it('matches LGAs by names in various formats', function(done) {
+ csvItem.updateFromJson( { data: 'lga_name,value\nCity of Melbourne,1\nGreater Geelong,2\nSydney (S),3' });
+ csvItem.load().then(function() {
+ expect(csvItem._regionMapped).toBe(true);
+ expect(csvItem.colorFunc).toBeDefined();
+ expect(csvItem.colorFunc(121)).not.toEqual([0,0,0,0]);
+ expect(csvItem.colorFunc(180)).not.toEqual([0,0,0,0]);
+ expect(csvItem.colorFunc(197)).not.toEqual([0,0,0,0]);
+ }).otherwise(fail).then(function() { done(); });
@kring
kring Sep 22, 2015 Member

No reason for the anonymous function, right?

@stevage
stevage Sep 22, 2015 Contributor

Right. I was going a little crazy trying to get Jasmine and (our version of) When to play together.

@kring kring commented on an outdated diff Sep 22, 2015
test/Models/CsvCatalogItemSpec.js
+ expect(csvItem._regionMapped).toBe(true);
+ expect(csvItem.colorFunc).toBeDefined();
+ expect(csvItem.colorFunc(121)).not.toEqual([0,0,0,0]);
+ expect(csvItem.colorFunc(180)).not.toEqual([0,0,0,0]);
+ expect(csvItem.colorFunc(197)).not.toEqual([0,0,0,0]);
+ }).otherwise(fail).then(function() { done(); });
+
+ });
+
+ it('matches SA4s', function(done) {
+ csvItem.updateFromJson( { data: 'sa4,value\n209,correct' });
+ csvItem.load().then(function() {
+ expect(csvItem._regionMapped).toBe(true);
+ expect(csvItem.colorFunc).toBeDefined();
+ expect(csvItem.rowProperties(209).value).toBe('correct');
+ }).otherwise(fail).then(function() { done(); });
@kring
kring Sep 22, 2015 Member

Same here.

@kring kring commented on an outdated diff Sep 22, 2015
test/Models/CsvCatalogItemSpec.js
+ csvItem.url = 'test/csv/lat_lon_enum_moving_date_unsorted.csv';
+ csvItem.load().then(function() {
+ var j = JulianDate.fromIso8601;
+ var source = csvItem._tableDataSource;
+ expect(source.dataset.getRowCount()).toEqual(13);
+ expect(csvItem._regionMapped).toBe(false);
+ expect(source.dataset.hasTimeData()).toBe(true);
+ expect(source.getDataPointList(j('2015-07-31')).length).toBe(0);
+ expect(source.getDataPointList(j('2015-08-01')).length).toBe(2);
+ expect(source.getDataPointList(j('2015-08-02')).length).toBe(3);
+ expect(source.getDataPointList(j('2015-08-06')).length).toBe(2);
+ expect(source.getDataPointList(j('2015-08-07')).length).toBe(0);
+ }).otherwise(fail).then(done);
+ });
+ /*
+ Nope - I don't know how feature picking on lat-longs works.
@kring
kring Sep 22, 2015 Member

Yeah, it's tricky. Lat-long files create actual vector points (attached to Cesium entities), which are then rendered and picked differently in the two clients:

  • In Leaflet, LeafletVisualizer turns the point into a circle marker, which is an actual browser DOM element. The browser detects a click on the circle marker DOM element and raises an event on LeafletScene that causes featurePicked in Leaflet to be invoked.
  • In Cesium, our Cesium.js handles left clicks by calling drillPick on Cesium's Scene. drillPick does one or more actual renders of the scene to an off-screen buffer in a fancy way so that the color of the rendered pixel indicates the ID of the vector feature that was picked.

Neither of these can be tested without the actual Leaflet / Cesium machinery up and running.

So, instead of testing picking, I recommend that you simply check that entities with the correct properties were created (one of my comments above has an example of checking the color of an entity's point) and trust that correctly-created entities will be rendered and will be pickable.

@kring kring and 1 other commented on an outdated diff Sep 22, 2015
test/Models/CsvCatalogItemSpec.js
@@ -180,7 +484,8 @@ describe('CsvCatalogItem', function() {
done();
});
});
-
+ /*
+ // Removed: not clear that this is correct behaviour, and it's failing.
@kring
kring Sep 22, 2015 Member

I don't know if transparent black is the right behavior.... but it's certainly possible to have a CSV file where the value column for most rows has a value, but it's missing for a few. So what should happen in that case? Transparent black (i.e. don't show the point or region) seems reasonable, but maybe there are other reasonable options. In any case, being clear about what happens and writing a test for it seems worthwhile.

@kring kring and 1 other commented on an outdated diff Sep 22, 2015
test/Models/RegionProviderListSpec.js
+
+/*global require,describe,it,expect,beforeEach,fail*/
+
+var Terria = require('../../lib/Models/Terria');
+var RegionProviderList = require('../../lib/Models/RegionProviderList');
+var RegionProvider = require('../../lib/Models/RegionProvider');
+var DataTable = require('../../lib/Map/DataTable.js');
+var terria;
+var rpl;
+
+beforeEach(function() {
+ terria = new Terria({
+ baseUrl: './',
+ regionMappingDefinitionsUrl: 'test/csv/regionMapping.json',
+ });
+ terria.corsProxy.baseProxyUrl = ""; // there is no localhost:3002/proxy, so ...
@kring
kring Sep 22, 2015 Member

Is this necessary because we're calling corsProxy.getURL directly somewhere, so it's always trying to proxy even when it shouldn't?

@stevage
stevage Sep 23, 2015 Contributor

Yep, removed.

@kring kring commented on an outdated diff Sep 22, 2015
test/Models/RegionProviderListSpec.js
+ baseUrl: './',
+ regionMappingDefinitionsUrl: 'test/csv/regionMapping.json',
+ });
+ terria.corsProxy.baseProxyUrl = ""; // there is no localhost:3002/proxy, so ...
+ rpl = new RegionProviderList(terria);
+});
+
+describe('RegionProviderList', function() {
+ it('loads some region providers', function(done) {
+ rpl.init().then(function(x) {
+ expect(rpl.regionProviders.length).toBeGreaterThan(2);
+ }).otherwise(fail).then(done);
+ });
+ /*
+ Test fails...and do we care anyway?
+ it('does not allow duplicate identifiers', function(done) {
@kring
kring Sep 22, 2015 Member

apparently JSON.parse doesn't fail if the JSON string contains duplicate keys. But if we change to using an array instead of an object in order to preserve order, it could be worthwhile (but not essential) to check for duplicate IDs.

@kring kring and 1 other commented on an outdated diff Sep 22, 2015
lib/Models/RegionProviderList.js
+};
+
+/**
+ Get the region provider matching a regionType string (eg, STE)
+ @param {String} regionType The type string to look up.
+ @return {Object} Region Provider, or null.
+*/
+RegionProviderList.prototype.getRegionProvider = function(regionType) {
+ if (!this._initialized) {
+ throw new DeveloperError('Region mapping list not initialized.');
+ }
+ var r = this.regionProviders.filter(function(p) { return p.regionType.toLowerCase() === regionType.toLowerCase(); });
+ if (r.length > 1)
+ throw new DeveloperError ('More than one definition of region provider: ' + regionType);
+ if (r.length === 0)
+ return null;
@kring
kring Sep 22, 2015 Member

We usually use undefined instead of null. JS conveniently has two "there's no object here" values. Using one consistently (undefined) means we don't have to check for both all over the place.

@stevage
stevage Sep 22, 2015 Contributor

Agreed. Changed.

@kring kring and 1 other commented on an outdated diff Sep 22, 2015
lib/Map/DataVariable.js
var hintSet = [
- { hints: ['lon'], type: VarType.LON },
- { hints: ['lat'], type: VarType.LAT },
- { hints: ['depth', 'height', 'elevation'], type: VarType.ALT },
- { hints: ['time', 'date'], type: VarType.TIME }];
-
- for (var vt in hintSet) {
- if (matchColumn(this.name, hintSet[vt].hints)) {
- this.varType = hintSet[vt].type;
+ { hint: /^(lon|longitude|lng)$/i, type: VarType.LON },
+ { hint: /^(lat|latitude)$/i, type: VarType.LAT },
+ { hint: /^(.*[_ ])?(depth|height|elevation)$/i, type: VarType.ALT },
+ { hint: /^(.*[_ ])?(time|date)$/i, type: VarType.TIME },
+ { hint: /^postcode|poa|(.*_code)$/i, type: VarType.ENUM }]; //## This doesn't actually prevent misparsing.
@kring
kring Sep 22, 2015 Member

Not sure what the comment means.

@stevage
stevage Sep 22, 2015 Contributor

I was trying to figure out why numeric codes were being coerced into integers. Figured it out. Fixed it. Comment gone now.

@kring kring and 1 other commented on an outdated diff Sep 22, 2015
lib/Map/TableDataSource.js
+ return undefined;
+ }
+ if (this._timeSlices) {
+ return this._timeSlices;
+ }
+ var pointList = this.dataset.getPointList();
+ var times = {};
+ for (var i = 0; i < pointList.length; i++) {
+ times[pointList[i].time.toString()] = true;
+ }
+ this._timeSlices = Object.keys(times).map(function(x) { return JulianDate.fromIso8601(x); } );
+
+ this._timeSlices = this._timeSlices.sort(function(a,b) {
+ if (a.equals(b))
+ return 0; // probably can't happen
+ return JulianDate.lessThan(a,b) ? -1 : 1;
@kring
kring Sep 22, 2015 Member

Just use JulianDate.compare.

@kring
kring Sep 22, 2015 Member

Well this is one way to remove duplicates, but it also round-trips every date through a string and allocates a bunch of memory and stuff.. How about something like this (not tested):

this._timeSlices = pointList.map(function(point) { return item.time; });
this._timeSlices.sort(JulianDate.compare);
this._timeSlices = this._timeSlices.filter(function(element, index, array) { return index === 0 || !JulianDate.equals(array[index-1], element); });

The filter creates an extra copy still (the elements could be removed in place) but it's still faster and clearer that the original implementation I think.

@stevage
stevage Sep 22, 2015 Contributor

Yep, that's better, fixed.

@kring kring commented on the diff Sep 22, 2015
lib/Map/TableDataSource.js
+ times[pointList[i].time.toString()] = true;
+ }
+ this._timeSlices = Object.keys(times).map(function(x) { return JulianDate.fromIso8601(x); } );
+
+ this._timeSlices = this._timeSlices.sort(function(a,b) {
+ if (a.equals(b))
+ return 0; // probably can't happen
+ return JulianDate.lessThan(a,b) ? -1 : 1;
+ });
+ return this._timeSlices;
+};
+
+/**
+ * Returns a { start, finish } pair of JulianDates corresponding to some pair of rows around this time.
+ */
+TableDataSource.prototype.getTimeSlice = function(time) {
@kring
kring Sep 22, 2015 Member

Have you seen Cesium's TimeInterval and TimeIntervalCollection types? I think they may simplify this logic quite a bit.

@kring kring and 1 other commented on an outdated diff Sep 22, 2015
lib/Models/CkanCatalogGroup.js
@@ -329,8 +329,7 @@ CkanCatalogGroup.prototype._load = function() {
var promises = [];
for (var i = 0; i < this.filterQuery.length; i++) {
- var url = cleanAndProxyUrl(this, this.url) + '/api/3/action/package_search?rows=100000&' + this.filterQuery[i];
-
+ var url = cleanAndProxyUrl( this.terria, this.url) + 'api/3/action/package_search?rows=100000&' + this.filterQuery[i];
@kring
kring Sep 22, 2015 Member

Is this a bad merge? I believe the original code is correct and the new code is wrong.

@stevage
stevage Sep 22, 2015 Contributor

Thanks, yep. The change from /api to api/ is intentional - totally unrelated to this branch, but necessary (vanilla CKANs don't like foo//api URLs).

@kring kring and 1 other commented on an outdated diff Sep 22, 2015
lib/Models/CsvCatalogItem.js
var WebMapServiceImageryProvider = require('terriajs-cesium/Source/Scene/WebMapServiceImageryProvider');
var WebMapServiceCatalogItem = require('./WebMapServiceCatalogItem');
var WebMercatorTilingScheme = require('terriajs-cesium/Source/Core/WebMercatorTilingScheme');
var when = require('terriajs-cesium/Source/ThirdParty/when');
var CatalogItem = require('./CatalogItem');
-var corsProxy = require('../Core/corsProxy');
+//var corsProxy = require('../Core/corsProxy'); // really not used?
@kring
kring Sep 22, 2015 Member

Yes, because it uses proxyCatalogItemUrl now. Please remove this line.

@stevage
stevage Sep 22, 2015 Contributor

Done.

@kring kring and 1 other commented on an outdated diff Sep 22, 2015
lib/Models/CsvCatalogItem.js
this._regionMapped = false;
this._csvDataset = undefined;
this._clockTickUnsubscribe = undefined;
+ // a function that returns a colour for a given index.
+ this.colorFunc = undefined;
@kring
kring Sep 22, 2015 Member

I don't think this should be a public property. Public properties get serialized for sharing, for one thing.

@stevage
stevage Sep 22, 2015 Contributor

Is renaming it to _colorFunc sufficient?

@kring
kring Sep 22, 2015 Member

Yep. The serialization code ignores properties that start with an underscore.

@stevage stevage Add disambiguation CSV test. (Not very diagnostic yet.)
92f1169
@kring
Member
kring commented Sep 22, 2015

Previously lat-long CSVs didn't support enums - which I think they ought to. The question now is really, "for how many distinct values does colour coding not make sense for enums" or "are there fields which should not be colour coded on"?

So, yes, it's intentional, but there are cases where the new behaviour is undesirable.

Yeah I don't have any major problem with the new behavior, especially if it's possible to put something in the init file to make it draw all the points with a single color (is it?).

@kring
Member
kring commented Sep 22, 2015

Yeah. I suspect that the default behaviour was probably crafted around this dataset, or ones very similar to it. Options:

  • specify an explicit displayDuration value for this dataset
  • have a minimum display duration for any point

The second point sounds useful, but I'm ok with the first point being the answer for now.

@kring
Member
kring commented Sep 22, 2015

That's because you're testing in NM, which has a regionMapping.json which is incompatible. Try copying over the one from node_modules/terriajs/wwwroot/test/csv/regionMapping.json.

Thanks, it works now. However, I notice that loading SEIFA 2011 now locks up the UI for several seconds while it loads. It used to load almost instantly.

@kring
Member
kring commented Sep 22, 2015

That's all I have for now. Thanks for doing this @stevage, it looks like a nice improvement!

stevage added some commits Sep 22, 2015
@stevage stevage Rejig RegionProviderList initialisation process, move to Maps/ as per @…
…kring's

excellent suggestion.
d773298
@stevage stevage Remove unworthy test. b23f6d5
@stevage stevage Fix previous commit.
ccec274
@stevage stevage Spec housekeeping.
c9ee8bb
@stevage
Contributor
stevage commented Sep 22, 2015

Yeah I don't have any major problem with the new behavior, especially if it's possible to put something in the init file to make it draw all the points with a single color (is it?).

Yes - manually confirmed, will add a test case. Will also update the sample file.

The second point sounds useful, but I'm ok with the first point being the answer for now.

Ok, will update this one too.
TerriaJS/nationalmap#114

stevage added some commits Sep 23, 2015
@stevage stevage Cleaner timeslice method (thanks @kring) 6b70e48
@stevage stevage Fix broken CKAN merge. 3297211
@stevage stevage Cleanup. c902e4a
@stevage stevage Rename colorFunc to _colorFunc 51001d2
@stevage stevage Make tests expect undefined, not null. c4b3f1a
@stevage stevage Comment fields in RegionProviderList
d14b844
@stevage stevage JSDoc b2e3ba6
@stevage stevage Never proxy region provider WFS requests. 23c4467
@stevage stevage Simpler region provider replacement cache keys, thanks @kring. ff80928
@stevage stevage Simplify getColorLookupFunc and remove dodgy comments.
fb8d832
@stevage
Contributor
stevage commented Sep 23, 2015

Just keeping track of comments/issues not resolved yet.

High priority to fix

  • DONE the data structure for disambigColumns is unnecessarily convoluted.
  • DONE broken test case for fuzzy-matched feature picking

Should be fixed, but not as urgent

  • SEMI-DONE #924 (diff) "But it does rely on the WFS getPropertyValue returning the same features in the same order in two successive calls to getPropertyValue. Is there a better way to do it?"
    -- it makes that assumption much more explicit now. Any alternative way would generate more traffic.
  • "Also can you come up with clearer names than id and code so that it's clearer that id is the actual ID of a region on the server and code is a value in a column that hopefully matches one of the known region IDs?"
  • DONE RegionProvider shouldn't generate HTML. #924 (diff)
  • DONE #924 (comment) It seems a bit awkward for each region provider to keep a reference back to the list of providers that contains it just so that it can go find its disambiguation provider.
  • DONE RegionProviderList doc #924 (comment)
  • DONE "However, I notice that loading SEIFA 2011 now locks up the UI for several seconds while it loads. It used to load almost instantly."

Nice to have

  • #924 (comment) >Have you seen Cesium's TimeInterval and TimeIntervalCollection types? I think they may simplify this logic quite a bit.
  • DONE How to test lat-long feature picking: #924 (diff)
  • Behaviour of no-data rows in region-mapped CSVs #928
  • DONE Simplify disambig stuff: #924 (comment)
  • DONE disambig-related TODO #924 (comment)
stevage added some commits Sep 23, 2015
@stevage stevage Remove test cruft 6120d45
@stevage stevage Regex performance tweak.
00d71d4
@stevage
Contributor
stevage commented Sep 23, 2015

However, I notice that loading SEIFA 2011 now locks up the UI for several seconds while it loads. It used to load almost instantly.

That's annoying. It's not bad on my system (just over 1 second). Probably there are opportunities to streamline the matching process.

@kring
Member
kring commented Sep 23, 2015

That's annoying. It's not bad on my system (just over 1 second). Probably there are opportunities to streamline the matching process.

It was closer to 5 seconds on my system, but I tried again just now and it's more like 2-3 seconds. Maybe you optimized it a bit? In any case, this is only the first time it's enabled after a fresh reload; after that it's fast. Could that be why you're seeing only 1 second? This machine isn't exactly slow...

@stevage
Contributor
stevage commented Sep 23, 2015

Oh, another possible reason is that the test regionMapping.json uses our regionmap-dev. server, which I think is less powerful than our geoserver. server, which Peter recently upgraded.

@kring
Member
kring commented Sep 23, 2015

Oh, another possible reason is that the test regionMapping.json uses our regionmap-dev. server, which I think is less powerful than our geoserver. server, which Peter recently upgraded.

I don't think so. My browser is locked up during the 2-3 seconds, which means it's actively doing something CPU intensive, not waiting for a remote server.

@kring
Member
kring commented Sep 28, 2015

Looks like sharing doesn't work with (some?) CsvCatalogItems at the moment. Add this file by URL using the Add Data panel: test/NSW_LGA_NEXIS_201212.csv and then hit the share button.

stevage added some commits Sep 28, 2015
@stevage stevage Simplify disambig matching back to a single column. 77daa85
@stevage stevage Remove logging e382050
@stevage stevage Tweak definition of 'enum' - region variables aren't 'ENUM' but they …
…have enumList
775cbcb
@stevage stevage Record region unique IDs. c8d9305
@stevage stevage Move region feedback out of RegionProvider 98ecd9c
@stevage stevage Improve tests. 0839584
@stevage stevage Clean up RegionProvider.
c66e6c6
@stevage stevage Fix provider feedback 13c4e98
@stevage stevage Remove back reference to RegionProviderList. Also fixes CSV sharing i…
…ssue.
37da09d
@stevage
Contributor
stevage commented Sep 29, 2015

Looks like sharing doesn't work with (some?) CsvCatalogItems at the moment.

Caused by circular object, fortunately fixed by removing back reference to RegionProviderList from RegionProvider.

@stevage stevage Fix dumb mistake.
54b5b33
@kring
Member
kring commented Sep 29, 2015

Caused by circular object, fortunately fixed by removing back reference to RegionProviderList from RegionProvider.

Hmm I'd argue we shouldn't even be trying to serialize RegionProvider and RegionProviderList for sharing. Is there a public property somewhere that shouldn't be public?

@stevage stevage Remove regionProvider refrerence from csvItem._tableStyle to csvItem …
…istelf, to avoid

it being serialised in horrible ways.
5637fb0
@stevage
Contributor
stevage commented Sep 29, 2015

Hmm I'd argue we shouldn't even be trying to serialize RegionProvider and RegionProviderList for sharing. Is there a public property somewhere that shouldn't be public?

Fixed.

stevage added some commits Sep 29, 2015
@stevage stevage Add test for serialized JSON length.
e4670b6
@stevage stevage Never serialize CsvCatalogItem.legendUrl
bc33b21
@stevage stevage Add a couple of CsvCatalogItem tests, clean up.
368ea3d
@stevage stevage Lint.
1ac47c8
@stevage stevage Remove TODO
2a356d8
@stevage stevage Update CHANGES.md
d66843d
@kring
Member
kring commented Sep 30, 2015

There are still performance problems here. Enable the Age layer under ABS, and then switch to SA2. On my system the app freezes for about 5 seconds. On nationalmap.gov.au there is no freezing.

@kring
Member
kring commented Sep 30, 2015

Here's where the time is spent:
image

@kring
Member
kring commented Oct 1, 2015

Some ideas for making codeMatchesRegionID much faster:

  • Don't do trim, toLowerCase, etc. any more than necessary. Those operations add up because they make a copy of the string.
  • When IDs are numeric (as they are in the SAx case), don't convert them to strings. Comparing numbers is much faster than comparing strings.
@kring
Member
kring commented Oct 1, 2015

And one more:

  • Construct RegExp instances once. Currently a new instance is constructed for each regex string for every attempted match. Unless the browser is being clever, this should result in a huge speedup. Even if the browser is clever, it'll still be noticeably faster.

This one won't help with the ABS case, though, because there are no replacements.

stevage added some commits Oct 1, 2015
@stevage stevage Add more to CHANGES.md
47e53f7
@stevage stevage Merge remote-tracking branch 'origin/master' into region-mapping-adva…
…nced-matching

Conflicts:
	lib/Map/DataTable.js
5b0dedf
@stevage stevage Merge branch 'cesiumUpgrade' into region-mapping-advanced-matching
* cesiumUpgrade:
  Removed unused var.
  Make handleInitialMessage() actually call the callback if no message is to be shown.
  Update CHANGES.md.
  Use terriajs-cesium 1.13.0.
010fdde
@stevage stevage Make broken test case more permissive. 475739f
@stevage stevage Make disabled spec a "pending" spec. fee1711
@stevage stevage Correct library location.
9b56708
@stevage stevage Merge remote-tracking branch 'origin/master' into region-mapping-adva…
…nced-matching
82a15e8
@stevage
Contributor
stevage commented Oct 5, 2015

There are still performance problems here. Enable the Age layer under ABS, and then switch to SA2. On my system the app freezes for about 5 seconds.

This is intriguing to me - I don't get this on my Macbook Pro. I enable the layer, and immediately grab the map and start panning around. It takes maybe 2-3 seconds for all the SA2s to display, but the app is responsive and panning during that time. There are two brief glitches (<0.5 seconds) when you could say it's "frozen", but nothing like what you're seeing.

I'll still try to fix it. :)

stevage added some commits Oct 6, 2015
@stevage stevage Create RegExp objects at load time for performance. d61969e
@stevage stevage Use much simpler replacements cache, maybe for performance. 1c13b51
@stevage stevage Speed up RegionProvider significantly by applying replacements to eac…
…h data row only once.
156a9f3
@stevage
Contributor
stevage commented Oct 6, 2015

Ok, the really inefficient bit was applying all the replacements to each region code every time it was checked against each server side ID. That was dumb.

Currently on my Macbook, the whole call to updateRegionMapping takes 150ms for the ABS SA2 case. And the whole block of loadRegionsFromXML + updateRegionMapping + finishTableLoad (everything under loadWithXhr.load.xhr.onload) takes 215ms.

The equivalent running on a fresh master build on my machine is actually slower: 250ms. Significantly faster running on nationalmap.research.nicta.com.au, which suggests that the gulp release process is actually doing something good :)

@stevage
Contributor
stevage commented Oct 6, 2015

Don't do trim, toLowerCase, etc. any more than necessary. Those operations add up because they make a copy of the string.

Removed a few of these.

When IDs are numeric (as they are in the SAx case), don't convert them to strings. Comparing numbers is much faster than comparing strings.

"Numeric" IDs come out of xml2json as strings. It's possible to convert them there to integers, but there are messy edge cases (leading zeroes) to deal with. Leaving this out for now.

Construct RegExp instances once.

Done.

Thanks very much for these tips btw, I'm definitely learning a lot about writing less sucky JavaScript :)

@RacingTadpole RacingTadpole referenced this pull request Oct 14, 2015
Closed

2 tests (specs) fail #970

kring added some commits Oct 15, 2015
@kring kring Update CHANGES.md.
0878d30
@kring kring Merge remote-tracking branch 'origin/master' into region-mapping-adva…
…nced-matching
52eb480
@kring kring merged commit a63619c into master Oct 15, 2015

2 of 4 checks passed

continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
continuous-integration/travis-ci/push The Travis CI build is in progress
Details
clahub All contributors have signed the Contributor License Agreement.
Details
licence/cla Contributor License Agreement is signed.
Details
@kring kring deleted the region-mapping-advanced-matching branch Oct 15, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment