Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regionprovider Region ID fetching optimisation #1202

Merged
merged 9 commits into from Jan 5, 2016
Merged

Conversation

@stevage
Copy link
Contributor

@stevage stevage commented Dec 22, 2015

This PR includes two separate optimisations:

  1. Maintaining an index of pre-computed codes for fast matching
  2. Allowing the use of server-side JSON instead of fetching IDs from WFS. The WFS method is still available as a fallback.
stevage added a commit to TerriaJS/nationalmap that referenced this pull request Dec 22, 2015
loadText('test/csv/mini_ced.xml').then(function(xml) {
ced.loadRegionsFromXML(xml);
expect(ced.regions.length).toEqual(6);
ced.loadRegionIDs().then(function(json) {

This comment has been minimized.

@RacingTadpole

RacingTadpole Dec 23, 2015
Contributor

This isn't really a comment on your commit, but on the existing code - can you set it up so the unit tests didn't go off to a server at all, eg. using a spy? (eg. see ArcGisMapServerCatalogItemSpec)

This comment has been minimized.

@RacingTadpole

RacingTadpole Dec 23, 2015
Contributor

btw @kring's doing some cool work on this right now using sinon.fakeServer.create(), to solve the problem of failing unit tests on IE9 (which cannot load data from the region mapping server without a proxy).

This comment has been minimized.

@stevage

stevage Dec 23, 2015
Author Contributor

Yeah, good idea.

this.regionIdList = properties.regionIdList;



This comment has been minimized.

@RacingTadpole

RacingTadpole Dec 23, 2015
Contributor

I like the way Cesium uses jsdoc to specify which options are allowed, eg. https://github.com/TerriaJS/cesium/blob/master/Source/Core/CylinderGeometry.js#L53 - would it be worth doing this for properties here?

This comment has been minimized.

@stevage

stevage Dec 23, 2015
Author Contributor

OIC, documenting the properties as constructor parameters rather than on the individual statements that set them. I like it - but I think the style I followed is the one we mostly use. Thoughts, @kring ?

* be used in preference to requesting those attributes from the WFS server.
* @type {String}
*/
this.regionIdList = properties.regionIdList;

This comment has been minimized.

@RacingTadpole

RacingTadpole Dec 23, 2015
Contributor

A nit-picky suggestion - it's potentially confusing to have something whose name ends in List actually be a URL (ie a String) - could you change this to this.regionIdListUrl or something similar?

This comment has been minimized.

@RacingTadpole

RacingTadpole Dec 23, 2015
Contributor

Actually on closer examination, am I right that this URL returns a json object eg.

{"layer": "region_map:FID_SA1_2011...", "property": "SA1_MAIN11", "values": [10101, ...]}

? I'd have guessed something named regionIdsListUrl would return an array, not an object - or perhaps an object with a property called regionIdsList.
So, what would you say to renaming values to ids in the json, and renaming this to this.regionIdsUrl?
Of course, I realise I'm probably being too pedantic. :-)

This comment has been minimized.

@stevage

stevage Dec 23, 2015
Author Contributor

Interesting points. I had thought that the JSON spec required the outer most object to be an object, not an array, but it looks like it doesn't. It's still sort of frowned on in API design though - I prefer passing an object that has a tiny bit of context at least.

Personally I think it's ok to describe an object which consists of two strings and an array of thousands of values as "a list". It really is just a list with a tiny bit of metadata. Otherwise you'd have to call virtually everything an object - even a javascript Array is an object, right? :)

In any case, I'm not really proposing that the format of this JSON file be a public interface - it's a private format that is used in this one place, and by the script that generates it.

So, what would you say to renaming values to ids

This is really tricky. Depending on the exact context, you can call these things "values", "ids", "codes", "attributes" etc etc. I'm a bit uncomfortable calling them "ids" because for certain layers and attributes, they're English text strings like "Baw Baw (S)", which don't really have the character of "IDs" to me. At the time they're generated by the script, they're (agnostically) the value of a particular attribute on a particular layer. Later on, our code treats them as IDs, possibly too optimistically (as with ambiguous names like "Camperdown (C)").

and renaming this to this.regionIdsUrl

Technically, it's not a URL, because it can just be a relative path, data/regionids/foo.json. I think regionIdsFile would be more accurate.

(Ha, I think I out-nitpicked your nitpick :))

This comment has been minimized.

@stevage

stevage Dec 23, 2015
Author Contributor

Changed it to regionIdsFile

promises.push(loadText(url).then(function(xml) {
that.loadRegionsFromXML(xml, that.disambigProp, "disambigServerReplacements");
}));
var dp = this.loadRegionsFromWfs(this.disambigProp);

This comment has been minimized.

@RacingTadpole

RacingTadpole Dec 23, 2015
Contributor

Wondering about this line - as you say, loading from wfs is the slower option - is there a way to tell if the faster json option is available for the disambiguation variable? I imagine it's hard because this RegionProvider would need to know things about other regions.

This comment has been minimized.

@stevage

stevage Dec 23, 2015
Author Contributor

Heh, yeah. I was kind of over it by this point :) So, to be explicit, the current situation for disambiguation matching on LGA+State is:

  1. Load LGA name column, using JSON file
  2. Load LGA state column, without using JSON file
  3. Go to town.

I kind of dismissed it as an edge case, but I think you're right. There are enough LGAs to make this worthwhile.

This comment has been minimized.

@stevage

stevage Dec 23, 2015
Author Contributor

Oh yeah, now I remember - as you pointed out, "RegionProvider would need to know things about other regions". In a previous iteration of this RegionProvider thing, each RP actually did contain a reference to a whole separate RP, but that turned out to be a pretty flakey concept. A disambiguation field is just an additional field of the same boundary layer, many of the other attributes don't apply.

Fixed by adding regionDisambigIdList field, and cleaning up the code some more.

}

// store a lookup by attribute, for performance.
this._idIndex[prop] = i;
that._idIndex[id] = i;

This comment has been minimized.

@RacingTadpole

RacingTadpole Dec 23, 2015
Contributor

processRegionIds is potentially called twice, once for the regions and sometimes also for the disambiguation column (eg. if two different States can have regions with the same name, the disambiguation variable is the State). Since both cases are written into the same object that._idIndex, is there ever a risk that the disambiguation id might be the same as the region id, and overwrite it here?

This comment has been minimized.

@stevage

stevage Dec 23, 2015
Author Contributor

Interesting point! As the code stands, this is never an issue, because _idIndex isn't even consulted when disambiguation is in question (see findRegionIndex). By definition, this kind of a hash table isn't much use when you have ambiguous values.

But you've prompted me to finish the job of the index, so it's used for disambiguation, too. (The disambiguation property itself doesn't get an index, because it wouldn't be useful.)

@RacingTadpole
Copy link
Contributor

@RacingTadpole RacingTadpole commented Dec 23, 2015

OK, that's all I can think of! Over to you again :-)

@RacingTadpole
Copy link
Contributor

@RacingTadpole RacingTadpole commented Dec 23, 2015

Looks good! Let me know when you're ready for me to take another look.

@stevage
Copy link
Contributor Author

@stevage stevage commented Dec 24, 2015

Oh, yep, I'm ready - think I addressed everything.

kring added a commit that referenced this pull request Jan 5, 2016
Regionprovider Region ID fetching optimisation
@kring kring merged commit fa03cf3 into master Jan 5, 2016
2 of 3 checks passed
2 of 3 checks passed
continuous-integration/travis-ci/push The Travis CI build is in progress
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
licence/cla Contributor License Agreement is signed.
Details
@kring kring deleted the regionprovider-fidList branch Jan 5, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants
You can’t perform that action at this time.