New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Region mapping advanced matching #924

Merged
merged 105 commits into from Oct 15, 2015

Conversation

Projects
None yet
2 participants
@stevage
Contributor

stevage commented Sep 21, 2015

Significant refactoring of the region mapping code. Major benefits:

  • Region mapping code split out from CsvCatalogItem to RegionProvider, RegionProviderList, DataVariable, DataTable and TableDataSource
  • added functionality: matching by regex (enabling matching LGA names), disambiguation using a second column
  • dozens of new unit tests
  • fixed many edge cases, particularly with drag-and-drop CSV files (which lack explicit styling).
  • temporal datasets work much better: each datapoint by default is shown until the next time point.
  • new architecture will make it much easier to add new region provider types

stevage added some commits Jul 1, 2015

Work in progress supporting more sophisticated matches. In particular:
- fuzzy matches ('City of Blah' = 'Blah (C)'), through regexes
- multi-column matches (State+LGA for disambiguation)
Merge branch 'master' into region-mapping-advanced-matching
Conflicts:
  lib/Models/CsvCatalogItem.js
  lib/Styles/PopupMessage.less
More selective data column type guessing based on column headings. Fixes
issue with "lone_person" column classification as longitude.
@stevage

This comment has been minimized.

Show comment
Hide comment
@stevage

stevage Sep 29, 2015

Contributor

Looks like sharing doesn't work with (some?) CsvCatalogItems at the moment.

Caused by circular object, fortunately fixed by removing back reference to RegionProviderList from RegionProvider.

Contributor

stevage commented Sep 29, 2015

Looks like sharing doesn't work with (some?) CsvCatalogItems at the moment.

Caused by circular object, fortunately fixed by removing back reference to RegionProviderList from RegionProvider.

@kring

This comment has been minimized.

Show comment
Hide comment
@kring

kring Sep 29, 2015

Member

Caused by circular object, fortunately fixed by removing back reference to RegionProviderList from RegionProvider.

Hmm I'd argue we shouldn't even be trying to serialize RegionProvider and RegionProviderList for sharing. Is there a public property somewhere that shouldn't be public?

Member

kring commented Sep 29, 2015

Caused by circular object, fortunately fixed by removing back reference to RegionProviderList from RegionProvider.

Hmm I'd argue we shouldn't even be trying to serialize RegionProvider and RegionProviderList for sharing. Is there a public property somewhere that shouldn't be public?

Remove regionProvider refrerence from csvItem._tableStyle to csvItem …
…istelf, to avoid

it being serialised in horrible ways.
@stevage

This comment has been minimized.

Show comment
Hide comment
@stevage

stevage Sep 29, 2015

Contributor

Hmm I'd argue we shouldn't even be trying to serialize RegionProvider and RegionProviderList for sharing. Is there a public property somewhere that shouldn't be public?

Fixed.

Contributor

stevage commented Sep 29, 2015

Hmm I'd argue we shouldn't even be trying to serialize RegionProvider and RegionProviderList for sharing. Is there a public property somewhere that shouldn't be public?

Fixed.

stevage added some commits Sep 29, 2015

@kring

This comment has been minimized.

Show comment
Hide comment
@kring

kring Sep 30, 2015

Member

There are still performance problems here. Enable the Age layer under ABS, and then switch to SA2. On my system the app freezes for about 5 seconds. On nationalmap.gov.au there is no freezing.

Member

kring commented Sep 30, 2015

There are still performance problems here. Enable the Age layer under ABS, and then switch to SA2. On my system the app freezes for about 5 seconds. On nationalmap.gov.au there is no freezing.

@kring

This comment has been minimized.

Show comment
Hide comment
@kring

kring Sep 30, 2015

Member

Here's where the time is spent:
image

Member

kring commented Sep 30, 2015

Here's where the time is spent:
image

@kring

This comment has been minimized.

Show comment
Hide comment
@kring

kring Oct 1, 2015

Member

Some ideas for making codeMatchesRegionID much faster:

  • Don't do trim, toLowerCase, etc. any more than necessary. Those operations add up because they make a copy of the string.
  • When IDs are numeric (as they are in the SAx case), don't convert them to strings. Comparing numbers is much faster than comparing strings.
Member

kring commented Oct 1, 2015

Some ideas for making codeMatchesRegionID much faster:

  • Don't do trim, toLowerCase, etc. any more than necessary. Those operations add up because they make a copy of the string.
  • When IDs are numeric (as they are in the SAx case), don't convert them to strings. Comparing numbers is much faster than comparing strings.
@kring

This comment has been minimized.

Show comment
Hide comment
@kring

kring Oct 1, 2015

Member

And one more:

  • Construct RegExp instances once. Currently a new instance is constructed for each regex string for every attempted match. Unless the browser is being clever, this should result in a huge speedup. Even if the browser is clever, it'll still be noticeably faster.

This one won't help with the ABS case, though, because there are no replacements.

Member

kring commented Oct 1, 2015

And one more:

  • Construct RegExp instances once. Currently a new instance is constructed for each regex string for every attempted match. Unless the browser is being clever, this should result in a huge speedup. Even if the browser is clever, it'll still be noticeably faster.

This one won't help with the ABS case, though, because there are no replacements.

stevage added some commits Oct 1, 2015

Merge remote-tracking branch 'origin/master' into region-mapping-adva…
…nced-matching

Conflicts:
	lib/Map/DataTable.js
Merge branch 'cesiumUpgrade' into region-mapping-advanced-matching
* cesiumUpgrade:
  Removed unused var.
  Make handleInitialMessage() actually call the callback if no message is to be shown.
  Update CHANGES.md.
  Use terriajs-cesium 1.13.0.
@stevage

This comment has been minimized.

Show comment
Hide comment
@stevage

stevage Oct 5, 2015

Contributor

There are still performance problems here. Enable the Age layer under ABS, and then switch to SA2. On my system the app freezes for about 5 seconds.

This is intriguing to me - I don't get this on my Macbook Pro. I enable the layer, and immediately grab the map and start panning around. It takes maybe 2-3 seconds for all the SA2s to display, but the app is responsive and panning during that time. There are two brief glitches (<0.5 seconds) when you could say it's "frozen", but nothing like what you're seeing.

I'll still try to fix it. :)

Contributor

stevage commented Oct 5, 2015

There are still performance problems here. Enable the Age layer under ABS, and then switch to SA2. On my system the app freezes for about 5 seconds.

This is intriguing to me - I don't get this on my Macbook Pro. I enable the layer, and immediately grab the map and start panning around. It takes maybe 2-3 seconds for all the SA2s to display, but the app is responsive and panning during that time. There are two brief glitches (<0.5 seconds) when you could say it's "frozen", but nothing like what you're seeing.

I'll still try to fix it. :)

@stevage

This comment has been minimized.

Show comment
Hide comment
@stevage

stevage Oct 6, 2015

Contributor

Ok, the really inefficient bit was applying all the replacements to each region code every time it was checked against each server side ID. That was dumb.

Currently on my Macbook, the whole call to updateRegionMapping takes 150ms for the ABS SA2 case. And the whole block of loadRegionsFromXML + updateRegionMapping + finishTableLoad (everything under loadWithXhr.load.xhr.onload) takes 215ms.

The equivalent running on a fresh master build on my machine is actually slower: 250ms. Significantly faster running on nationalmap.research.nicta.com.au, which suggests that the gulp release process is actually doing something good :)

Contributor

stevage commented Oct 6, 2015

Ok, the really inefficient bit was applying all the replacements to each region code every time it was checked against each server side ID. That was dumb.

Currently on my Macbook, the whole call to updateRegionMapping takes 150ms for the ABS SA2 case. And the whole block of loadRegionsFromXML + updateRegionMapping + finishTableLoad (everything under loadWithXhr.load.xhr.onload) takes 215ms.

The equivalent running on a fresh master build on my machine is actually slower: 250ms. Significantly faster running on nationalmap.research.nicta.com.au, which suggests that the gulp release process is actually doing something good :)

@stevage

This comment has been minimized.

Show comment
Hide comment
@stevage

stevage Oct 6, 2015

Contributor

Don't do trim, toLowerCase, etc. any more than necessary. Those operations add up because they make a copy of the string.

Removed a few of these.

When IDs are numeric (as they are in the SAx case), don't convert them to strings. Comparing numbers is much faster than comparing strings.

"Numeric" IDs come out of xml2json as strings. It's possible to convert them there to integers, but there are messy edge cases (leading zeroes) to deal with. Leaving this out for now.

Construct RegExp instances once.

Done.

Thanks very much for these tips btw, I'm definitely learning a lot about writing less sucky JavaScript :)

Contributor

stevage commented Oct 6, 2015

Don't do trim, toLowerCase, etc. any more than necessary. Those operations add up because they make a copy of the string.

Removed a few of these.

When IDs are numeric (as they are in the SAx case), don't convert them to strings. Comparing numbers is much faster than comparing strings.

"Numeric" IDs come out of xml2json as strings. It's possible to convert them there to integers, but there are messy edge cases (leading zeroes) to deal with. Leaving this out for now.

Construct RegExp instances once.

Done.

Thanks very much for these tips btw, I'm definitely learning a lot about writing less sucky JavaScript :)

@RacingTadpole RacingTadpole referenced this pull request Oct 14, 2015

Closed

2 tests (specs) fail #970

kring added some commits Oct 15, 2015

kring added a commit that referenced this pull request Oct 15, 2015

@kring kring merged commit a63619c into master Oct 15, 2015

2 of 4 checks passed

continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
continuous-integration/travis-ci/push The Travis CI build is in progress
Details
clahub All contributors have signed the Contributor License Agreement.
Details
licence/cla Contributor License Agreement is signed.
Details

@kring kring deleted the region-mapping-advanced-matching branch Oct 15, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment