Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New module @turf/clean-coords #875

Merged
merged 19 commits into from
Aug 1, 2017
Merged

New module @turf/clean-coords #875

merged 19 commits into from
Aug 1, 2017

Conversation

stebogit
Copy link
Collaborator

@stebogit stebogit commented Jul 30, 2017

New module @turf/clean-coords

Implements #874

/**
 * Removes redundant coordinates from any GeoJSON Geometry.
 *
 * @name cleanCoords
 * @param {Geometry|Feature} geojson Feature or Geometry
 * @param {boolean} [mutate=false] allows GeoJSON input to be mutated
 * @returns {Geometry|Feature} the cleaned input Feature/Geometry
 * @example
 * var line = turf.lineString([[0, 0], [0, 2], [0, 5], [0, 8], [0, 8], [0, 10]]);
 * var multiPoint = turf.multiPoint([[0, 0], [0, 0], [2, 2]]);
 *
 * turf.cleanCoords(line).geometry.coordinates;
 * //= [[0, 0], [0, 10]]
 *
 * turf.cleanCoords(multiPoint).geometry.coordinates;
 * //= [[0, 0], [2, 2]]
 */
module.exports = function (geojson, mutate) {

@stebogit stebogit self-assigned this Jul 30, 2017
@DenisCarriere
Copy link
Member

DenisCarriere commented Jul 31, 2017

@stebogit Looking at this right now, I'll spend more time on it tomorrow, but would it be worth cleaning MultiPoint as well? points which have multiple points on top of each other should be cleaned?

* Removes redundant coordinates from a (Multi)LineString or (Multi)Polygon; ignores (Multi)Point.
*
* @name cleanCoords
* @param {Geometry|Feature<any>} geojson Feature or Geometry
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been dropping the <any> in the recent JSDocs.
{Geometry|Feature<any>} => {Geometry|Feature}

throw new Error(type + ' geometry not supported');
}

var output = (mutate === true) ? geojson : clone(geojson, true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit of "black magic" how clone works, but you don't need to define a new variable if you apply clone (aka. JSON.parse + stringify).

This below is valid:

if (mutate !== true) geojson = clone(geojson, false);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2nd - Since we aren't mutating properties we don't need to define clone(geojson, true) as TRUE, if we use the default or FALSE it will only clone the coordinates (which is a lot faster process).

Copy link
Collaborator Author

@stebogit stebogit Jul 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DenisCarriere that's right! Now that I looked better at it, @turf/clone does already clone (if defined) all the "standard" GeoJSON attributes (i.e. properties, id and bbox); that's what I was aiming for. Very well 👍

if (mutate !== true) geojson = clone(geojson, false);

I really like this, it's much nicer to look at, even if for some reason it's always really hard for me to decode... 🤔
I am a big fan of immutable data though, so I was trying to avoid changing the value of geojson. I guess however in this small modules mutate the data is not really a big deal, and the code gains in readability 👍👍

@stebogit
Copy link
Collaborator Author

@DenisCarriere we can definitely clean MultiPoint; at this point we can allow Points for completeness and just ignoring them.
I think a MultiPoint with overlapping points it's not necessarily wrong (or "dirt" 😄 ), but if the user is using this module on MultiPoint I guess it means he wants to clean it! 👍

var getGeomType = invariant.getGeomType;

/**
* Removes redundant coordinates from a (Multi)LineString or (Multi)Polygon; ignores (Multi)Point.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added MultiPoint support, we can simply say this now:

* Removes redundant coordinates from any GeoJSON Geometry.

/**
* http://turfjs.org/docs/#cleancoords
*/
declare function cleanCoords(feature: Feature, mutate?: boolean): Feature;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stebogit This one gets a bit tricky, one thing to keep in mind is trying to preserve the Input/Output type as much as possible.

At the moment, this definition is saying the input is any Feature or Geometry and it outputs either Geometry or a Feature.

Using the extends you validate the input, however the output will be the same as what the user enters, that way it preserves their Types.

type GeometryObject = GeoJSON.GeometryObject
type Feature = GeoJSON.Feature<any>

/**
 * http://turfjs.org/docs/#cleancoords
 */
declare function cleanCoords<T extends GeometryObject|Feature>(feature: T, mutate?: boolean): T;
declare namespace cleanCoords {}
export = cleanCoords;

console.time(name);
cleanCoords(geojson);
console.timeEnd(name);
suite.add(name, () => cleanCoords(geojson));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For internal benchmark tests, we can add mutate=true, that way we get the "real" benchmarks results.

 * geometry x 3,053,905 ops/sec ±1.84% (87 runs sampled)
 * multiline x 3,943,840 ops/sec ±2.60% (85 runs sampled)
 * multipoint x 465,553 ops/sec ±1.04% (89 runs sampled)
 * multipolygon x 2,046,659 ops/sec ±0.77% (91 runs sampled)
 * point x 22,318,913 ops/sec ±1.20% (86 runs sampled)
 * polygon-with-hole x 3,109,098 ops/sec ±2.10% (91 runs sampled)
 * polygon x 4,603,124 ops/sec ±1.99% (84 runs sampled)
 * simple-line x 7,732,876 ops/sec ±1.18% (86 runs sampled)
 * triangle x 4,950,167 ops/sec ±1.36% (89 runs sampled)

@DenisCarriere
Copy link
Member

but if the user is using this module on MultiPoint I guess it means he wants to clean it!

Exactly.. :)

Copy link
Member

@DenisCarriere DenisCarriere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to go! @stebogit

@DenisCarriere DenisCarriere added this to the 4.6.0 milestone Jul 31, 2017
var existing = {};
for (var i = 0; i < coords.length; i++) {
var point = coords[i];
var key = point.join('-');
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DenisCarriere this is really smart! 🤓 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks 🤓 !

@stebogit
Copy link
Collaborator Author

stebogit commented Jul 31, 2017

@DenisCarriere now that I cleaned the horrible for-loops I used initially I feel better. I can't stand them! 😄
Good for me to merge this.

@rowanwins do you have any suggestion for an alternative name, if you feel it should be different?
I am personally fine with this (of course 😄 ).

@DenisCarriere
Copy link
Member

@stebogit 👍 I'm on the same page with you for those for loops, there's some slight performance decrease (~8-10%) when using forEach, but it's a lot easier to read!

@rowanwins
Copy link
Member

Happy to go with group consensus on the name - thoughts @DenisCarriere ? See history

@DenisCarriere
Copy link
Member

DenisCarriere commented Jul 31, 2017

@stebogit Let me know what you think about my last commit d684814, I dropped clone entirely so that the cleanCoords module doesn't mutate the input by adding geometry & feature helper functions.

will need to update @turf/helpers to support bbox & id properties, also a geometry method would be useful.

@DenisCarriere
Copy link
Member

@rowanwins Happy to go with group consensus on the name

I'm good with this name, struggling to find a better name for this.

@stebogit
Copy link
Collaborator Author

@DenisCarriere

I dropped clone entirely

Why? is't this the perfect situation for using this module? It is basically exactly using geometry & feature helpers.

so that the cleanCoords module doesn't mutate the input

Why? I would leave this option to the user. We have this useful option in quite a few other modules, like truncate, transform, kmeans and other

@DenisCarriere
Copy link
Member

DenisCarriere commented Jul 31, 2017

@stebogit
Well unlike truncate or those other modules, we aren't using coordEach which is impossible to prevent input mutation without using clone. Since we've already deconstructed the entire GeoJSON geometry we can easily put it back together with little performance loss (1.3x slower).

Reason why I changed it to geometry/feature is purely for performance since mutate will most likely always be defined as false (which will be the slowest option).

I ran some benchmarks using the different techniques.

Considerations

Foreign members are dropped using the geometry/feature approach, however if we are enforcing a "strict" GeoJSON implementation that might be a good thing, also this might be considered a positive +1 for this "clean" module.

👍 It's a good idea to keep the mutate options that way it allows the user to preserve foreign members and/or wants the input to be mutated.

Summary

  • mutate=true (0x - speed factors are compared to these benchmark results)
  • mutate=false
    • using geometry/feature (~1.3x slower)
    • using @turf/clone (~10x slower)
    • using JSON.parse + JSON.stringify (~20x slower)
 * // mutate=false (using geometry/feature)
 * geometry x 1,524,640 ops/sec ±4.60% (74 runs sampled)
 * multiline x 1,511,608 ops/sec ±8.79% (72 runs sampled)
 * multipoint x 382,429 ops/sec ±3.56% (84 runs sampled)
 * multipolygon x 808,277 ops/sec ±2.84% (82 runs sampled)
 * point x 14,675,464 ops/sec ±4.42% (80 runs sampled)
 * polygon-with-hole x 1,493,507 ops/sec ±5.53% (72 runs sampled)
 * polygon x 2,386,278 ops/sec ±1.27% (86 runs sampled)
 * simple-line x 4,195,499 ops/sec ±2.88% (86 runs sampled)
 * triangle x 2,254,753 ops/sec ±1.10% (88 runs sampled)
 *
 * // mutate=false (using @turf/clone)
 * geometry x 202,410 ops/sec ±1.43% (88 runs sampled)
 * multiline x 235,421 ops/sec ±3.48% (86 runs sampled)
 * multipoint x 280,757 ops/sec ±1.59% (87 runs sampled)
 * multipolygon x 127,353 ops/sec ±1.35% (88 runs sampled)
 * point x 18,233,987 ops/sec ±1.34% (86 runs sampled)
 * polygon-with-hole x 199,203 ops/sec ±2.61% (84 runs sampled)
 * polygon x 355,616 ops/sec ±1.58% (86 runs sampled)
 * simple-line x 515,430 ops/sec ±2.40% (83 runs sampled)
 * triangle x 286,315 ops/sec ±1.64% (86 runs sampled)
 *
 * // mutate=false (using JSON.parse + JSON.stringify)
 * geometry x 93,681 ops/sec ±7.66% (75 runs sampled)
 * multiline x 112,836 ops/sec ±4.60% (82 runs sampled)
 * multipoint x 113,937 ops/sec ±1.09% (90 runs sampled)
 * multipolygon x 71,131 ops/sec ±1.32% (90 runs sampled)
 * point x 18,181,612 ops/sec ±1.36% (91 runs sampled)
 * polygon-with-hole x 100,011 ops/sec ±1.14% (85 runs sampled)
 * polygon x 154,331 ops/sec ±1.31% (89 runs sampled)
 * simple-line x 193,304 ops/sec ±1.33% (90 runs sampled)
 * triangle x 130,921 ops/sec ±3.37% (87 runs sampled)
 *
 * // mutate=true
 * geometry x 2,016,365 ops/sec ±1.83% (85 runs sampled)
 * multiline x 2,266,787 ops/sec ±3.69% (79 runs sampled)
 * multipoint x 411,246 ops/sec ±0.81% (89 runs sampled)
 * multipolygon x 1,011,846 ops/sec ±1.34% (85 runs sampled)
 * point x 17,635,961 ops/sec ±1.47% (89 runs sampled)
 * polygon-with-hole x 2,110,166 ops/sec ±1.59% (89 runs sampled)
 * polygon x 2,887,298 ops/sec ±1.75% (86 runs sampled)
 * simple-line x 7,109,682 ops/sec ±1.52% (87 runs sampled)
 * triangle x 3,116,940 ops/sec ±0.71% (87 runs sampled)

@stebogit
Copy link
Collaborator Author

stebogit commented Aug 1, 2017

Great! I can't wait to use this in @turf/boolean-equal @DenisCarriere 😃

@DenisCarriere DenisCarriere merged commit adb56ea into master Aug 1, 2017
@DenisCarriere DenisCarriere deleted the clean-coords branch August 1, 2017 05:36
@DenisCarriere
Copy link
Member

Now all I need to do is find some time to push out a new minor release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants