This repository has been archived by the owner. It is now read-only.

Add GeographicMidpointReduction #446

Merged
merged 4 commits into from Oct 26, 2017

Conversation

Projects
None yet
3 participants
@AlexDBlack
Copy link
Member

AlexDBlack commented Oct 25, 2017

@AlexDBlack AlexDBlack requested a review from crockpotveggies Oct 25, 2017


@Override
public List<String> getColumnsOutputName(String columnInputName) {
return null;

This comment has been minimized.

@crockpotveggies

crockpotveggies Oct 25, 2017

Collaborator

Do we not need to return a new column name? If null, does it just perform the op on the existing column? (Sorry not versed in the new API)

This comment has been minimized.

@huitseeker

huitseeker Oct 25, 2017

Contributor
return Collections.singletonList("slerp(" + columnInputName + ")");

?

This comment has been minimized.

@AlexDBlack

@Override
public List<ColumnMetaData> getColumnOutputMetaData(List<String> newColumnName, ColumnMetaData columnInputMeta) {
return null;

This comment has been minimized.

@crockpotveggies

crockpotveggies Oct 25, 2017

Collaborator

Similar question as above.

This comment has been minimized.

@crockpotveggies

crockpotveggies Oct 25, 2017

Collaborator

Yea so in testing I'm getting NPEs from the reducer when this is null

...java.lang.NullPointerException
  at java.util.ArrayList.addAll(ArrayList.java:577)
  at org.datavec.api.transform.reduce.Reducer.transform(Reducer.java:130)
  at org.datavec.api.transform.sequence.window.ReduceSequenceByWindowTransform.transform(ReduceSequenceByWindowTransform.java:65)

This comment has been minimized.

@crockpotveggies

crockpotveggies Oct 25, 2017

Collaborator

The code that triggers the NPE

val transform = new TransformProcess.Builder(schema)
    .removeAllColumnsExceptFor("Timestamp","MMSI","Lat","Lon")
    .transform(new StringToTimeTransform("Timestamp","dd/MM/YYYY HH:mm:ss",DateTimeZone.UTC))
    .transform(new ConcatenateStringColumns("LatLon", ",", List("Lat","Lon")))
    .convertToSequence("MMSI", new NumericalColumnComparator("Timestamp", true))
    .transform(
        new ReduceSequenceByWindowTransform(
            new Reducer.Builder(ReduceOp.Count).keyColumns("MMSI")
                .countColumns("Timestamp")
                .customReduction("LatLon", new Reductions.GeoAveragingReduction)
                .takeFirstColumns("Timestamp")
                .build(),
            new TimeWindowFunction("Timestamp",10L,TimeUnit.MINUTES)
        )
    )
    .removeAllColumnsExceptFor("LatLon")
    .build

This comment has been minimized.

@AlexDBlack

//http://www.geomidpoint.com/example.html
//That particular example is weighted - have 3x weight for t1, 2x weight for t2, 1x weight for t1
Text t1 = new Text("40.7143528,-74.0059731");

This comment has been minimized.

@crockpotveggies

crockpotveggies Oct 25, 2017

Collaborator

Maybe move the coordinates to a polar extreme? That will exaggerate any mathematical issues.

This comment has been minimized.

@huitseeker

huitseeker Oct 25, 2017

Contributor

Actually, you may want to test the ka-boom : two polar opposites, or more generally equally-distributed points on the sphere. Their Ka-boom nature should be apparent (i.e. raise an exception).

This comment has been minimized.

@AlexDBlack

AlexDBlack Oct 26, 2017

Member

Added exception, based on "special case" / pt 9 here:
http://www.geomidpoint.com/calculation.html

}

@Override
public List<Writable> get() {

This comment has been minimized.

@huitseeker

huitseeker Oct 25, 2017

Contributor

This one and the above LGTM.


//http://www.geomidpoint.com/example.html
//That particular example is weighted - have 3x weight for t1, 2x weight for t2, 1x weight for t1
Text t1 = new Text("40.7143528,-74.0059731");

This comment has been minimized.

@huitseeker

huitseeker Oct 25, 2017

Contributor

Actually, you may want to test the ka-boom : two polar opposites, or more generally equally-distributed points on the sphere. Their Ka-boom nature should be apparent (i.e. raise an exception).


double lat = latDeg * PI_180;
double lng = longDeg * PI_180;

This comment has been minimized.

@huitseeker

huitseeker Oct 25, 2017

Contributor

The interpolation is nice but computing a large sum, and our friend @crockpotveggies has tons of ships that stay at berth for days (i.e. immobile or close to that). Should we whip out a rounding error reduction (Kahan?). Does it make sense to keep a cache of the last encountered point and just ignore a new point if the angle with the cached point was negligible? Am I getting too wound up about this?

I believe we can get the angle between (lat1, long1), (lat2, long2) as:

havAngle = (Math.sin(dlat/2))^2 + Math.cos(lat1) * Math.cos(lat2) * (Math.sin(dlon/2))^2 
angle = 2 * Math.arcsin(min(1,sqrt(havAngle))) 

If we neglect points s.t. angle < 1e-6 rad, our error is bound to 6.371 m, i.e. less than the length of a ship of a size susceptible to carry an AIS beacon.

This comment has been minimized.

@AlexDBlack

AlexDBlack Oct 26, 2017

Member

Maybe overkill? But if one of you wants to implement rounding error reduction, by all means go ahead. As to cache of last point: hm, that implies order, which I don't think we can assume in general (like, spark execution, or even just non-sequences).

This comment has been minimized.

@huitseeker

huitseeker Oct 26, 2017

Contributor

Yeah, except I know @crockpotveggies is going to use this on a list where 50% of points or more are within meters of each other. I guess he'll have to implement the above then :)

@@ -0,0 +1,158 @@
/*

This comment has been minimized.

@huitseeker

huitseeker Oct 25, 2017

Contributor

Are we sure this is the right folder ?

.../org/datavec/api/transform/transform/reduce/GeographicMidpointReduction.java

This comment has been minimized.

@AlexDBlack

AlexDBlack Oct 26, 2017

Member

Moved to org.datavec.api.transform.reduce.impl.GeographicMidpointReduction

@huitseeker
Copy link
Contributor

huitseeker left a comment

Nice op, thanks. @crockpotveggies caught the warts, I added some idle comments.

@AlexDBlack AlexDBlack force-pushed the ab_avglocreduction branch from d65260c to 216a786 Oct 26, 2017

@huitseeker
Copy link
Contributor

huitseeker left a comment

👍

@AlexDBlack AlexDBlack merged commit 0b7644d into master Oct 26, 2017

@AlexDBlack AlexDBlack deleted the ab_avglocreduction branch Oct 26, 2017

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.