Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Location-dependent left hand driving flag #4415

Merged
merged 25 commits into from
Oct 4, 2017
Merged

Conversation

oxidase
Copy link
Contributor

@oxidase oxidase commented Aug 16, 2017

Issue

PR adds location-dependent tags and adds parsing driving_side tag.
Location-dependent data is taken from a GeoJSON file (file name is an argument of a new extractor option --location-dependent-data) and provided as an optional argument location_data of function process_way(profile, way, result, location_data). At the moment to use location-dependent data OSM files must be preprocessed with osmium add-locations-to-ways.

Left-hand driving flag is added to node-based-graph edges, edge-based-graph nodes and used in guidance pre-processing and in geospatial queries.

Some points to be discussed:

  1. MultiPolygon support?
  2. Use a nodes locations cache. This will require an implementation equivalent to add-locations-to-ways before parsing ways Country-aware way function #4167
  3. Merging conflicting tags at different locations of a way or multiple-defined at one node. At the moment a way is represented by a single node node and no dedicated merging is implemented.
  4. is_left_hand_driving flag is indexed by EBG node ids, but it can be indexed by geometry ids. This will require an additional file that is indexed by geometry ids, but contains geometry non-segments data.
  5. vector<bool> is stored as 8-bit per 1-bit value, is it a new issue?

Tasklist

Requirements / Relations

Partially supersedes #4167

@oxidase oxidase added the Review label Aug 16, 2017
@emiltin
Copy link
Contributor

emiltin commented Aug 16, 2017

do i understand it correctly, that this adds a general ability to use location-based data when preprocessing osm data, not just data about left/right side driving?

can the features in the geojson file overlap?

@oxidase
Copy link
Contributor Author

oxidase commented Aug 16, 2017

@emiltin yes, it is not limited to left-hand driving tags and GeoJSON can have any data that will be forwarded into parse_way function, but I did not checked yet other tags. The next step will be to check driving_side together with some other tags.

Polygons with non-intersecting tags sets can overlap. If some overlapping polygons have the same tag then last-value-wins merging in the rtree indexing order will be used. Basically it just a

boost::apply_visitor(table_setter(table, key_value.first), key_value.second);
so no dedicated conflict resolution yet.

@emiltin
Copy link
Contributor

emiltin commented Aug 16, 2017

ok, thanks for clarifying. this seems very useful. it will cover a different set of use-cases related to location data than the raster-based approach already available.

@TheMarex
Copy link
Member

TheMarex commented Aug 17, 2017

vector is stored as 8-bit per 1-bit value, is it a new issue?

You mean stored like that on disk? As far as I remember there were some issues around using out default memory-dump based serialization. If you see a fix for that, go for it. 👍

Merging conflicting tags at different locations of a way or multiple-defined at one node.

This is mainly an issue while crossing country polygon borders?

Location-dependent data is taken from a GeoJSON file (file name is an argument of a new extractor option --location-dependent-data)

Do you think it would be possible to specify this from within the profile? Do we want to allow for multiple of those sources? This could work similar to how we use raster-data, in this case it would just be polygon based data.

@oxidase
Copy link
Contributor Author

oxidase commented Aug 17, 2017

@TheMarex i added a vector serializer

This is mainly an issue while crossing country polygon borders?

yes, maybe for driving side flag this would make no sense (an edge case https://www.openstreetmap.org/export#map=17/22.52030/114.07047), but in many cases ways can cross polygons with the same tags, but different values.

Do you think it would be possible to specify this from within the profile? Do we want to allow for multiple of those sources? This could work similar to how we use raster-data, in this case it would just be polygon based data.

yes, it is possible to specify in profiles, but data must loaded before creating contexts, otherwise many copies of rtrees will be multiplied for every thread-local context.

Multiple of sources are also possible, in principle we can index data in a single rtree, but it will bring the merging question to discussion, i would go first towards a single GeoJSON file with MultiPolygon support and Sol2ScriptingEnvironment as the sole owner of a constant single copy of data.

@TheMarex
Copy link
Member

yes, it is possible to specify in profiles, but data must loaded before creating contexts,

We could call the initialization in the setup functions. In setup we would call a function in C++ land that would check if the specific GeoJSON file has already been loaded. Data would be owned by SolScriptingEnvironment (not SolScriptingContext) so one global R-Tree per file and many threads access. I take it lookups in the RTree would be thread-safe?

Copy link
Member

@TheMarex TheMarex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading the code, I think it might make sense to have a CLI parameter for the location data after all, since it is optional. I will leave this up to you to decide.

What is the plan with relation to the node-location? Do we want to wait for that PR or should be tell users to use the osmium re-writer?

Thanks for pushing this forward. 👍

CHANGELOG.md Outdated
@@ -18,6 +19,10 @@
- Fix a pre-processing bug where incorrect directions could be issued when two turns would have similar instructions and we tried to give them distinct values (https://github.com/Project-OSRM/osrm-backend/pull/4375)
- The entry bearing for correct the cardinality of a direction value (https://github.com/Project-OSRM/osrm-backend/pull/4353
- Change timezones in West Africa to the WAT zone so they're recognized on the Windows platform
- Profiles
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changelog entries need to be moved to 5.12

CHANGELOG.md Outdated
@@ -6,6 +6,7 @@
- BREAKING: Traffic signals will no longer be represented as turns internally. This requires re-processing of data but enables via-way turn restrictions across highway=traffic_signals
- Additional checks for empty segments when loading traffic data files
- Tunes the constants for turns in sharp curves just a tiny bit to circumvent a mix-up in fork directions at a specific intersection (https://github.com/Project-OSRM/osrm-backend/issues/4331)
- BREAKING: added `is_left_hand_driving` vector to `.ebg_nodes` file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might want to just say something like File format for .ebg_nodes not compatible with 5.11.

"""
And the ways with locations
| nodes | driving_side |
| ab | right |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tests the driving_side tag on ways in additional to the tag coming from outside data? Do you think in the future we might want to parse this data automatically from the OSM relations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there some residential roads, service roads, private ways or u-turns. Yes, we can add in future such handling, atm tags on ways have higher priority than default values.

@@ -114,6 +115,7 @@ struct ExtractionWay
bool is_startpoint : 1;
bool forward_restricted : 1;
bool backward_restricted : 1;
bool is_left_hand_driving : 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought: Maybe we should add some bool unused : 2 to make it explicit how many bits are still free.

Unrelated to your change but looking at this again, didn't VSC have problem with packing bools? We might want to change this to std::uint8_t.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VCS have other rules for padding, but here it should be similar to gcc packing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after some hours of debugging i think i will keep bool type here, because of some strange gcc behavior here: with std::uint8_t setting is_startpoint also sets forward_restricted and backward_restricted. it can be either ub or gcc optimization issue

@@ -103,7 +103,7 @@ struct ProfileProperties
bool continue_straight_at_waypoint;
//! flag used for restriction parser (e.g. used for the walk profile)
bool use_turn_restrictions;
bool left_hand_driving;
bool left_hand_driving; // DEPRECATED: property value is local to edges from API version 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen if a user sets this value? Will it be used as the default for per-edge values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@@ -114,44 +114,64 @@ template <typename T> void write(io::FileWriter &writer, const util::vector_view
writer.WriteFrom(data.data(), count);
}

template <typename T> inline unsigned char packBits(T &data, std::size_t index, std::size_t count)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data can be a const-ref.

};
}

sol::table LocationDependentData::operator()(sol::state &state, const osmium::Way &way) const
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to return a non-sol object, so we can keep this interface lua-agnostic? I think sol should be able to wrap an unordered_map automatically.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would like to keep it as a shortcut to avoid creating unordered_map. It can be easily adjusted when it will be clear how conflicting tags must be merged

@@ -224,7 +225,7 @@ void Sol2ScriptingEnvironment::InitContext(LuaScriptingContext &context)
"interpolate",
&RasterContainer::GetRasterInterpolateFromSource);

context.state.new_usertype<ProfileProperties>(
auto registration_ProfileProperties = context.state.new_usertype<ProfileProperties>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to save the return type of new_usertype here?

@@ -0,0 +1,38 @@
#include "storage/serialization.hpp"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. We should add tests for the other serialization methods too at some point.

@oxidase oxidase force-pushed the feature/left-hand-driving branch 3 times, most recently from d4f5c92 to be68c57 Compare August 21, 2017 14:46
@oxidase
Copy link
Contributor Author

oxidase commented Aug 21, 2017

@TheMarex please can you check the last changes?

I have tried to use real data with OSM administrative boundaries and results are very disappointing:
265 boost::geometry::within(point, polygons[v.second].first) calls increase OSM parsing from 30ms to 835 with just Thailand single border line.

EDIT: Before using location-dependent data, i think we need benchmarks of desired features and make an optimization of within or make better use of the r-tree data structure. ATM it is possible to use very simplified geometries without performance penalties.

@oxidase oxidase force-pushed the feature/left-hand-driving branch 2 times, most recently from 5550b63 to 57e7819 Compare August 31, 2017 11:14
@oxidase
Copy link
Contributor Author

oxidase commented Aug 31, 2017

@TheMarex i have ported point-in-polygon check from osmium and added some unit tests to check correctness. The performance results are better than using boost:geometry::within but the price is additional memory usage to store segment bands. For berlin-latest with 690932 ways anf the Germany border parsing time changed from 5.2 to 40 seconds, that is ~ 100-200 microseconds per lookup that is still bad.

There are still some space for improvement because "bands" solution is a naive implementation of an interval tree with fixed-size bands. For OSM Germany border with latitudes from 47.2701 to 55.0992 with a band width 0.00056389 degrees the histogram of segments is
germany-bands13884

so the check performance will depend on node locations and at the southern border it will be 1000 times slower than at the northern one. I think it is ok to merge under assumption of simple polygons for the left-hand side driving checks, and performance can be improved later by using similar check with an interval tree.

Another huge performance improvement is to avoid using unordered_map merging because 54% out 59% of LocationDependentData::operator() time for berlin-latest extraction is spent at

result.insert(polygon_properties.begin(), polygon_properties.end());

But i don't know atm the best way how to deal with the issue: may be create Lua tables in every context in advance, don't merge tables in c++ and provide an optional table of tags tables (it is possible to add also relation tags or any other "extension" tags into that table). So it is mainly a Lua interface question.

Copy link
Member

@TheMarex TheMarex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unclear on how the multi-file handling works, otherwise the code looks good.

However there is a big speedup potential for the point-in-polygon checks using quad-tree approximation (actually just removing most of the point-in-polygon checks).

}

// Create R-tree for bounding boxes of collected polygons
rtree = rtree_t(bounding_boxes);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm I don't quite understand how this works with multiple input files: It seems to me on every call of this functions the rtree gets overwritten and all previous data is lost?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦‍♂️ will be fixed

}
else
{
way_function(profile_table, way, result, toLua(state, location_dependent_data(way)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this add any overhead when not giving any location dependent data as input (e.g. no geojson file)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

empty data should handled in then branch when rtree is empty, and no optional arguments will be passed to the lua function

};

// Search the R-tree and collect a Lua table of tags that correspond to the location
rtree.query(boost::geometry::index::intersects(point) &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of saving each polygon in one bounding box you could try a quad tree approximation:

  • Divide the bounding box into 4 equal parts
  • For every new box check if its fully container, fully outside or intersects the boundary
  • If a box intersects the boundary: Continue dividing that box.
  • Stop when all boxes are either fully in or fully out, or you reached a minimum box size (like 10x10m)

Now your query can terminate immediately when it hits a fully in/fully out bounding box. You only need to do the expensive point-in-polygon check on the points that hit the boundary boxes.

Copy link
Member

@daniel-j-h daniel-j-h left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just stumbled upon some guidance profile decisions which depend on driving side here. I think it is missing from this changeset.

@oxidase oxidase force-pushed the feature/left-hand-driving branch 2 times, most recently from 4dad3c3 to 480bc4a Compare September 21, 2017 13:21
@oxidase
Copy link
Contributor Author

oxidase commented Oct 3, 2017

@TheMarex just checked the mean difference is 0.3 seconds or 5%

> t.test(d[d$V2=='no',1],d[d$V2=='yes',1])

	Welch Two Sample t-test

data:  d[d$V2 == "no", 1] and d[d$V2 == "yes", 1]
t = -4.4597, df = 197.97, p-value = 1.374e-05
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.4449259 -0.1720891
sample estimates:
mean of x mean of y 
 6.723882  7.032389 

berlin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants