Skip to content

Commit

Permalink
generalize node_keys; add way_keys
Browse files Browse the repository at this point in the history
This PR generalizes the idea of `node_keys`, adds `way_keys`, and fixes systemed#402.

I'm not too sure if this is generally useful - it's useful for one of my
use cases, and I see someone asking about it in systemed#190
and, elsewhere, in onthegomap/planetiler#99

If you feel it complicates the maintainer story too much, please reject.

The goal is to reduce memory usage for users doing thematic extracts by
not indexing nodes that are only used by uninteresting ways.

For example, North America has ~1.8B nodes, needing 9.7GB of RAM for its node
store. By contrast, if your interest is only to build a railway map, you
require only ~8M nodes, needing 70MB of RAM. Or, to build a map of
national/provincial parks, 12M nodes and ~120MB of RAM.

Currently, a user can achieve this by pre-filtering their PBF using
osmium-tool. If you know exactly what you want, this is a good
long-term solution. But if you're me, flailing about in the OSM data
model, it's convenient to be able to tweak something in the Lua script
and observe the results without having to re-filter the PBF and update
your tilemaker command to use the new PBF.

Sample use cases:

```lua
-- Building a map without building polygons, ~ excludes ways whose
-- only tags are matched by the filter.
way_keys = {"~building"}
```

```lua
-- Building a railway map
way_keys = {"railway"}
```

```lua
-- Building a map of major roads
way_keys = {"highway=motorway", "highway=trunk", "highway=primary", "highway=secondary"}`
```

Nodes used in ways which are used in relations (as identified by
`relation_scan_function`) will always be indexed, regardless of
`node_keys` and `way_keys` settings that might exclude them.

A concrete example, given a Lua script like:

```lua
function way_function()
  if Find("railway") ~= "" then
    Layer("lines", false)
  end
end
```

it takes 13GB of RAM and 100 seconds to process North America.

If you add:

```lua
way_keys = {"railway"}
```

It takes 2GB of RAM and 47 seconds.

Notes:

1. This is based on `lua-interop-3`, as it interacts with files that are
   changed by that. I can rebase against master after lua-interop-3 is
   merged.

2. The names `node_keys` and `way_keys` are perhaps out of date, as they
   can now express conditions on the values of tags in addition to their
   keys. Leaving them as-is is nice, as it's not a breaking change.
   But if breaking changes are OK, maybe these should be
   `node_filters` and `way_filters` ?

3. Maybe the value for `node_keys` in the OMT profile should be
   expressed in terms of a negation, e.g. `node_keys = {"~created_by"}`?
   This would avoid issues like systemed#337
  • Loading branch information
cldellow committed Dec 29, 2023
1 parent 6ba38b0 commit 0576b26
Show file tree
Hide file tree
Showing 15 changed files with 570 additions and 71 deletions.
1 change: 1 addition & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,7 @@ file(GLOB tilemaker_src_files
src/sharded_way_store.cpp
src/shared_data.cpp
src/shp_mem_tiles.cpp
src/significant_tags.cpp
src/sorted_node_store.cpp
src/sorted_way_store.cpp
src/tag_map.cpp
Expand Down
8 changes: 8 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,7 @@ tilemaker: \
src/sharded_way_store.o \
src/shared_data.o \
src/shp_mem_tiles.o \
src/significant_tags.o \
src/sorted_node_store.o \
src/sorted_way_store.o \
src/tag_map.o \
Expand All @@ -133,6 +134,7 @@ test: \
test_deque_map \
test_pbf_reader \
test_pooled_string \
test_significant_tags \
test_sorted_node_store \
test_sorted_way_store

Expand Down Expand Up @@ -163,6 +165,12 @@ test_pooled_string: \
test/pooled_string.test.o
$(CXX) $(CXXFLAGS) -o test.pooled_string $^ $(INC) $(LIB) $(LDFLAGS) && ./test.pooled_string

test_significant_tags: \
src/significant_tags.o \
src/tag_map.o \
test/significant_tags.test.o
$(CXX) $(CXXFLAGS) -o test.significant_tags $^ $(INC) $(LIB) $(LDFLAGS) && ./test.significant_tags

test_sorted_node_store: \
src/external/streamvbyte_decode.o \
src/external/streamvbyte_encode.o \
Expand Down
23 changes: 13 additions & 10 deletions docs/CONFIGURATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,16 +109,19 @@ For example:

Your Lua file needs to supply a few things:

1. `node_keys`, a list of those OSM keys which indicate that a node should be processed
2. `node_function()`, a function to process an OSM node and add it to layers
3. `way_function()`, a function to process an OSM way and add it to layers
4. (optional) `init_function(name)`, a function to initialize Lua logic
5. (optional) `exit_function`, a function to finalize Lua logic (useful to show statistics)
6. (optional) `relation_scan_function`, a function to determine whether your Lua file wishes to process the given relation
7. (optional) `relation_function`, a function to process an OSM relation and add it to layers
8. (optional) `attribute_function`, a function to remap attributes from shapefiles

`node_keys` is a simple list (or in Lua parlance, a 'table') of OSM tag keys. If a node has one of those keys, it will be processed by `node_function`; if not, it'll be skipped. For example, if you wanted to show highway crossings and railway stations, it should be `{ "highway", "railway" }`. (This avoids the need to process the vast majority of nodes which contain no important tags at all.)
1. (optional) `node_keys`, a list of those OSM tags which indicate that a node should be processed
2. (optional) `way_keys`, a list of those OSM tags which indicate that a way should be processed
3. `node_function()`, a function to process an OSM node and add it to layers
4. `way_function()`, a function to process an OSM way and add it to layers
5. (optional) `init_function(name)`, a function to initialize Lua logic
6. (optional) `exit_function`, a function to finalize Lua logic (useful to show statistics)
7. (optional) `relation_scan_function`, a function to determine whether your Lua file wishes to process the given relation
8. (optional) `relation_function`, a function to process an OSM relation and add it to layers
9. (optional) `attribute_function`, a function to remap attributes from shapefiles

`node_keys` is a simple list (or in Lua parlance, a 'table') of OSM tags. If a node has one of those keys, it will be processed by `node_function`; if not, it'll be skipped. For example, if you wanted to show highway crossings and railway stations, it should be `{ "highway", "railway" }`. (This avoids the need to process the vast majority of nodes which contain no important tags at all.)

`way_keys` is similar to `node_keys`, but for ways. For ways, you may also wish to express the filter in terms of the tag value, or as an inversion. For example, to exclude buildings: `way_keys = {"~building"}`. To build a map only of major roads: `way_keys = {"highway=motorway", "highway=trunk", "highway=primary", "highway=secondary"}`

`node_function` and `way_function` work the same way. They are called with an OSM object; you then inspect the tags of that object, and put it in your vector tiles' layers based on those tags. In essence, the process is:

Expand Down
7 changes: 5 additions & 2 deletions include/osm_lua_processing.h
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
#include <boost/container/flat_map.hpp>

class TagMap;
class SignificantTags;

// Lua
extern "C" {
Expand Down Expand Up @@ -72,6 +73,7 @@ class OsmLuaProcessing {
~OsmLuaProcessing();

// ---- Helpers provided for main routine
void handleUserSignal(int signum);

// Has this object been assigned to any layers?
bool empty();
Expand All @@ -93,7 +95,7 @@ class OsmLuaProcessing {
bool scanRelation(WayID id, const TagMap& tags);

/// \brief We are now processing a significant node
void setNode(NodeID id, LatpLon node, const TagMap& tags);
bool setNode(NodeID id, LatpLon node, const TagMap& tags);

/// \brief We are now processing a way
bool setWay(WayID wayId, LatpLonVec const &llVec, const TagMap& tags);
Expand Down Expand Up @@ -194,7 +196,8 @@ class OsmLuaProcessing {

void setVectorLayerMetadata(const uint_least8_t layer, const std::string &key, const uint type);

std::vector<std::string> GetSignificantNodeKeys();
SignificantTags GetSignificantNodeKeys();
SignificantTags GetSignificantWayKeys();

// ---- Cached geometries creation

Expand Down
29 changes: 27 additions & 2 deletions include/osm_store.h
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,21 @@ extern bool verbose;
class NodeStore;
class WayStore;

class UsedObjects {
public:
enum class Status: bool { Disabled = false, Enabled = true };
UsedObjects(Status status);
bool test(NodeID id);
void set(NodeID id);
void enable();
void clear();

private:
Status status;
std::vector<std::mutex> mutex;
std::vector<std::vector<bool>> ids;
};

// A comparator for data_view so it can be used in boost's flat_map
struct DataViewLessThan {
bool operator()(const protozero::data_view& a, const protozero::data_view& b) const {
Expand Down Expand Up @@ -188,8 +203,18 @@ class OSMStore
RelationScanStore scanned_relations;

public:

OSMStore(NodeStore& nodes, WayStore& ways): nodes(nodes), ways(ways)
UsedObjects usedNodes;
UsedObjects usedRelations;

OSMStore(NodeStore& nodes, WayStore& ways):
nodes(nodes),
ways(ways),
// We only track usedNodes if way_keys is present; a node is used if it's
// a member of a way used by a used relation, or a way that meets the way_keys
// criteria.
usedNodes(UsedObjects::Status::Disabled),
// A relation is used only if it was previously accepted from relation_scan_function
usedRelations(UsedObjects::Status::Enabled)
{
reopen();
}
Expand Down
16 changes: 11 additions & 5 deletions include/pbf_processor.h
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
#include <mutex>
#include <map>
#include "osm_store.h"
#include "significant_tags.h"
#include "pbf_reader.h"
#include "tag_map.h"
#include <protozero/data_view.hpp>
Expand Down Expand Up @@ -44,7 +45,7 @@ struct IndexedBlockMetadata: BlockMetadata {
class PbfProcessor
{
public:
enum class ReadPhase { Nodes = 1, Ways = 2, Relations = 4, RelationScan = 8 };
enum class ReadPhase { Nodes = 1, Ways = 2, Relations = 4, RelationScan = 8, WayScan = 16 };

PbfProcessor(OSMStore &osmStore);

Expand All @@ -54,7 +55,8 @@ class PbfProcessor
int ReadPbfFile(
uint shards,
bool hasSortTypeThenID,
const std::unordered_set<std::string>& nodeKeys,
const SignificantTags& nodeKeys,
const SignificantTags& wayKeys,
unsigned int threadNum,
const pbfreader_generate_stream& generate_stream,
const pbfreader_generate_output& generate_output,
Expand All @@ -77,28 +79,32 @@ class PbfProcessor
std::istream &infile,
OsmLuaProcessing &output,
const BlockMetadata& blockMetadata,
const std::unordered_set<std::string>& nodeKeys,
const SignificantTags& nodeKeys,
const SignificantTags& wayKeys,
bool locationsOnWays,
ReadPhase phase,
uint shard,
uint effectiveShard
);
bool ReadNodes(OsmLuaProcessing& output, PbfReader::PrimitiveGroup& pg, const PbfReader::PrimitiveBlock& pb, const std::unordered_set<int>& nodeKeyPositions);
bool ReadNodes(OsmLuaProcessing& output, PbfReader::PrimitiveGroup& pg, const PbfReader::PrimitiveBlock& pb, const SignificantTags& nodeKeys);

bool ReadWays(
OsmLuaProcessing& output,
PbfReader::PrimitiveGroup& pg,
const PbfReader::PrimitiveBlock& pb,
const SignificantTags& wayKeys,
bool locationsOnWays,
uint shard,
uint effectiveShards
);
bool ScanRelations(OsmLuaProcessing& output, PbfReader::PrimitiveGroup& pg, const PbfReader::PrimitiveBlock& pb);
bool ScanWays(OsmLuaProcessing& output, PbfReader::PrimitiveGroup& pg, const PbfReader::PrimitiveBlock& pb, const SignificantTags& wayKeys);
bool ScanRelations(OsmLuaProcessing& output, PbfReader::PrimitiveGroup& pg, const PbfReader::PrimitiveBlock& pb, const SignificantTags& wayKeys);
bool ReadRelations(
OsmLuaProcessing& output,
PbfReader::PrimitiveGroup& pg,
const PbfReader::PrimitiveBlock& pb,
const BlockMetadata& blockMetadata,
const SignificantTags& wayKeys,
uint shard,
uint effectiveShards
);
Expand Down
39 changes: 39 additions & 0 deletions include/significant_tags.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
#ifndef SIGNIFICANT_TAGS_H
#define SIGNIFICANT_TAGS_H

#include <string>
#include <vector>

class TagMap;
// Data structures to permit users to express filters on which nodes/ways
// to be accepted.
//
// Filters are of the shape: [~]key-name[=value-name]
//
// When a tilde is present, the filter's meaning is inverted.

struct TagFilter {
bool accept;
std::string key;
std::string value;

bool operator==(const TagFilter& other) const {
return accept == other.accept && key == other.key && value == other.value;
}
};

class SignificantTags {
public:
SignificantTags();
SignificantTags(std::vector<std::string> rawTags);
bool filter(const TagMap& tags) const;

static TagFilter parseFilter(std::string rawTag);
bool enabled() const;

private:
bool enabled_;
std::vector<TagFilter> filters;
};

#endif
19 changes: 19 additions & 0 deletions include/tag_map.h
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,17 @@
// This is true since the strings are owned by the protobuf block reader
// 3. Max number of tag values will fit in a short
// OSM limit is 5,000 tags per object
struct Tag {
protozero::data_view key;
protozero::data_view value;
};

class TagMap {
public:
TagMap();
void reset();

bool empty();
void addTag(const protozero::data_view& key, const protozero::data_view& value);

// Return -1 if key not found, else return its keyLoc.
Expand All @@ -41,6 +47,19 @@ class TagMap {

boost::container::flat_map<std::string, std::string> exportToBoostMap() const;

struct Iterator {
const TagMap& map;
size_t shard = 0;
size_t offset = 0;

bool operator!=(const Iterator& other) const;
void operator++();
Tag operator*() const;
};

Iterator begin() const;
Iterator end() const;

private:
uint32_t ensureString(
std::vector<std::vector<const protozero::data_view*>>& vector,
Expand Down
56 changes: 51 additions & 5 deletions src/osm_lua_processing.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,36 @@
#include "helpers.h"
#include "coordinates_geom.h"
#include "osm_mem_tiles.h"
#include "significant_tags.h"
#include "tag_map.h"
#include <signal.h>

using namespace std;

const std::string EMPTY_STRING = "";
thread_local kaguya::State *g_luaState = nullptr;
thread_local OsmLuaProcessing* osmLuaProcessing = nullptr;

void handleOsmLuaProcessingUserSignal(int signum) {
osmLuaProcessing->handleUserSignal(signum);
}

class Sigusr1Handler {
public:
Sigusr1Handler() {
#ifndef _WIN32
signal(SIGUSR1, handleOsmLuaProcessingUserSignal);
#endif
}

void initialize() {
// No-op just to ensure the compiler doesn't optimize away
// the handler.
}
};

thread_local Sigusr1Handler sigusr1Handler;

// A key in `currentTags`. If Lua code refers to an absent key,
// found will be false.
struct KnownTagKey {
Expand Down Expand Up @@ -157,6 +179,8 @@ OsmLuaProcessing::OsmLuaProcessing(
layers(layers),
materializeGeometries(materializeGeometries) {

sigusr1Handler.initialize();

// ---- Initialise Lua
g_luaState = &luaState;
luaState.setErrorHandler(lua_error_handler);
Expand Down Expand Up @@ -212,6 +236,10 @@ OsmLuaProcessing::~OsmLuaProcessing() {
luaState("if exit_function~=nil then exit_function() end");
}

void OsmLuaProcessing::handleUserSignal(int signum) {
std::cout << "processing OSM ID " << originalOsmID << std::endl;
}

// ---- Helpers provided for main routine

// Has this object been assigned to any layers?
Expand Down Expand Up @@ -723,8 +751,7 @@ bool OsmLuaProcessing::scanRelation(WayID id, const TagMap& tags) {
return true;
}

void OsmLuaProcessing::setNode(NodeID id, LatpLon node, const TagMap& tags) {

bool OsmLuaProcessing::setNode(NodeID id, LatpLon node, const TagMap& tags) {
reset();
originalOsmID = id;
isWay = false;
Expand All @@ -747,7 +774,11 @@ void OsmLuaProcessing::setNode(NodeID id, LatpLon node, const TagMap& tags) {
for (auto &output : finalizeOutputs()) {
osmMemTiles.addObjectToSmallIndex(index, output, originalOsmID);
}
}

return true;
}

return false;
}

// We are now processing a way
Expand Down Expand Up @@ -837,10 +868,25 @@ void OsmLuaProcessing::setRelation(int64_t relationId, WayVec const &outerWayVec
}
}

vector<string> OsmLuaProcessing::GetSignificantNodeKeys() {
return luaState["node_keys"];
SignificantTags OsmLuaProcessing::GetSignificantNodeKeys() {
if (!!luaState["node_keys"]) {
std::vector<string> keys = luaState["node_keys"];
return SignificantTags(keys);
}

return SignificantTags();
}

SignificantTags OsmLuaProcessing::GetSignificantWayKeys() {
if (!!luaState["way_keys"]) {
std::vector<string> keys = luaState["way_keys"];
return SignificantTags(keys);
}

return SignificantTags();
}


std::vector<OutputObject> OsmLuaProcessing::finalizeOutputs() {
std::vector<OutputObject> list;
list.reserve(this->outputs.size());
Expand Down
Loading

0 comments on commit 0576b26

Please sign in to comment.