Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate disk caching in GraphContractor #2654

Open
1 of 5 tasks
TheMarex opened this issue Jul 13, 2016 · 2 comments
Open
1 of 5 tasks

Investigate disk caching in GraphContractor #2654

TheMarex opened this issue Jul 13, 2016 · 2 comments

Comments

@TheMarex
Copy link
Member

TheMarex commented Jul 13, 2016

Currently at 50% of contraction we save all contracted nodes + edges on disk and then rebuilds the graph:

if (!flushed_contractor && (number_of_contracted_nodes >
static_cast<NodeID>(number_of_nodes * 0.65 * core_factor)))
{
util::DeallocatingVector<ContractorEdge>
new_edge_set; // this one is not explicitely
// cleared since it goes out of
// scope anywa
std::cout << " [flush " << number_of_contracted_nodes << " nodes] " << std::flush;
// Delete old heap data to free memory that we need for the coming operations
thread_data_list.data.clear();
// Create new priority array
std::vector<float> new_node_priority(remaining_nodes.size());
std::vector<EdgeWeight> new_node_weights(remaining_nodes.size());
// this map gives the old IDs from the new ones, necessary to get a consistent graph
// at the end of contraction
orig_node_id_from_new_node_id_map.resize(remaining_nodes.size());
// this map gives the new IDs from the old ones, necessary to remap targets from the
// remaining graph
std::vector<NodeID> new_node_id_from_orig_id_map(number_of_nodes, SPECIAL_NODEID);
for (const auto new_node_id :
util::irange<std::size_t>(0UL, remaining_nodes.size()))
{
auto &node = remaining_nodes[new_node_id];
BOOST_ASSERT(node_priorities.size() > node.id);
new_node_priority[new_node_id] = node_priorities[node.id];
BOOST_ASSERT(node_weights.size() > node.id);
new_node_weights[new_node_id] = node_weights[node.id];
}
// build forward and backward renumbering map and remap ids in remaining_nodes
for (const auto new_node_id :
util::irange<std::size_t>(0UL, remaining_nodes.size()))
{
auto &node = remaining_nodes[new_node_id];
// create renumbering maps in both directions
orig_node_id_from_new_node_id_map[new_node_id] = node.id;
new_node_id_from_orig_id_map[node.id] = new_node_id;
node.id = new_node_id;
}
// walk over all nodes
for (const auto source :
util::irange<NodeID>(0UL, contractor_graph->GetNumberOfNodes()))
{
for (auto current_edge : contractor_graph->GetAdjacentEdgeRange(source))
{
ContractorGraph::EdgeData &data =
contractor_graph->GetEdgeData(current_edge);
const NodeID target = contractor_graph->GetTarget(current_edge);
if (SPECIAL_NODEID == new_node_id_from_orig_id_map[source])
{
external_edge_list.push_back({source, target, data});
}
else
{
// node is not yet contracted.
// add (renumbered) outgoing edges to new util::DynamicGraph.
ContractorEdge new_edge = {new_node_id_from_orig_id_map[source],
new_node_id_from_orig_id_map[target],
data};
new_edge.data.is_original_via_node_ID = true;
BOOST_ASSERT_MSG(SPECIAL_NODEID != new_node_id_from_orig_id_map[source],
"new source id not resolveable");
BOOST_ASSERT_MSG(SPECIAL_NODEID != new_node_id_from_orig_id_map[target],
"new target id not resolveable");
new_edge_set.push_back(new_edge);
}
}
}
// Delete map from old NodeIDs to new ones.
new_node_id_from_orig_id_map.clear();
new_node_id_from_orig_id_map.shrink_to_fit();
// Replace old priorities array by new one
node_priorities.swap(new_node_priority);
// Delete old node_priorities vector
// Due to the scope, these should get cleared automatically? @daniel-j-h do you
// agree?
new_node_priority.clear();
new_node_priority.shrink_to_fit();
node_weights.swap(new_node_weights);
// old Graph is removed
contractor_graph.reset();
// create new graph
tbb::parallel_sort(new_edge_set.begin(), new_edge_set.end());
contractor_graph =
std::make_shared<ContractorGraph>(remaining_nodes.size(), new_edge_set);
new_edge_set.clear();
flushed_contractor = true;
// INFO: MAKE SURE THIS IS THE LAST OPERATION OF THE FLUSH!
// reinitialize heaps and ThreadData objects with appropriate size
thread_data_list.number_of_nodes = contractor_graph->GetNumberOfNodes();
}

It is not clear how much memory this is really saving us and how big the performance impact would be.

  • Remove/Disable the code
  • Benchmark for:
    • Planet on car.lua
    • Planet on foot.lua
  • Get memory increase for both

If the memory increase is too high for the foot profile, we should make this conditional on the number of inserted edges.

@TheMarex
Copy link
Member Author

TheMarex commented Jul 21, 2016

I did a preliminary test for berlin using the foot profile. Result are looking in favor of having disk caching:

With disk caching Cached Priority Time
Yes No 554s
No No 616s
Yes Yes 22s
No Yes 22s

Might be the graph is small enough that cache effects really kicks in and disk caching does not create significant IO overhead.

Next steps would be to re-run this on a bigger graph.

/cc @MoKob

@MoKob
Copy link

MoKob commented Jul 21, 2016

@TheMarex given the timing at what part the caching happens, I'd expect the results to swing in favour of non-cached for larger graphs. But I'd say we have to actually do some tests. I'd expect to see a result starting in the size of California and up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants