Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize intermediate file of <way, nodeids> mapping table #31

Closed
CodeBear801 opened this issue Jun 27, 2019 · 12 comments
Closed

Optimize intermediate file of <way, nodeids> mapping table #31

CodeBear801 opened this issue Jun 27, 2019 · 12 comments
Assignees

Comments

@CodeBear801
Copy link

CodeBear801 commented Jun 27, 2019

We need to generate <wayid, nodeids> mapping table to enable Telenav traffic for OSRM. More background could go to #22.
This table will be used to generate speed.csv each several minutes and be used for OSRM customization. The initial version takes about 70~80 seconds to iterate all the table and about 10GB's memory.
Our target is optimize time cost for dealing this file.

@CodeBear801
Copy link
Author

CodeBear801 commented Jun 27, 2019

Current file format:

wayid1,nodeid1,nodeid2,nodeid3
wayid2,nodeid4,nodeid5
...

Example

24418343,84760849102,84760850102
24418344,84760846102,84760858102,847608599102

Statistic
Based on NA data, way count: 60M, node count: 700M

format size comments
csv 12,521,206,024
tar 2,035,689,499
snappy 3,623,614,741
parquet 3,768,399,074 code Just for compare, iterate all data don't need parquet

Currently, we are using csv for follow steps, I think a large time be used in I/O of loading this 10GB file. This file is don't need to be designed for human readable, so next step would be

  1. Design strategy for lost less data schema. We don't need record original value, I believe record delta data would helps a lot.

  2. For how to use the final result, if we load all of them into memory by once, then we could consider choosing a proper way to compress and decompress data. I will consider snappy first, consider which has good decompress performance.

  3. If we plan to break big task into smaller one, then we could consider some kind of partition strategy. Like based on wayid to partition entire data to certain number of parts.
    Say that we partition data based on last digit and each partition is sorted
    (partition1)
    34343401
    34343511
    34343551


(partition2)
34343402
34373572
34383552


(partition3)

If traffic component passing the result to us based on the same strategy, what we could benefit from swift's streaming and go concurrency.
After receiving all data from traffic component ended with 1, we could start job to deal with(traffic data end with 1, map data end with 1), since those to data set is sorted, it would be very quick to generate final joined result and put them into output channel.

Will do experiment for 1 and 2 for now, 3 is just draft idea for future discussions.

@CodeBear801
Copy link
Author

CodeBear801 commented Jun 27, 2019

After analysis source data, we feel delta strategy could be a good way to reduce original information's size.
Instead of recording

24418343,84760849102,84760850102
24418344,84760846102,84760858102,847608599102

we could record

wayid1, nodeid1, (nodeid2-nodeid1)
wayid2-wayid1, nodeid3-nodeid1, nodeid4-nodeid3, nodeid5-nodeid4
wayid3-wayid2, nodeid6-nodeid3,nodeid7-nodeid6
//...

wayid 1000, nodeid1000, nodeid1001-nodeid1000
//...

Sample result is

24418343,84760849102,1102
1,-3102,12102,18102

Notes:

  • For each node, because their id is similar, so diff with previous one could save lots of size
  • For each way, diff with previous one also helps
  • If we sort data then result should be better
  • Must be careful about offset out of range during delta generation

@CodeBear801
Copy link
Author

CodeBear801 commented Jun 27, 2019

Updates for experiment

Result:

format size comments
csv 4457788518
snappy 1280865368 30% of previous snappy size, 10% compare to original csv file

Code:

func generateIDMapptingString(nodeids []int64, wayid int64,
	preNodeID int64, preWayID int64) string {
	var str string
	str += strconv.FormatInt((wayid-preWayID), 10) + ","

	for i, n := range nodeids {
		str += strconv.FormatInt((n - preNodeID), 10)
		if i < (len(nodeids) - 1) {
			str += ","
		}
		preNodeID = n
	}
	str += "\n"

	return str
}

func updateDeltaBase(wc uint32, currNodeID, currWayID int64) (int64, int64) {
	if wc%blobSize == 0 {
		return 0, 0
	} else {
		return currNodeID, currWayID
	}
}

Experiment with differernt blob size

blob size snappy file size comments
10 1337122870
1000 1280865368 Will choose this threshold
10000 1280210252

@CodeBear801 CodeBear801 self-assigned this Jun 27, 2019
@wangyoucao577
Copy link

Seems good. The blob size is wayid2nodeids count, right?

@wangyoucao577
Copy link

How about the snappy compress and decompress time cost?

@CodeBear801
Copy link
Author

Seems good. The blob size is wayid2nodeids count, right?

Blob size means group count while compressing elements in wayid2nodeids. I choose delta strategy to generate the diff, and each "Blob size" elements would reset the base.

@CodeBear801
Copy link
Author

Back from emergent firefighting. Will continue this task.

@CodeBear801
Copy link
Author

CodeBear801 commented Jul 13, 2019

Latest strategy(v4)

  • Record mapping table of wayid->nodeids into snappy(12GB -> 3GB)
  • When start rpc request with traffic server, start loading mapping table, write mapping table result into 64 channels(configurable)
    As long as communication with traffic server finished, which means mapping of wayid->speed is generated, start transform and dump
  • There will be 64 threads working on transform, then finally use one thread to dump into file
Processing time for getting traffic flow takes 14.504480 seconds
Processing time for building traffic map takes 6.370307 seconds
Processing time for loadWay2NodeidsTable takes 48.059116 seconds
Processing time for dumpSpeedTable4Customize takes 32.162450 seconds

Total time is 14 + 6 + 32, compare with v1 is 14 + 6 + 98

The strategy of V2

  • one goroutine to generate traffic result(13 seconds)
  • one gorouting to generate a map of wayid to nodeids array (280 seconds)
  • 16 ~ 32 tread to iterate traffic array, query on map and generate final result(15 seconds)

Problem

  • Load all data into system takes too much of time. Especially the nodeids using slice will cause lots of memory allocation and deallocation
  • Dump step need to wait for first two steps to finish

The strategy of V3

  • one goroutine to generate traffic result(13 seconds)
  • 16~64 gorouting to decoding id mapping table in delta format, put [wayid, nodeids array] into channel (100+ seconds)
  • As long as step1 is finished, start dummping (100+ seconds)

Problem:

  • Decoding string and put [wayid, nodeids array] has lots of additional memory usage, while is V4 I direct put string into channel, and move decoding of this string near final dump.
  • Both V2 and V3 use my own delta compression strategy, the input of snappy file is about 1.2GB, but which must need decoding while reading the string, which makes following step not easy to run in parallel. So I decide to change back to just snappy compress original file to simplify the flow

@CodeBear801
Copy link
Author

Upload the cpu profile and mem profile for v2, will do more analysis and compare with v4 later.
cpu-v2.pdf
mem-v2.pdf

@wangyoucao577
Copy link

It's awesome! It will be better if there's a data flow diagram describes the flow.
For the 64 threads(goroutines I think), how about set it by CPU cores by default?

@CodeBear801
Copy link
Author

@wangyoucao577
Thanks for the suggestion. A flow diagram is needed :)
Will do more experiments about how many goroutine is the best result.

@CodeBear801
Copy link
Author

CodeBear801 commented Jul 16, 2019

Some idea about next step:

  • The original requirement of this task, is due to traffic team dropped original pbf's nodeid and we need to generate that.
    Need to confirm with traffic team, add original node id should not hard from their side, they could put it in the rpc protocal

  • When we consider end to end time, we will evaluate the time cost between traffic happened in the real world and when such traffic be used in our routing engine.
    Real world accident -> Traffic Provider -> Pull & Push request from Telenav -> Traffic component -> Script in this task->OSRM customization -> Restart routing
    From what we could change, we will consider how to optimize "Script in this task->OSRM customization -> Restart routing"

  • For the script part, if we could load way-> nodeids into memory, then the time cost would be around 20 seconds, but memory usage could be up to 10GB
    We could consider just put most frequent items into memory + in memory cache such as redis for other items. In memory cache could be updated more frequently, such as minute level, item need redis could be updated each several minutes

  • Currently, the biggest time cost is in OSRM Customization part, it seems each round of "OSRM customization" need about 10 minutes.
    We might need to think about using delta strategy to handle this, not to update everything in each round.

CodeBear801 added a commit that referenced this issue Jul 16, 2019
…tes -wayid indicate for traffic flow on reverse direction.

issues: #31
wangyoucao577 pushed a commit that referenced this issue Jul 17, 2019
* feat: Optimize output of wayid2nodeids format, use delta format to comparess data

Issues: #31

* feat: Implement logic to compress/decompress file to snappy.

issues: #31

* feat: Modify osrm speed table generator to support snappy compression format.

issues: #31

* feat: Fix bug during conversion

* feat: Adjust traffic updater's logic to improve performance.

* feat: Adjust traffic updater's logic to improve performance.

issues: #39

* feat: Refine the code for osrm_traffic_updater.

issues: #31

* fix: fix dead lock in the code.

* fix: optimize performance with new architecture.
issues: #31

* fix: revert way id generator to original solution

* fix: Use string to pass between different components
issues: #31

* fix: update unit test for latest changes
issues: #31

* fix: remove useless printf

* fix: update unit test for osrm_traffic_updater.go

* fix: fix the misunderstanding with requirement.  Traffic server generates -wayid indicate for traffic flow on reverse direction.
issues: #31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants