Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Newer
Older
100644 76 lines (39 sloc) 4.051 kb
1ff2e74 @chelm initial commit
authored
1 # Playing with Uber GPS tracks in SF
2
3 ## Task 1 & 2
4
3740708 @chelm working model and good times
authored
5 ### Problem:
1ff2e74 @chelm initial commit
authored
6
3740708 @chelm working model and good times
authored
7 Develop and train a model that can predict travel times across SF.
1ff2e74 @chelm initial commit
authored
8
3740708 @chelm working model and good times
authored
9 ### Approach:
1ff2e74 @chelm initial commit
authored
10
3740708 @chelm working model and good times
authored
11 I used a time weighted, grid based approach to solve this problem. This is accomplished with the help of NumPy and its awesome multi-dimensional arrays. The general idea is to loop over each point in the GPS data and operate on pairs of locations, calculating the distance and the a time delta. Using a very simple euclidean distance algorithm we can linearly divide the total time difference across each cell the path crosses. Essentially we train each cell by modulating its time-weight based on the time it take to cross it. Once the model is trained I save it to disk so we can use it to predict times very quickly. Prediction re-uses the distance/vector algorithm to traverse a path and sum each cell time the path crosses.
1ff2e74 @chelm initial commit
authored
12
3740708 @chelm working model and good times
authored
13 One thing to note is the structure of the nd-array. Its a 1000 x 1000 x 7 grid over SF. Each coordinate pair trains on an daily index (the z-axis, 0 to 7). To predict travel times without a given time/day the grid is simply averaged into a 2d array and used to predict. To predict at a given location and day the corresponding daily index grid is used. Originally I was using an hourly index, and easily modify the grid to add a fourth axis for hour. This might improve the accuracy.
14
15 ### Problems:
16
17 1. This approach just uses "as the crow flies" distance, which is less than perfect for a city.
18
19 2. By using only a daily index on the data we assume uniform temporal distribution throughout each. This is an obvious error, as traffic is highly dynamic yet petterned. Probably simple to correct by adding a fourth dimension to the matrix for time of day.
20
21 3. A grid based solution works, but a node graph solution would probably be better. Id like to experiment with using a graph database as the adaptation model, would be interesting for sure.
22
ff050c3 @chelm model is working now, stoked
authored
23 4. The training is slow, the time diff fn is to blame for this. If I create a time diff that was avoided using the time_delta fn in datetime itd be faster.
24
3740708 @chelm working model and good times
authored
25 ### Running:
1ff2e74 @chelm initial commit
authored
26
3740708 @chelm working model and good times
authored
27 > python task1_2.py
28
29 ### Results:
1ff2e74 @chelm initial commit
authored
30
968cb56 @chelm pushing it out, got just stop refining
authored
31 * (output from running task1_2.py):
1ff2e74 @chelm initial commit
authored
32
968cb56 @chelm pushing it out, got just stop refining
authored
33 * total1, total2
3740708 @chelm working model and good times
authored
34
968cb56 @chelm pushing it out, got just stop refining
authored
35 * 00:01:15, 00:01:03
3740708 @chelm working model and good times
authored
36
968cb56 @chelm pushing it out, got just stop refining
authored
37 * 00:02:27, 00:02:49
38
39 * 00:00:19, 00:00:15
1ff2e74 @chelm initial commit
authored
40
41
42 ## Task 3
43
3740708 @chelm working model and good times
authored
44 ### Problem:
45
46 Using the 25k GPS tracks can I think of way to detect errors/outliers and "scrub" them from any given GPS track.
47
48 ### Approach:
49
50 I chose to treat this problem as an distance-threshold problem where any point that was further from the previous point than given a threshold would be dropped from the track. The goal then is to eliminate the highest number points while minimizing the amount of total time lost in GPS track. All this solution does is compute distances and drops points if thy cross a given threshold.
51
52 I think Ive got a pretty solid was to solve this problem though. Its simple and solid solution that works.
1ff2e74 @chelm initial commit
authored
53
3740708 @chelm working model and good times
authored
54 ### Problems:
1ff2e74 @chelm initial commit
authored
55
3740708 @chelm working model and good times
authored
56 1. This is a pretty rough and simple way of solving this problem, and there are several issues with this. It doesnt solve the problem of stationary sensors that would cause problems with the ETA algorithm.
1ff2e74 @chelm initial commit
authored
57
3740708 @chelm working model and good times
authored
58 2. Could / Should use a combination of bearing and distance to more accurately drop points.
1ff2e74 @chelm initial commit
authored
59
3740708 @chelm working model and good times
authored
60 Notes: One thing I noticed was that one track "18880" had a lot of dropped points. It turned out there was something funky. Several points share the same lat/lon: 37.600135, -122.383066. This is most likely an error, but I was curious so I made a simple map of this GPS track. See http://geocommons.com/maps/154890 - there is Prelim, dirty version of the track and a cleaned version.
1ff2e74 @chelm initial commit
authored
61
3740708 @chelm working model and good times
authored
62 ### Running the code:
1ff2e74 @chelm initial commit
authored
63
64 > python task3.py
65
66 Results:
67
3740708 @chelm working model and good times
authored
68 * 21 Points are dropped at a threshold of .1 degrees
69 * 26 Points are dropped at a threshold of .05 degrees
70 * 48 Points are dropped at a threshold of .025 degrees
71 * 135 Points are dropped at a threshold of .0125 degrees
1ff2e74 @chelm initial commit
authored
72
00ae6ba @chelm sending
authored
73 ## Visualization
74
75 Ive got some images going, have a cool viz tool in mind and am hacking on it. Wanted to share code anyway.
Something went wrong with that request. Please try again.