Permalink
Browse files

add one more link

  • Loading branch information...
1 parent 2460d38 commit 7241873239772a09e977da86510b647deb8c6481 Marc Pfister committed Jun 29, 2012
Showing with 1 addition and 1 deletion.
  1. +1 −1 README.md
View
@@ -10,7 +10,7 @@ In October 2011 Joel Lawhead figured out [how to decode ESRI's SBN spatial index
I took his data and started trying to visualize it differently, looking for patterns that might give away the algorithm. First I noticed [Z-shaped curves](http://flic.kr/p/atrEim), which suggest quadtrees, or a z-order curve. I then noticed that the many of features were actually on seams between possible divisions of the space, and by tracking them it became clear that the spatial index was a binary division. The index space (a 256 x 256 grid) initally split in half horizontally. Then each rectangular half would split vertical into two squares, and so on.
-To further test how the algorithm worked, I created a series of incremental shapefiles that would add one feature to the previous one. I then wrote some scripts that would track how the index tree grew and also tracked features as they were sorted in the tree. Tracking through thousands of shapefiles, I noticed is that no matter the shapefiles, the size of the tree jumped at fixed feature counts. Looking at the numbers, it became clear that the size of tree was designed to keep an average of 8 features or less per node. By tracking individual nodes, I also noticed that a node would fill up until it had 8 features in it, and after that it would send the feature down to its child nodes.
+To further test how the algorithm worked, I created a series of incremental shapefiles that would add one feature to the previous one. I then wrote some scripts that would track how the index tree grew and also tracked features as they were sorted in the tree. Tracking through thousands of shapefiles, I noticed is that no matter the shapefiles, the size of the tree [jumped at fixed feature counts](http://flic.kr/p/cnSSTC). Looking at the numbers, it became clear that the size of tree was designed to keep an average of 8 features or less per node. By tracking individual nodes, I also noticed that a node would fill up until it had 8 features in it, and after that it would send the feature down to its child nodes.
With thousands of test shapefiles to work with, I also tested the algorithm that maps the features to index space and found that it does some interesting rounding. At this point it seems like we had the algorithm figured out. But in some of the test shapefiles some features would be sorted one level lower than the ESRI algorithm places them. After checking the individual features, we couldn't find any reason why the features did not sort all the way to the bin that should contain them. And there didn't seem to be any obvious explanation why.

0 comments on commit 7241873

Please sign in to comment.