-
Notifications
You must be signed in to change notification settings - Fork 0
Theory of operation
A given index comprises sets of tiles at various zoom levels with each tile containing a list of the points that lie within it. The lowest level of the index covers the entire globe. Each higher index level contains twice as many tiles in each direction. At each zoom level points are linearly mapped to grid tiles based on their latitudes and longitudes using an equirectangular projection. This is fairly analogous to how typical web slippy maps are organized (though they use a pseudo-mercator projection).
As one approaches the poles the tiles become increasingly distorted with the area (in square meters) covered by each tile becoming progressively smaller. The distance in meters for one degree of longitude gets smaller as one moves away from the equator. The distance in meters for one degree of latitude, however, remains constant at all latitudes.
Each tile has a name that gives its zoom level and position. These names are used as keys into a Perl hash allowing the quick retrieval of the points that lie within a given tile. The various search methods are designed to efficiently pull points from this index and filter them in various ways. The format used for the keys is described in the Tile naming section below.
Additional datastructures are also present but knowing their details is not needed to understand how the index functions. In the descriptions below, some minor (but often critical) details have been omitted and some simplifications have been made; these details (mainly edge cases) are discussed in the code comments.
When a point is added to the index it is stored multiple times in the index hash, once for each level. This is done as follows:
-
The point's latitude and longitude are converted to integers. This is done using a simple linear mapping. In pseudo-code, the equations used are:
max_size = 2**levels integer_latitude = floor( ( latitude + 90.0 ) * max_size / 180.0 ) integer_latitude = max_size - 1 if (integer_latitude == max_size) integer_longitude = floor( ( longitude + 180.0 ) * max_size / 360.0 ) % max_size
The values for
latitude
andlongitude
are in degrees andlevels
is the number of levels in the index (not counting the global one). -
Each index level is looped through from the index's maximum level to
0
. At each level, the key (comprised oflevel
,integer_latitide
, andinteger_longitide
, see also below) is used to retrieve the corresponding value from the index hash. This value is a reference to the list of points that lie within the grid tile named by the key. The point being indexed is added to the retreived list. If there is no list stored in the index for the current key then a new list is created and added. As a special case, all points adjacent to the poles (that is points with integer latitudes of0
ormax_size - 1
) use the longitudeALL
in their keys.Once the point has been added, the integer latitudes and longitudes as well as the
max_size
are shifted right by one bit in preparation for the the next level. -
Once a the point has been added to the index at each level, the point is added to the global index entry using the key
ALL
,ALL
,ALL
. (All indexed points can be found under this key.)
The Search(…)
method is typically used to find all points lying
within a given radius of a search point. Two steps are performed by this
method: retrieval of preliminary results and filtering of the results based on
the search criteria.
If no search radius was specified, if a global search was requested, or if the search radius covers more than half the globe then the preliminary results are all points in the index. Otherwise, the preliminary results are gathered as follows:
The appropriate tile zoom level to use is determined using:
shift = ceil( log2( search_radius / half_circumference ) )
level = max_level - shift
This results in the smallest level that could potentially contain all result points within a single tile.
The search radius (in meters) is converted to two search angular radii, one for latitude and one for longitude. This is done since the number of meters per degree longitude decreases as one approaches the poles. Thus the north-south (latitude) search radius remains constant at all latitudes but the east-west (longitude) search radius increases as one nears the poles.
Each extreme is converted to an integer and shifted right by the determined
shift
, The preliminary results are retrieved from the index by iterating
over the keys for the computed level, bounded by the integer extrema. This
typically, but not always, results in a list of pointers to four tiles' points.
If the quick_results
option is active then this preliminary list of lists of
points is returned. If not then the points are filtered to only include those
matching the search criteria. The filtered points are optionally sorted and
then returned. Note that when large numbers of points have been found this
filtering can be very slow; see PERFORMANCE for details.
The Closest(…)
and Farthest(…)
methods find the points
closest to (or farthest from) a search point. The Closest(…)
method
works as follows:
The search starts at the most detailed level of the index and proceeds to the
least detailed (0
). At each level, the grid tile that the search point lies
in along with the three closest grid squares are identified. The method used
for selecting the adjacent tiles is to look at the least-significant bits of the
integer position at the previous (more detailed) level. A 1
bit for the
latitude selects tiles to the north, a 0
bit the ones to the south. Likewise
a 1
for the longitude selects the ones east and a 0
the ones west.
Now that the four tiles have been identified, the largest radius from the search
point to the tile edges is determined. The distance from the search point to
each point within the four tiles is measured. If the point is within the radius
computed and it passes any pre- or post-condition tests it is added to the
results list. To speed up processing, points that have already been rejected
along with the distances so far measured are cached. As a convenience, by
default a filter is applied that omits the search point from the results.
Once all points within a level's four chosen tiles have been gathered a check is done to see whether at least the requested number of points have been found. If they have then the loop ends, if not then the next (less-detailed) level is processed.
By default, the results are then sorted by distance which ensures that the closest results are earliest in the list. This is necessary since although the nature of the algorithm tends to place closer points earlier in the results there is no inherent order to the points added from a particular index level. Lastly, the requested number of result points is returned.
The Farthest(…)
method is largely implemented as a wrapper for
Closest(…)
. It functions by finding the closest points to the search
point's antipode.
The SearchByBounds(…)
method works much the same as Search(…)
method. Instead of computing extrema of a search circle, those of the supplied
bounding box are used. The tile level used is max( latitude_level, longitude_level )
where latitude_level
and longitude_level
are the most detailed levels that
could potentially (given their extrema's angular distances) contain their
respective extrema within a single tile in each direction. The remainder of the
method is identical to that of Search(…)
albeit with all
distance-related code removed.
As mentioned earlier, keys consist of a zoom level, a latitude, and a longitude. Each key uniquely names a given tile.
Zoom levels are either integers between 0
and the maximum zoom level or the
special zoom level ALL
(with the value -1
) that covers the entire globe.
Latitudes and longitudes are integers between 0
and one less than the maximum
grid size for the level. The tiles immediately adjacent to the poles are
treated differently. In these areas the coverage of each tile is quite small
and the algorithm around the poles would normally be complex. To accommodate
these issues, the special value ALL
(with the value -1
) is used for the
longitude of the polar tiles (those areas with the lowest or highest latitude
value for the key's level). All points lying in a polar region are assigned to
that region's overarching ALL
tile. At the global level all three components
are set to ALL
.
If Perl has been compiled with 64-bit support then each key is packed into a
64 bit integer. The level is stored in the upper 6 bits (bits 58 .. 63), the
integer latitude in the next 29 bits (bits 29 .. 57), and the integer longitude
in the low 29 bits (bits 0 .. 28). To represent the ALL
value all bits in
the relevant field are set to 1
. Note that even on 32-bit systems Perl is
often compiled with 64-bit support.
If Perl does not have 64-bit support then a different format is used. In most
places within Geo::Index, keys are stored as three-element array references.
The first field contains the level, the second the integer latitude and the
third the integer longitude. If present, ALL
values are stored as-is as
their integer value (-1
). For accessing the index, keys are converted to
strings with the format "level:
latitude,
longitude" with the
literal string "ALL
" being used for ALL
values.
Each index object is a hash containing a number of entries These are:
-
$self->{index}
- The points indexEntry is a hash reference. Keys are tile names (as discussed above), values are lists of point references.
-
$self->{indices}
- Indices used for each pointEntry is a hash reference. Keys are point references, values are lists of tile names.
-
$self->{positions}
- Each point's position when indexedEntry is a hash reference. Keys are point references, values are two-element lists giving each point's latitude and longitude at the time it was indexed.
-
$self->{levels}
- Number of levels in the index (excluding the global level) -
$self->{max_level}
- The highest-resolution level number (i.e.levels
- 1) -
$self->{max_size}
- Number of grid tiles in each direction at most detailed level of index -
$self->{planetary_radius}
- The planetary radius used by the index (in meters) -
$self->{polar_circumference}
- The polar circumference used by the index (in meters) -
$self->{equatorial_circumference}
- The equatorial circumference used by the index (in meters)
- Overview
- Points
-
Methods
-
Indexing
new
Index
IndexPoints
Unindex -
Searching
Search
SearchByBounds
Closest
Farthest
AllPoints
Condition functions -
Distance and measurement
Distance
DistanceFrom
DistanceTo
OneDegreeInMeters
OneMeterInDegrees -
Metadata
GetConfiguration
GetStatistics -
Miscellaneous
Sweep
Vacuum
PointCount
-
Indexing
- Alternate method names
- Performance
- Theory of operation
- Bugs and deficiencies
- Version history
- Author and copyright