Skip to content

Theory of operation

Alex-Kent edited this page Apr 5, 2019 · 4 revisions

Overview

A given index comprises sets of tiles at various zoom levels with each tile containing a list of the points that lie within it. The lowest level of the index covers the entire globe. Each higher index level contains twice as many tiles in each direction. At each zoom level points are linearly mapped to grid tiles based on their latitudes and longitudes using an equirectangular projection. This is fairly analogous to how typical web slippy maps are organized (though they use a pseudo-mercator projection).

As one approaches the poles the tiles become increasingly distorted with the area (in square meters) covered by each tile becoming progressively smaller. The distance in meters for one degree of longitude gets smaller as one moves away from the equator. The distance in meters for one degree of latitude, however, remains constant at all latitudes.

Each tile has a name that gives its zoom level and position. These names are used as keys into a Perl hash allowing the quick retrieval of the points that lie within a given tile. The various search methods are designed to efficiently pull points from this index and filter them in various ways. The format used for the keys is described in the Tile naming section below.

Additional datastructures are also present but knowing their details is not needed to understand how the index functions. In the descriptions below, some minor (but often critical) details have been omitted and some simplifications have been made; these details (mainly edge cases) are discussed in the code comments.

Populating the index

When a point is added to the index it is stored multiple times in the index hash, once for each level. This is done as follows:

  • The point's latitude and longitude are converted to integers. This is done using a simple linear mapping. In pseudo-code, the equations used are:

      max_size = 2**levels
      
      integer_latitude  = floor( ( latitude + 90.0 )  * max_size / 180.0 )
      integer_latitude  = max_size - 1 if (integer_latitude == max_size)
      
      integer_longitude = floor( ( longitude + 180.0 ) * max_size / 360.0 ) % max_size
    

    The values for latitude and longitude are in degrees and levels is the number of levels in the index (not counting the global one).

  • Each index level is looped through from the index's maximum level to 0. At each level, the key (comprised of level, integer_latitide, and integer_longitide, see also below) is used to retrieve the corresponding value from the index hash. This value is a reference to the list of points that lie within the grid tile named by the key. The point being indexed is added to the retreived list. If there is no list stored in the index for the current key then a new list is created and added. As a special case, all points adjacent to the poles (that is points with integer latitudes of 0 or max_size - 1) use the longitude ALL in their keys.

    Once the point has been added, the integer latitudes and longitudes as well as the max_size are shifted right by one bit in preparation for the the next level.

  • Once a the point has been added to the index at each level, the point is added to the global index entry using the key ALLALLALL. (All indexed points can be found under this key.)

Basic searching

The Search(…) method is typically used to find all points lying within a given radius of a search point. Two steps are performed by this method: retrieval of preliminary results and filtering of the results based on the search criteria.

If no search radius was specified, if a global search was requested, or if the search radius covers more than half the globe then the preliminary results are all points in the index. Otherwise, the preliminary results are gathered as follows:

The appropriate tile zoom level to use is determined using:

shift = ceil( log2( search_radius / half_circumference ) )
level = max_level - shift

This results in the smallest level that could potentially contain all result points within a single tile.

The search radius (in meters) is converted to two search angular radii, one for latitude and one for longitude. This is done since the number of meters per degree longitude decreases as one approaches the poles. Thus the north-south (latitude) search radius remains constant at all latitudes but the east-west (longitude) search radius increases as one nears the poles.

Each extreme is converted to an integer and shifted right by the determined shift, The preliminary results are retrieved from the index by iterating over the keys for the computed level, bounded by the integer extrema. This typically, but not always, results in a list of pointers to four tiles' points.

If the quick_results option is active then this preliminary list of lists of points is returned. If not then the points are filtered to only include those matching the search criteria. The filtered points are optionally sorted and then returned. Note that when large numbers of points have been found this filtering can be very slow; see PERFORMANCE for details.

Proximity searching

The Closest(…) and Farthest(…) methods find the points closest to (or farthest from) a search point. The Closest(…) method works as follows:

The search starts at the most detailed level of the index and proceeds to the least detailed (0). At each level, the grid tile that the search point lies in along with the three closest grid squares are identified. The method used for selecting the adjacent tiles is to look at the least-significant bits of the integer position at the previous (more detailed) level. A 1 bit for the latitude selects tiles to the north, a 0 bit the ones to the south. Likewise a 1 for the longitude selects the ones east and a 0 the ones west.

Now that the four tiles have been identified, the largest radius from the search point to the tile edges is determined. The distance from the search point to each point within the four tiles is measured. If the point is within the radius computed and it passes any pre- or post-condition tests it is added to the results list. To speed up processing, points that have already been rejected along with the distances so far measured are cached. As a convenience, by
default a filter is applied that omits the search point from the results.

Once all points within a level's four chosen tiles have been gathered a check is done to see whether at least the requested number of points have been found. If they have then the loop ends, if not then the next (less-detailed) level is processed.

By default, the results are then sorted by distance which ensures that the closest results are earliest in the list. This is necessary since although the nature of the algorithm tends to place closer points earlier in the results there is no inherent order to the points added from a particular index level. Lastly, the requested number of result points is returned.

The Farthest(…) method is largely implemented as a wrapper for Closest(…). It functions by finding the closest points to the search point's antipode.

Searching by bounding box

The SearchByBounds(…) method works much the same as Search(…) method. Instead of computing extrema of a search circle, those of the supplied bounding box are used. The tile level used is max( latitude_level, longitude_level ) where latitude_level and longitude_level are the most detailed levels that could potentially (given their extrema's angular distances) contain their respective extrema within a single tile in each direction. The remainder of the method is identical to that of Search(…) albeit with all distance-related code removed.

Tile naming (key generation)

As mentioned earlier, keys consist of a zoom level, a latitude, and a longitude. Each key uniquely names a given tile.

Zoom levels are either integers between 0 and the maximum zoom level or the special zoom level ALL (with the value -1) that covers the entire globe. Latitudes and longitudes are integers between 0 and one less than the maximum grid size for the level. The tiles immediately adjacent to the poles are treated differently. In these areas the coverage of each tile is quite small and the algorithm around the poles would normally be complex. To accommodate these issues, the special value ALL (with the value -1) is used for the longitude of the polar tiles (those areas with the lowest or highest latitude value for the key's level). All points lying in a polar region are assigned to that region's overarching ALL tile. At the global level all three components are set to ALL.

If Perl has been compiled with 64-bit support then each key is packed into a 64 bit integer. The level is stored in the upper 6 bits (bits 58 .. 63), the integer latitude in the next 29 bits (bits 29 .. 57), and the integer longitude in the low 29 bits (bits 0 .. 28). To represent the ALL value all bits in the relevant field are set to 1. Note that even on 32-bit systems Perl is often compiled with 64-bit support.

If Perl does not have 64-bit support then a different format is used. In most places within Geo::Index, keys are stored as three-element array references. The first field contains the level, the second the integer latitude and the third the integer longitude. If present, ALL values are stored as-is as their integer value (-1). For accessing the index, keys are converted to strings with the format "level:latitude,longitude" with the literal string "ALL" being used for ALL values.

Object structure

Each index object is a hash containing a number of entries These are:

  • $self->{index} - The points index

    Entry is a hash reference. Keys are tile names (as discussed above), values are lists of point references.

  • $self->{indices} - Indices used for each point

    Entry is a hash reference. Keys are point references, values are lists of tile names.

  • $self->{positions} - Each point's position when indexed

    Entry is a hash reference. Keys are point references, values are two-element lists giving each point's latitude and longitude at the time it was indexed.

  • $self->{levels} - Number of levels in the index (excluding the global level)

  • $self->{max_level} - The highest-resolution level number (i.e. levels - 1)

  • $self->{max_size} - Number of grid tiles in each direction at most detailed level of index

  • $self->{planetary_radius} - The planetary radius used by the index (in meters)

  • $self->{polar_circumference} - The polar circumference used by the index (in meters)

  • $self->{equatorial_circumference} - The equatorial circumference used by the index (in meters)