Skip to content
Pedro R. Andrade edited this page Sep 2, 2019 · 33 revisions

Neighborhoods from Geospatial Data

Pedro R. Andrade, Rodrigo Avancini

Summary

Introduction

Neighborhoods describe how spatial entities are connected. They summarize (possibly complex) geospatial relations into a simple representation that makes the simulation faster. This tutorial describes how to create Generalized Proximity Matrixes (GPM) within TerraME. GPM is based on the idea that Euclidean spaces are not enough to describe the relations that take place within the geographical space GPM is composed by a set of strategies that try to capture such spatial warp, computing operations over sets of spatial data.

In the next sections, we will describe the basic structure of the implementation and present some examples of creating proximity matrixes. For more information about GPM, see Aguiar et al. (2003).

In this tutorial, we use some spatial data available within gpm package (documentation available here):

  • a set of lines, representing roads;
  • a set of points, representing the center of communities;
  • a set of polygons, representing farms.
  • a set of cells (created from farms using script cells.lua within the package itself).
import("gis")

proj = Project{
	file = "gpm-test.tview",
	clean = true,
	roads = filePath("roads.shp", "gpm"),
	communities = filePath("communities.shp", "gpm"),
	farms = filePath("farms.shp", "gpm"),
	cells = filePath("cells.shp", "gpm")
}

roads       = CellularSpace{project = proj, layer = "roads"}
communities = CellularSpace{project = proj, layer = "communities"}
farms       = CellularSpace{project = proj, layer = "farms"}
cells       = CellularSpace{project = proj, layer = "cells"}

Figure below shows them.

Basic Strategies

This section presents the strategies that use Euclidean spaces to create GPM. All the code below uses gpm package, that can be loaded using:

import("gpm")

Neighborhoods can be built from any vectorial data. Different strategies are available according to the spatial representation of the origin and destination to be connected. They are summarized in the table below.

From To Points To Lines To Polygons
Points Distance, Network Distance, Network Distance, Network
Lines Distance, Network Distance, Network Distance, Network
Polygons Contains, Distance, Network Distance, Length, Network Area, Border, Distance, Network

Area

The first example of GPM starts with a strategy that uses the intersection area to create relations between polygons. Two spatial objects are connected if they have some intersection area. We can declare a GPM using cells as origin, farms as destination, and area as strategy to create relations based on the intersection area.

gpm = GPM{
    origin = cells,
    strategy = "area",
    destination = farms
}

While gpm is being created, TerraME shows messages on the screen indicating which object is being processed. To avoid such messages, it is possible to set argument progress = false. The GPM stored in object gpm can then be saved as a .gal file by using save(), presented in more details in last section.

GPM can also be used to create new attributes for the CellularSpace using fill(). For example, if we want to count how many polygons cover each cell, we can use strategy = "count". In this case, we will also use an optional argument max = 5 to indicate that, if a given cell has more than five neighbors, it will use five as the output value.

gpm:fill{
    strategy = "count",
    attribute = "quantity",
    max = 5
}

This function creates an attribute named quantity in the origin of the gpm (i.e., cells). We can then create a Map to visualize the output.

Map{
    target = cells,
    select = "quantity",
    min = 0,
    max = 5,
    slices = 6,
    color = "Reds"
}

The output is shown below. Note that this attibute was not yet saved into a file. To do so, it is necessary to call CellularSpace:save() explicitly. For example:

cells:save("gpm", "quantity") -- create layer "gpm" pointing to file "gpm.shp"

Some fill strategies have the option to copy attributes from the destination to the origin. For example, if we want to create an attribute with the maximum intersection area, as it is related to a single destination, we can copy an attribute of this destination to the origin. The code below uses this strategy and copies the attribute "id" from the destination to the origin. This attribute will be called "farm" in the origin.

gpm:fill{
    strategy = "maximum",
    attribute = "max",
    copy = {farm = "id"}
}

To paint them with different colors, we use the rest of division of the id by 9. In the code below, we replace the value of farms with this content and then create a Map with nine colors to visualize the output.

forEachCell(gpm.origin, function(cell)
    cell.farm = tonumber(cell.farm) % 9
end)

Map{
    target = gpm.origin,
    select = "farm",
    min = 0,
    max = 8,
    slices = 9,
    color = "Set1"
}

The output is shown below. Note that it is possible to recognize the largest farms in this figure. If one wants to better represent the relations to smaller farms, it is necessary to use cells with a smaller resolution.

Distance

The strategy "distance", as default, computes the Euclidean distance between each pair of object to create relations. It can be used with any geometry because it uses the centroids of the objects. The example below creates relations between cells and communities based on their distances.

gpm = GPM{
    origin = cells,
    destination = communities,
    strategy = "distance"
}

An attribute can then be created using the community closest to each cell using the code below. This funcion also copies the attribute "LOCALIDADE" of the closest community to the cells.

gpm:fill{
    strategy = "minimum",
    attribute = "dist",
    copy = "LOCALIDADE"
}

Map{
    target = cells,
    select = "dist",
    slices = 8,
    min = 0,
    max = 7000,
    color = "YlOrRd",
    invert = true
}

The output is shown below.

It is also possible to visualize the closest community using the copied attribute "LOCALIDADE".

Map{
    target = cells,
    select = "LOCALIDADE",
    value = {"Palhauzinho", "Santa Rosa", "Garrafao", "Mojui dos Campos"},
    color = "Set1"
}

As we created an attribute using all destinations, it is possible to create one attribute per destination in the origin, using the distances as values. For example, the code below creates four attributes, "d_0","d_1", "d_2", and "d_3", with the distance to the respective points.

gpm:fill{
    strategy = "all",
    attribute = "d"
}

The code and image below shows the values of attribute "d_0" using eight slices.

Map{
    target = cells,
    select = "d_0",
    slices = 8,
    min = 0,
    max = 10000,
    color = "YlOrRd",
    invert = true
}

Another option to compute distance relations is to use a threshold with a given a maximum distance. The code below creates relations only when the distance is at most 4000m.

gpm = GPM{
    origin = cells,
    destination = communities,
    distance = 4000
}

Note that now cells can have different quantities of neighbors. The code below creates an attribute counting how many neighbors each cell has.

gpm:fill{
    strategy = "count",
    attribute = "quantity"
}

Map{
    target = cells,
    select = "quantity",
    min = 0,
    max = 5,
    slices = 6,
    color = "RdPu"
}

To plot the nearest neighbor, it is necessary to compute the closest neighbor first. The code below creates attribute "dist", copying attribute "LOCALIDADE" and using value 7000 in those cells that do not have any nearest neighbor (those that are far from 4000m from the closest community).

gpm:fill{
    strategy = "minimum",
    attribute = "dist",
    dummy = 7000,
    copy = "LOCALIDADE"
}

As this operation can create some nil values, those cells that do not have attribute LOCALIDADE need to be filled with some value. The code below fills them with a value called "<none>", and draws a Map using "LOCALIDADE". The cells painted with orange have this value.

forEachCell(cells, function(cell)
    if not cell.LOCALIDADE then
        cell.LOCALIDADE = "<none>"
    end
end)

Map{
    target = cells,
    select = "LOCALIDADE",
    value = {"Palhauzinho", "Santa Rosa", "Garrafao", "Mojui dos Campos", "<none>"},
    color = "Set1"
}

Length

The strategy presented in this section computes neighborhoods based on the intersection between lines and cells. Each cell is connected with the line segments that intersects it by using strategy "length". The code below creates a neighborhood between "cells" and "rodovias".

gpm = GPM{
    origin = cells,
    strategy = "length",
    destination = roads
}

To visualize the output, the code below counts the number of neighbors, using a maximum of one (any cell with more than one neighbor will have one as attribute value).

gpm:fill{
    strategy = "count",
    attribute = "quantity",
    max = 1
}

Map{
    target = cells,
    select = "quantity",
    value = {0, 1},
    label = {"0", "1 or more"},
    color = {"gray", "blue"}
}

Contains

In this section, we present a function that computes neighborhoods between a layer of polygons and a layer of points based on spatial relation "contains". A cell is connected to the points located inside its area. The code below creates a neighborhood between the layer "cells" and the layer "comunidades".

gpm = GPM{
    origin = cells,
    strategy = "contains",
    destination = communities
}

It then creates an attribute "quantity" that counts the number of points within each polygon and draws this attribute within a map.

gpm:fill{
    strategy = "count",
    attribute = "quantity"
}

Map{
    target = cells,
    select = "quantity",
    value = {0, 1},
    color = {"lightGray", "blue"}
}

The output is shown in the figure below.

Border

The strategy "border" computes relations between polygons based on their common border. The example below uses a set of five polygons stored in file "partofbrazil.shp", within gpm package, represented by object states. It creates a GPM from states to themselves by omitting the destination.

local states = CellularSpace{
    file = filePath("partofbrazil.shp", "gpm"),
    geometry = true
}

local gpm = GPM{
    origin = states,
    strategy = "border",
    progress = false
}

This example manipulates the relations stored within gpm directly. This object has an attribute neighbor, a table whose indexes are unique identifiers from the origin, and whose value are named tables mapping the neighbors (indexes) to the weight (values). Code below shows how to retrieve such values from a GPM.

forEachOrderedElement(gpm.neighbor, function(idx, neigh)
    print(states:get(idx).name)

    forEachOrderedElement(neigh, function(midx, weight)
        print("\t"..states:get(midx).name.." ("..string.format("%.2f", weight)..")")
    end)
end)

This script will produce the following output, showing some Brazilian states and their neighbors. The value between parenthesis indicates the percentage of the origin's border shared with the given destination.

MINAS GERAIS
	RIO DE JANEIRO (0.10)
	SAO PAULO (0.25)
	ESPIRITO SANTO (0.12)
PARANA
	SAO PAULO (0.32)
RIO DE JANEIRO
	MINAS GERAIS (0.29)
	SAO PAULO (0.14)
	ESPIRITO SANTO (0.09)
SAO PAULO
	MINAS GERAIS (0.38)
	PARANA (0.26)
	RIO DE JANEIRO (0.07)
ESPIRITO SANTO
	MINAS GERAIS (0.48)
	RIO DE JANEIRO (0.11)

Network

In human-environment models, usually Euclidean spaces are not enough to describe the relations that take place in the geographic space. Networks can then be used to better compute such relations.

  1. The original data has to be very well represented, with the starting and ending points of two lines being connected to one another only when they share the same position in space.
  2. A given path from an origin to a destination enters in the Network only once. Because of this, the Network must be fully connected.

In this type of network, it is possible to enter and leave the roads in any position.

The type Network is used to generate the network. It takes as arguments the destination (reference) points, the lines that will be used to represent the network, and a function that computes the distance on the network given the length of the lines and their id. The code below creates a Network that reduces the distance within the network by one fifth of the Euclidean distance for paved roads and by half on the others. The attribute "STATUS" of the table connected to the layer of lines indicates whether the road is paved or not.

When we use Networks, the created attributes are not distances (unless the weight and outside functions return the distance itself). Usually it has a meaning of cost.

network = Network{
    target = communities,
    lines = roads,
    weight = function(distance, cell)
        if cell.STATUS == "paved" then
            return distance / 5
        else
            return distance / 2
        end
    end,
    outside = function(distance) return distance * 4 end
}

Fixing Networks

The algorithm to create a Network verifies if the data is valid. Whenever there is some problem, TerraME stops and prompts the lines with problem. It is possible to fix the Network using software like QGIS. Some functionalities of this QGIS are interesting to solve geometric problems. First, it is possble to visualize the features with problems. Click in the icon Select features using an expression (Ctrl + F3), as shown below.

Then write the number of the feature using the expression $id = <number>. The figure below selects the feature with id 1. It is then possible to Zoom to the selected feature to investigate the problem.

It is possible to simply remove the lines with problems, as long as they do not affect the connectivity of the Network. To do so, use the following icons:

If the lines must be edited, sometimes it is interesting to visualize all the points of the lines, which can be computed using Extract vertices, shown below.

After editing the data, just save it and run the script again in TerraME to check if everything is now working properly.

GPM from Network

A GPM can then be created using the Network as destination.

gpm = GPM{
    destination = network,
    origin = cells
}

gpm:fill{
    strategy = "minimum",
    attribute = "dist",
    copy = "LOCALIDADE"
}

Map{
    target = cells,
    select = "dist",
    slices = 10,
    min = 0,
    max = 14000,
    color = "YlOrBr"
}

Map{
    target = cells,
    select = "LOCALIDADE",
    value = {"Palhauzinho", "Santa Rosa", "Garrafao", "Mojui dos Campos"},
    color = "Set1"
}

Figure below shows the polygons drawn with the color of the closest point through the network.

Neighborhood files

Nighborhoods created from geospatial data usually takes a considerable time to be computed. As it is normally considered as an exogenous property of the model and will not change along a simulation, it is usually recommended to save them into files to be loaded in the beginning of the simulation. Once we have created a GPM through one of the strategies presented above, we can save it in a file, which can be a gal file (".gal"), a gwt file (".gwt"), of a gpm file (".gpm") through the function save(). The only argument of this function is the file name to be saved.

gpm:save("myfile.gal")

This neighborhood can then be loaded into a simulation using CellularSpace:loadNeighborhood().

cs:loadNeighborhood{file = "myfile.gal"}

The three available file extensions are summarized below and described as follows.

gal gwt gpm
Number of attributes Zero One Zero or more
How many layers does it connect? One One One or two
Author Anselin (2005) Anselin (2005) TerraME team

gal

GAL files store connections between objects, without using weight. These objects must belong to the same dataset, which means that a GPM must be created with origin being equals to destination. The first line of the file contains a header with the following fields:

  1. Character "0" indicating that it is a neighborhood file.
  2. Number of objects in the data;
  3. Name of the layer (or file) over which GPM was created;
  4. Name of the object attribute used as unique identifier for the objects.

The relations are described from the second line until the end of file. Each relation uses two lines. The first contains an unique identifier of the object and then the number of neighbors of the object. The second line contains the unique identifier of each of its connections. See a general descripttion below.

0 Num_Objects Layer Key_Variable
ID_Object_1 Num_Neighbors
ID_Neighbor_1 ID_Neighbor_2 ... ID_Neighbor_N
...

A small example of a GAL file is shown below. This file was created from the output of the GPM created in section Border.

0 5 partofbrazil.shp object_id_
0 3
2 3 4
1 1
3
2 3
0 3 4
3 3
0 1 2
4 2
0 2

gwt

Gwt format describes relations using an additional value indicating the connection weight. It also requires that the objects must belong to the same dataset (origin equals to destination). The header of the gwt format is the same of gal. From the second line until the end of file, it stores one connection by line using the following fields:

  1. Unique identifier of the object N-th object;
  2. M-th neighbor of the N-th object;
  3. Weight (attribute value) of the relation between the N-th object and the M-th neighbor.

The structure is presended below.

0 Num_Objects Layer Key_Variable
ID_Object_1 ID_Neighbor_1 Weight_Neighbor_1
ID_Object_1 ID_Neighbor_2 Weight_Neighbor_2
...
ID_Object_1 ID_Neighbor_N Weight_Neighbor_N
ID_Object_2 ID_Neighbor_1 Weight_Neighbor_1
...

A small example of a GWT file is shown below. This file was created from the output of the GPM created in section Border.

0 5 partofbrazil.shp object_id_
0 2 0.096047689354866
0 3 0.25358518320813
0 4 0.12346779200856
1 3 0.3231305819424
2 0 0.28661770283377
2 3 0.13545641162995
2 4 0.085637388476816
3 0 0.38457681249402
3 1 0.26322254348472
3 2 0.06884029109318
4 0 0.4839820977464
4 2 0.11249233755161

gpm

Gpm is a more general format that can connects two different layers and store any amount of attributes to each connection. The structure of the GPM file contains a header in the first line, with the following fields:

  1. Number of attributes associated to each relation
  2. Name of the first layer (origin) of GPM. If the data was loaded directly from a shapefile, it will be the name of the file.
  3. Name of the second layer (destination). Connections are established from the first layer to the second one. If the GPM was created from a single layer, then the name of the first layer will be repeated in this field.
  4. Names of the GPM attributes.

The second line until the end of the file describe the connections. The neighborhood of each object uses two lines. The first one contains:

  1. Unique identifier of the N-th object;
  2. Number of neighbors of the N-th object.

The second line contais the neighborhood of the object which ID is in the previous line, represented by the fields below.

  1. M-th neighbor of the N-th object;
  2. Value of the k-th attribute of the M-th neighbor.

The structure of a GPM file is summarized below.

Num_attributes Layer_1 Layer_2 Attribute_1 Attribute_2 ... Attribute_N
ID_Object_1 Num_Neighbors
ID_Neighbor_1 Attrib_1_Neigh_1 Attrib_2_Neigh_1 ... Attrib_N_Neigh_1 ID_Neighbor_2 ...
ID_Object_2 ...
...

An example of a GPM file with one attribute is shown below. This output is part of the file produced by the GPM created in section Network.

111 farms_cells.shp communities.shp object_id_
0 4
0 5501.9562754449 1 6153.7686145953 2 10641.235146975 3 13929.451501441
1 4
0 7365.1627505185 1 8020.7951384333 2 9759.3030416018 3 12486.174413104
10 4
0 5012.8344428588 1 3069.7189628308 2 9426.2638737062 3 15346.631146027
100 4
0 19168.291791002 1 8013.275527337 2 3320.4380158657 3 10619.245311459
101 4
0 18116.038549794 1 6568.2057569898 2 4562.1923447465 3 8744.1659540264

References

Aguiar et al. (2003) Modeling Spatial Relations by Generalized Proximity Matrices. Proceedings of V Brazilian Symposium in Geoinformatics (GeoInfo'03).

Anselin, L. (2005) Exploring Spatial Data with GeoDaTM : A Workbook. http://www.csiss.org/clearinghouse/GeoDa/geodaworkbook.pdf