# Characterizing Raster Data around Buffered Spatial Points

Author: Matt Oakley

One important thing we can do with data in Raster format is calculate statistical values such as mean, standard deviation, variance, etc. around points in the data. By doing so, we're able to find out attributes about the data and can generalize areas of it as opposed to dealing with every single individual cell.

## Objectives

- Extract points in area with respect to an inputted radius
- Compute statistics from these points

## Dependencies

- GDAL
- NumPy

In [1]:
!conda install gdal
!conda install numpy

Fetching package metadata .......
Solving package specifications: ..........

# All requested packages already installed.
# packages in environment at /home/user/anaconda2:
#
gdal                      2.0.0                    py27_1  
Fetching package metadata .......
Solving package specifications: ..........

# All requested packages already installed.
# packages in environment at /home/user/anaconda2:
#
numpy                     1.11.1                   py27_0  


In [2]:
from osgeo import gdal
import numpy as np

## Extracting Points in the Area

Our first objective will be to specify a radius that we want to use. This radius will essentially specify how many points/how large we want our buffer area to be from our central point. The larger the radius, the more generalized our data and subsequent statistical values will become. Let's first just read in our data and see the extents of our data so we can pick a point to center our buffer.

In [3]:
#Read in our data
filename = "front_range.dem"
data = gdal.Open(filename)
data_array = np.array(data.GetRasterBand(1). ReadAsArray())
num_rows = data_array.shape[0]
num_cols = data_array.shape[1]

print "Number of rows:", num_rows
print "Number of cols:", num_cols

Number of rows: 466
Number of cols: 357


Now that we know the number of rows and columns that make up our raster, we can pick a points to center our buffer/area. With the example data, we have 466 rows and 357 columns. Therfore, we can pick anywhere between 0-466 and 0-357 for our x and y coordinates, respetively. Let's choose (100, 200) for our center, set our radius to 2, and extract the surrounding points. This means that all values within 2 cells of our center point will be taken into consideration when calculating statistics on the data.

In [4]:
coords = (100, 200)
radius = 2

buffer_list = []
lower_x = coords[0] - radius
upper_x = coords[0] + radius
lower_y = coords[1] - radius
upper_y = coords[1] + radius

for row in range(lower_x, upper_x + 1):
    for col in range(lower_y, upper_y + 1):
        val = data_array[row][col]
        buffer_list.append(val)

buffer_array = np.asarray(buffer_list)

print "Center point at coordinates " + str(coords) + ": " + str(data_array[100][200])
print "Radius:", radius
print "Points in buffer [line 1]: ", buffer_array[0:5]
print "Points in buffer [line 2]: ", buffer_array[5:10]
print "Points in buffer [line 3]: ", buffer_array[10:15]
print "Points in buffer [line 4]: ", buffer_array[15:20]
print "Points in buffer [line 5]: ", buffer_array[20:25]

Center point at coordinates (100, 200): 1915
Radius: 2
Points in buffer [line 1]:  [1868 1879 1888 1901 1912]
Points in buffer [line 2]:  [1880 1889 1903 1915 1929]
Points in buffer [line 3]:  [1890 1900 1915 1929 1944]
Points in buffer [line 4]:  [1898 1911 1925 1943 1961]
Points in buffer [line 5]:  [1905 1919 1935 1956 1967]


## Compute Statistics

Now that we have all of the values within the buffer/area, we can very easily use NumPy to compute statistics such as mean, standard deviation, and variance on the data.

In [5]:
mean = np.mean(buffer_array)
std = np.std(buffer_array)
variance = np.var(buffer_array)

print "Mean:", mean
print "Standard Deviation", std
print "Variance", variance

Mean: 1914.48
Standard Deviation 25.8969032898
Variance 670.6496
