# FPGA implementation of an efficient similarity-based adaptive window algorithm for real-time stereo matching



Journal of Real-Time Image Processing manuscript No.

(will be inserted by the editor)

Madaín Pérez-Patricio · Abiel Aguilar-González

### FPGA Implementation of an Efficient Similarity-based Adaptive Window Algorithm for Real-time Stereo Matching

Preprint version, the final publication is available at http://link.springer.com/article/10.1007/s11554-015-0530-6

Received: date / Revised: date

**Abstract** The stereo matching is one of the most widely used algorithms in real time image processing applications such as positioning systems for mobile robots, threedimensional building mapping and both recognition, detection and three-dimensional reconstruction of objects. In area-based algorithms, the similarity between one pixel of the left image and one pixel of the right image is measured using a correlation index computed on vicinities of these pixels called correlation windows. In order to preserve edges, small windows need to be used. On the other hand, for homogeneous areas, large windows are required. Due to only local information is used, matching between primitives is difficult. In this paper, FPGA implementing of an efficient similarity-based adaptive window algorithm for dense disparity maps estimation in real-time is described. To evaluate the proposed algorithm behavior, the developed FPGA architecture was simulated via ModelSim-Altera 6.6c using different synthetic stereo pairs and different sizes for correlation window. In addition, the FPGA architecture was implemented in a FPGA Cyclone IIEP2C35F672C6 embedded in the Altera development board DE2. The disparity maps are computed at a rate of 76 frames per second for stereo pairs of 1280×1024 pixel resolution and a maximum expected disparity equal to 15. The proposed algorithm possesses a significantly better performance regarding to other correlation-based algorithms. Furthermore, the developed FPGA architecture offers better results with respect to the most of real-time area-based stereo matching algorithms reported in the literature, allows increasing the processing speed up to 93,061,120 pixels per second and enables it to be implemented in the majority of the medium-gamma FPGA devices.

 Madaín Pérez-Patricio · Abiel Aguilar-González ( $\boxtimes$ ) Instituto Tecnológico de Tuxtla Gutiérrez, Tuxtla Gutiérrez, México. Departamento de investigación y posgrado.

Tel.: +123-45-678910

E-mail: 13270869@ittuxtlaguitierrez.edu.mx

#### 1 Introduction

The perception of the depht values of the points contained in a scene is one of the most important tasks of the computer vision systems and has been used in several applications such as recognition, detection and three-dimensional reconstruction of objects and positioning systems for mobile robots [4–6, 8, 17, 20, 35, 42–44, 46, 47, 54, 60].

Although numerous techniques exist to determine the depth of a scene, to extract the information referring to the depth from images obtained by a stereo configuration has become the most used technique. In this technique the correspondence between stereo pairs and the geometrical configuration of the stereo camera allows to obtain images of depth called disparity maps [11]. In order to determine a disparity map it is necessary to measure the similarity of the points contained in the stereo pair. Techniques to determine these similarities are divided in two categories: area-based algorithms [2, 23, 37, 51] and feature-based algorithms [12, 16, 26, 27, 52, 56].

Area-based algorithm use the gray scale value of the surrounding pixels to the interest pixel for similarity stimation and produce dense disparity maps, i. e. they compute disparity for each point in stereo pair. These algorithms are more efficient in runtime, computer resource consumption and mathematical simplicity in comparison with features-based algorithms. On the other hand, feature-based algorithms are based on certain points of interest and are more stable against changes of contrast, environment conditions and illumination due to they represent the geometric properties of the scene and the interest points are selected according to detectors of specific features. The main restriction of feature-based algorithms is that they do not allow to generate dense disparity maps, and therefore they often need to be applied with other techniques. Additionally, a pre-processing stage for the extraction of features is necessary, which increases the computational resource consumption and runtime.

Due to FPGAs devices allow high speed handling of a great deal of information, several algorithms for the estimation of disparity maps have been implemented in these devices [22, 25]. Depending on the configuration of the cameras, the range of disparity levels varies; in the case of implemented algorithms in FPGA, this implies a significant increasement of the consumption of hardware resource, this has motivated diverse authors to study the possibility of reducing that disadvantage [30, 40] and search for new approaches to implement stereo vision algorithms in FPGA devices.

The system presented in [58] consists in an  $4\times4$  array of FPGAs connected in mesh type configuration, authors use a maximum total of near 35,000 LUT of 4 inputs, allowing to process 40 frames per second for images of  $320\times240$  pixel resolution. In [7], a structure based on four FPGAs Virtex 2000E of Xilinx is presented, obtaining dense disparity maps at a speed of 40 frames per second for images of  $256\times360$  pixel resolution. In [9], the use of a single FPGA is proposed, the developed system processes images at 30 frames per second using images of  $640\times480$  pixel resolution.

The architecture developed in [41], uses a technique based on SAD to calculate the optical flow efficiently, the system generates dense vectorial maps at speeds superiors to 800 frames per second for images of 320×240 pixels, and 30 frames per second for images of 640×480 pixel resolution. A correlation window of  $7 \times 7$  with a maximum expected disparity equal to 121 is used. A modification of SAD is shown in [32], the authors of this work synthesize diverse versions of SAD to determine the needs and the performance of the hardaware resource, by decomposing the correlation window of SAD in rows and columns using buffers a saving of resource of around 50% is reached. Using different forms of windows, the high consumptions of memory decreases without any detriment of the quality. Disparity maps are calculated at a speed of 122 frames per second for images of  $320 \times 240$ pixels and a maximum expected disparity equal to 64.

The architecture in [38] uses four FPGAs to conduct a rectification in real-time, later, a verification of left-right consistency was applied in order to improve the quality of the produced disparity map. Speeds of 30 frames per second are reached for images of  $640 \times 480$  pixel resolution and a maximum expected disparity equal to 128. In [13] one module for real-time disparity maps computation is proposed, the module was implemented in a single FPGA Altera of the family Stratix IV, disparity maps are computed at a rate of 320 frames per second for images of  $640 \times 480$  pixels and a maximum expected disparity equal to 80.

The module developed in [15] enables to process 275 frames per second for images with a maximum expected disparity equal to 80 and 640×480 pixel resolution, the presented architecture provides a high speed of processing at expenses of the accuracy with great scalability in terms of disparity levels. A technique of adaptive win-

dow in combination with SAD is used in [45], the algorithm processes images of up to  $1024\times1024$  pixels and a maximum expected disparity equal to 32 at 47 frames per second. In [51] an FPGA correlation-edge distance approach for disparity map is proposed. Speeds of 76 frames per second are reached for images of  $1280\times1024$  pixel resolution and a maximum expected disparity equal to 15. By using a geometric feature, the euclidean distance between the selected point and the nearest left edge, the developed FPGA architecture provides a significant improvement over others conventional correlation-based stereo matching algorithms, allowing to maintian a low consumption of hardware resource and high speed processing.

#### 1.1 Adaptive window algorithm

Several adaptive algorithms have been proposed to improve results in both depth discontinuities and homogeneous areas. [28] changes correlation window size and shape iteratively according to the local variation of the gray scale values and stimates the current depth. However, the algorithm is computationally expensive and sensible to the initial depth estimates [49]. Autors of [55] has changed the window size and shape by optimization over a large class of compact windows via minimum ratio cycle. The algorithm presented in [33] proposes using edges in the reference image to determine the size of a rectangular window. In [61], pixels are aggregated addaptively based on pixel similarity using a tree structure. Autors of [39] proposes the aggregation process cost from a perspective of a histogram, reducing the complexity of this process. These algorithms can not be implemented in a dedicated hardware for real-time processing.

To simplify the adaptive algorithms, efficient multiple windows algorithms have been proposed. Autors of [1, 21], compute correlation coefficients on nine windows, and the one yielding the lowest value is retained. In [19] the use of a central window surrounded by several support windows is proposed. The correlation coefficients of the best support windows, i.e. the lowest values, are added to the coefficient computed on the central window. The reduced number of windows used in these algorithms cannot cover the whole range of different sizes and shapes required in all the situations. The use of non-parametric measures has been proposed by autors of [62]. In the Census transform, each pixel and its surrounding is mapped into a vector of boolean variables, which denoting the ordering relation between the center pixel and a vicinity pixel. Boolean vectors are compared using the Hamming distance. Hamming distances are summed over a small local area and the shift that minimizes Hamming distance is retained as the disparity. Non-parametric measures reduces the sensitivity to outliers but not resolves the problem of the window size due to the window size must remain small.



Fig. 1: Disparity maps generated for different test synthetic stereo pairs by applying the SAD algorithm

In this research we are interested on stereo matching algorithms that can be implemented in dedicated hardware for real-time processing. The most adapted are correlation-based algorithms such as the sum of absolute differences (SAD), because they have a regular structure with fixed runtime. Several systems that use correlation-based algorithms have been described in the literature [29, 31, 57, 59].

#### 1.2 Correlation using fixed size windows

In majority of area-based algorithms, a rectangular vicinity centered on a reference pixel in one of the images from stereo pair is compared with similar vicinities for some pixels in the same raster line of the other image. Vicinities are called correlation windows and can be compared using a correlation-based measure such as the Sum of Absolute Differences (SAD):

$$C_l(x, y, s) = \sum_{i = -w_x}^{i = w_x} \sum_{j = -w_y}^{j = w_y} |I_l(x + i, y + j) - I_r(x + s + i, y + j)|,$$

where  $I_l(x+i,y+j)$  and  $I_r(x+i+s,y+j)$  are the grey scale values of the pixels within the window in both images, called the left and right images respectively.  $(2 \times w+1)^2$  is the window size, s is the shift of the window in the right image and the maximal shift of the correlation window in the right image is  $s_m$ . A correlation coefficient is determined for each pixel and the shift that minimizes the correlation coefficient is retained as the disparity. These algorithms yield a dense depth map, but they need a high runtime.

Disparity maps generated by applying the SAD algorithm on different synthetic stereo pairs are shown in  $\mathbf{Fig.}\ 1$ . The main problem with this algorithm is to select the correlation window size. High window size values allow to determine the correct correlation values in areas with uniform texture. However, these window sizes imply a high computational demand and erroneous values at certain points due to the blurring edges and that small features are eliminated  $\mathbf{Fig.}\ 1.(c),(f),(i)$ . On the other hand, small window sizes imply low computational demand but the correlation coefficient measurement is sensitive to noise, hence, erroneous values at uniform texture regions are generated as seen in  $\mathbf{Fig.}\ 1.(b),(e),(h)$ . In order to improve this behavior, the use of an adaptive correlation window is proposed.

In this paper, an efficient area-based stereo matching algorithm in which the size and shape of the correlation window are adjusted by each pixel in the reference image according to its content and his FPGA implementation are described. The proposed algorithm uses the grey scale values variations in the window as a technique to determine the similarity criterion. It is demonstrated that even with a simple similarity criterion, the proposed algorithm outperforms significantly other similar area-based algorithms and enables to be implemented in a dedicated hardware for real-time processing such as FPGA devices. Furthermore, it is demonstrated that the developed FPGA architecture outperforms to the most of other real-time area-based stereo matching algorithms reported in the literature and allows to maintain a high processing speed. The rest of this paper is organized as follows: section 2 presents the proposed algorithm and the technique to determine the similarity criterion used for the selection of pixels. In section 3, the FPGA architecture for the proposed algorithm is described. Experimental results with synthetic stereo pairs and a comparison with similar algorithms and hardware implementation results is reported in section 4. Finally, section 5 concludes this paper.

## 2 Correlation using an efficient adaptive size windows approach

The main objective is to develop one algorithm that uses a single window, which is processed only once using a recursive approach appropriate for dedicated hardware real-time processing implementation. In order to explain the proposed algorithm, the image of the University of Tsukuba shown in the **Fig.** 2.(a) is used. This image presents multiple objets at differents depths. Depth of each object is indicated using grey scale values as shown in the **Fig.** 2.(b).





a) Original image

b) Ground truth

Fig. 2: Image of the University of Tsukuba

The pixels within the small overlapped window as illustrated in **Fig. 3**.(a) include projections of points of different objects as shown at **Fig. 3**.(b). When correlation coefficient is computed using all the pixels of this window, the averaging effect yields errors on the estimated disparity. On the other hand, **Fig. 3**.(c) shows a

vicinity in which only the pixels that are the projections of points of the same object are used while the other are not considered and eliminated of the window. Pixels that are not considered are indicated in black. Color of the pixels retained is similar to the central pixel and they have the same depth as shown in the **Fig. 3**.(d). With this window, disparity estimation is more accurate.



Fig. 3: Fixed versus adaptive window

In the Similarity-Based Adaptive Window algorithm (SBAW), a fixed size window is centered on each pixel of the reference image, but only the selected pixels by similarity criterion are used to compute the correlation coefficient. Any correlation coefficient based on gray scale values can be modified using this technique. For example, the standard SAD expression turns into:

$$C_{l}(x, y, s) = \sum_{i=-w_{x}}^{i=w_{x}} \sum_{j=-w_{y}}^{j=w_{y}} \beta(x, y, i, j) \times$$

$$|I_{l}(x+i, y+j) - I_{r}(x+s+i, y+j)|,$$
(1)

where the coefficient  $\beta(x,y,i,j)$  is equal to 1 when the pixels from correlation window are projections of selected point, otherwise is zero. i and j are used in the sum process. This corresponds to define a window with variable size and shape that can be adapted to the local reference image data. In order that pixels within the window correspond to the same object than the selected pixel  $P_l(x,y)$ , a pixel  $P_l(x+i,y+j)$  is included or excluded from the window according to a similarity criterion. If the two pixels are similar,  $\beta(x,y,i,j)$  is set to 1, otherwise is zero. Several techniques can be used to define the similarity criterion, from simple ones such as grey scale value comparison until more complex ones as local texture analysis.

In standard algorithms, the disparity  $d_l(x, y)$  is defined as the shift s giving the maximum (or minimum) value of  $C_l(x, y, s)$ . In order to detect occlusions, the left-right consistency is used. For each pixel, if the disparity  $d_l(x, y)$  computed using the left image as a reference is

equal to the disparity  $d_r(x + d_l, y)$  computed using the right image as the reference, the solution is considered as correct. Otherwise the pixels are marked as occluded and the disparity can be computed with subpixel accuracy or be assigned as the minimum value between  $d_l(x, y)$  and  $d_r(x + d_l, y)$ . In this case the minimum value between  $d_l(x, y)$  and  $d_r(x + d_l, y)$  will be used.

#### 2.1 Techniques to define the similarity criteria

Several techniques are able to be used to define the similarity criterion. However, a technique based on recursive approach is more suitable in terms of computational efficiency and facilitates the proposed algorithm to be implemented in a dedicated hardware for real-time processing. In this section a technique based on the comparison of the grey scale values is described. The aim is to demonstrate that even a simple technique allows the use of adaptive correlation windows achieving to increase significantly the disparity map stimation accuracy.

## 2.1.1 Criterion based on comparison of the grey scale values

We can assume that two pixels are not similar and they have different disparity, when there is a significant difference between their grey scale values [63]. Then, we set  $\beta(x, y, i, j)$  to 1 only when the grey scale value  $I_l(x + i, y + j)$  is close to the grey scale value of the selected pixel  $I_l(x, y)$ , i. e. if:

$$|I_l(x+i,y+j) - I_l(x,y)| \le T_l(x,y)$$
, (2)

where  $T_l(x,y)$  is the maximum acceptable difference between the grey scale values. In practice, it is sufficient to assign the value of  $T_l \, \forall \, I_l(x,y)$  as a constant value defined by the user. However, the problem is to determine an appropriate value for all points contained in the input stereo pair. By analyzing simulations performed in Matlab R2013a, it was determined that small values of  $T_l$  are most appropriate for points contained in regions near the edges while higher values of  $T_l$  are more suitable for regions which belong to the same object. On the other hand, it was determined that by assigning a constant value to  $T_l$  erroneous stimations occur in regions where due to the color of selected pixel, some pixels of to the same object are eliminated from the correlation window. Therefore, assigning to  $T_l$  a constant value does not ensure that an appropriate value for each point of the input stereo pair is used, furthermore, wrong stimations will be obtained at some points.

In order to compute an appropriate value of  $T_l$  the use of a recursive approach that considers multiple points of the correlation window is proposed. Through multiple analysis performed in Matlab R2013a it was determined that the minimum pixels required for an  $T_l$  accurate estimation are the pixels around the selected pixel,  $I_l(x, y)$ 

(cf. Fig. 4). To compute  $T_l$  the use of the sum of absolute differences between the selected pixel and vicinity pixels is proposed. This value is adapted appropriately to most points contained in the stereo pair. However, in regions where the variation of the correlation window is high, some pixels corresponding to different objects are included in the correlation window. It was determined that this error is proportional to the size of the correlation window, therefore, it is proposed to compute the similarity criterion as follows:

$$T_l(x,y) = \begin{cases} K_l(x,y), & K_l(x,y) <= \beta \\ \beta, & \text{otherwise,} \end{cases}$$
 (3)

$$K_l(x,y) = \sum_{i=-1}^{i=1} \sum_{j=-1}^{j=1} |I_l(x,y) - I_l(x+i,y+j)|,$$
 (4)

$$\beta = \frac{2^n}{(2 \times w + 1)^2} \text{ as } n = \text{bits per pixel}(bpp).$$
 (5)



Fig. 4: Pixels used to calculate the similarity criterion

Fig. 5 shows both the included and the excluded pixels for some windows of the Tsukuba scene using Equation 3. The included pixels are indicated with a light gray values while the excluded pixel are indicated with a dark grey values. We can conclude that the used window is a rectangular; but, the window can be adapted to the local variations in the stereo pair by using a simple similarity criterion.



Fig. 5: The selected pixels using the proposed algorithm



Fig. 6: General diagram of the developed FPGA architecture

#### 3 The FPGA architecture

The algorithm presented in section 2 possess a low mathematical complexity, however, compute a disparity map for a 384×288 pixel resolution synthetic stereo pair (pixel resolution of the Tsukuba scene), implies a runtime close to 1 second. This time is not appropriate for real-time applications. This was the main motivation to search efficient ways to implement the proposed algorithm, an FPGA implementation was selected. In Fig. 6, an overview of the developed FPGA architecture is shown. This architecture have three inputs, clk\_pixel as the pixel rate of the input stereo pairs, left\_image [7:0] and right\_ima ge [7:0] as gray scale values of pixels from the left and right images respectively and one output, disparity [7:0], corresponding to disparity value for the selected pixels. The developed FPGA architecture allows to process input stereo pairs of  $x \times y$  pixel resolution, where  $x \forall \mathbb{N}$  and  $y \le 2048$ . Furthermore, this architecture enables to compute the disparity maps by applying the SBAW algorithm using  $n \times n$  correlation windows, where  $n=2k+1 \ \forall \ k \in \mathbb{N}$ , and considering a maximum expected disparity equal to  $2^k - 1 \ \forall k \in \mathbb{N}$ . Its general behavior can be described as following: first, the **buffer** modules store gray scale values of pixels contained in nhorizontal lines for both left and right images of input stereo pair. After, the storage\_vector modules generate n storage vectors, each vector consists of a register defined by the gray scale values for n vertical pixels stored in one of the horizontal lines stored above. Then, left-disparity and right-disparity values are computed via **SBAW** modules separately. Later, a multiplexer (mux) sets the final disparity value as the minimum of two disparity values previously computed by SBAW modules. Finally, the **equalizer** module convert the final disparity value to gray scale values of 8 bits of depth. In the following subsections the architecture of all the individual modules is shown in detail.

#### 3.1 The **buffer** module

In order to store necessary data for the disparity computation, the use of **buffer** modules is proposed. These modules allows to store the gray scale values corresponding to the pixels contained in n horizontal lines from an image and enables to read all stored lines in parallel. An overview of the FPGA architecture of the **buffer** module is shown in Fig. 7. This module consists of three different sub-modules, The RAM\_driver module manages an array of n+1 single-port ram units (**RAM**) assigning to each one the corresponding address, address [9:0], and the corresponding write-read value, w/r [n+1:0]. The w/r [n+1:0] output consist of one logic vector of n+1 bits of size, the write-read value of each of the **RAM**s is determined by each one of the bits of the w/r [n+1:0] output. The outputs of the **buffer** modules are determined via state machines, which are controlled by horizontal resolution of the input stereo pairs, x\_resolution [11:0], and the correlation window size n [5:0]. In **Table 1**, the behavior of the state machine for the output w/r [n+1:0] is shown, the number of states is set as n + 1. n **RAM**s are in read mode while one **RAM** is in write mode for all the states at any time. On the other hand, in **Table 2** the behavior of the state machine for output address [9:0] is shown.

Table 1: Behavior of the state machine for w/r [n+1:0] output of the **buffer** module

| State | Behavior                                                                                            |
|-------|-----------------------------------------------------------------------------------------------------|
| 1     | If address < x_resolution [11:0] and state = 1<br>then w/r [n+1:0] =00001 else state = 2            |
| 2     | If address < x.resolution [11:0] and state = 2<br>then w/r [n+1:0] =00010 else state = 3            |
|       | TE allows a second the fat of and state                                                             |
| n+1   | If address $<$ x_resolution [11:0] and state = $n+1$ then w/r [ $n+1:0$ ] = $1000$ else state = $1$ |



Fig. 7: FPGA architecture for the **buffer** module

Table 2: Behavior of the state machine for address [9:0] output of the **buffer** module

| State | Behavior                                                                                                                      |
|-------|-------------------------------------------------------------------------------------------------------------------------------|
| 1     | If address $<$ x.resolution [11:0] and state $= 1$<br>then address [9:0] $=$ address $+1$ else state $= 2$ ,<br>address $= 0$ |
| 2     | If address < x_resolution [11:0] and state = 2<br>then address [9:0] = address +1 else state = 3,<br>address = 0              |
| •••   |                                                                                                                               |
| n+1   | If address $<$ x_resolution [11:0] and state $=$ n+1 then address [9:0] $=$ address +1 else state $=$ 1, address $=$ 0        |

The RAM module consists of a synchronous single-port ram unit, its general settings was set as: type = synchronous, width = 8, depth = 2048, operation type = single port, all the others parameters are defined as default. These parameters allow to store the gray scale values for each pixel contained in horizontal lines from images of up to 2048 horizontal resolution with 8 bits of color depth. The use of an RAM modules array enables to read the gray scale values of the pixels contained in n horizontal lines from an image, see Table 3

Table 3: Behavior of the array of RAM modules

| w/r [n+1:0] | Read lines by $RAM_{1,2,3,4,n}$ |
|-------------|---------------------------------|
| 1000        | -,-,-                           |
| 0100        | 1,-,-,-                         |
| 0010        | 1,-,-,-                         |
| 0001        | 1,2,3,-                         |
| 0001        | 1,2,3,n                         |
| 1000        | n+1,2,3,n                       |
| 0100        | n+1, n+2, 3, n                  |
| 0010        | n+1, n+2, n+3, n                |
| 0001        | n+1, n+2, n+3, n+4              |
| 0001        | n+5, n+2, n+3, n+4              |

The n\_lines\_generator module reads the outputs from the RAM modules and determines which RAMs modules are in read mode at any time. In order to assign lines in the outputs of the n\_lines\_generator module in ascending form, i.e. pixel\_1 [7:0] = input image line number l, pixel\_2 [7:0] = input image line number l+1, pixel\_n [7:0] = input image line number l+n-1, the outputs from RAM modules in read mode are assigned to the outputs of the n\_lines\_generator module as seen in Table 4, the first column corresponds to the output w/r [n+1:0] of the RAM\_driver module, the second column corresponds to the numbers of the RAM modules assigned to the outputs of the n\_lines\_generator module.

Table 4: Output assignment for the n\_lines\_generator module

| w/r [n+1:0]                  | Assignment pixel_1 [7:0] for 1=1,2,,n                                  |
|------------------------------|------------------------------------------------------------------------|
| 1000<br>0100<br>0010<br>0001 | $2,3,4,5,,n \\ 3,4,5,,n,1 \\ 4,5,,n,1,2 \\ 5,n,,1,2,3 \\ 1,2,3,4,,n-1$ |

#### 3.2 The storage\_vector module

To compute the disparity value via the SBAW algorithm, it is necessary to have stored the gray scale values of all the pixels from the correlation window. However, the **buffer** module only provides the gray scale values of one of the vertical lines of the correlation window at each time. In order to store the rest of the values efficiently, to use several register-based storage vectors is proposed.

All storage vectors possess a similar behavior with respect to a shift register unit, however, these allow to read multiple data in one clock cycle. In general, when a line begins, the gray scale value of the pixel with coordinate (1) is stored in index [7:0] of one storage vector, in the following clock cycle, this value is moved to index [15:8] and the gray scale value of the pixel with coordinate (2) is stored in index [7:0]. A similar process is repeated for all the pixels that integrate the line. In **Fig.** 8, behavior of **storage\_vector** module with settings as follows: number of lines to process = n, v = 8 \* n - 1 is shown. In **Fig.** 9 the architecture of the **storage\_vector** module is shown.



Fig. 8: Behavior of the **storage\_vector** module



Fig. 9: FPGA architecture for the **storage\_vector** module

#### 3.3 The **SBAW** module

For the computation of the disparity map via the SBAW algorithm, a pixel-parallel and window parallel architecture was designed; the necessary data are obtained from the **storage\_vector** modules, using the appropriate indexes is possible to process video streams at real-time, giving as result disparity maps of  $(X-w)^*(Y-w)$  pixel resolution, where X, Y corresponds to the values of resolution of the input video stream and 2w+1 is the dimension of the correlation window used. The architecture

of the SBAW module is presented in Fig. 9, its general behavior is described as following: first, the absolute\_differences modules compute the absolute difference between pixels from left and right images of the correlation window. This process is executed in each of the  $d_{\text{max}} + 1$  absolute\_differences modules, implemented in parallel, which are configured for expected disparity levels from 0 until  $d_{\text{max}}$ , where each module process only one disparity level. Then, the output of each of the absolute\_differences modules are sent to its corresponding sumator module, in this step adder blocks compute the sum of the absolute differences for all pixels in the correlation window only for which are projections of selected pixel, i. e. all pixels belong to the same object, **Equations 2 - 5.** Finally, the **minimum** module assigns the corresponding index for all correlation values, then, determines the minimum correlation value and set the disparity value as the index of the minimum correlation value. In the developed FPGA architecture, two **SBAW** modules were implemented in parallel form where the first module uses the left image as reference and the second module uses the right image.

#### 3.3.1 The minimun module

In order to reach an appropriate propagation of the processed data, the use of the minimum module is proposed. It consists of an index\_generator module and k min modules implemented in sequential form. Firstly, the index\_generator module assigns the corresponding indexes to all the correlation values from the previous stage, then, the min<sub>1</sub> module, receives all the correlation values and their indexes. Afterwards, this module determines the minimum values for correlation values, which are sorted by pairs with unrepeated correlation values for any pair: the minimum correlation values and their indexes obtained here are placed in the vectors (value [x:0], where  $x = 16 * (d_{\text{max}} + 1) - 1$ , and index [x:0], where  $x = 8*(d_{\text{max}}+1)-1$ ), respectively. This process is repeated in sequential form until only one correlation value and its index are placed in the output vectors.

#### 3.4 The **equalizer** module

In order that disparity values are appropriate for displaying in LCD screens or another output devices, the use of the **equalizer** module is proposed. This module convert the final disparity value to gray scale values through disparity  $[7:0]*256/d_{\rm max}$ . To reduce the hardware resource consumption, this process was performed with a CASE structure, which considers all expected disparity levels and turns the final disparity value into integer constant value corresponding to the operation, as described previously.



Fig. 10: FPGA architecture for the **SBAW** module



Fig. 11: FPGA architecture for the **minimum** module



Fig. 12: Behavior of the developed FPGA architecture

#### 4 Discussion and analysis of results

The FPGA architecture presented in section 3 was implemented with a top-down approach. All the modules were programmed using Verilog, Quartus II Web Edition version 10.1SP1 was used for the synthesis process. In order to verify functionality of all the modules individually, post-synthesis simulation in ModelSim-Altera 6.6c were executed.

#### 4.1 Simulation results

In order to evaluate the behavior of proposed algorithm, the developed FPGA architecture was simulated in Model Sim-Altera 6.6c using different synthetic stereo pairs and different sizes for correlation windows. The selected tests stereo pairs were the Tsukuba, Venus, Teddy and Cones scenes. The window sizes used were:  $\{3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41\}$ . In **Fig.** 12 the behavior of the error obtained in the disparity maps generated for different window sizes for the selected synthetic stereo pairs are shown, this demonstrates the effectiveness of the developed FPGA architecture. Disparity

maps have been compared using the method proposed in [48], in which the percentage of pixels with a disparity error greater than one is computed. Three percentages are computed, one for all non-occluded pixels (nonocc), one for all pixels (all) and one for occluded pixels near depth discontinuities (disc). Performance for occluded pixels are not considered because no one algorithm compute occluded pixels explicitly.

On the other hand, Fig. 13 presents the error percentage of all pixels (all) for all evaluated synthetic stereo pairs obtained via SBAW algorithm compared with using SAD algorithm, for different window sizes. For the SBAW algorithm, if a small correlation window is used, error is more important than error for the SAD algorithm because some pixels of the window are not used and effective area of the used window is reduced. However, error in untextured areas is significantly reduced when a large correlation window is used. Error at discontinuities grows with the correlation window but it is smaller than the error for the SAD algorithm. So, we can conclude that performance of the SBAW algorithm is better when a large correlation window is used.



Fig. 13: Comparison between the SBAW algorithm and the SAD algorithm for all the pixels (all)

In Table 5 quantitative results of the number of erroneous pixels obtained by the proposed algorithm for the Tsukuba, Venus, Teddy and Cones scenes, using a 41×41 correlation window compared with other real-time stereo matching algorithms, reported in the literature, are presented. In order to process and collect the data presented in Table 5, the developed architecture was scaled and synthesized to operate with the appropriate maximum expected disparity values. For all cases, Quartus II Web Edition version 10.1SP1 was used for the synthesis process and simulations in ModelSim-Altera 6.6c were executed. By analyzing Table 5, it is concluded that the results of the algorithm present a improvement regarding to the most of real-time-stereo matching algorithms reported in literature. In addition, similar to the majority of these algorithms, the proposed algorithm presents a high performance with small values of maximum disparity (Tsukuba, Venus scenes); whilst, a medium performance with high values of maximum disparity (Teddy, Cones scenes) is observed. The generated disparity maps for the Tsukuba, Venus, Teddy and Cones scenes considering 2w + 1 = 41 are shown in **Fig. 14**.

Table 5: Comparison between quantitative results of real-time-stereo matching algorithms

| Algorithm | Tsukuba | Venus | Teddy | Cones |
|-----------|---------|-------|-------|-------|
| [34]      | 11%     | 8%    | -     | -     |
| [18]      | 8.7%    | 8.6%  | -     | -     |
| [15]      | 12.0%   | 8.0%  | -     | -     |
| [16]      | 15.2%   | 14.1% | -     | -     |
| [13]      | 12.8%   | 10.8% | 10.7% | -     |
| [2]       | 8.8%    | 6.9%  | 30.2  | 43.4  |
| [3]       | 3.8%    | 2.12% | 11.85 | 8.45  |
| [50]      | 7.5%    | 4.1%  | 17.6% | 18.4% |
| [53]      | 10.4%   | 12.1% | 29.1  | 25.3  |
| [24]      | 11.5%   | 5.27% | 21.5  | 17.5  |
| [14]      | 7.8     | 11%   | 21    | 16.8  |
| SBAW      | 7.6%    | 3.2%  | 13.6% | 16.4% |

Furthermore, comparisons with respect to similar correlation-based algorithms, such as the SMW algorithm [1], the Census algorithm [62] and the Hirschmüller (HIR) algorithm [19], were performed. In order to perform comparisons, the synthetic image shown in Fig. 15 (a) is used. Two textured objects are present in the synthetic scene, which appear as a square and as the background in the image. Fig. 15(b) shows the ground truth map, where well defined edges correspond to depth discontinuities.



Fig. 15: Test scene stereo pair

Fig. 16 shows the disparity maps computed by all the algorithms using a  $27 \times 27$  square window.



Fig. 16: Results for the test scene (2w + 1 = 27)



Fig. 14: Disparity maps generated for different test synthetic stereo pairs

In the areas corresponding to a single object, all the algorithms estimate the disparity precisely, because the correlation window is large, however, the averaging effect generates errors at depth discontinuities, which is clearly visible in the disparity maps of the SMW and HIR algorithms (**Fig.** 16(a) and (b)). The HIR algorithm reduces errors at discontinuities, but there are still false matchings due to the central window, which is always used. Square windows used in the SMW algorithm are well adapted for this image. Nevertheless, there are false matchings at the corners of the central square object. The performance of the Census algorithm is worse because of the repetitive pattern in the image. With the SBAW algorithm (**Fig.** 16(d)), the estimated disparity

map is very similiar to the ground truth, even near of both depth discontinuities and at the corners of the central square object. Table 8 shows the numerical values obtained by perform this comparison.

Table 8: Errors for the synthetic pair (in %)

|             |       | on-occlud |       |             | discontin |       |
|-------------|-------|-----------|-------|-------------|-----------|-------|
| Algorithm   | W     | indow siz | ze    | Window size |           | ze    |
|             | 15X15 | 21X21     | 27X27 | 15X15       | 21X21     | 27X27 |
| HIR [19]    | 2.24  | 3.51      | 5.01  | 27.32       | 33.05     | 35.63 |
| SMW [1]     | 0.53  | 0.86      | 1.4   | 8.9         | 12.38     | 16.49 |
| Census [62] | 3.21  | 3.16      | 3.62  | 26.97       | 29.97     | 33.46 |
| SBAW        | 0.34  | 0.34      | 0.33  | 4.94        | 4.53      | 4.37  |

Table 6: Logic elements (combinational functions and logic registers) consumption for different configurations of the developed FPGA architecture

| $2w+1$ $d_{\max}$ | 3      | 9       | 15      | 21      | 27          | 33          | 41      |
|-------------------|--------|---------|---------|---------|-------------|-------------|---------|
| 15                | 7,375  | 47,286  | 119,991 | 143,988 | 165,586     | 182,144     | 191,252 |
| 31                | 14,699 | 94,099  | 238,782 | 286,536 | $329,\!516$ | $362,\!466$ | 380,591 |
| 63                | 29,104 | 186,316 | 470,400 | 561,610 | $645,\!851$ | 710,434     | 726,757 |

Table 7: Memory bits consumption for different configurations of the developed FPGA architecture

| $\frac{2w+1}{d_{\max}}$ | 3      | 9       | 15          | 21      | 27      | 33      | 41      |
|-------------------------|--------|---------|-------------|---------|---------|---------|---------|
| 5                       | 65,636 | 163,840 | 262,144     | 344,064 | 442,368 | 573,440 | 671,744 |
| 31                      | 65,636 | 163,840 | $262,\!144$ | 344,064 | 442,368 | 573,440 | 671,744 |
| 63                      | 65,636 | 163,840 | 262,144     | 344,064 | 442,368 | 573,440 | 671,744 |

In Table 8 only two percentages are computed, one for all the pixels and one for the pixels near discontinuities, because objects are well textured. The quantitative comparison demonstrate that the error percentages increase with window size for SMW and HIR algorithms, but decrease with window size for the SBAW algorithm. In the SMW algorithm, the window is adapted according to the local texture as confirmed by the low error percentages, but a large window is not well adapted at the corners and the error percentages rise up with the size window. Percentage errors in the Census algorithm are high due to the repetitive patterns in the image, but their performance in discontinuities is better than the HIR algorithm. Applying SBAW algorithm using a small window, errors are caused by a lack of information in the correlation window. On the other hand, with large windows, the errors near to depth discontinuities are avoided with the SBAW algorithm. This behavior is confirmed by low error percentages of the SBAW algorithm for pixels near to depth discontinuities. For the SBAW algorithm, the best performance is obtained with a large window. This is a difference and an advantage with respect to the others algorithms where the window size must remain small.

Tables 6-7 present a comparison of the use of hardware resource regarding to all the synthesized and simulated configurations of the developed FPGA architecture. By analyzing Fig. 12, the acceptable behavior for the SBAW algorithm can be determined by using a 21×21 correlation window, in this case the hardware consumption for the developed FPGA architecture is appropriate for the majority of the medium gamma of FPGA devices such as the Stratix III family of Altera or Spartan III family of Xilinx, however, for higher window sizes only high gamma FPGA devices such as the Stratix V family of Altera support the hardware resource consumption.

It is the user decision to select the configuration of the SBAW algorithm more appropriate to his particular requirements.

Finally, Table 9 presents comparisons of processing speed regarding to other real-time stereo matching algorithms reported in the literature. Due to the mathematical simplicity of the proposed algorithm, the developed architecture does not require complex arithmetical operations such as calculation of quotients and radicals (which require a high runtime), hence, it maintains a high processing speed. When comparison of processing speed is conducted, Table 9, it is observed an increase with respect to other algorithms implemented in FPGA devices of up to 93,061,120 pixels per second. This processing speed is appropriate for real-time stereo vision applications.

Table 9: Processing speed for differents real-time stereo matching algorithms

| Algorithm | Resolution             | Frames/s | Pixeles/s   |
|-----------|------------------------|----------|-------------|
| [15]      | $1280 \times 1024$     | 65       | 85,196,800  |
| [50]      | $640\!\times\!480$     | 68       | 20,889,600  |
| [3]       | $256{\times}256$       | 100      | 6,553,600   |
| [53]      | $1280 \times 1024$     | 50       | 65,536,000  |
| [14]      | $384{\times}288$       | 1250     | 138,240,000 |
| [24]      | $640 \! \times \! 480$ | 230      | 70,656,000  |
| [36]      | $320{\times}240$       | 574      | 44,083,200  |
| [13]      | $1024 \times 1024$     | 102      | 106,954,752 |
| $SBAW^*$  | $450{\times}375$       | 592      | 99,900,000  |
| $SBAW^*$  | $434{\times}383$       | 601      | 99,899,422  |
| SBAW*     | $384{\times}288$       | 904      | 99,975,168  |

<sup>\*</sup>Operating frequency = 50 MHz

#### 4.2 Implementation results

The developed FPGA architecture was implemented in a FPGA Cyclone II EP2C35F672C6 embedded in the development board DE2 of Altera and the selected configuration for the SBAW algorithm was  $d_{\text{max}} = 15, 2w + 1 =$ 7. In order to acquire input stereo pairs, a TRDB DC2 board connected in the first port of expansion of the DE2 board is used. TRDB DC2 board provides stereo pairs of 1280×1024 pixel resolution in RGB scale; in order to determine the gray scale value of the input stereo pairs, the value of the green channel was used as a gray scale value. With the purpose of reaching appropriate values to the environmental characteristics of the input scene, the implementation enables to configure the exposition of the cameras. For assigning the exposure value of the cameras, 4 push buttons of the DE2 board are used. The function of each of these push buttons is detailed in the Table 10.

Table 10: Control of the board TRDB\_DC2

| Name | Description               |
|------|---------------------------|
| Key0 | Reset the frame capture   |
| Key1 | Assign the exposure value |
| Key2 | Pause the frame capture   |
| Key3 | Continuous frame capture  |

Output disparity maps was displayed in a terasIC 4,3" LCD screen of  $800\times480$  pixel resolution connected to the second expansion port of the DE2 board. The processing speed of the FPGA implementation is equal to 76 fps (99,614,720 pixels/s) for the input stereo pairs of  $1280\times1024$  pixel resolution. The resource consumption of the implemented architecture is shown in Table 11.

Table 11: Hardware resource consumption for the FPGA implementation

| Resource                      | Demand                |
|-------------------------------|-----------------------|
| Total logic elements          | 27,061/33,216 (81%)   |
| Total combinational functions | 21,639/33,216 $(65%)$ |
| Dedicated logic registers     | 15,180/33,216~(46%)   |
| Total pins                    | 161/475 (34%)         |
| Total Memory Bits             | 407,472/483,840 (84%) |
| Embedded multiplier elements  | 0/70 (0%)             |
| Total PLLs                    | 2/4~(50%)             |

#### 5 Conclusions

An efficient area-based algorithm for stereo matching using an adaptive window technique was presented. Only selected pixels are used in the window according to their similarity to the central pixel. A technique to determine similarity criterion has been described and it was demonstrated that even using a simple similarity criterion, the SBAW algorithm significantly outperforms other similar area-based algorithms. The best performance of the SBAW algorithm was obtained with a large window appropriated for homogeneous areas. However, since the effective size and shape of the window were adaptive, blurring effects at discontinuities are avoided.

To improve its processing speed, the proposed algorithm was implemented in a FPGA device. For this purpose, a criterion based on comparison of the grey scale values was used. The developed FPGA architecture offers an outperforms with respect to other real-time stereo matching algorithms in the literature, enables both increasing the processing speed and to be implemented in the majority of the medium gamma FPGA devices.

Furthermore, an important characteristic of the presented architecture is the scalability permissible; all the modules and submodules which integrate the developed FPGA architecture, easily allow to be adapted for processing of larger correlation windows than the simulated and implemented correlation windows. On the other hand, the FPGA architecture enables to configure different levels of maximum expected disparity  $(d_{\text{max}})$ , consequently, it is possible to configure the module for the computation of disparity maps with appropriate values to the environmental characteristics of the input video streams. This allows that the developed architecture can be applied to a wide range of applications of real-time stereo vision such as positioning systems for mobile robots and recognition, detection and tri-dimensional reconstruction of objects.

#### References

- A Fusiello VR, Trucco E (2000) Symmetric stereo with multiple windowing. International Journal of Pattern Recognition and Artificial Inteligence 8(14):1053–1066
- Aguilar-González A, Pérez-Patricio M, Arias-Estrada M, Camas-Anzueto JL, Hernández-deLeón HR, Sánchez-Alegría A (2015) An fpga correlation-edge distance approach for disparity map. In: Proceedings of IEEE International Conference on Electronics, Communications and Computers (CONIELECOMP 2015) pp 21–28
- Alba A, Arce-Santana E, Aguilar-Ponce RM, Campos-Delgado DU (2014) Phase-correlation guided area matching for realtime vision and video encoding. J Real-Time Image Proc 9:621633
- 4. Asadi E, Bottasso CL (2013) Delayed fusion for real-time vision-aided inertial navigation. J Real-Time Image Proc pp 1-14
- Bartczak B, Koeser K, Woelk F, Koch R (2007) Extraction of 3d freeform surfaces as visual landmarks for real-time tracking. J Real-Time Image Proc 2:81–101
- Correal R, Pajares G, Ruz J (2013) Automatic expert system for 3d terrain reconstruction based on stereo vision and histagoram matching. Expert Systems with Applications 106:75–90
- Darabiha A, MacLean W, Rose J (2006) Reconfigurable hardware implementation of a phase-correlation stereo algorithm. Journal of Machine Vision and Applications 17:116–132

- 8. Delmerico JA, Davidb P, Corso JJ (2013) Building facade detection, segmentation, and parameter estimation for mobile robot stereo vision. Image and Vision Computing 31(29):841–852
- Diaz J, Ros E, Pelayo F, Ortigosa E, Mota S (2006) Fpga based real-time opticalflow system. IEEE Transactions on Circuits and Systems for Video Technology 16:274–279
- ECCV'94 (1994) Proceedings of the European Conference on computer Vision, ECCV'94
- Faugeras O (1993) Three Dimensional Computer Vision: A Geometric Viewpoint. MIT Press, Cambridge, MA
- 12. FeiyangCheng, HongZhang, DingYuan, MinguiSun (2013) Stereo matching by using the global edge constraint. Neurocomputing 131(11):217–226
- 13. Georgoulas C, Andreadis I (2010) Fpga based disparity map computation with vergence control. Microprocessors and Microsystems 34:259-273
- Georgoulas C, Andreadis I (2011) A real-time fuzzy hardware structure for disparity map computation. J Real-Time Image Proc 6:257–273
- Georgoulas C, Kotoulas L, Sirakoulis GC, Andreadis I, Gasteratos A (2008) Real-time disparity map computation module. Microprocessors and Microsystems 32:159–170
- Gong M, Yang YH (2005) Near real-time reliable stereo matching using programmable graphics hardware. Computer Vision and Pattern Recognition, IEEE Computer Society Conference on 1:924– 931
- 17. Granados S, Barranco F, Mota S, Díaz J, Ros E (2014) On-chip semidense representation map for dense visual features driven by attention processes. J Real-Time Image Proc 9:171–185
- Guerra-Filho G (2012) An optimal timespace algorithm for dense stereo matching. J Real-Time Image Proc 7:69–86
   Hirschmuller H (2001) Improvements in real-time correlation-
- Hirschmuller H (2001) Improvements in real-time correlationbased stereo vision. In: Proceedings of IEEE workshop on Stereo and Multi-Baseline Vision, Kauai, Hawaii, pp 141–148
- Ideses I, Yaroslavsky LP, Fishbain B (2007) Real-time 2d to 3d video conversion. J Real-Time Image Proc 2:3–9
- Intille S, Bobick A (1994) Disparity-space images and large occlusion stereo. In: [10]
- ISAKOVA N, cuk BASAK S, SONMEZ A (2010) Fpga design and implementation of a real-time stereo vision system. Circuits and Systems for Video Technology, IEEE Transactions on 20:1–5
- Jin M, Maruyama T (2012) A fast and high quality stereo matching algorithm on fpga. Field Programmable Logic and Applications (FPL), International Conference on 22(31):507–5010
- 24. Jin S, Cho J, Pham XD, Lee KM, Park SK, Kim M, Jeon JW (2010) Fpga design and implementation of a real-time stereo vision system. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 20:15–26
- 25. Jin S, Cho J, Pham XD, Lee KM, Park SK, Kim M, Jeon JW (2012) Fpga design and implementation of a real-time stereo vision system. Innovations in Intelligent Systems and Applications (INISTA), International Symposium on 125:15–26
- Jung HY, Park H, Park IK, Lee KM, Lee SU (2014) Stereo reconstruction using high-order likelihoods. Computer Vision and Image Understanding 125(21):223–236
- Kalomiros J, Lygouras J (2011) Design and hardware implementation of a stereo-matching system based on dynamic programming. Microprocessors and Microsystems 35(05):496-509
- Kanade T, Okutomi M (1991) A stereo matching algorithm with an adaptive window: Theory and experiment. In: Proceedings of the 1991 IEEE International Conference on Robotics and Automation, Sacramento, CA, USA, vol 16, pp 920–932
- Kanade T, Kano H, Kimura S, Yoshida A, Oda K (1995) Development of a video-rate stereo machine. In: Proceedings of IEEE workshop on Stereo and Multi-Baseline Vision, pp 95–100
- Kanade T, Yoshida A, Oda K, Kano H, Tanaka M (1996) A stereo machine for videorate dense depth mapping and its new applications. IEEE Computer Vision & Pattern Recognition Conference 15:196-202
- Konolige K (1997) Small vision systems: Hardware and implementation. In: Proceedings of Eight International Symposium on Robotics, ISR'97, Hayama, Japan, pp 111–116
- Lee S, Yi J, Kim J (2005) Real-time stereo vision on a reconfigurable system. Lecture Notes in Computer Science: Embedded Computer Systems 3553:299–307
- Lotti J, Giraudon G (1994) Correlation algorithm with adaptive window for aerial image in stereo vision. In: Proceedings of the Image and Signal Processing for Remote Sensing, EUROPTO'94, Rome, Italy, vol 1, pp 701–703
- 34. Madeo S, Pelliccia R, Salvadori C, del Rincon JM, Nebel JC (2014)
  An optimized stereo vision implementation for embedded systems:

- application to rgb and infra-red images. J Real-Time Image Proc pp 1-22
- Mahotra S, Patlolla C, Kehtarnavaz N (2012) Real-time computation of disparity for hand-pair gesture recognition using a stereo webcam. J Real-Time Image Proc 7:257-266
   Martin Humenberger MWWK Christian Zinner, Vincze M (2010)
- 36. Martin Humenberger MWWK Christian Zinner, Vincze M (2010) A fast stereo matching algorithm suitable for embedded real-time systems. Computer Vision and Image Understanding 114:11801202
- Marzotto R, Zoratti P, Bagni D, Colombari A, Murino V (2010)
   A real-time versatile roadway path extraction and tracking on an fpga platform. Computer Vision and Image Understanding 114:11641179
- Masrani D, MacLean W (2006) A real-time large disparity range stereo-system using fpgas. IEEE International Conference on Computer Vision Systems pp 13–19
   Min D, Lu J, Do MN (2013) Joint histogram-based cost aggrega-
- Min D, Lu J, Do MN (2013) Joint histogram-based cost aggregation for stereo matching. IEEE Trans Pattern Anal Mach Intell 35(10):2539–2545
- Murphy C, Lindquist D, Cecil AMRT, Leavitt S, Chang ML (2007) Low-cost stereo vision on an fpga. International Symposium on Field-Programmable Custom Computing Machines 15(21):333– 334
- Niitsuma H, Maruyama T (2005) High-speed computation of the optical flow. Lecture Notes in Computer Science: Image Analysis and Processing 3617:287–295
   Orts-Escolano S, Morell V, Garcia-Rodriguez J, Cazorla M, Fisher
- Orts-Escolano S, Morell V, Garcia-Rodriguez J, Cazorla M, Fisher RB (2013) Fuzzy control for obstacle detection in stereo video sequences. J Real-Time Image Proc pp 1–20
- Parrilla E, Torregrosa JR, Riera J, Hueso JL (2011) Fuzzy control for obstacle detection in stereo video sequences. Mathematical and Computer Modelling 54(10):1813–1817
   Roberto Marzotto DBAC Paul Zoratti, Murino V (2010) A
- Roberto Marzotto DBAC Paul Zoratti, Murino V (2010) A real-time versatile roadway path extraction and tracking on an fpga platform. Computer Vision and Image Understanding 114:11641179
- Roh C, Ha T, Kim S, Kim J (2004) Symmetrical dense disparity estimation: algorithms and fpgas implementation. IEEE International Symposium on Consumer Electronics pp 452–456
- Rong X, Huanyu J, Yibin Y (2014) Recognition of clustered tomatoes based on binocular stereo vision. Computers and Electronics in Agriculture 106(18):75–90
- 47. Santos PM, o Canas Ferreira J, Matos JS (2013) Scalable hardware architecture for disparity map computation and object location in real-time. J Real-Time Image Proc pp 1–13
- 48. Scharstein D, Szeliski R (2002) A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision 47(1):7–42
- Scherer S, Andexer W, Pinz A (1998) Robust adaptive window matching by homogeneity constraint and integration of descriptions. In: Proceedings of the 14th International Conference on Pattern Recognition, ICPR'98, Brisbane, Australia, vol 1, pp 777-780
- Stefania Perri PC, Cocorullo G (2013) Adaptive census transform:
   A novel hardware-oriented stereovision algorithm. Computer Vision and Image Understanding 117:29–41
- Stefano LD, Marchionni M, Mattoccia S (2004) A fast areabased stereo matching algorithm. Image and Vision Computing 22(22):983-1005
- Tingbo Hu TWXX Baojun Qi, He H (2012) Stereo matching using weighted dynamic programming on a single-direction four-connected tree. Computer Vision and Image Understanding 116:908921
- Ttofis C, Hadjitheophanous S, Georghiades AS, Theocharides T (2013) Edge-directed hardware architecture for real-time disparity map computation. IEEE TRANSACTIONS ON COMPUTERS 62:690-704
- uligoj F, ekoranja B, vaco M, Jerbi B (2014) Object tracking with a multiagent robot system and a stereo vision camera. Procedia Engineering 69:968–973
- 55. Veksler O (2002) Stereo matching by compact windows via minimum ratio cycle. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(12):1654
- 56. W James MacLean SS, Islam J (2010) Leveraging cost matrix structure for hardware implementation of stereo disparity computation using dynamic programming. Computer Vision and Image Understanding 114:11261138
- 57. Wal G, Hansen M, Piacentino M (2000) The acadia vision processor. In: Proceedings of the 5th IEEE International Workshop on Computer Architectures for Machine Perception, CAMP'00, Padova, Italy, vol 1, pp 31–40
- 58. Woodfill J, Herzen BV (1997) Real time stereo vision on the parts reconfigurable computer. IEEE Symposium on Field-

- Programmable Custom Computing Machines 5:201–210
- Woodfill J, Herzen BV (1997) Real-time stereo vision on the parts reconfigurable computer. In: Proceedings of the IEEE Symposium for Custom Computing Machines, CCM'97, pp 201–210
- Yang L, Noguchi N (2012) Human detection for a robot tractor using omni-directional stereo vision. Computers and Electronics in Agriculture 89(17):112–125
- Yang Q (2012) A non-local cost aggregation method for stereo matching. In: Proceedings of the Conference on Computer Vision and Pattern Recognition, CVPR'2012, Providence, USA, vol 1, pp 1–8
- 62. Zabih R, Woodfill J (1994) Non-parametric local transforms for computing visual correspondence. In: [10], pp 151–158
  63. Zhang Y, Kambhamettu C (2002) Stereo matching with
- Zhang Y, Kambhamettu C (2002) Stereo matching with segmentation-based cooperation. In: Proceedings of the Seventh European Conference on Computer Vision, ECCV'02, Copenhagen, Denmark



Madaín Pérez-Patricio received the Ph.D. degree of Automation and industrial computing 2005, Université Lille 1 : Sciences et Technologies, France. Since september 1997 he is research professor in department of postgraduate and research, Instituto Tecnológico de Tuxtla Gutirrez, México. His primary research interest include computer vision and reconfigurable computing.



Abiel Aguilar-González received the BEng degree of Mechatronic Engineering in June 2012. Universidad Politécnica de Chiapas, Tuxtla Gutiérrez, México. He is currently pursuing his MSc degree in department of postgraduate and research of Instituto Tecnológico de Tuxtla Gutiérrez, Tuxtla Gutiérrez, México. His research interests are mainly image processing, real-time FPGA-based system design, computer vision and reconfigurable computing.