# **BSP-01.04.00.08 Feature Performance Guide**

### **BSP Drivers**

This section provides brief overview of the device drivers supported in BSP release. Drivers are mainly classified into four categories:

- DSS Display Driver
- VPE Memory-to-Memory (M2M) Driver (Not applicable for TDA3xx Platform)
- VIP Capture Driver
- ISS Capture and M2M Drivers (applicable only for TDA3xx Platform)
- Serial Drivers I2C, McASP, McSPI, UART

#### **BSP Driver Features**

- 1. Supports TDA2xx, TDA2Ex and TDA3xx SoC
- 2. Most of the drivers runs on IPU1 (M4) core 0 with BIOS operating system and FVID2 interface.
- 3. Ships with sample applications and documentation.

## **VPDMA List Usage**

In TDA2xx/TDA2Ex/TDA3xx, each VIP and VPE has a separate VPDMA instance. And each VPDMA in turn has 8 lists:

#### **VPDMA** usage

| Driver                           | DMA usage                                                        |
|----------------------------------|------------------------------------------------------------------|
| VIP Capture                      | One list per port. Hence max 4 list per VIP (Slice0/1 x PortA/B) |
| M2M VPE (only for TDA2xx/TDA2Ex) | Only one list for VPE1                                           |

## **Setup Details**

#### **Setup Details**

|                                               | Details | TDA2xx/TDA2Ex                                                                                 | TDA3xx                                       | TI814x                                                |
|-----------------------------------------------|---------|-----------------------------------------------------------------------------------------------|----------------------------------------------|-------------------------------------------------------|
| SoC Details                                   | Core    | IPU1 (M4) core 0                                                                              | IPU1 (M4) core 0                             | VPSS-M3                                               |
| Operating speed of Core                       |         | 212.5 MHz                                                                                     | 212.5 MHz                                    | 200 MHz                                               |
| of VPE  EVM TDA2x: Configuration @ 532M TDA2E |         | 266 Mpixels/sec                                                                               | NA                                           | 200 Mpixels/sec                                       |
|                                               |         | TDA2xx: 2 EMIFs Non-Interleaved, DDR3 @ 532MHz TDA2Ex: 1 EMIFs Non-Interleaved, DDR3 @ 666MHz | 1 EMIFs<br>Non-Interleaved,<br>DDR3 @ 532MHz | Ducati, HDVPSS, 2 EMIFs Interleaved,<br>DDR3 @ 333MHz |

| Optimization<br>Details | Is the Ducati cache enabled?                 | Yes                                                                                                                                                                                                                                                   | Yes            | Yes                                                                                                                                                                                                                           |
|-------------------------|----------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|                         | Profile                                      | release                                                                                                                                                                                                                                               | release        | release                                                                                                                                                                                                                       |
|                         | M4/M3 compile options (release build)        | -c -qq -pdsw225endian=little -mv7M4float_support=vfplibabi=eabisymdebug:dwarfembed_inline_assembly -g -ms -oe -O3 -op0 -osoptimize_with_debuginline_recursion_limit=20                                                                                | Same as TDA2xx | -c -qq -pdsw225endian=little -mv7M3<br>abi=eabisymdebug:dwarf<br>embed_inline_assembly -g -ms -oe -O3<br>-op0 -osoptimize_with_debug<br>inline_recursion_limit=20                                                             |
|                         | M4/M3 Linker<br>options (release<br>build)   | -w -q -u _c_int00silicon_version=7M4 -copt='endian=little -mv7M4float_support=vfplibabi=eabi -qq -pdsw225 -g -ms -oesymdebug:dwarf -op2 -O3 -osoptimize_with_debuginline_recursion_limit=20diag_suppress=23000'strict_compatibility=on -xzero_init=on | Same as TDA2xx | -w-q-u_c_int00silicon_version=7M3 -copt='endian=little -mv7M3abi=eabi -qq-pdsw225 -g -ms -oesymdebug:dwarf -op2 -O3 -osoptimize_with_debuginline_recursion_limit=20diag_suppress=23000'strict_compatibility=on -xzero_init=on |
|                         | DSP Compile<br>options (release<br>build)    | -mv6600 -abi=eabi -q -mi10 -mo -pden<br>-pds=238 -pds=880 -pds1110<br>program_level_compile -gendian=little<br>-O2display_error_number<br>diag_warning=225diag_wrap=off<br>preproc_with_compile                                                       | Same as TDA2xx | -mv6740 -abi=eabi -q -mi10 -mo -pden<br>-pds=238 -pds=880 -pds1110<br>program_level_compile -gendian=little<br>-O2display_error_number<br>diag_warning=225diag_wrap=off<br>preproc_with_compile                               |
|                         | DSP Linker<br>options (release<br>build)     | warn_sections -q -e=_c_int00<br>silicon_version=6600 -c                                                                                                                                                                                               | Same as TDA2xx | warn_sections -q -e=_c_int00<br>silicon_version=6740 -c                                                                                                                                                                       |
|                         | Is the code and data placed in L2/L3 memory? | No                                                                                                                                                                                                                                                    | No             | No                                                                                                                                                                                                                            |
|                         | Is the L3 interconnect optimized?            | No                                                                                                                                                                                                                                                    | No             | No                                                                                                                                                                                                                            |

## **Resources Details**

## Resource usage

| Details | TDA2xx/TDA2Ex     | TDA3xx            | TI814x            |
|---------|-------------------|-------------------|-------------------|
| Timers  | M4 Internal timer | M4 Internal timer | M3 Internal timer |

| HWI                                                                               | IPU1_23 (DSS DISPC), IPU1_26 (HDMI_IRQ) IPU1_27 (VIP1), IPU1_28 (VIP2), IPU1_29 (VIP3) IPU1_30 (VPE1) IPU1_41 (I2C1), IPU1_42 (I2C2), IPU1_43 (I2C3) DSP1_58 (MCASP1 RX), DSP1_59(MCASP1 TX), DSP1_60 (MCASP2 RX), DSP1_61 (MCASP2 TX), DSP1_91 (MCASP3 RX), DSP1_92(MCASP3 TX), DSP1_74 (MCASP4 RX), DSP1_51(MCASP4 TX), DSP1_79 (MCASP5 RX), DSP1_81(MCASP5 TX), DSP1_86 (MCASP6 RX), DSP1_87(MCASP6 TX), DSP1_88 (MCASP7 RX), DSP1_43(MCASP7 TX), DSP1_48 (MCASP8 RX), DSP1_49(MCASP8 TX) IPU1_57 (MCSP11), IPU1_58 (MCSP12) IPU1_44 (UART1), IPU1_60 (UART2), IPU1_62 (UART5), IPU1_63 (UART6), IPU1_64 (UART7), IPU1_65 (UART8), IPU1_69 (UART9), IPU1_70 (UART10)                                                                                                                                                           | IPU1_23 (DSS DISPC), IPU1_27 (VIP1), IPU1_41 (I2C1), IPU1_42 (I2C2) IPU1_64 (MCSPI1), IPU1_65 (MCSPI2), IPU1_48 (MCSPI3), IPU1_49 (MCSPI4) IPU1_44 (UART1), IPU1_43 (UART2), IPU1_45 (UART3)                              | 1 (IPU1_41 (I2C0), DSP1_75(MCASP2 RX), DSP1_74 (MCASP2 TX) (TI814x Instance starting from 0))                                |
|-----------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------|
| Low Latency HWI (This cant be preempted or disabled using Hwi_disable() BIOS API) | NA                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | NA                                                                                                                                                                                                                        | NA                                                                                                                           |
| I2C Instances<br>(Starting from 1)                                                | I2C1, I2C2, I2C5(for TDA2Ex) (Usage can be controlled from App)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | I2C1, I2C2 (Usage can be controlled from App)                                                                                                                                                                             | App) I2C1 (Usage can be controlled from App)                                                                                 |
| EDMA Channels                                                                     | UART1 (TX-48, RX-49), UART2 (TX-50, RX-51), UART3 (TX-52, RX-53), UART4 (TX-54, RX-55), UART5 (TX-62, RX-63), UART6 (TX-50, RX-51), UART7 (TX-50, RX-51), UART8 (TX-50, RX-51), UART9 (TX-50, RX-51), UART9 (TX-50, RX-51), UART10 (TX-50, RX-51) MCASP1TX - 1 (DSP EDMA), MCASP1TX - 0 (DSP EDMA), MCASP2TX - 3 (DSP EDMA), MCASP2TX - 2 (DSP EDMA), MCASP3TX - 5 (DSP EDMA), MCASP3TX - 5 (DSP EDMA), MCASP3TX - 7 (DSP EDMA), MCASP4TX - 6 (DSP EDMA), MCASP5TX - 9 (DSP EDMA), MCASP5TX - 11 (DSP EDMA), MCASP6TX - 11 (DSP EDMA), MCASP6TX - 10 (DSP EDMA), MCASP7TX - 13 (DSP EDMA), MCASP7TX - 12 (DSP EDMA), MCASP8TX - 15 (DSP EDMA), MCASP8TX - 14 (DSP EDMA)  MCSP11TX - 34, MCSP11TX - 35, MCSP12TX - 42, MCSP12TX - 43, MCSP13TX - 14, MCSP13TX - 15, MCSP14TX - 22, MCSP14TX - 23 (TDA2XX Instance starting from 1) | UART1 (TX-48, RX-49), UART2 (TX-50, RX-51), UART3 (TX-52, RX-53) MCSPI1TX - 34, MCSPI1RX - 35, MCSPI2TX - 42, MCSPI2RX - 43, MCSPI3TX - 14, MCSPI3RX - 15, MCSPI4TX - 22, MCSPI4RX - 23 (TDA3XX Instance starting from 1) | UART0TX - 26, UART0RX - 27, MCASP2TX - 12, MCASP2RX - 13, MCSPI0TX - 16, MCSPI0RX - 17, (TI814x Instance starting from 0)    |
| PLLs Used                                                                         | Video1_PLL and HDMI_PLL (All video PLLs configured according to display resolution selected)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | DSP_EVE_VID_PLL (configured according to display resolution selected)                                                                                                                                                     | HDVPSS PLL (200MHz) DPLL_VIDEO0, DPLL_VIDEO1, DPLL_HDMI (All video PLLs configured according to display resolution selected) |

| PRCM Done                                            | PRCM Done                                                                                                                                                                                                                        | None (all through GEL file/SBL)                                                     | HDVPSS and I2C0_2(Usage can be controlled from App)                                                                                                                                                         |
|------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| GPIO                                                 | GPIO4_13, GPIO4_14, GPIO4_15, GPIO4_16 and GPIO6_17 to control video mux select and sensor power on vision application card GPIO2_29, GPIO1_4, GPIO6_7 acts as Demux_FPD_A/B/C control signals in LVDS multi-deserializer board. | None                                                                                | None                                                                                                                                                                                                        |
| PinMuxing Details (Usage can be controlled from App) | See TDA2xx platform/board file for details                                                                                                                                                                                       | See TDA3xx platform/board file for details                                          | Following pins are configured as Video pins PINCNTL134-PINCNTL167, PINCTRL204-PINCTRL231, PINCTRL74 and PINCTRL75 as I2C Pins                                                                               |
| Memory<br>Requirements<br>(Cache able)               | See Memory Footprint table below                                                                                                                                                                                                 | See Memory Footprint table below                                                    | Code Memory 1MB, Data Memory 8MB (This includes drivers and M3 sample application)                                                                                                                          |
| Memory<br>Requirements<br>(Non Cache able)           | VIP/VPE Descriptor memory, see <b>Memory Footprint</b> table below                                                                                                                                                               | VIP Descriptor memory, see  Memory Footprint table below                            | HDVPSS Descriptor memory <b>2MB</b> , HDVPSS Shared memory <b>2MB</b> , Notify Shared memory <b>1MB</b> (HDVPSS shared and Notify shared memory is required only if proxy server is used for FBDEV or V4L2) |
| SWI                                                  | 1 per UART instance in case of DMA or Interrupt mode to handle UART RX/TX ISR                                                                                                                                                    | 1 per UART instance in case of<br>DMA or Interrupt mode to<br>handle UART RX/TX ISR | 1 per UART instance in case of DMA or<br>Interrupt mode to handle UART RX/TX<br>ISR                                                                                                                         |
| Tasks                                                | 1 (highest priority)                                                                                                                                                                                                             | 1 (highest priority)                                                                | 1 (highest priority)                                                                                                                                                                                        |

# **Memory Footprint**

TDA2xx Memory Footprint in bytes (Static Sections)

| Modules         | TDA2xx          |                     |              |  |  |
|-----------------|-----------------|---------------------|--------------|--|--|
|                 | Code<br>(.text) | DATA (.data,.const) | UDATA (.bss) |  |  |
| BSP Audio (DSP) | 2560            | 142                 | 672          |  |  |
| BSP Boards      | 19280           | 6868                | 0            |  |  |
| BSP Common      | 6398            | 0                   | 0            |  |  |
| BSP Devices     | 85455           | 368696              | 6440         |  |  |
| FVID2           | 6906            | 9276                | 0            |  |  |
| BSP I2C         | 2206            | 492                 | 0            |  |  |
| BSP McASP (DSP) | 41568           | 10354               | 10864        |  |  |
| BSP McSPI       | 16528           | 10796               | 0            |  |  |
| BSP OSAL        | 2093            | 46092               | 0            |  |  |
| BSP Platforms   | 2736            | 376                 | 0            |  |  |
| BSP UART        | 11212           | 5348                | 0            |  |  |
| BSP VPS         | 88850           | 64                  | 2821960      |  |  |
| Starterware HAL | 19492           | 4                   | 52           |  |  |

| Starterware I2C Lib       | 4642   | 820     | 0       |
|---------------------------|--------|---------|---------|
| Starterware PM HAL        | 17016  | 73635   | 0       |
| Starterware PM Lib        | 5230   | 21276   | 8       |
| Starterware System Config | 1168   | 1584    | 0       |
| Starterware Common        | 979    | 2056    | 0       |
| Starterware VPS Lib       | 139176 | 558306  | 736952  |
| Total                     | 473 KB | 1116 KB | 3576 KB |

## **TDA2Ex Memory Footprint in bytes (Static Sections)**

| Modules                   | TDA2Ex          |                     |              |  |
|---------------------------|-----------------|---------------------|--------------|--|
|                           | Code<br>(.text) | DATA (.data,.const) | UDATA (.bss) |  |
| BSP Audio (DSP)           | 2560            | 142                 | 672          |  |
| BSP Boards                | 19324           | 6868                | 0            |  |
| BSP Common                | 6398            | 0                   | 0            |  |
| BSP Devices               | 85459           | 368696              | 6440         |  |
| FVID2                     | 6906            | 9276                | 0            |  |
| BSP I2C                   | 2206            | 568                 | 0            |  |
| BSP McASP (DSP)           | 41568           | 10354               | 10864        |  |
| BSP McSPI                 | 16528           | 10796               | 0            |  |
| BSP OSAL                  | 2093            | 46092               | 0            |  |
| BSP Platforms             | 2544            | 376                 | 0            |  |
| BSP UART                  | 11212           | 5348                | 0            |  |
| BSP VPS                   | 88750           | 64                  | 1200808      |  |
| Starterware HAL           | 19388           | 4                   | 52           |  |
| Starterware I2C Lib       | 4718            | 956                 | 0            |  |
| Starterware PM HAL        | 16980           | 70612               | 0            |  |
| Starterware PM Lib        | 5108            | 20452               | 8            |  |
| Starterware System Config | 998             | 1584                | 0            |  |
| Starterware Common        | 979             | 2056                | 0            |  |
| Starterware VPS Lib       | 139188          | 253058              | 196280       |  |
| Total                     | 472 KB          | 807 KB              | 1415 KB      |  |

## **TDA3xx** Memory Footprint in bytes (Static Sections)

| Modules                   | TDA3xx          |                     |              |  |
|---------------------------|-----------------|---------------------|--------------|--|
|                           | Code<br>(.text) | DATA (.data,.const) | UDATA (.bss) |  |
| BSP Audio (DSP)           | 21988           | 796                 | 52           |  |
| BSP Boards                | 16640           | 6836                | 0            |  |
| BSP Common                | 6398            | 0                   | 0            |  |
| BSP Devices               | 85455           | 368696              | 6440         |  |
| FVID2                     | 6906            | 9276                | 0            |  |
| BSP I2C                   | 2186            | 260                 | 0            |  |
| BSP McASP (DSP)           | 39840           | 10422               | 1368         |  |
| BSP McSPI                 | 16532           | 10796               | 0            |  |
| BSP OSAL                  | 2093            | 46092               | 0            |  |
| BSP Platforms             | 2088            | 376                 | 0            |  |
| BSP UART                  | 11104           | 1792                | 0            |  |
| BSP VPS                   | 52272           | 64                  | 3014920      |  |
| Starterware HAL           | 21988           | 796                 | 52           |  |
| Starterware I2C Lib       | 4382            | 412                 | 0            |  |
| Starterware PM HAL        | 15424           | 38549               | 0            |  |
| Starterware PM Lib        | 4434            | 7812                | 8            |  |
| Starterware System Config | 1356            | 1584                | 0            |  |
| Starterware Common        | 979             | 2056                | 0            |  |
| Starterware VPS Lib       | 198912          | 331038              | 192076       |  |
| Total                     | 491 KB          | 836 KB              | 3215 KB      |  |

## TDA2xx Memory Footprint in bytes (Dynamic Heap memories)

| Use Case or Example           | System Stack (Cached section) | Task Stack (Cached section) | System Heap (Cached section) | VPDMA Descriptor Heap<br>(Non-cached section) |
|-------------------------------|-------------------------------|-----------------------------|------------------------------|-----------------------------------------------|
| Loopback Example<br>(VIP-DSS) | 1772                          | 1292                        | 3152                         | 722880 (Static)                               |
| M2M VPE Example               | 404                           | 1328                        | 2080                         | 722880 (Static)                               |

## **TDA2Ex Memory Footprint in bytes (Dynamic Heap memories)**

| Use Case or Example           | System Stack (Cached section) | Task Stack (Cached section) | System Heap (Cached section) | VPDMA Descriptor Heap<br>(Non-cached section) |
|-------------------------------|-------------------------------|-----------------------------|------------------------------|-----------------------------------------------|
| Loopback Example<br>(VIP-DSS) | 1296                          | 1804                        | 3152                         | 182208 (Static)                               |
| M2M VPE Example               | 404                           | 1352                        | 2080                         | 182208 (Static)                               |

## **TDA3xx** Memory Footprint in bytes (Dynamic Heap memories)

| Use Case or Example | System Stack (Cached section) | Task Stack (Cached section) | System Heap (Cached section) | VPDMA Descriptor Heap<br>(Non-cached section) |
|---------------------|-------------------------------|-----------------------------|------------------------------|-----------------------------------------------|
| Loopback Example    | 1296                          | 1788                        | 3152                         | 108544 (Static)                               |
| (VIP-DSS)           |                               |                             |                              |                                               |

## **Software Performance Numbers**

| SETUP                     |             |  |  |  |  |
|---------------------------|-------------|--|--|--|--|
| Profile Clock (MHz) - CTM | 425         |  |  |  |  |
| Platform                  | TDA2XX      |  |  |  |  |
|                           | ES1.0/ES1.1 |  |  |  |  |
| M4 Clock (MHz)            | 212.5       |  |  |  |  |
| Cache                     | Enabled     |  |  |  |  |
| Build                     | Release     |  |  |  |  |
| DDR3 (MHz)                | 532         |  |  |  |  |

## **Summary**

| Summary                                                                | FPS | Load  | Mhz  |
|------------------------------------------------------------------------|-----|-------|------|
| VIP Capture Driver Load (1 Channel 720p60 capture)                     | 60  | 0.25% | 0.53 |
| VPE M2M Driver (1 Channel 720x240 YUV420SP to 360x240 YUV422I, DEI ON) | 30  | 0.32% | 0.68 |
| DSS Display Driver (1 Video Pipe @720p60 display)                      | 60  | 0.11% | 0.23 |

## **VIP Capture Driver Performance**

| VIP Capture Driver                     | Av    | erage               | Max   |                     |  |
|----------------------------------------|-------|---------------------|-------|---------------------|--|
| (1 Channel 720p60 capture)             | Ticks | Duration<br>(in us) | Ticks | Duration<br>(in us) |  |
| M3 Load per frame (Including App Q/DQ) | 16664 | 41.66               | 32020 | 80.05               |  |
| Queue                                  | 2637  | 6.59                | 6038  | 15.10               |  |
| DeQueue                                | 2441  | 6.10                | 5646  | 14.12               |  |

#### **VPE M2M Driver Performance**

| VPE M2M Driver                                          |       | erage            | Max   |                     |
|---------------------------------------------------------|-------|------------------|-------|---------------------|
| (1 Channel 720x240 YUV420SP to 360x240 YUV422I, DEI ON) | Ticks | Duration (in us) | Ticks | Duration<br>(in us) |
| M3 Load per frame (Including App Q/DQ)                  | 42831 | 107.08           | 73072 | 182.68              |
| Queue                                                   | 32046 | 80.12            | 48642 | 121.61              |
| DeQueue                                                 | 2416  | 5.37             | 12708 | 31.77               |

#### **DSS Display Driver Performance**

| DSS Display Driver                     | Av    | erage               | Max   |                     |  |
|----------------------------------------|-------|---------------------|-------|---------------------|--|
| (1 Video Pipe @720p60 display)         | Ticks | Duration<br>(in us) | Ticks | Duration<br>(in us) |  |
| M3 Load per frame (Including App Q/DQ) | 47339 | 18.35               | 14942 | 37.36               |  |
| Queue                                  | 1528  | 3.82                | 2800  | 7.00                |  |
| DeQueue                                | 1341  | 3.35                | 3692  | 9.23                |  |

## VIP Capture to DSS Display Glass-to-Glass Latency Numbers

#### **Setup Details**

- TDA2xx EVM running the default video loopback application from OV Sensor->VIP->DSS->LCD
- OV Sensor is pointing to another monitor displaying millisecond counter running at 60 Hz
- Both the LCD image and original monitor are captured at the same time side by side using another digital still camera
- · Glass to glass latency is then calculated by taking the difference in time in the LCD and monitor

With this method, it is observed that the glass to glass VIP to DSS latency is measured to vary from 44ms to 66ms.

The explanation and the split-up for the above observation is as below

- Capture is happening at 30 FPS. This will have a 33.33 ms latency because of end of frame callback is used to trigger the display
- Display is running at 60 FPS. Since capture VSYNC and display VSYNCs are not synchronized, the latency can vary from 0 16.66 ms. Also since the display FPS is more than capture, the display will repeat the frame resulting in another possible 0 16.66 ms latency difference
- Also since this measurement is done by capturing PC monitor which is also running at 60 FPS, that could also
  introduce some more latency from 0 16.66 ms because of quantization error (i.e. counter can't display any time
  granular than 16.66 ms)
- Also the sensor and LCD latency should be considered, which looks like is negligible from the measured and theoretical calculations as above

## **Video Display Driver**

This section describes the display drivers performance numbers - throughput and CPU load.

#### Introduction

Display drivers takes the video buffers from the application and display the videos on HDMI/LCD at specified frame rate and resolution. Display drivers follows the FVID2 interface.

### Video 1,2,3 and Graphics 1 Display Driver

#### **Setup Details**

- TDA2xx/TDA2Ex EVM & TFC-S9700RTWV35TR-01 800x480 LCD from ThreeFive Corp
- TDA3xx EVM & LG LP101WX2 1280x800 LCD

#### Video Display performance values

| Output Display   | TDA2xx/TDA2Ex (IPU            | U1 Core0)             | TDA3xx (IPU1 Core0)           |                       |  |
|------------------|-------------------------------|-----------------------|-------------------------------|-----------------------|--|
| (Resolution)     | Frame Rate<br>(in Frames/sec) | CPU<br>Load<br>(in %) | Frame Rate<br>(in Frames/sec) | CPU<br>Load<br>(in %) |  |
| On/Off-Chip HDMI | 60 FPS (on-Chip<br>HDMI)      | 1%                    | 60 FPS (Off-Chip HDMI)        | 1%                    |  |
| LCD              | 60 FPS                        | 1%                    | 60 FPS                        | 1%                    |  |

## **Buffer Queue Latency**

Driver latency to program the buffer to DSS = code execution time from APP queue to programming (T1) + 5 line of display rate (T2). With TDA2XX EVM, T1 is measured to be around 20 micro seconds.

#### Value of T2 for different resolution

| Display Resolution | T2 in micro<br>seconds |
|--------------------|------------------------|
| 800x480@60fps      | 158.25                 |
| 1280X720@60fps     | 107.74                 |
| 1920X1080@60fps    | 74.07                  |

The total latency comes around 180 us for 800x480 @ 60 FPS display. So if any buffer is queued 180 us before the Vsync then the buffer will be displayed in the next frame period.

**Note:** This measurement is done with the stand alone display application. In fully loaded system the interrupt latency will add to it.

**Reason for 5 lines check:** This check is required so that the driver won't program the buffer address around the display VSYNC period. Doing so would result in DSS HW not accepting the programmed buffer resulting in frame drop.

## **Video Capture Driver**

This section describes the video capture driver performance numbers - throughput and CPU load.

#### Introduction

VIP capture driver makes use of VIP hardware block to capture data from external video source like sensors and video decoders. The video data is captured from the external video source by the VIP Parser sub-block in the VIP block. The VIP Parser then sends the captured data for further processing in the VIP block which can include color space conversion, scaling, chroma down sampling and finally writes the video data to external DDR memory.

#### **Setup Details**

- TDA2xx/TDA2Ex Base EVM + Vision App board or TDA3xx Base EVM
- Sensor Omnivision OV10635

#### Video Capture (OV10635 Video Sensor) performance values

| Video                | TDA2xx/TDA2Ex (IPU1 Core0) |        | TDA3xx (IPU1 Core0) |        | TI814x (M3 Core1) |        |
|----------------------|----------------------------|--------|---------------------|--------|-------------------|--------|
| (Resolution)         | Field Rate per             | CPU    | Field Rate per      | CPU    | Field Rate per    | CPU    |
|                      | Channel                    | Load   | Channel             | Load   | Channel           | Load   |
|                      | (in Frames/sec)            | (in %) | (in Frames/sec)     | (in %) | (in Frames/sec)   | (in %) |
| 1 CH 720P resolution | 30                         | 1%     | 30                  | 1%     | NRY               | NRY    |

#### Video Capture (Video Decoder - TVP7002) performance values

| Video                | TDA2xx/TDA2Ex/TD                             | A3xx (IPU1 Core0)  | TI814x (M3 Core1)                            |                       |  |
|----------------------|----------------------------------------------|--------------------|----------------------------------------------|-----------------------|--|
| (Resolution)         | Field Rate per<br>Channel<br>(in Frames/sec) | CPU Load<br>(in %) | Field Rate per<br>Channel<br>(in Frames/sec) | CPU<br>Load<br>(in %) |  |
| 1 CH 720P resolution | NA                                           | NA                 | 60                                           | 1%                    |  |

## **Memory to Memory Drivers**

This section describes the memory-to-memory drivers' performance numbers - throughput and CPU load.

#### Introduction

M2M drivers takes the video buffer from the memory, optionally process the buffer, (processing done on the buffer depends on the specific M2M driver) and puts it back to memory. M2M driver follows the FVID2 interface for the applications.

### **VPE M2M Driver**

This driver takes YUYV422/YUV420 interlaced/progressive input via the DEI path and provide a scaled version of the deinterlaced/bypassed with optional conversion to YUV422/YUV420/RGB output.

The performance is calculated based on below:

- Width to consider = MAX(In Width, Out Width)
- Height to consider = MAX(In Height, Out Height)

#### **Setup Details**

- CPU Idle Disabled
- Calculate time required for single scaler operation and for CPU load, issue scaler operation in contiguous loop with queuing buffer for each scaling.

### **VPE Driver Performance values**

| Scaling Factor                                                                   |                          | TDA2xx (II                | PU1 Core0)              |                       |                          | TDA2Ex (I                 | PU1 Core0)              |                       |
|----------------------------------------------------------------------------------|--------------------------|---------------------------|-------------------------|-----------------------|--------------------------|---------------------------|-------------------------|-----------------------|
| (Resolution)                                                                     | Max<br>Frames per<br>Sec | Mega<br>Pixels per<br>Sec | Hardware<br>Utilization | CPU<br>Load<br>(in %) | Max<br>Frames per<br>Sec | Mega<br>Pixels per<br>Sec | Hardware<br>Utilization | CPU<br>Load<br>(in %) |
| 1 CH D1 (720x480) YUYV422I to CIF<br>(360x240) YUYV422I with DEI OFF<br>(TC0001) | 707                      | 244 MP/s                  | 91%                     | 9%                    | 706                      | 244 MP/s                  | 91%                     | 9%                    |
| 1 CH D1 (720x480) YUYV422I to<br>1080P YUYV422I with DEI OFF<br>(TC0004)         | 126                      | 261 MP/s                  | 98%                     | 15%                   | 126                      | 261 MP/s                  | 98%                     | 6%                    |
| 1 CH D1 (720x480) YUYV422I to CIF<br>(360x240) YUYV422I with DEI ON<br>(TC0021)  | 695                      | 240 MP/s                  | 90%                     | 69%                   | 690                      | 238 MP/s                  | 89%                     | 39%                   |
| 4 CH D1 (720x480) YUYV422I to CIF<br>(360x240) YUYV422I with DEI OFF<br>(TC2001) | 730                      | 252 MP/s                  | 94%                     | 45%                   | 731                      | 252 MP/s                  | 94%                     | 45%                   |
| 8 CH D1 (720x480) YUYV422I to D1<br>(720x480) YUYV422I with DEI OFF<br>(TC2002)  | 737                      | 254 MP/s                  | 95%                     | 7%                    | 737                      | 254 MP/s                  | 95%                     | 4%                    |
| 4 CH WXGA (1280x800)<br>YUV420SP_UV to 640x400 YUYV422I<br>with DEI OFF (TC2007) | 252                      | 258 MP/s                  | 96%                     | 8%                    | 252                      | 258 MP/s                  | 96%                     | 3%                    |
| 6 CH WXGA (1280x800) YUYV422I to<br>640x400 YUYV422I with DEI OFF<br>(TC2008)    | 254                      | 260 MP/s                  | 97%                     | 7%                    | 254                      | 260 MP/s                  | 97%                     | 3%                    |

### VPE Driver Performance values with 304MHz from Video PLL1

| Scaling Factor                                                             |                       | TDA2xx (IPU            | 1 Core0)                |                       |
|----------------------------------------------------------------------------|-----------------------|------------------------|-------------------------|-----------------------|
| (Resolution)                                                               | Max Frames per<br>Sec | Mega Pixels per<br>Sec | Hardware<br>Utilization | CPU<br>Load<br>(in %) |
| 1 CH D1 (720x480) YUYV422I to CIF (360x240) YUYV422I with DEI OFF (TC0001) | 802                   | 277 MP/s               | 91%                     | 8%                    |
| 1 CH D1 (720x480) YUYV422I to 1080P YUYV422I with DEI OFF (TC0004)         | 142                   | 295 MP/s               | 97%                     | 4%                    |
| 1 CH D1 (720x480) YUYV422I to CIF (360x240) YUYV422I with DEI ON (TC0021)  | 782                   | 270 MP/s               | 88%                     | 8%                    |
| 4 CH D1 (720x480) YUYV422I to CIF (360x240) YUYV422I with DEI OFF (TC2001) | 825                   | 285 MP/s               | 93%                     | 6%                    |
| 8 CH D1 (720x480) YUYV422I to D1 (720x480) YUYV422I with DEI OFF (TC2002)  | 830                   | 287 MP/s               | 94%                     | 8%                    |
| 4 CH WXGA (1280x800) YUV420SP_UV to 640x400 YUYV422I with DEI OFF (TC2007) | 285                   | 292 MP/s               | 96%                     | 6%                    |

| 6 CH WXGA (1280x800) YUYV422I to 640x400 YUYV422I with DEI | 286 | 293 MP/s | 96% | 2% |
|------------------------------------------------------------|-----|----------|-----|----|
| OFF (TC2008)                                               |     |          |     |    |

## Calculating Performance for Memory to memory drivers (VPE)

The description below is based on actual performance seen with SW drivers on actual Si.

#### Performance of Scalar (SC) with DEI OFF

#### This is applicable for TDA2xx VPE & TI814x (DEI-WB path).

Here DEI, whereever applicable, is assumed to be in bypass mode.

When DEI is not in bypass mode the performance description is given in subsequent section.

Each SC operates at 266 Mhz clock (in TDA2xx) and 200Mhz (in TI814x).

In theory it can process 1 pixel per clock, i.e

- about 266 mega pixel per second (MP/s) in TDA2xx.
- about 200 mega pixel per second (MP/s) in TI814x.

But due to inherent overheads due to overlapping needed for various filtering operations, the practical standalone (i.e only SC running in system) speed would be

- about 240-250 MP/s (mega pixels/sec) in TDA2xx
- about 180-190 MP/s (mega pixels/sec) in TI814x

When SC is run with other modules like other driver, or codecs the performance may drop further due to DDR BW.

SW overheads will also reduce SC performance, but with TI BSP driver we see very little impact of SW overheads.

#### Taking typical use-case, each SC can safely do

- about 186MP/s processing (in TDA2xx).
- about 130MP/s processing (in TI814x).

Number of pixel processed when doing SC for a 1 D1 CH of 720x480 @ 30frames per second, is 720x480x30(frames per second) = 10.3MP/s

Here Output from SC is  $\leq 720x480$ 

Thus SC can safely do about 16CHs of D1 (in TDA2xx) and about 12CH D1 (in TI814x) when its output size is <= 720x480, i.e only downscaling is done in the scaler.

In practice with BSP only applications we found that measured SC performance is

- about 22 D1 CHs (about 236MP/s) in TDA2xx
- about 13 D1 CHs (about 140MP/s) in TI814x

With other activity like codec, performance should drop but we know each SC will safely give

- 20CH D1 performance (200MP/s) in TDA2xx
- 12CH D1 performance (130MP/s) in TI814x

When scalar upsampling is used the results would be bit different.

For use-case of scaling 720x480 to 960x540 output size, the performance for 1CH would be,

960x540(since 960x540 > 720x480) x30(frames per second) = 15.5MP/s

In TDA2xx, assuming SC performance is 200MP/s, thats about 12 CHs

In TI814x, assuming SC performance is 130MP/s, thats about 8 CHs

#### Performance of Scalar (SC) with DEI ON

#### This is applicable for TDA2xx VPE & TI814x (DEI-WB path).

Each DEI operates at 266Mhz clock (in TDA2xx) and 200Mhz (in TI814x).

In theory it can process 1 pixel per clock, i.e

- about 266 mega pixel per second. (MP/s) in TDA2xx
- about 200 mega pixel per second. (MP/s) in TI814x

But due to inherent overheads due to overlapping needed for various filtering operations, the practical standalone (only DEI running in system) speed would be

- about 200-210 MP/s (mega pixels/sec) in TDA2xx
- about 150-160 MP/s (mega pixels/sec) in TI814x

When DEI is run with other modules like other driver, or codecs the performance may drop further due to DDR BW.

SW overheads will also reduce DEI performance, but with TI BSP drivers we see very little impact of SW overheads.

#### Taking DVR kind of use-case, each DEI can safely do

- about 170MP/s processing in TDA2xx
- about 130MP/s processing in TI814x

Number of pixel processed when doing DEI for a 1 D1 CH of 720x240 @ 60fields per second, is

720x240x2(since DEI results in 1 line becoming two lines)x60(frames per second) = 20.7MP/s

Here Output from DEI is  $\leq 720x480$ 

Thus DEI can safely do,

- about 8CHs of D1 in TDA2xx
- about 6CHs of D1 in TI814x

when its output size is <= 720x480, i.e only downscaling is done in the scaler after DEI.

In practice with BSP only applications we found that measured DEI performance is

- about 9-10 D1 CHs (about 200MP/s) in TDA2xx
- about 6-7 D1 CHs (about 140MP/s) in TI814x

With other activity like codec, performance should drop but we know each DEI will safely give

- 8CH D1 performance in TDA2xx.
- 6CH D1 performance in TI814x.

Above is when scalar downsampling is used after DEI.

When scalar upsampling is used the results would be bit different.

For use-case of 960x540 output size, the performance for 1CH would be,

960x540(since 960x540 > 720x480) x60(fields per second) = 31.1MP/s

In TDA2xx, assuming DEI performance is 170MP/s, thats about 5-6 CHs

In TDA2xx, assuming DEI performance is 130MP/s, thats about 4 CHs

### **ISS Drivers**

### **ISS Capture Driver (CAL)**

ISS captures video streams via CAL sub-block of the ISS. It provides interfaces to capture via mipi CSI2 and Parallel. Typically used to capture streams from sensors such as Omnivision 10640, Aptina Ar0132 & Aptina AR0140. To measure the performance, RAW 12 video stream @ 30 FPS is captured from OV10640 and written into memory.

#### **Setup Details**

- TDA3xx EVM
- Sensor Omnivision OV10640, Data Format as RAW 12

#### Video Capture (OV10635 Video Sensor) performance values

| Video                | TDA3xx (IPU1 Core0)                          |                       |  |  |
|----------------------|----------------------------------------------|-----------------------|--|--|
| (Resolution)         | Field Rate per<br>Channel<br>(in Frames/sec) | CPU<br>Load<br>(in %) |  |  |
| 1 CH 720P resolution | 30                                           | < 1%                  |  |  |

### ISS M2M ISP WDR Driver

This driver takes RAW 12 video frame, companded and performs 2 pass processing. In pass 1, low exposure is processed and in pass 2 high exposure is processed and merged with low exposure. Writes the processed frame to memory in YUV420 SP (NV12) datafomat.

#### **Setup Details**

- Input frame RAW12 of size 1280x960
- Output YUV420 SP (NV12)

#### **WDR Driver Performance values**

| WDR                | TDA3xx (IPU1 Core0)   |                        |  |  |
|--------------------|-----------------------|------------------------|--|--|
|                    | Max Frames per<br>Sec | Mega Pixels per<br>Sec |  |  |
| Pass 1             | 125                   | 146.48 MP/s            |  |  |
| Pass 2             | 143                   | 167.41 MP/s            |  |  |
| Pass 1 & Pass<br>2 | 66                    | 156.25 MP/s            |  |  |

### ISS M2M SIMCOP (LDC + VTNF) Driver

This driver read a YUV420 frame (NV12), performs LDC corrections, apply temporal noise filter and writes it back to memory.

#### **Setup Details**

- TDA3xx EVM
- Input frame YUV420 SP (NV12) of size 1920x1080
- Output YUV420 SP (NV12)

#### **SIMCOP Driver Performance values**

| SIMCOP LDC + VTNF                   | TDA3xx (IPU1 Core0)   |                        |  |  |
|-------------------------------------|-----------------------|------------------------|--|--|
|                                     | Max Frames per<br>Sec | Mega Pixels per<br>Sec |  |  |
| LDC in Bi-Linear Interpolation mode | 87                    | 172 MP/s               |  |  |
| LDC in Bi-Cubic Interpolation mode  | 50                    | 99 MP/s                |  |  |

## **Overall System Performance**

BSP package is having video loopback example. Below table shows the performance numbers for the different combination of the BSP drivers.

#### **System Performance Values**

| Mode                                          | TDA2xx (IPU1 | TDA2Ex (IPU1 | TDA3xx (IPU1 |  |
|-----------------------------------------------|--------------|--------------|--------------|--|
|                                               | Core0)       | Core0)       | Core0)       |  |
|                                               | CPU Load     | CPU Load     | CPU Load     |  |
|                                               | (in %)       | (in %)       | (in %)       |  |
| 1 channel Capture (30 FPS) + Display (60 FPS) | 2%           | 2%           | 2%           |  |
| 1 channel Capture + VPE + Display             | NRY          | NRY          | NA           |  |

## **UART Driver**

This section describes the UART drivers' performance numbers - throughput and CPU load.

#### Introduction

The UART drivers in used to transfer data to and from the UART terminal. The UART driver follows the BIOS GIO/IOM driver model.

#### **Setup Details**

• Calculate time and CPU load required for UART transfer operation - issue GIO\_submit operation in contiguous loop. Below are the test parameters

Instance : UART1Baudrate : 115200Stop Bits : 1Parity : None

• Character Length: 8 bits

• Bytes per GIO Submit: 138

## **UART Driver Performance values**

| Test Case                                                               | TDA2xx (IPU1 Core0)       |                         |                       | TDA2Ex (IPU1 Core0)       |                         |                       | TDA3xx (IPU1 Core0)       |                         |                       |
|-------------------------------------------------------------------------|---------------------------|-------------------------|-----------------------|---------------------------|-------------------------|-----------------------|---------------------------|-------------------------|-----------------------|
|                                                                         | TX Bytes<br>per<br>Second | Hardware<br>Utilization | CPU<br>Load<br>(in %) | TX Bytes<br>per<br>Second | Hardware<br>Utilization | CPU<br>Load<br>(in %) | TX Bytes<br>per<br>Second | Hardware<br>Utilization | CPU<br>Load<br>(in %) |
| Polled Mode, FIFO Enable (TC_00102)                                     | 11412 BP/s                | 99%                     | 87%                   | 11416 BP/s                | 99%                     | 74%                   | 11416 BP/s                | 99%                     | 68%                   |
| Polled Mode, FIFO Disable (TC_00132)                                    | 1000 BP/s                 | 8%                      | 2%                    | 1000 BP/s                 | 8%                      | 2%                    | 1000 BP/s                 | 8%                      | 2%                    |
| Interrupt Mode, FIFO<br>Enable, TX Trigger Level 56<br>bytes (TC_00202) | 11439 BP/s                | 99%                     | 22%                   | 11439 BP/s                | 99%                     | 21%                   | 11440 BP/s                | 99%                     | 21%                   |
| Interrupt Mode, FIFO Disable (TC_00232)                                 | 11070 BP/s                | 96%                     | 100%                  | 11147 BP/s                | 96%                     | 100%                  | 11122 BP/s                | 96%                     | 100%                  |
| Interrupt Mode, FIFO<br>Enable, TX Trigger Level 8<br>bytes (TC_00241)  | 11439 BP/s                | 99%                     | 91%                   | 11439 BP/s                | 99%                     | 91%                   | 11439 BP/s                | 99%                     | 91%                   |
| Interrupt Mode, FIFO<br>Enable, TX Trigger Level 16<br>bytes (TC_00242) | 11438 BP/s                | 99%                     | 80%                   | 11438 BP/s                | 99%                     | 879%                  | 11439 BP/s                | 99%                     | 79%                   |
| Interrupt Mode, FIFO<br>Enable, TX Trigger Level 32<br>bytes (TC_00243) | 11438 BP/s                | 99%                     | 57%                   | 11438 BP/s                | 99%                     | 56%                   | 11439 BP/s                | 99%                     | 56%                   |
| DMA Mode, FIFO Enable,<br>TX Trigger Level 56 bytes<br>(TC_00302)       | 11450 BP/s                | 99%                     | 1%                    | 11450 BP/s                | 99%                     | 1%                    | 11449 BP/s                | 99%                     | 1%                    |
| DMA Mode, FIFO Disable (TC_00332)                                       | 11450 BP/s                | 99%                     | 1%                    | 11449 BP/s                | 99%                     | 1%                    | 11450 BP/s                | 99%                     | 1%                    |
| DMA Mode, FIFO Enable,<br>TX Trigger Level 8 bytes<br>(TC_00341)        | 11450 BP/s                | 99%                     | 1%                    | 11450 BP/s                | 99%                     | 1%                    | 11450 BP/s                | 99%                     | 1%                    |
| DMA Mode, FIFO Enable,<br>TX Trigger Level 16 bytes<br>(TC_00342)       | 11450 BP/s                | 99%                     | 1%                    | 11450 BP/s                | 99%                     | 1%                    | 11450 BP/s                | 99%                     | 1%                    |
| DMA Mode, FIFO Enable,<br>TX Trigger Level 32 bytes<br>(TC_00343)       | 11450 BP/s                | 99%                     | 2%                    | 11450 BP/s                | 99%                     | 2%                    | 11450 BP/s                | 99%                     | 2%                    |

# **Article Sources and Contributors**