Clocks and programmable delays in the DDR3 memory interface of Elphel NC393 camera
Block diagram of the memory interface is available here: [http://blog.elphel.com/wp-content/uploads/2014/06/eddr3_bdiag.png]
Used PLL and clock buffer resources
DDR3 controller uses one MMCME2_ADV and one PLLE2_ADV primitives. PLLE2_ADV drives only clk_ref, all other clocks described below (with the exception of the byte_lane0_i/iclk and byte_lane1_i/iclk, generated from the I/O ports) are driven by the MMCME2_ADV.
All referenced parameters are defined in x393_parameters.vh
Controller utilizes the following clock buffers (frequencies are shown for the current implementation), Verilog keywords posedge and negedge refer to the rising and falling clock edges:
- 2 of BUFG:
- mclk (200 MHz) used in the high-level parts of the memory controller, for request arbitration logic, organization of the multi-channel client memory access as a sequence of the transactions, synchronizes fills/reads the channel buffers from the memory side. Most events are synchronized by the rising clock edge, but the data from the memory to buffers is synchronized by the falling edge of this clock. This clock has a static phase shift (defined by the parameter MCLK_PHASE=90) to other clocks, such as clk_div (below) used in ISERDES and OSERDES I/O serializers/deserializers, so when crossing clock boundary mclk -> clk_div (from posedge mclk to posedge clk_div) there is 0.75 of the clock period (3.75 ns), and the the same time is available when crossing back: clk_div -> mclk (from posedge clk_div to negedge mclk)
- clk_ref is used only for I/O delay modules (connected to the IDELAYCTRL) and is now 200MHz, will likely be 300MHz in the future.
- 1 BUFIO clock buffer:
- sdclk (400MHz) with OBUFTDS drives DDR3 memory differential clock signal. Its source comes from the
only MMCME2_ADV output that is not driven by the dynamic phase shifter, so effectively it is as if it is
only one controlled by it. This is done so to avoid dependence on the PLL in the external DDR3 chip.
With the current settings of the MCME2_ADV (VCO frequency = 800MHz) this phase adjust step is 1/56 of 1/Fvco ~= 22ps and dynamic phase shift can be adjusted in the +/-127 counts range, or more than 1 full period of the sdclk in each direction (full sdclk period corresponds to 112=0x70 phase steps). When the phase shift is increased, DDR3 clock arrives earlier to the memory chip.
- sdclk (400MHz) with OBUFTDS drives DDR3 memory differential clock signal. Its source comes from the only MMCME2_ADV output that is not driven by the dynamic phase shifter, so effectively it is as if it is only one controlled by it. This is done so to avoid dependence on the PLL in the external DDR3 chip.
- 4 of the regional clock buffers (BUFR):
- clk - (400MHz) drives I/O serializers and deserializers, with the rising edge aligned with the clk_div. This clock has statically defined phase by the parameter CLK_PHASE=0
- clk_div (200 Mhz) also drives I/O serializers and deserializers, synchronizing their parallel side (interfacing to the rest of the system). Static phase is specified in parameter CLK_DIV_PHASE=0
- byte_lane0_i/iclk and
- byte_lane0_i/iclk are parts of the two byte_lane modules and provide clocks derived from the differential DQSL and DQSU ports. Normally these clocks are used for the memory read operations when DQSL and DQSU are generated by the memory device, but in the write levelling mode DQSL and DQSU are generated by the FPGA and is fed back with these clocks to drive input deserializers. These two clocks are gated, and in memory provides just enough of the pulses to push data through the ISERDES modules.
Available programmable delays
Clocks mclk, clk and clk_div have statically defined relative phases (__clk and clk_div are posedge-aligned and __mclk is 90 degrees later). External memory device differential clock can be adjusted to any phase with 360/112 calibrated phase shift step. All other available programmable delays are based on IDELAY2_FINEDELAY and ODELAY2_FINEDELAY primitives with delays consisting of 2 parts:
- 31-tap delay with calibrated 78ps/tap resolution for 200Hz reference clock (currently used) and shorter 52ps/tap delays when using 300Mhz. This provides the full range of 2.4ns - slightly less than the full period of 400Mhz clock
- 5-tap uncalibrated delay of approximately 10ps/tap connected in series with the main 31-tap one. When using 200MHz this stage covers approximately 1/2 of the step of the 31-tap delays, effectively adding just one extra bit of the resolution. 300 MHz (available in the faster Zynq devices than the one used in the prototype of the NC393 camera) will allow more uniform subdivision of the delays. All delays in this controller use 8-bit delay value, with 5 MSBs controlling the 31-tap delay and the 3 LSB controlling the fine delay. Only values of 0, 1,2, 3 and 4 are valid for the 3 LSB of each delay.
There are 18 programmable input delays and 43 output delays, each individually controlled:
- Two of the DQS_IDELAY values control delay from the I/O ports to the byte_lane0_i/iclk and byte_lane1_i/iclk clocks that drive input deserializers. differences between individual DQS_IDELAY values and DQS_IDELAY should get in the center of the "eyes" for reliable reading data, but the absolute value of the DQS_IDELAY values is important for crossing boundaries from the byte_lane*_i/iclk clocks to the clk and clk_div ones. Input clocks are generated by the memory device with its PLL driven by the sdclk (indirectly dependent on the programmable phase shift), and the DQS phase may fluctuate relative to the FPGA clocks. During
- 16 DQ_IDELAY values set the input delays of the individual data bit signals. Data acquisition windows are determined by the DQ_IDELAY- DQS_IDELAY differences while DQS_IDELAY have to satisfy other requirements too.
- Two DQS_ODELAY determine delay of the QDS signals. These values can be determined with the "write levellimg" procedure - when the memory device sees DQS* signal lagging behind the clock, it outputs '8h1 on each of the data bytes, when it is too early - outputs '8h0
- 16 DQ_ODELAY values set the output delays for the individual data bit signals to the memory. Data sent to the memory should be centered around DQS transitions, so DQ_ODELAY values should be approximately 90 degrees (~0x40 when other signals delays in teh FPGA are equal) higher than DQS_ODELAY . If the available ranges do not allow that, DQS_ODELAY can be modified together with the clock phase (verifying the DQS_IDELAY requirements above).
- 2 of the DM_ODELAY values control data mask signals, they have the same timing requirements as
- 23 _DLY_CMDA_ODELAY values control command, address, bank address and ODT outputs to DDR3. Their timing requirements are more relaxed as they operate in SDR (not DDR) mode.