Support for PLL primitives #425
The goal of this issue is to document the kinds of PLL primitives of the various FPGA manufacturers, so that we can support them in nmigen.
Lattice PLLs are different for the different FPGA families.
The iCE40 family has five types of PLLs. They are all capable of shifting the output clock by 0 and 90 degrees, and allow fine delay adjustments of up to 2.5 ns (typical) in 150 ps increments (typical).
Can be used if the source clock of the PLL originates on the FPGA or is driven by an input pad that is not the bottom IO bank (bank 2).
Can be used if the source clock of the PLL is driven by an input pad that is located in the bottom or top IO bank (banks 2 and 0 respectively). When using this PLL, the source clock cannot be used anymore.
Can be used if the source clock of the PLL is driven by an input pad that is located in the bottom or top IO bank (banks 2 and 0 respectively). This PLL outputs the requested clock as well as the the source clock.
Can generate two different output frequencies. Can be used if the source clock of the PLL originates on the FPGA.
Can generate two different output frequencies. Can be used if the source clock of the PLL is driven by an input pad that is located in the bottom or top IO bank (banks 2 and 0 respectively).
The ECP5 family has bunch of clocking elements, but only one type of PLL.
It has four outputs, but if the user wants hook the PLL feedback, then one of the outputs cannot be used in the fabric. It supports dynamic clock selection and control, dynamic phase adjustment, etc.
Xilinx has two clock synthesis IPs as far as I know: PLL and Mixed-Mode Clock Manager (MMCM). The different FPGA families have different versions of these IPs.
The Spartan-6 family have Clock Management Tiles (CMTs), where each contain one PLL and two Digital Clock Managers (DCMs). The latter can be used to implement Delay Locked Loops, digital frequency synthesizers, digital phase shifters, or a digital spread spectrum.
There are two PLL primitives, that each have six clock outputs and a dedicated feedback output.
PLL_BASE provides access to the most commonly used features, such as clock deskewing, frequency synthesis, coarse phase shifting, and duty cycle programming.
PLL_ADV provides access to all PLL_BASE features, such as dynamically reconfiguring the PLL.
The 7 series also have CMTs, where each contains a MMCM and a PLL. The PLL contains a subset of the functions of the MMCM. The PLL has six clock outputs, whereas the MMCM has seven clock outputs. Both have a dedicated feedback output.
MMCME2_BASE and PLLE2_BASE
MMCME2_ADV and PLLE2_ADV
UltraScale / UltraScale+
The CMTs in the Ultrascale family contain an MMCM and two PLLs. Just like with the 7 series FPGAs, there are
The MMCMs have seven clock outputs and a dedicate feedback output. The MMCM IPs for the UltraScale are
The PLLs have two clock outputs each. Similarly to the MMCMs, the PLL IPs are
From what I can gather, all Intel FPGA PLLs are instantiated using the
Maybe it would be a good idea to just start with the iCE40 family (since its the simplest, and I use it my current design). We could implement the PLLs similarly to how Litex does it.
The text was updated successfully, but these errors were encountered:
Here are some of my thoughts on how a PLL could be used:
As I see it, there are three ways of creating a PLL.
The first one is basically a class that instantiates the specific PLL primitive, with some logic that calculates the PLL parameters. I think this leads to repeated code, since most primitives differ slightly among themselves.
The second one somewhat abstracts away the specifics of each PLL primitive and leads to code reuse.
The third one looks like it would require quite a bit of code. Who decides what the "correct" PLL primitive is? For example, for the iCE40 family, the usage of the
Creating Output Clocks
In Litex, you create a PLL object and call
I like how this works, but of course I'm open to suggestions. It might not make sense for the iCE40 PLL primitives with one clock output, but it would be consistent.
Also, maybe we should supply the Platform when creating the PLL object, so that when we create an output clock, it would automatically add the corresponding period constraint so that the user doesn't forget it. But maybe this is not necessary.
One more thing that might be interesting to think about (but not sure if this actually fits this issue) is where the PLL gets stored. Given there is usually only a small number of them, it almost feels like putting them into the platform could be a good idea. It would make it easier to generate the required clocks for different parts of a design without manually passing a PLL instance around or having to create all clocks toplevel.
Another thing (atleast on xilinx 7series, not sure if there are similar things on other platforms) is the placement of the PLL, if it is in the same clocking region as say the pin the input clock comes into the fpga from, one only needs a regional clock buffer for the input clock, if it is not one needs a global buffer. This is probably far out (and maybe out of scope), but some support from nmigen to help with choosing the correct clock buffer / automatically finding a good combination would be pretty interesting.
Okay, so I think we have two basic problems here that are essentially separate:
It seems to me that it's worthwhile to start with the frequency problem. Do you think you can start collecting the info about PLLs and expressing it in the form of Diophantine equations? Then we'll need to figure out some way to solve them other than brute force, I'm sure there should be libraries that help with that.
I think the PLL should definitely create the clock constraints and given that the platform is already passed to elaborate it should be fairly easy to do that there.
This isn't usually necessary because the backend toolchain already knows the output frequency if you constrained the inputs. But in any case, the platform is the factory of PLLs, so a PLL naturally knows what the platform is.
These are the polynomial equations that describe the relationship between the input and output frequencies, and the multiplication and division (integer) values, right?
Is brute forcing an actual issue? It might just be me, but it's not like Python chokes on calculating the parameters every time I build my design. Maybe I'm overlooking something?
Yes, with integer solutions, and some inequalities too, since none of the parameters have an arbitrary range.
We can start with brute forcing and improve it later, assuming that our data representation doesn't prevent us from doing so. (E.g. expressing the equations with arbitrary Python code obviously won't work.)
Something like that. It's not clear if we can use an existing library or if we should make yet another tiny sub-language for this--I'm leaning towards the latter since historically nMigen has been very conservative with the dependencies and lately it became more conservative, not less (126f0be).
For now I actually think it makes sense to collect PLL equations in this issue in human-readable form while the rest of the design is being worked on; the data entry part is the most laborous one anyhow, and converting them to the format accepted by whatever CAS we end up using is easy.
Is the final API meant to be for deriving different digital clocks are does one want to be more general ?
I would want it to be as general as much as it makes sense. I think most of the issues you mentioned can be solved if we can specify the location of the PLLs. In the DDR example, the user can use the same PLL to generate the two clocks, and use the two different clock domains as needed. Maybe there is a way to specify the phase difference as a constraint?
Yeah, you're right. I got confused :)
Output clocks are routed through dedicated clock networks, so it doesn't really matter. I would matter if they were to be routed through the fabric, but that's not recommended and you have specify that by hand for some vendors.
But I don't think it would be bad if you could specify its location.
Both the LP and HX parts have the same constraints when it comes to PLL parameters. The parameters are:
Now, depending on the chosen feedback path (
FEEDBACK_PATH other than SIMPLE
FEEDBACK_PATH = SIMPLE
These equations come from the
The parameters are:
These equations come from the
Looking at the frequency problem, here's a first attempt to document a generalized model of PLLs. I believe (based on the docs from @GuzTech above, and a passing familiarity with the code) that a solver based on this model would handle all of the Lattice PLLs in LiteX's clock.py.
Xilinx clocking seems to be a little simpler - I'll need to add that to the model. Have not looked hard at other vendors yet.
Please point out problems!
Older Altera parts (and Cyclone 10LP for some reason) use the ALTPLL (or ALTPLL_RECONFIG) IP to generate a PLL instantiating altpll. The key parameters are:
pll_type (string) which should probably be "AUTO" in general so Quartus can select which is used (only relevant for stratix, arria parts)
V series instantiates an altera_pll with a slightly simpler set of parameters for a basic pll:
fractional_vco_multipler (string) "true" or "false". Example was done with an integer-N PLL
I think there is a balance deciding how much to make user controllable and how much to default to basic settings to avoid creating something that can't be synthesized, but I've tried to show here the parameters that would be relevant to the majority of use cases.
Thanks @H-S-S-11 for the Altera info. In terms of the basic PLL architecture and frequency calculations, it lines up well with Lattice and Xilinx parts.
Agreed! I think the following should cover most designs, and be included in the initial PLL API.
For a basic PLL API, I don't think we need features such as
Here's an outline of a solver: https://github.com/alanvgreen/nmigen/blob/pll/experiments/PLLSolver_pynb.ipynb
It's in Jupyter notebook form so, please mess around with it.
At this stage, I'd like feedback on the FrequencySolver API.
The shape of the solver is based on algorithms found in migen's clock.py. In clock.py, though the frequency problem and routing problem solutions are closely tied.
The "brute-force" solver is probably fast enough for most situations. Due to the allowed ranges of input and VCO frequencies, large parts of the search space are trimmed quite quickly. And it seems to be working for litex, too.
I'll keep going with this, then.