Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move clock generation into C++ #89

Closed
chiggs opened this issue Sep 5, 2013 · 23 comments · Fixed by #3983
Closed

Move clock generation into C++ #89

chiggs opened this issue Sep 5, 2013 · 23 comments · Fixed by #3983
Labels
category:performance performance topics type:feature new or enhanced functionality

Comments

@chiggs
Copy link
Contributor

chiggs commented Sep 5, 2013

Having moved clock into python to ensure externals have a scheduling trigger we have diminished performance. We should use a c clock and create a monitor while external is in parties

@ktbarrett ktbarrett added the category:performance performance topics label Nov 17, 2020
@ktbarrett ktbarrett changed the title Performance enhancement: use python monitor while external in progress Move clock generation into C++ Nov 17, 2020
@marlonjames
Copy link
Contributor

This would be easier with CancelledError in order to clean up the clock between tests from a coroutine, rather than have the scheduler manage yet another thing.

@themperek
Copy link
Contributor

This may be required/needed first #2208 ?

@marlonjames
Copy link
Contributor

If it's just a clock generator, it should only need a timed callback (or two for different duty cycle) and a handle to the signal to be toggled.

@marlonjames
Copy link
Contributor

marlonjames commented Dec 8, 2020

Possible GPI interface for clocks:

class GpiClk;
typedef GpiClk *gpi_clk_hdl;

/**
 * Create a clock object associated with a signal handle.
 * @param[in] gpi_hdl       Handle to clock signal
 * @param[in] high_steps    Number of timesteps clock is high per period
 * @param[in] low_steps     Number of timesteps clock is low per period
 * @returns                 Clock object handle (NULL on error)
*/
GPI_EXPORT gpi_clk_hdl gpi_clock_create (gpi_sim_hdl gpi_hdl, uint64_t high_steps, uint64_t low_steps);

/**
 * Start driving a clock.
 * @param[in] clk_hdl       Handle to clock object
 * @param[in] cycles        Number of clock periods to cycle. Negative value will cycle clock until stopped
 * @param[in] start_high    Non-0 to start clock high, 0 to start clock low
 * @returns                 0 for successful start, non-0 on error
*/
GPI_EXPORT int gpi_clock_start (gpi_clk_hdl clk_hdl, uint64_t cycles, int start_high);

/**
 * Stop driving a clock.
 * If the clock is already stopped, this has no effect.
 * @param[in] clk_hdl       Handle to clock object
 * @returns                 0 if clock was already stopped, non-0 if clock was started
*/
GPI_EXPORT int gpi_clock_stop (gpi_clk_hdl clk_hdl);

/**
 * Delete a clock object.
 * If the clock is started, this will stop the clock before deleting.
 * @param[in] clk_hdl       Handle to clock object
 * @returns                 0 if clock was already stopped, non-0 if clock was started
*/
GPI_EXPORT int gpi_clock_delete (gpi_clk_hdl clk_hdl);

Maybe a gpi_clock_is_started(), although that should be managed by the GPI user.

I imagine this will be encapsulated in a class in pure Python or simulatormodule.cpp.
GpiClkHdl will use gpi_register_timed_callback() to call it's own callback, and gpi_set_signal_value_long() to cycle the clock signal, staying within the GPI layer and providing huge performance improvement.

The main issue is to handle clean-up between tests. Currently, I think the only way to do that is to keep track of all the clocks in the Scheduler and delete them after a test finishes.
If something like #1349 was in, the clock could be managed by a forked task that would delete the clock when the task was killed/canceled.

@ktbarrett
Copy link
Member

Should this be in the GPI or in the simulatormodule? I have a feeling the GPI will become an end-product based on interest in GPI implementations in extensions, GPI extensions for cocotb, and extension with other languages.

@marlonjames
Copy link
Contributor

It could be in either location. Clocks are so fundamental to simulation, I think providing that functionality makes sense. Users could always roll their own using the same primitives.

@themperek
Copy link
Contributor

Maybe should follow sc_clock input parameters? http://www.cecs.uci.edu/~doemer/risc/v020/html_oopsc/a00033.html

@marlonjames
Copy link
Contributor

@themperek That might make more sense at the Python level.

@ktbarrett
Copy link
Member

The only issue I have with your suggestion (and the SystemC one) is that is doesn't cover skew. It's not a common ask, but we have already had a couple users ask. It should be pretty easy to add to C++ (std::normal_distribution).

@marlonjames
Copy link
Contributor

@ktbarrett Do you mean clock jitter? If you want to skew two clocks from each other just start them at different times.

@ktbarrett
Copy link
Member

Yes jitter (brainfart).

@cmarqu
Copy link
Contributor

cmarqu commented Dec 8, 2020

Ha, I was also about to ask for a jitter capability, but I then thought that in those rare cases, one could go to a HDL clock generator.

@marlonjames
Copy link
Contributor

So support gaussian random jitter? Across each period or for each edge?
If we set the standard deviation to 1/3 the jitter limit and use [-limit, +limit] bounds, that would give us a pretty nice distribution.

@alexforencich
Copy link
Contributor

@themperek interesting that the systemc API doesn't support number of cycles. Does anyone actually use that capability of the cocotb clocks?

@ktbarrett
Copy link
Member

@alexforencich I seriously doubt it.

@alexforencich
Copy link
Contributor

In that case, maybe we should leave that out of the GPI/Verilator implementation, and either deprecate it or fall back on the current python implementation if a specific number of cycles is requested

@cmarqu
Copy link
Contributor

cmarqu commented Dec 8, 2020

If I wanted to change the frequency of the clock, I would create a new clock object for the same handle, stop the old clock and start the new clock? By waiting on the old clock edge, I could do the switching glitch-free, right?

I'm thinking of more complicated but still existing scenarios, like generating a spread spectrum clock with a "random" sequence (of rather short length) of discrete frequencies to be switched through cyclically. This is done in my world to shape the EMI spectrum the IC emits.

@alexforencich
Copy link
Contributor

alexforencich commented Dec 8, 2020

I'm wondering if it is possible to bind the clock to the signal, instead of having a separate structure for the clock, or perhaps bind it internally somehow. After all, it doesn't make sense to attach more than one clock to a signal. In that case, you could just call start_clock(signal, parameters...) as many times as you like, and it could certainly be set up to support gitch-free switching (assuming the parameters are reasonable). But maybe that's more trouble than it's worth.

Edit: maybe what you could do is add a reference to the clock object (if present) to the signal object, and then allocate that internally if needed when starting the clock, and free it when stopping the clock. And then all you need to do is set it up so that if the clock is running, the new parameters would apply at the end of the current cycle. If you want to apply the change immediately, stop the clock and then start it. Another option would perhaps to be to add an update_period API or something that changes the settings used starting from the next clock edge.

@marlonjames
Copy link
Contributor

marlonjames commented Dec 8, 2020

@cmarqu I think that would work, if you use the right start_high value. It would have to be controlled from Python due to only supporting a single value change callback at the moment.

Something like that could be done as a custom clock extension.

@marlonjames
Copy link
Contributor

marlonjames commented Dec 8, 2020

We could simplify it to a stimulus interface that takes a signal handle and a generator which yields (value, duration) tuples. Then cocotb could provide clock generators for common use cases.

That's not much different than just using the underlying gpi functions though.

@alexforencich
Copy link
Contributor

alexforencich commented Dec 8, 2020

What about (high_duration, low_duration) tuples, but then use a FIFO interface or similar and preload/batch it to reduce the communication overhead from python? Basically, instead of evaluating the generator on every clock edge, evaluate it 100 times, stuff those in a FIFO, and then evaluate it another 100 times when the FIFO drains. I'm considering doing something with a FIFO-style interface to GPI at some point (basically, provide a GPI back-end for my generic streaming modules), so figuring out how to do that efficiently would be useful.

But on the flip side, maybe this sort of per-cycle adjustment should be peeled off into a different API so the fixed-duration cycle stuff (which is probably going to be the most common) can be as efficient as possible.

@ktbarrett
Copy link
Member

ktbarrett commented Dec 9, 2020

Before we go too off into the weeds, realize this is an optimization. You optimize the common case, then you optimize less common stuff later. This is mostly implementation detail (with the exception that everything the GPI does will have to be reasonably achievable by an implementation), so we can start simple than toss it and replace it with something else later on. We don't need to design everything up front.

I'm still okay with the first suggestion. Everything that is supported by Clock can be moved into that.

@marlonjames
Copy link
Contributor

marlonjames commented Dec 10, 2020

There is another complication. The clock signals will change value immediately in the timed callback. This is different than how cocotb.clock.Clock currently works, which uses cached writes like self.signal <= 1 that delays the change until the ReadWrite callback.
The C++ clocks would be roughly equivalent to:

async def clock(signal, steps):
    while True:
        signal.setimmediatevalue(1)
        await Timer(steps)
        signal.setimmediatevalue(0)
        await Timer(steps)

Matching current behavior would require GPI re-work to support registering multiple readwrite callback functions.

The Scheduler also guarantees cached writes will occur before user tasks are run. Matching that behavior would require something like an additional bool append argument to gpi_register_readwrite_callback() to decide where to add the new callback in the list.

I'm not sure that the current behavior is necessary, but the change needs to be considered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category:performance performance topics type:feature new or enhanced functionality
Projects
None yet
6 participants