Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor opcode decoding a bit to kill FifoCommandRunnable. #924

Merged
merged 1 commit into from
Sep 3, 2014

Commits on Sep 1, 2014

  1. Refactor opcode decoding a bit to kill FifoCommandRunnable.

    Separated out from my gpu-determinism branch by request.  It's not a big
    commit; I just like to write long commit messages.
    
    The main reason to kill it is hopefully a slight performance improvement
    from avoiding the double switch (especially in single core mode);
    however, this also improves cycle calculation, as described below.
    
    - FifoCommandRunnable is removed; in its stead, Decode returns the
    number of cycles (which only matters for "sync" GPU mode), or 0 if there
    was not enough data, and is also responsible for unknown opcode alerts.
    
    Decode and DecodeSemiNop are almost identical, so the latter is replaced
    with a skipped_frame parameter to Decode.  Doesn't mean we can't improve
    skipped_frame mode to do less work; if, at such a point, branching on it
    has too much overhead (it certainly won't now), it can always be changed
    to a template parameter.
    
    - FifoCommandRunnable used a fixed, large cycle count for display lists,
    regardless of the contents.  Presumably the actual hardware's processing
    time is mostly the processing time of whatever commands are in the list,
    and with this change InterpretDisplayList can just return the list's
    cycle count to be added to the total.  (Since the calculation for this
    is part of Decode, it didn't seem easy to split this change up.)
    
    To facilitate this, Decode also gains an explicit 'end' parameter in
    lieu of FifoCommandRunnable's call to GetVideoBufferEndPtr, which can
    point to there or to the end of a display list (or elsewhere in
    gpu-determinism, but that's another story).  Also, as a small
    optimization, InterpretDisplayList now calls OpcodeDecoder_Run rather
    than having its own Decode loop, to allow Decode to be inlined (haven't
    checked whether this actually happens though).
    
    skipped_frame mode still does not traverse display lists and uses the
    old fake value of 45 cycles.  degasus has suggested that this hack is
    not essential for performance and can be removed, but I want to separate
    any potential performance impact of that from this commit.
    comex committed Sep 1, 2014
    Configuration menu
    Copy the full SHA
    608f9bc View commit details
    Browse the repository at this point in the history