Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep working row of pixels into PRU RAM, use devmemkm for low GPIO latency, new redbeat demo, update README. #66

Merged
merged 41 commits into from Mar 25, 2021

Conversation

bigjosh
Copy link

@bigjosh bigjosh commented Dec 15, 2020

The main focus of this update was to get rid of the white flashing from issue #49. These flashes happened when the PRU was delayed in accessing the GPIO registers in between the initial rising edge and the zero bit falling edge (time period T0H). If this time was stretched long enough, it would cause the zero bit to look like a one bit to the pixels, so they would flash. In practice, this would happen often enough on my installations to be visually distracting.

To fix it, we needed to reduce the jitter in accessing those GPIO lines during the T0H period so that the signal sent was always shorter than a one bit signal would be.

This took a multi-prong approach that included....

  1. Rewriting the WS281X PRU code so that preloaded a full row of pixels into registers and PRU RAM at the beginning of each row cycle. This reduced traffic from the PRU over interconnects to the DDR RAM to 1/24th of the previous level.
  2. Blocking PRU1 from actually writing any signals onto the pins. Previously both PRUs were started with identical code, and these two copies of the code would then pound the exact same GPIO addresses at almost exactly the same time (+/-10's of ns). This caused 2X the traffic at exactly the time you didn't want it (when the bits were changing) and also some very rare and very (otherwise) hard to explain waveforms!
  3. Streamlining the PRU code that writes to the GPIO registers so that as much potential jitter-causing access is moved outside the T0H time. This meant stretching out the T1H waveform, but that is OK because a long T1H is still a T1H, whereas a long T0H is a T1H.
  4. Using full blast dummy writes to the GPIO registers to generate the T0H delay rather than spinning in place. This blocks other requestors from getting access to the interconnect during this critical time period.
  5. Using devmemkm (a new repo) to set the PRIORITY of the PRU's requestor on the interconnects to be the highest in the system. This is what allows us to block other requestors in step 4 above.

Taken individually, each of these changes noticeably reduces the incidence of the flashes. Used together, they eliminate the flashes completely with a wide margin.

Here is what the new WS281X bit waveforms look like (GPIO0, GPIO1, GPIO2 from bottom to top) ...

NewFile6

I've tested this combination for many many hours and under many load conditions and I have not been able to create more than 100ns of jitter on the T0H waveform, which is well within spec. Here is 24 hours of 0-bits under as many stress conditions I could throw at it...
NewFile5

You can see that the max T0H is about 280ns, well under the spec'ed max T0H of 500ns. Also note the jitter is <100ns (only visible on the bottom trace since that one is trigger).

Functionally I've had several dozen strings running the black demo next to me for several days and have not seen a single flash.

Note that this approach does slightly lower the frame update rate - probably by about 10%. For my application I am happy to live with this cost for not having flashes, but if anyone really needs every drop of FPS then it would be possible to rework this code and get a 10% speedup over the previous versions by changing the format of the pixel array data passed to the PRU so that it was already in GPIO0 zeros, GPIO1 zeros, ... format rather than RBGW format. LMK if anyone is interested in pursuing this.

Additionally this PR includes a couple of other random improvements, README updates, and a new redbeat demo mode that I use so that I can see if a remote system has lost network connectivity but is still getting power.

Questions, comments, and testing welcome!

root and others added 30 commits September 12, 2019 16:05
Each one will suppress the signal for the pin for channel `?`.

Neede this becuase hot or bad pixels will causing the whole row to light up and ruin the display. This at least lets you make the bad row look dark.
Update LED reset timing to work with newer chips as per this commit...
Clean up markdown headings
see the timing problem that makes 0 bits get streched into 1 bits.
into hard coded #defines. This SUCKS becuase you cant have any comments or anything but I'll fricken take it after like 5 hours of battles.

Now just to build it out and add the comments above becuase there is no way to figure
out what the hell is going on with this code looking at even 5 mins after I wrote it.
What a mess.
...but that lead me to a big discovery when I could not figure out where a glitch was coming from!

Turns out the PRU code IS RUNNING ON BOTH PRUs!

So now lets go back and see if just disabling one will fix the original code.

If not, then I will see you back in the branch shortly and we will pick up where we left off.
make a 250ns pulse and that pulse is rock solid. So we move on from here.
…ulse

on all 32 bits (4 bytes) of GPIO2. The pulse is nominally 250ns wide with
a nominal delay of 2000ns between pulses. After much banging I am not able
to get the pulse width to exceed 325ns so this is well well within the
limits for both T0H and TL. Next we will see what happens when we do multipule
GPIO banks like this...
about 350ns. Will keep running osciliscope traces for many hours to
find any outliers.

No pins on gpio bank #4 seem to be working, but my guess is that there
is something messed up in my device tree rather than a problem with the code.
pushing so I can see the diff on github.com and find the typo.
…0 bits. :/

Time to got back to earlier versions and see what changed.
bits that are too short. :/ Even with only a single channel. :/
no matter what I do.

We are writing repeatedly to the SET register to make the delay/

We are setting the PRU interconnect priority to 3 with a kernal driver.

We are not doing anything else. No T1H phase, no reading pixels from memory.

We will see if this holds up and if the kernel driver is even nessisary.
Multipule banks real live data.

Now optimize and make easily buildable installable.
Optimized to only load channels 0-7 from DDR once per row to save some time,
but not enough regs to avoid having to reload the other 16 bits for every
outgoing bit.

Currently running about 500kKz rather than theortical 800kHz.

Maybe try coping pixels from DDR to PRU on each row?
Works perfect, in fact slightly too fast the T1L is only 200ns.
This works fine but heck we are in here so mind as well make it to spec.
Now noticing that occasionally the T0L is getting streched again. :/

Need to track this down.
…s our

slot on the interconnect.

Right now only banks 0 and 1.
@Yona-Appletree
Copy link
Owner

Thanks for this work, @bigjosh. Apologies for not paying attention to this for a long while. Just starting to get back into things. Merged.

@Yona-Appletree Yona-Appletree merged commit 880fb0f into Yona-Appletree:master Mar 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants