Skip to content

GPU-accelerated Display (Vulkan) #93

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 128 commits into from
Oct 5, 2023
Merged

GPU-accelerated Display (Vulkan) #93

merged 128 commits into from
Oct 5, 2023

Conversation

osnr
Copy link
Collaborator

@osnr osnr commented Sep 30, 2023

Replace our software rendering with GPU rendering using shaders. All Display primitives (text, lines, images, filled shapes) should work as before.

Performance is extremely good on the NUC (60fps without any drops, even rotating multiple large images and text, compared to 20-30fps before). (less so on Pi 4, maybe moderately worse than the software renderer but not terrible)

You'll need to install Vulkan on your machine to get this to work; see the README and wiki.

Replaces

  • pi/Display.tcl + pi/lineclip.tcl + pi/rotate.tcl + virtual-programs/display.folk + vendor/font.tcl

with


Implements a new 'GPU FFI' which allows you to write vertex and fragment shaders to implement drawing primitives:

set fillTriangle [Gpu::pipeline {vec2 p0 vec2 p1 vec2 p2 vec4 color} {
            vec2 vertices[4] = vec2[4](p0, p1, p2, p0);
            return vertices[gl_VertexIndex];
} {
            return color;
}]
set circle [Gpu::pipeline {vec2 center float radius} {
        // HACK: just draws to full screen
        vec2 vertices[4] = vec2[4](vec2(0, 0), vec2(_resolution.x, 0), vec2(0, _resolution.y), _resolution);
        return vertices[gl_VertexIndex];
} {
        float dist = length(gl_FragCoord.xy - center) - radius;
        return dist < 0.0 ? vec4(1.0, 0, 0, 1.0) : vec4(0, 0, 0, 0);
}]

# (must be called in between Gpu::drawStart and Gpu::drawEnd)
Gpu::draw $circle {300 300} 20
Gpu::draw $fillTriangle {0 0} {100 400} {200 300} {1 0 0 1}

The first argument to Gpu::pipeline is a list of overall arguments for both shaders. These are accessible from both the vertex and fragment shader. You pass these in at draw time (they're push constants -- sort of like uniforms in GL, but limited in total size to 128 bytes). (The exception is fn-type arguments, which aren't actually passed in; they're meant to be names of Gpu::fn functions you've already made, which are looked up at call time from the surrounding Tcl environment and inlined into the shader. They just fall out and don't map to arguments in Gpu::draw)

The second argument to Gpu::pipeline is the source code of the main function of a vertex shader. It should return a vec2 vertex of a quad in Vulkan triangle-strip vertex order (should be topleft, topright, bottomleft, bottomright, like a Z, not counterclockwise) based on gl_VertexIndex (it will be called with vertex index 0, 1, 2, 3).
The returned vertex should be in screen coordinates, not in [0, 1]. You can access the builtin vec2 _resolution to get resolution of the screen. (In practice, so far we mostly use the vertex shader for clipping so we don't have to touch the whole display on every draw.)

The third (or fourth, if you make fragment shader fn arguments before it) argument to Gpu::pipeline is the source code of the main function of a fragment shader. You can access vec2 gl_FragCoord to get current pixel coordinates. It should return a vec4 color. You can access any of the overall arguments (including _resolution) from here as well.

See Gpu.tcl and display.folk for more examples. You can make sampler2D-type arguments if you want to pass an image in; you can make fn arguments to a fragment shader as third argument.

A pressing TODO is to break the drawing primitives up into separate virtual programs and expose all this better to the user (it's currently hard to use in a user program, because you need to run this code on the Display process).


Substantially rewrites calibrate.tcl to use the new drawing system. Also slightly friendlier print output.

Removes the live camera preview from pi/Camera.tcl -- we'll recommend the web image for now, unless we reimplement it.

Extends c.tcl to automatically generate Tcl-side struct getter functions like image_t data, image_t width, etc (ensemble commands under the struct type name, one for each field).

Extends c.tcl to allow some throwing of Tcl exceptions deeper in C (still WIP).

image

Extends the Folk interprocess heap in main.tcl to allow freeing of heap allocations (it now is built on top of dlmalloc).
On each allocation, the heap now also stores a random 64-bit 'version' for that allocation (which can then be remembered by the caller). You can query the heap with any address and it will try to give you the 'version' if there is an allocation containing that address. This is used to check for staleness of images that have been copied to the GPU (like camera slices). If the version mismatches the one we stored at previous copy-time, then we know we have to recopy that image to the GPU.
This implementation is pretty inefficient and unsafe (it walks a capped-256 array of all allocations) and we may want to replace it with an interval tree or something at some point. We will probably want to introduce more locking, too.


Fixes #22.

Please test this if you can (and help with documentation if you can); I'd like to merge it by Monday if possible and have at least 1-2 people sign off on it. I know @charlesetc has been running it for a few days without problems.

osnr added 30 commits July 31, 2023 10:43
First step toward rendering multiple objects (using different SDF shaders)
(I want to draw multiple things in a row without having them each
clear the screen.)
Remove csubst now that it's part of base C lib
Also fix linebreaks in csubst
(next, update images, which is harder)
Now we can standardize on struct representation of C structs in Tcl
instead of dict representation... hopefully simpler code too in the end.
WIP: Start on images/sampler2D.

Improve char array support in C. Actually put uboFields in Pipeline
struct so they don't shimmer out (because we now run the Pipeline data
back into C to set up the image right before draw time).
@l3gacyb3ta
Copy link
Collaborator

Review time!

Copy link
Collaborator

@l3gacyb3ta l3gacyb3ta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

99% amazing!

README.md Outdated
$ CFLAGS="-O2 -march=armv8-a+crc+simd -mtune=cortex-a72" CXXFLAGS="-O2 -march=armv8-a+crc+simd -mtune=cortex-a72" meson -Dglx=disabled -Dplatforms= -Dllvm=disabled -Dvulkan-drivers=broadcom -Dgallium-drivers=v3d,vc4,kmsro -Dbuildtype=release ..

# AMD (radeonsi), including Beelink SER5
$ meson -Dglx=disabled -Dplatforms= -Ddri-drivers='' -Dvulkan-drivers=amd -Dgallium-drivers=radeonsi -Dbuildtype=release ..
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-Ddri-drivers='' is unnessary and causes it to fail


1. See [notes](https://folk.computer/notes/vulkan) and [Naveen's
notes](https://gist.github.com/nmichaud/1c08821833449bdd3ac70dcb28486539).
1. `sudo adduser folk video` & `sudo adduser folk input` (?) & log out and log back in (re-ssh)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe put the video and input groups before the vulkan stuff? It tripped me up when the folk user couldn't use the video out, causing vkcube to fail

```

Go to http://whatever.local:4273/frame-image/ to see the camera's
current field of view. Reposition your camera to cover your table.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh this is awesome! Way better than trying to use the projection itself.

calibrate.tcl Outdated
int captureNum = 0;
uint8_t* delayThenCameraCapture(Tcl_Interp* interp, const char* description) {
usleep(100000);
usleep(500000);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this longer?

variable rtypes {
int { expr {{ $robj = Tcl_NewIntObj($rvalue); }}}
int32_t { expr {{ $robj = Tcl_NewIntObj($rvalue); }}}
double { expr {{ $robj = Tcl_NewDoubleObj($rvalue); }}}
float { expr {{ $robj = Tcl_NewDoubleObj($rvalue); }}}
char { expr {{ $robj = Tcl_ObjPrintf("%c", $rvalue); }}}
bool { expr {{ $robj = Tcl_NewIntObj($rvalue); }}}
uint16_t { expr {{ $robj = Tcl_NewIntObj($rvalue); }}}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feels like uint8_t is missing?

pi/Gpu.tcl Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all is flying over my head, but it seems great!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe write up how to generate these?

namespace eval ::Display {
variable WIDTH
variable HEIGHT
variable LAYER 0
regexp {mode "(\d+)x(\d+)"} [exec fbset] -> WIDTH HEIGHT
if {$::isLaptop} {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idk if this is relevant to this code, but the laptop stuff still doesn't work on my end

@l3gacyb3ta l3gacyb3ta self-requested a review October 4, 2023 18:57
Copy link
Collaborator

@l3gacyb3ta l3gacyb3ta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good to me, but the calibration being broken is wierd

@osnr
Copy link
Collaborator Author

osnr commented Oct 4, 2023

Seems good to me, but the calibration being broken is wierd

It's worked for me -- can you post some of the jpegs in the folk folder that get emitted during calibration? I wonder whether it's a timing issue or a repeatability/optics issue. It's definitely janky though

@osnr
Copy link
Collaborator Author

osnr commented Oct 4, 2023

The sleep (and extra webcam framegrabs) during calibration is because it takes some time for the projected stripes to get to the real world -> be visible from the webcam (and to appear in the received buffer), but they're kind of guesses...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Vulkan renderer
3 participants