Lemonade Playbook for Local AI on CPU, GPU, and NPU by vgodsoe · Pull Request #75 · amd/playbooks

vgodsoe · 2026-02-12T20:14:15Z

Closes #53

sdevinenamd · 2026-03-03T06:34:09Z

For the 'Separation of concerns' benefit section, the sentence 'Model management, streaming, and fault tolerance are handled by the server so app developers can focus on their app' uses the word app twice, sounds repetitive 😅. Can we either remove the first occurrence of app or replace the last occurrence with 'application'?
Under 'Running Models on the NPU' section, the sentence 'NPU inference is not yet supported on Linux. You can still run all GGUF models on CPU or GPU. To request Linux NPU support, file an issue at [amd/ryzenai-sw] or [FastFlowLM]' Could you please clarify which issue we are referring to here? Is it specifically the request for NPU inference support on Linux? If so, how does it help for users to file an issue, my understanding is that the feature is still a WIP for our engineering team?
'What is happening under the hood? When you send a message, the NPU processes your entire prompt in parallel (this is called "prefill"). Then the iGPU takes over to generate the response one token at a time. This hybrid approach plays to each chip's strengths.' Since we are mentioning 'prefill', do you think it would help to include 'decode' too?
After covering the flashcards examples on the NPU, do you think we can include a performance comparison[TTFT & TPS] for a similar model between NPU & GPU/hybrid to highlight the powers of NPU?
The 'Next Steps' section includes icons, but I haven’t seen them in other playbooks. I think it would look good if we maintained consistency.

danielholanda · 2026-03-04T23:11:05Z

@sdevinenamd Would be great if you could add your review here before the end of the week

adamlam2-amd · 2026-03-05T07:07:09Z

Hey - good job everyone, the overall UI and language looks pretty good.

Agree with the sentiment that screenshots would be helpful.

I don't have any major comments for now - will probably have more once QA actually tests this as well.

One question I do have is:

it seems we are doing things through the command line. What is the difference between that and the Lemonade app? In the HaloBox, the actual lemonade app is installed both on Linux and Windows. Can you help me understand this? Thanks!

sdevinenamd · 2026-03-05T13:55:43Z

@sdevinenamd Would be great if you could add your review here before the end of the week

@danielholanda provided my comments already. Adding them here again:

For the 'Separation of concerns' benefit section, the sentence 'Model management, streaming, and fault tolerance are handled by the server so app developers can focus on their app' uses the word app twice, sounds repetitive 😅. Can we either remove the first occurrence of app or replace the last occurrence with 'application'?
Under 'Running Models on the NPU' section, the sentence 'NPU inference is not yet supported on Linux. You can still run all GGUF models on CPU or GPU. To request Linux NPU support, file an issue at [amd/ryzenai-sw] or [FastFlowLM]' Could you please clarify which issue we are referring to here? Is it specifically the request for NPU inference support on Linux? If so, how does it help for users to file an issue, my understanding is that the feature is still a WIP for our engineering team?
'What is happening under the hood? When you send a message, the NPU processes your entire prompt in parallel (this is called "prefill"). Then the iGPU takes over to generate the response one token at a time. This hybrid approach plays to each chip's strengths.' Since we are mentioning 'prefill', do you think it would help to include 'decode' too?
After covering the flashcards examples on the NPU, do you think we can include a performance comparison[TTFT & TPS] for a similar model between NPU & GPU/hybrid to highlight the powers of NPU?
The 'Next Steps' section includes icons, but I haven’t seen them in other playbooks. I think it would look good if we maintained consistency.

vgodsoe · 2026-03-06T20:32:38Z

For the 'Separation of concerns' benefit section, the sentence 'Model management, streaming, and fault tolerance are handled by the server so app developers can focus on their app' uses the word app twice, sounds repetitive 😅. Can we either remove the first occurrence of app or replace the last occurrence with 'application'?

Under 'Running Models on the NPU' section, the sentence 'NPU inference is not yet supported on Linux. You can still run all GGUF models on CPU or GPU. To request Linux NPU support, file an issue at [amd/ryzenai-sw] or [FastFlowLM]' Could you please clarify which issue we are referring to here? Is it specifically the request for NPU inference support on Linux? If so, how does it help for users to file an issue, my understanding is that the feature is still a WIP for our engineering team?

'What is happening under the hood? When you send a message, the NPU processes your entire prompt in parallel (this is called "prefill"). Then the iGPU takes over to generate the response one token at a time. This hybrid approach plays to each chip's strengths.' Since we are mentioning 'prefill', do you think it would help to include 'decode' too?

After covering the flashcards examples on the NPU, do you think we can include a performance comparison[TTFT & TPS] for a similar model between NPU & GPU/hybrid to highlight the powers of NPU?

The 'Next Steps' section includes icons, but I haven’t seen them in other playbooks. I think it would look good if we maintained consistency.

Removed repeated app
Removed the Windows only section got rid of the GitHub issue info
Added decode
I tend to avoid perf comparisons in docs like this. It's dependent on user hardware, software at the time of testing, etc.
Removed icons for consistency.

vgodsoe · 2026-03-06T20:44:54Z

Hey - good job everyone, the overall UI and language looks pretty good.

Agree with the sentiment that screenshots would be helpful.

I don't have any major comments for now - will probably have more once QA actually tests this as well.

One question I do have is:

it seems we are doing things through the command line. What is the difference between that and the Lemonade app? In the HaloBox, the actual lemonade app is installed both on Linux and Windows. Can you help me understand this? Thanks!

For each step, I included app instructions and command line instructions so the user learns the different ways to interact with Lemonade. Some people prefer an app, some CLI.

adamlam2-amd

lgtm to pass into qa testing.

danielholanda

Looks great. Thanks for addressing the comments!

First draft of Lemonade Playbook

09cc0fc

vgodsoe self-assigned this Feb 12, 2026

vgodsoe marked this pull request as draft February 12, 2026 20:14

Updated playbook

a2f21e1

vgodsoe marked this pull request as ready for review February 23, 2026 17:32

Merge branch 'main' into vgodsoe/lemonade-playbook

5fc16d7

vgodsoe requested a review from danielholanda February 23, 2026 18:56

danielholanda reviewed Mar 2, 2026

View reviewed changes

Comment thread playbooks/supplemental/lemonade-getting-started/README.md Outdated

danielholanda requested review from adamlam2-amd, jeremyfowers and sdevinenamd March 2, 2026 23:02