Skip to content

Lemonade Playbook for Local AI on CPU, GPU, and NPU#75

Merged
vgodsoe merged 8 commits intomainfrom
vgodsoe/lemonade-playbook
Mar 9, 2026
Merged

Lemonade Playbook for Local AI on CPU, GPU, and NPU#75
vgodsoe merged 8 commits intomainfrom
vgodsoe/lemonade-playbook

Conversation

@vgodsoe
Copy link
Copy Markdown
Collaborator

@vgodsoe vgodsoe commented Feb 12, 2026

Closes #53

@vgodsoe vgodsoe self-assigned this Feb 12, 2026
@vgodsoe vgodsoe marked this pull request as draft February 12, 2026 20:14
@vgodsoe vgodsoe marked this pull request as ready for review February 23, 2026 17:32
Comment thread playbooks/supplemental/lemonade-getting-started/README.md Outdated
@sdevinenamd
Copy link
Copy Markdown
Collaborator

  1. For the 'Separation of concerns' benefit section, the sentence 'Model management, streaming, and fault tolerance are handled by the server so app developers can focus on their app' uses the word app twice, sounds repetitive 😅. Can we either remove the first occurrence of app or replace the last occurrence with 'application'?
  2. Under 'Running Models on the NPU' section, the sentence 'NPU inference is not yet supported on Linux. You can still run all GGUF models on CPU or GPU. To request Linux NPU support, file an issue at [amd/ryzenai-sw] or [FastFlowLM]' Could you please clarify which issue we are referring to here? Is it specifically the request for NPU inference support on Linux? If so, how does it help for users to file an issue, my understanding is that the feature is still a WIP for our engineering team?
  3. 'What is happening under the hood? When you send a message, the NPU processes your entire prompt in parallel (this is called "prefill"). Then the iGPU takes over to generate the response one token at a time. This hybrid approach plays to each chip's strengths.' Since we are mentioning 'prefill', do you think it would help to include 'decode' too?
  4. After covering the flashcards examples on the NPU, do you think we can include a performance comparison[TTFT & TPS] for a similar model between NPU & GPU/hybrid to highlight the powers of NPU?
  5. The 'Next Steps' section includes icons, but I haven’t seen them in other playbooks. I think it would look good if we maintained consistency.

Comment thread playbooks/supplemental/lemonade-getting-started/README.md Outdated
Comment thread playbooks/supplemental/lemonade-getting-started/README.md Outdated
Comment thread playbooks/supplemental/lemonade-getting-started/README.md Outdated
Comment thread playbooks/supplemental/lemonade-getting-started/README.md
Comment thread playbooks/supplemental/lemonade-getting-started/README.md
Comment thread playbooks/supplemental/lemonade-getting-started/README.md Outdated
Comment thread playbooks/supplemental/lemonade-getting-started/README.md Outdated
Comment thread playbooks/supplemental/lemonade-getting-started/README.md Outdated
Comment thread playbooks/supplemental/lemonade-getting-started/README.md Outdated
Comment thread playbooks/supplemental/lemonade-getting-started/README.md
@danielholanda
Copy link
Copy Markdown
Collaborator

@sdevinenamd Would be great if you could add your review here before the end of the week

@adamlam2-amd
Copy link
Copy Markdown
Collaborator

Hey - good job everyone, the overall UI and language looks pretty good.

Agree with the sentiment that screenshots would be helpful.

I don't have any major comments for now - will probably have more once QA actually tests this as well.

One question I do have is:

  • it seems we are doing things through the command line. What is the difference between that and the Lemonade app? In the HaloBox, the actual lemonade app is installed both on Linux and Windows. Can you help me understand this? Thanks!

@sdevinenamd
Copy link
Copy Markdown
Collaborator

@sdevinenamd Would be great if you could add your review here before the end of the week

@danielholanda provided my comments already. Adding them here again:

  1. For the 'Separation of concerns' benefit section, the sentence 'Model management, streaming, and fault tolerance are handled by the server so app developers can focus on their app' uses the word app twice, sounds repetitive 😅. Can we either remove the first occurrence of app or replace the last occurrence with 'application'?
  2. Under 'Running Models on the NPU' section, the sentence 'NPU inference is not yet supported on Linux. You can still run all GGUF models on CPU or GPU. To request Linux NPU support, file an issue at [amd/ryzenai-sw] or [FastFlowLM]' Could you please clarify which issue we are referring to here? Is it specifically the request for NPU inference support on Linux? If so, how does it help for users to file an issue, my understanding is that the feature is still a WIP for our engineering team?
  3. 'What is happening under the hood? When you send a message, the NPU processes your entire prompt in parallel (this is called "prefill"). Then the iGPU takes over to generate the response one token at a time. This hybrid approach plays to each chip's strengths.' Since we are mentioning 'prefill', do you think it would help to include 'decode' too?
  4. After covering the flashcards examples on the NPU, do you think we can include a performance comparison[TTFT & TPS] for a similar model between NPU & GPU/hybrid to highlight the powers of NPU?
    The 'Next Steps' section includes icons, but I haven’t seen them in other playbooks. I think it would look good if we maintained consistency.

@vgodsoe
Copy link
Copy Markdown
Collaborator Author

vgodsoe commented Mar 6, 2026

  1. For the 'Separation of concerns' benefit section, the sentence 'Model management, streaming, and fault tolerance are handled by the server so app developers can focus on their app' uses the word app twice, sounds repetitive 😅. Can we either remove the first occurrence of app or replace the last occurrence with 'application'?
  2. Under 'Running Models on the NPU' section, the sentence 'NPU inference is not yet supported on Linux. You can still run all GGUF models on CPU or GPU. To request Linux NPU support, file an issue at [amd/ryzenai-sw] or [FastFlowLM]' Could you please clarify which issue we are referring to here? Is it specifically the request for NPU inference support on Linux? If so, how does it help for users to file an issue, my understanding is that the feature is still a WIP for our engineering team?
  3. 'What is happening under the hood? When you send a message, the NPU processes your entire prompt in parallel (this is called "prefill"). Then the iGPU takes over to generate the response one token at a time. This hybrid approach plays to each chip's strengths.' Since we are mentioning 'prefill', do you think it would help to include 'decode' too?
  4. After covering the flashcards examples on the NPU, do you think we can include a performance comparison[TTFT & TPS] for a similar model between NPU & GPU/hybrid to highlight the powers of NPU?
  5. The 'Next Steps' section includes icons, but I haven’t seen them in other playbooks. I think it would look good if we maintained consistency.
  1. Removed repeated app
  2. Removed the Windows only section got rid of the GitHub issue info
  3. Added decode
  4. I tend to avoid perf comparisons in docs like this. It's dependent on user hardware, software at the time of testing, etc.
  5. Removed icons for consistency.

@vgodsoe
Copy link
Copy Markdown
Collaborator Author

vgodsoe commented Mar 6, 2026

Hey - good job everyone, the overall UI and language looks pretty good.

Agree with the sentiment that screenshots would be helpful.

I don't have any major comments for now - will probably have more once QA actually tests this as well.

One question I do have is:

  • it seems we are doing things through the command line. What is the difference between that and the Lemonade app? In the HaloBox, the actual lemonade app is installed both on Linux and Windows. Can you help me understand this? Thanks!

For each step, I included app instructions and command line instructions so the user learns the different ways to interact with Lemonade. Some people prefer an app, some CLI.

Copy link
Copy Markdown
Collaborator

@adamlam2-amd adamlam2-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm to pass into qa testing.

Copy link
Copy Markdown
Collaborator

@danielholanda danielholanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Thanks for addressing the comments!

@vgodsoe vgodsoe merged commit 1336d3a into main Mar 9, 2026
5 checks passed
@vgodsoe vgodsoe deleted the vgodsoe/lemonade-playbook branch March 9, 2026 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Playbook] Using Lemonade Across CPU, GPU, and NPU

5 participants