Local LLMs for TurboWarp #2486

Brackets-Coder · 2026-05-16T03:09:20Z

Brackets-Coder
May 16, 2026
Collaborator

Introduction

Lots of pull requests have attmepted to add large language models to TurboWarp. There's obviously a high demand for it, but many of these have been disputed or closed due to reliance on external APIs (especially with authentication keys stored inside projects), or, for self-hosted APIs, the concern of relying on someone else to always keep a free server available.

The clear solution to both of these problems is to bring the AI to the device, where it runs offline, all the time—never stops working, and 100% secure. While this hasn't been practical in the past due to hardware requirements, Gemma 4 E4B just came out and is a fairly small but powerful model designed to provide high-level reasoning to mobile devices. I'm not certain, but I think it may be able to fit within browser memory limitations.

Proposal

We could use something like WebLLM or compile llama.cpp to WASM + WebGPU for a proprietary implementation, then get everything running in JS. My concern and hurdle to overcome is where to store the model weights, as the smallest Gemma 4 model is 7 GB. Obviously, this is too large to store directly in the extension, and it woud be impractical to request users to install this themselves. When quantized or compressed though it should be able to fit in much smaller amounts of memory. If we can figure out this, I think it's a feasible idea.

Attempt

Before coming up with the idea to run local LLMs in the browser, I attempted to develop a small Ollama extension for TW that sent a fetch request to localhost:11434. This worked for a plugin I'm working on for another application, but I'm seeming to run into CORS proxy limitations in the browser, and this would require users to install Ollama and the appropriate open models (the former of which is only available for desktop devices). Obviously, this prevented me from looking into it further

I just wanted to get the community's thoughts on this. It seems feasible conceptually, but I don't know if it's practical or logistical to pursue.

GarboMuffin · 2026-05-16T06:53:51Z

GarboMuffin
May 16, 2026
Maintainer

not planning anything. don't think there's a reasonable way to run a useful LLM on typical end user devices ($200 chromebooks), and i don't intend to fund anything at this time

1 reply

Brackets-Coder May 16, 2026
Collaborator Author

Okay, just thought it would be worth considering, even if only for a moment

Brackets-Coder · 2026-05-16T14:26:45Z

Brackets-Coder
May 16, 2026
Collaborator Author

@CubesterYT if you're interested it might be possible for NitroBolt but I don't know that it would be practical

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Local LLMs for TurboWarp #2486

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Local LLMs for TurboWarp #2486

Uh oh!

Uh oh!

Brackets-Coder May 16, 2026 Collaborator

Introduction

Proposal

Attempt

Replies: 2 comments · 1 reply

Uh oh!

GarboMuffin May 16, 2026 Maintainer

Uh oh!

Brackets-Coder May 16, 2026 Collaborator Author

Uh oh!

Brackets-Coder May 16, 2026 Collaborator Author

Brackets-Coder
May 16, 2026
Collaborator

Replies: 2 comments 1 reply

GarboMuffin
May 16, 2026
Maintainer

Brackets-Coder May 16, 2026
Collaborator Author

Brackets-Coder
May 16, 2026
Collaborator Author