refactor: v1.9.0 + scheduler, idle RAM management, Observation rewrite #88
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Goals
@Model
accessors for PromptViewWith the existing setup, implementing new interface features from the A1111 backend requires a number of moving parts, along with a handful of modifications in different parts of the codebase. With this change, we look to merge all of these parts into one model. This will also bring us closer to the goal of direct API → SwiftUI translation for plugin components.
Observation framework (see Apple docs)
ObservableObject
,@Published
, ...A1111 v1.9.0 compatibility (see stable-diffusion-webui releases)
.automatic
(Optional) Idle RAM management for powerful hardware
--api-server-stop
in A1111 CLI args)The current setup for all stable diffusion clients is as such: load the current model into RAM, use said model to generate existing prompt, leave model (and weights, prompt-dependencies, etc.) in memory until either: it is overridden by another model, or the process is shut down. This is beneficial for lower-end hardware, as it removes the need to reload model on each new prompt generation, saving anywhere from 30-90s in between prompts.
However, for higher-end hardware (especially the M3 Pro/Max), the time it takes to load an SDXL model into RAM usually takes a maximum of 2-3s. As a result, these clients will reserve 30-50GB of active memory for as long as the process is running—all to save this particular user a second or two of time (2-3s in worst case scenarios). Furthermore, you can restart the Python process, and load the previous model into RAM, which will only result in ~5GB of idle memory usage and add a measly 1-2s of time onto each generation queue.
As such, I propose two separate strategies that I plan to implement (as options) within SwiftDiffusion:
default
(current)restartWithLoad
startOnQueue
After a generation queue has finished successfully:
restartWithLoad
: end Py process, start Py process, load last model into RAMstartOnQueue
: end Py process, start Py process with no loaded modelOn new generation queue:
restartWithLoad
: make generation requeststartOnQueue
: load model into RAM, make generation requestOther Planned Improvements
stable-diffusion-webui
release (exclude repository, venv)