Skip to content

Latest commit

 

History

History
191 lines (136 loc) · 7.54 KB

compel.md

File metadata and controls

191 lines (136 loc) · 7.54 KB

Table of Contents

compel.compel

Compel Objects

class Compel()

__init__

def __init__(
        tokenizer: Union[CLIPTokenizer, List[CLIPTokenizer]],
        text_encoder: Union[CLIPTextModel, List[CLIPTextModel]],
        textual_inversion_manager: Optional[
            BaseTextualInversionManager] = None,
        dtype_for_device_getter: Callable[
            [torch.device], torch.dtype] = lambda device: torch.float32,
        truncate_long_prompts: bool = True,
        padding_attention_mask_value: int = 1,
        downweight_mode: DownweightMode = DownweightMode.MASK,
        returned_embeddings_type: ReturnedEmbeddingsType = ReturnedEmbeddingsType
    .LAST_HIDDEN_STATES_NORMALIZED,
        requires_pooled: Union[bool, List[bool]] = False,
        device: Optional[str] = None)

Initialize Compel. The tokenizer and text_encoder can be lifted directly from any DiffusionPipeline. For SDXL, you'll be using multiple Tokenizers and multiple Text Encoders - see https://github.com/damian0815/compel/pull/41 for details.

textual_inversion_manager: Optional instance to handle expanding multi-vector textual inversion tokens. dtype_for_device_getter: A Callable that returns a torch dtype for a given device. You probably don't need to use this. truncate_long_prompts: if True, truncate input prompts to 77 tokens long including beginning/end markers (default behaviour). If False, do not truncate, and instead assemble as many 77 token long chunks, each capped by beginning/end markers, as is necessary to encode the whole prompt. You will likely need to supply both positive and negative prompts in this case - use pad_conditioning_tensors_to_same_length to prevent having tensor length mismatch errors when passing the embeds on to your DiffusionPipeline for inference. padding_attention_mask_value: Value to write into the attention mask for padding tokens. Stable Diffusion needs 1. downweight_mode: Specifies whether downweighting should be applied by MASKing out the downweighted tokens (default) or REMOVEing them (legacy behaviour; messes up position embeddings of tokens following). returned_embeddings_type: controls how the embedding vectors are taken from the result of running the text encoder over the parsed prompt's text. For SD<=2.1, use LAST_HIDDEN_STATES_NORMALIZED, or PENULTIMATE_HIDDEN_STATES_NORMALIZED if you want to do "clip skip". For SDXL use PENULTIMATE_HIDDEN_STATES_NON_NORMALIZED. requires_pooled: for SDXL, append the pooled embeddings when returning conditioning tensors device: The torch device on which the tensors should be created. If a device is not specified, the device will be the same as that of the text_encoder at the moment when build_conditioning_tensor() is called.

make_conditioning_scheduler

def make_conditioning_scheduler(
        positive_prompt: str,
        negative_prompt: str = '') -> ConditioningScheduler

Return a ConditioningScheduler object that provides conditioning tensors for different diffusion steps (currently not fully implemented).

build_conditioning_tensor

def build_conditioning_tensor(text: str) -> torch.Tensor

Build a conditioning tensor by parsing the text for Compel syntax, constructing a Conjunction, and then building a conditioning tensor from that Conjunction.

__call__

@torch.no_grad()
def __call__(text: Union[str, List[str]]) -> torch.FloatTensor

Take a string or a list of strings and build conditioning tensors to match.

If multiple strings are passed, the resulting tensors will be padded until they have the same length.

Returns:

A tensor consisting of conditioning tensors for each of the passed-in strings, concatenated along dim 0.

parse_prompt_string

@classmethod
def parse_prompt_string(cls, prompt_string: str) -> Conjunction

Parse the given prompt string and return a structured Conjunction object that represents the prompt it contains.

describe_tokenization

def describe_tokenization(text: str) -> List[str]

For the given text, return a list of strings showing how it will be tokenized.

Arguments:

  • text: The text that is to be tokenized.

Returns:

A list of strings representing the output of the tokenizer. It's expected that the output list may be longer than the number of words in text because the tokenizer may split words to multiple tokens. Because of this, word boundaries are indicated in the output with </w> strings.

build_conditioning_tensor_for_conjunction

def build_conditioning_tensor_for_conjunction(
        conjunction: Conjunction) -> Tuple[torch.Tensor, dict]

Build a conditioning tensor for the given Conjunction object.

Returns:

A tuple of (conditioning tensor, options dict). The contents of the options dict depends on the prompt, at the moment it is only used for returning cross-attention control conditioning data (.swap()).

build_conditioning_tensor_for_prompt_object

def build_conditioning_tensor_for_prompt_object(
        prompt: Union[Blend, FlattenedPrompt]) -> Tuple[torch.Tensor, dict]

Build a conditioning tensor for the given prompt object (either a Blend or a FlattenedPrompt).

pad_conditioning_tensors_to_same_length

def pad_conditioning_tensors_to_same_length(
        conditionings: List[torch.Tensor]) -> List[torch.Tensor]

If truncate_long_prompts was set to False on initialization, or if your prompt includes a .and() operator, conditioning tensors do not have a fixed length. This is a problem when using a negative and a positive prompt to condition the diffusion process. This function pads any of the passed-in tensors, as necessary, to ensure they all have the same length, returning the padded tensors in the same order they are passed.

Example:

``` python
embeds = compel('("a cat playing in the forest", "an impressionist oil painting").and()')
negative_embeds = compel("ugly, deformed, distorted")
[embeds, negative_embeds] = compel.pad_conditioning_tensors_to_same_length([embeds, negative_embeds])
```