Skip to content

[QUESTION] How to figure out correct injection_policy for Flan-T5 #2689

@ivo-1

Description

@ivo-1

I would like to use deepspeed-inference with the flan-t5 model and I have the following code:

def get_model():
    model_name = "google/flan-t5-small"
    tensor_parallel = int(os.getenv("TENSOR_PARALLEL_DEGREE", "2"))
    local_rank = int(os.getenv("LOCAL_RANK", "0"))
    model = T5ForConditionalGeneration.from_pretrained(
        model_name, device_map="auto"
    )
    tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-small")

    # create the model
    config = DeepSpeedInferenceConfig(
        replace_with_kernel_inject=True,
        dtype=model.dtype,
        tensor_parallel=DeepSpeedTPConfig(
            enabled=True, tp_size=tensor_parallel, mpu=None, tp_group=None
        ),
        injection_policy={T5Block: ('SelfAttention.o', 'EncDecAttention.o', 'DenseReluDense.wo')}
    )

    model = deepspeed.init_inference(
        model,
        config=config,
    )
    generator = pipeline(
        task="text2text-generation", model=model, tokenizer=tokenizer, device=local_rank # TODO: try text2text-generation instead
    )
    return generator

Basically I'm wondering if I can use the T5Block class in the injection_policy for the flan-t5 model as it's part of the same model family. I'm wondering how I can figure out whether this would or wouldn't work without just more or less blindly trying both out.

More generally, how can I find more information on the requirements of an injection_policy for models and verifying that the injection_policy actually makes sense?

I have read:

but wasn't able to find an answer to my question.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions