-
Notifications
You must be signed in to change notification settings - Fork 4.8k
Open
Description
I would like to use deepspeed-inference with the flan-t5 model and I have the following code:
def get_model():
model_name = "google/flan-t5-small"
tensor_parallel = int(os.getenv("TENSOR_PARALLEL_DEGREE", "2"))
local_rank = int(os.getenv("LOCAL_RANK", "0"))
model = T5ForConditionalGeneration.from_pretrained(
model_name, device_map="auto"
)
tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-small")
# create the model
config = DeepSpeedInferenceConfig(
replace_with_kernel_inject=True,
dtype=model.dtype,
tensor_parallel=DeepSpeedTPConfig(
enabled=True, tp_size=tensor_parallel, mpu=None, tp_group=None
),
injection_policy={T5Block: ('SelfAttention.o', 'EncDecAttention.o', 'DenseReluDense.wo')}
)
model = deepspeed.init_inference(
model,
config=config,
)
generator = pipeline(
task="text2text-generation", model=model, tokenizer=tokenizer, device=local_rank # TODO: try text2text-generation instead
)
return generator
Basically I'm wondering if I can use the T5Block class in the injection_policy for the flan-t5 model as it's part of the same model family. I'm wondering how I can figure out whether this would or wouldn't work without just more or less blindly trying both out.
More generally, how can I find more information on the requirements of an injection_policy for models and verifying that the injection_policy actually makes sense?
I have read:
- https://deepspeed.readthedocs.io/en/latest/inference-init.html
- https://www.deepspeed.ai/tutorials/inference-tutorial/#initializing-for-inference
but wasn't able to find an answer to my question.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels