-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Open
Description
What happened?
Right now in RunInference, when loading large models from remote locations (e.g. gcs), we timeout our request and eventually kill the work item/try a new one. We should have some mechanism for loading large remote models without timing out.
Note that the recommended path for large models will mostly be building a custom container, so this isn't a huge deal, but that doesn't play well with model updates or pulling from model registries.
You can reproduce this by trying to load the t5-11b model remotely instead of from a custom container (https://beam.apache.org/documentation/ml/large-language-modeling/)
Issue Priority
Priority: 3 (minor)
Issue Components
- Component: Python SDK
- Component: Java SDK
- Component: Go SDK
- Component: Typescript SDK
- Component: IO connector
- Component: Beam examples
- Component: Beam playground
- Component: Beam katas
- Component: Website
- Component: Spark Runner
- Component: Flink Runner
- Component: Samza Runner
- Component: Twister2 Runner
- Component: Hazelcast Jet Runner
- Component: Google Cloud Dataflow Runner