[Bug]: Model loading repeatedly fails for large models in RunInference

### What happened?

Right now in RunInference, when loading large models from remote locations (e.g. gcs), we timeout our request and eventually kill the work item/try a new one. We should have some mechanism for loading large remote models without timing out.

Note that the recommended path for large models will mostly be building a custom container, so this isn't a _huge_ deal, but that doesn't play well with model updates or pulling from model registries.

You can reproduce this by trying to load the t5-11b model remotely instead of from a custom container (https://beam.apache.org/documentation/ml/large-language-modeling/)

### Issue Priority

Priority: 3 (minor)

### Issue Components

- [X] Component: Python SDK
- [ ] Component: Java SDK
- [ ] Component: Go SDK
- [ ] Component: Typescript SDK
- [ ] Component: IO connector
- [ ] Component: Beam examples
- [ ] Component: Beam playground
- [ ] Component: Beam katas
- [ ] Component: Website
- [ ] Component: Spark Runner
- [ ] Component: Flink Runner
- [ ] Component: Samza Runner
- [ ] Component: Twister2 Runner
- [ ] Component: Hazelcast Jet Runner
- [X] Component: Google Cloud Dataflow Runner

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: Model loading repeatedly fails for large models in RunInference #25286

What happened?

Issue Priority

Issue Components

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: Model loading repeatedly fails for large models in RunInference #25286

Description

What happened?

Issue Priority

Issue Components

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions