Skip to content

[Bug]: Model loading repeatedly fails for large models in RunInference #25286

@damccorm

Description

@damccorm

What happened?

Right now in RunInference, when loading large models from remote locations (e.g. gcs), we timeout our request and eventually kill the work item/try a new one. We should have some mechanism for loading large remote models without timing out.

Note that the recommended path for large models will mostly be building a custom container, so this isn't a huge deal, but that doesn't play well with model updates or pulling from model registries.

You can reproduce this by trying to load the t5-11b model remotely instead of from a custom container (https://beam.apache.org/documentation/ml/large-language-modeling/)

Issue Priority

Priority: 3 (minor)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions