Skip to content

Weird model quadruplication when deploying pytorch inference model #2465

@johann-petrak

Description

@johann-petrak

I am testing deploying a pytorch inference model trained elsewhere using the SageMaker API, roughly like this, trying to test everything locally on my own machine for now:

sess = LocalSession()
sess.config = {'local': {'local_code': True}}
model = PyTorchModel(
    model_data="model1.tar.gz",
    role=MYROLE, 
    framework_version="1.8.1",  
    py_version="py3",
    entry_point="inference.py",
    )
model.sagemaker_session = sess
predictor = model.deploy(
    instance_type="local", 
    initial_instance_count=1,
    deserializer=JSONDeserializer(),
    serializer=JSONSerializer(),
)

The model.deply call creates two temporary directories: one for the docker-compose command and config file, the other for mounting in the container as /opt/ml/model

However there are some thing happening which I find odd and incorrect and especially annoysagemaker/ts/models/model.maring for large models:

  • the tmp file that gets mounted contains the content of the model1.tar.gz file I proved PLUS the original model1.tar.gz file
  • when the container initializes it internally uses the torch-model-archiver to create the file /sagemaker/ts/models/model.mar inside the container which contains the content of model1.tar.gz PLUS AGAIN the model1.tar.gz file itself.
  • in summary, the model (which may be big), is present 4 times on the FS of instance running this:
    • in the /tmp file that is mounted, twice, once as the tar.gz file and once extracted
    • in the container FS inside a zip file (the mode.mar file) which contains the files, plus the original tar.gz file

UPDATE: actually, it seems the model is present also in

  • /home/model-server/tmp/model
  • something like /home/model-server/tmp/models/1547c25a74854a1aaaf72751c0febd02/
    • this apparently is the only path actually needed when i have my own model loading function as this is the path I get
      and in both cases, the tar.gz is present together with the unarchived files of the tar.gz

Surely that cannot be intentional, why would all that duplication be needed?

Since I am using my own model loading code, I do not understand why packing the model up into a model.mar file is even needed? Would I not just load it from the mounted /opt/ml/model directory ?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions