Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'torch._C.PyTorchFileReader' object has no attribute'seek' #994

Closed
deadsoul44 opened this issue Jun 10, 2021 · 13 comments
Closed

'torch._C.PyTorchFileReader' object has no attribute'seek' #994

deadsoul44 opened this issue Jun 10, 2021 · 13 comments

Comments

@deadsoul44
Copy link

deadsoul44 commented Jun 10, 2021

Hello,

I am using the following model for sentence similarity

https://huggingface.co/sentence-transformers/stsb-xlm-r-multilingual/tree/main

word_embedding_model = models.Transformer(bert_model_dir)  # , max_seq_length=512
pooling_model = models.Pooling(word_embedding_model.get_word_embedding_dimension())
model = SentenceTransformer(modules=[word_embedding_model, pooling_model], device=device_str)

But, I get this error:

Traceback (most recent call last):

  File "/home/work/anaconda/lib/python3.6/site-packages/torch/serialization.py", line 306, in _check_seekable

    f.seek(f.tell())

AttributeError:'torch._C.PyTorchFileReader' object has no attribute'seek'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

  File "/home/work/anaconda/lib/python3.6/site-packages/transformers/modeling_utils.py", line 1205, in from_pretrained

    state_dict = torch.load(resolved_archive_file, map_location="cpu")

  File "/home/work/anaconda/lib/python3.6/site-packages/torch/serialization.py", line 584, in load

    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)

  File "/home/work/anaconda/lib/python3.6/site-packages/moxing/framework/file/file_io_patch.py", line 200, in _load

    _check_seekable(f)

  File "/home/work/anaconda/lib/python3.6/site-packages/torch/serialization.py", line 309, in _check_seekable

    raise_err_msg(["seek", "tell"], e)

  File "/home/work/anaconda/lib/python3.6/site-packages/torch/serialization.py", line 302, in raise_err_msg

    raise type(e)(msg)

AttributeError:'torch._C.PyTorchFileReader' object has no attribute'seek'. You can only torch.load from a file that is seekable. Please pre-load the data into a buffer like io.BytesIO and try to load from it instead .

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

  File "code/similarity.py", line 118, in <module>

    word_embedding_model = models.Transformer(bert_model_dir) #, max_seq_length=512

  File "/home/work/anaconda/lib/python3.6/site-packages/sentence_transformers/models/Transformer.py", line 30, in __init__

    self.auto_model = AutoModel.from_pretrained(model_name_or_path, config=config, cache_dir=cache_dir)

  File "/home/work/anaconda/lib/python3.6/site-packages/transformers/models/auto/auto_factory.py", line 381, in from_pretrained

    return model_class.from_pretrained(pretrained_model_name_or_path, *model_args, config=config, **kwargs)

  File "/home/work/anaconda/lib/python3.6/site-packages/transformers/modeling_utils.py", line 1208, in from_pretrained

    f"Unable to load weights from pytorch checkpoint file for'{pretrained_model_name_or_path}' "

OSError: Unable to load weights from pytorch checkpoint file for'/home/work/user-job-dir/input/pretrained_models/stsb-xlm-r-multilingual/' at'/home/work/user-job-dir/input /pretrained_models/stsb-xlm-r-multilingual/pytorch_model.bin'If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True. 

I checked on web but could not find any solution. What could be the problem? Thank you.

@nreimers
Copy link
Member

Which PyTorch version are you using? Have you tried to update it to some recent version (>= 1.6.0)?

@deadsoul44
Copy link
Author

I am using 1.6.0

@nreimers
Copy link
Member

Do other models work?

@deadsoul44
Copy link
Author

I will try this one:

https://huggingface.co/sentence-transformers/paraphrase-xlm-r-multilingual-v1/tree/main

I am renaming downloaded model zip file as pytorch_model.bin

@nreimers
Copy link
Member

Hi,
not sure what you are doing. You can either provide the model name directly, and the code will download the model, or you must download the zip file from here:
https://sbert.net/models/

And unzip it by yourself. No renaming of files.

@deadsoul44
Copy link
Author

deadsoul44 commented Jun 10, 2021

I get this when no renaming:

OSError: Error no file named ['pytorch_model.bin', 'tf_model.h5', 'model.ckpt.index', 'flax_model.msgpack'] found in directory /home/work/user-job-dir/input/pretrained_models/paraphrase-xlm-r-multilingual-v1/ or from_tfandfrom_flax set to False.

Unzipping extracts an archive folder and there are these files inside:
image

I will try to download from your link.

@nreimers
Copy link
Member

What do you download?

As mentioned, you must use the zip files from here:
https://sbert.net/models/

It should have a 0_Transformer, 1_Pooling and several json files includes.

Also, you can just load it with:
model = SentenceTransformer('path/to/unzipped/folder')

@deadsoul44
Copy link
Author

Previously, I downloaded from huggingface. Now, I downloaded from sbert.net

I still get the same error:

I0611 04:23:01.643567 140559902119680 SentenceTransformer.py:39] Load pretrained SentenceTransformer: /home/work/user-job-dir/input/pretrained_models/paraphrase-xlm-r-multilingual-v1/

I0611 04:23:01.644223 140559902119680 SentenceTransformer.py:100] Load SentenceTransformer from folder: /home/work/user-job-dir/input/pretrained_models/paraphrase-xlm-r-multilingual-v1/

Traceback (most recent call last):

  File "/home/work/anaconda/lib/python3.6/site-packages/torch/serialization.py", line 306, in _check_seekable

    f.seek(f.tell())

AttributeError:'torch._C.PyTorchFileReader' object has no attribute'seek'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

  File "/home/work/anaconda/lib/python3.6/site-packages/transformers/modeling_utils.py", line 1205, in from_pretrained

    state_dict = torch.load(resolved_archive_file, map_location="cpu")

  File "/home/work/anaconda/lib/python3.6/site-packages/torch/serialization.py", line 584, in load

    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)

  File "/home/work/anaconda/lib/python3.6/site-packages/moxing/framework/file/file_io_patch.py", line 200, in _load

    _check_seekable(f)

  File "/home/work/anaconda/lib/python3.6/site-packages/torch/serialization.py", line 309, in _check_seekable

    raise_err_msg(["seek", "tell"], e)

  File "/home/work/anaconda/lib/python3.6/site-packages/torch/serialization.py", line 302, in raise_err_msg

    raise type(e)(msg)

AttributeError:'torch._C.PyTorchFileReader' object has no attribute'seek'. You can only torch.load from a file that is seekable. Please pre-load the data into a buffer like io.BytesIO and try to load from it instead .

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

  File "code/similarity.py", line 128, in <module>

    model = SentenceTransformer(bert_model_dir, device=device_str)

  File "/home/work/anaconda/lib/python3.6/site-packages/sentence_transformers/SentenceTransformer.py", line 114, in __init__

    module = module_class.load(os.path.join(model_path, module_config['path']))

  File "/home/work/anaconda/lib/python3.6/site-packages/sentence_transformers/models/Transformer.py", line 105, in load

    return Transformer(model_name_or_path=input_path, **config)

  File "/home/work/anaconda/lib/python3.6/site-packages/sentence_transformers/models/Transformer.py", line 30, in __init__

    self.auto_model = AutoModel.from_pretrained(model_name_or_path, config=config, cache_dir=cache_dir)

  File "/home/work/anaconda/lib/python3.6/site-packages/transformers/models/auto/auto_factory.py", line 381, in from_pretrained

    return model_class.from_pretrained(pretrained_model_name_or_path, *model_args, config=config, **kwargs)

  File "/home/work/anaconda/lib/python3.6/site-packages/transformers/modeling_utils.py", line 1208, in from_pretrained

    f"Unable to load weights from pytorch checkpoint file for'{pretrained_model_name_or_path}' "

OSError: Unable to load weights from pytorch checkpoint file for'/home/work/user-job-dir/input/pretrained_models/paraphrase-xlm-r-multilingual-v1/0_Transformer' at'/home/work/user-job- dir/input/pretrained_models/paraphrase-xlm-r-multilingual-v1/0_Transformer/pytorch_model.bin'If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True. 

I tried it now on my local machine also and it is working fine.
But, I get this error on company cloud although the same package versions are installed.

@nreimers
Copy link
Member

Ah ok, that is the problem.

The torch load function requires that the file system supports seek (e.g. see https://www.tutorialspoint.com/python/file_seek.htm)

Apparently, your company cloud file system does not support this elementary file system operation. Hence, torch.load cannot load any file.

Real solution would be to get a better company cloud with a file system that support basic I/O operations.

In PyTorch 1.6, they changed the file format when models are saved. Maybe on your company cloud the old file format works? You can try it with this model:
https://public.ukp.informatik.tu-darmstadt.de/reimers/sentence-transformers/v0.2/bert-base-nli-cls-token.zip

It has still the pre PyTorch 1.6. file format.

@deadsoul44
Copy link
Author

It seems to be working. Can this model be used for multilingual similarity calculation? Is there any alternative? Thank you.

@nreimers
Copy link
Member

Sadly not.

But you can convert from the new torch format to the old format like this:

import torch
model = torch.load('pytorch_model.bin')
torch.save('pytorch_model.bin', model, _use_new_zipfile_serialization=False)

Run this on your local machine and then you can push it to your cloud

@deadsoul44
Copy link
Author

I get this error when saving the model.

Traceback (most recent call last):
  File "lib\site-packages\torch\serialization.py", line 366, in save
    _legacy_save(obj, opened_file, pickle_module, pickle_protocol)
  File "lib\site-packages\torch\serialization.py", line 426, in _legacy_save
    pickle_module.dump(MAGIC_NUMBER, f, protocol=pickle_protocol)
TypeError: file must have a 'write' attribute

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "code/similarity.py", line 85, in <module>
    torch.save(
  File "lib\site-packages\torch\serialization.py", line 366, in save
    _legacy_save(obj, opened_file, pickle_module, pickle_protocol)
  File "lib\site-packages\torch\serialization.py", line 224, in __exit__
    self.file_like.flush()
AttributeError: 'collections.OrderedDict' object has no attribute 'flush'

Process finished with exit code 1

@deadsoul44
Copy link
Author

torch.save(model, 'pytorch_model.bin', _use_new_zipfile_serialization=False)

This is working. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants