Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert_marian_tatoeba_to_pytorch FileNotFoundError #28

Open
Lyaaaaaaaaaaaaaaa opened this issue Oct 30, 2022 · 5 comments
Open

convert_marian_tatoeba_to_pytorch FileNotFoundError #28

Lyaaaaaaaaaaaaaaa opened this issue Oct 30, 2022 · 5 comments

Comments

@Lyaaaaaaaaaaaaaaa
Copy link

Hello, I'm trying to convert more models to the pytorch format, but I'm getting an error.

I'm running the convert_marian_tatoeba_to_pytorch script, but it seems like it's looking for a readme.md file in the models/results folder, yet there is none.

Traceback (most recent call last):
  File "Tatoeba-Challenge/scripts/convert_marian_tatoeba_to_pytorch.py", line 1282, in <module>
    resolver = TatoebaConverter(save_dir=args.save_dir)
  File "Tatoeba-Challenge/scripts/convert_marian_tatoeba_to_pytorch.py", line 58, in __init__
    reg = self.make_tatoeba_registry()
  File "Tatoeba-Challenge/scripts/convert_marian_tatoeba_to_pytorch.py", line 264, in make_tatoeba_registry
    lns = list(open(p / "README.md").readlines())
    
FileNotFoundError: [Errno 2] No such file or directory: 'Tatoeba-Challenge/models/results/README.md'
@jorgtied
Copy link
Member

@Lyaaaaaaaaaaaaaaa
Copy link
Author

Hello, I will try this one and update you.

@Lyaaaaaaaaaaaaaaa
Copy link
Author

Hello, sorry for the long delay.
I ran your script and got another error. TypeError: expected str, bytes or os.PathLike object, not NoneType

The logs:

python3 model_converter/convert_to_pytorch.py --model-path opus-en-pt --dest-path converted/opus-en-pt

added 1 tokens to vocab
Traceback (most recent call last):
  File "/home/path_to_project/model_converter/convert_to_pytorch.py", line 28, in <module>
    convert(Path(args.model_path), Path(args.dest_path))
  File "/home/path_to_env/lib/python3.9/site-packages/transformers/models/marian/convert_marian_to_pytorch.py", line 663, in convert
    opus_state = OpusState(source_dir)
  File "/home/path_to_env/lib/python3.9/site-packages/transformers/models/marian/convert_marian_to_pytorch.py", line 494, in __init__
    self.tokenizer = self.load_tokenizer()
  File "/home/path_to_env/lib/python3.9/site-packages/transformers/models/marian/convert_marian_to_pytorch.py", line 593, in load_tokenizer
    return MarianTokenizer.from_pretrained(str(self.source_dir))
  File "/home/path_to_env/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1804, in from_pretrained
    return cls._from_pretrained(
  File "/home/path_to_env/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1958, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/home/path_to_env/lib/python3.9/site-packages/transformers/models/marian/tokenization_marian.py", line 158, in __init__
    assert Path(source_spm).exists(), f"cannot find spm source {source_spm}"
  File "/home/path_to_env/lib/python3.9/pathlib.py", line 1082, in __new__
    self = cls._from_parts(args, init=False)
  File "/home/path_to_env/lib/python3.9/pathlib.py", line 707, in _from_parts
    drv, root, parts = self._parse_args(args)
  File "/home/path_to_env/lib/python3.9/pathlib.py", line 691, in _parse_args
    a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType

Additional information:

  • I'm running the script within a miniconda environment (Miniconda3-py39_23.1.0-1-Linux-x86_64.sh had been used to create the environment)
    Here are the environment packages
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                  2_kmp_llvm    conda-forge
accelerate                0.18.0             pyhd8ed1ab_0    conda-forge
aiohttp                   3.8.4            py39h72bdee0_0    conda-forge
aiosignal                 1.3.1              pyhd8ed1ab_0    conda-forge
arrow-cpp                 11.0.0          ha770c72_13_cpu    conda-forge
async-timeout             4.0.2              pyhd8ed1ab_0    conda-forge
attrs                     22.2.0             pyh71513ae_0    conda-forge
aws-c-auth                0.6.26               hf365957_1    conda-forge
aws-c-cal                 0.5.21               h48707d8_2    conda-forge
aws-c-common              0.8.14               h0b41bf4_0    conda-forge
aws-c-compression         0.2.16               h03acc5a_5    conda-forge
aws-c-event-stream        0.2.20               h00877a2_4    conda-forge
aws-c-http                0.7.6                hf342b9f_0    conda-forge
aws-c-io                  0.13.19              h5b20300_3    conda-forge
aws-c-mqtt                0.8.6               hc4349f7_12    conda-forge
aws-c-s3                  0.2.7                h909e904_1    conda-forge
aws-c-sdkutils            0.1.8                h03acc5a_0    conda-forge
aws-checksums             0.1.14               h03acc5a_5    conda-forge
aws-crt-cpp               0.19.8              hf7fbfca_12    conda-forge
aws-sdk-cpp               1.10.57              h17c43bd_8    conda-forge
brotlipy                  0.7.0           py39hb9d737c_1005    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
c-ares                    1.18.1               h7f98852_0    conda-forge
ca-certificates           2022.12.7            ha878542_0    conda-forge
certifi                   2022.12.7          pyhd8ed1ab_0    conda-forge
cffi                      1.15.1           py39he91dace_3    conda-forge
charset-normalizer        2.1.1              pyhd8ed1ab_0    conda-forge
click                     8.1.3           unix_pyhd8ed1ab_2    conda-forge
colorama                  0.4.6              pyhd8ed1ab_0    conda-forge
cryptography              40.0.1           py39h079d5ae_0    conda-forge
cudatoolkit               11.8.0              h37601d7_11    conda-forge
cudnn                     8.4.1.50             hed8a83a_0    conda-forge
dataclasses               0.8                pyhc8e2a94_3    conda-forge
datasets                  2.11.0             pyhd8ed1ab_0    conda-forge
dill                      0.3.6              pyhd8ed1ab_1    conda-forge
filelock                  3.10.7             pyhd8ed1ab_0    conda-forge
frozenlist                1.3.3            py39hb9d737c_0    conda-forge
fsspec                    2023.3.0           pyhd8ed1ab_1    conda-forge
gflags                    2.2.2             he1b5a44_1004    conda-forge
glog                      0.6.0                h6f12383_0    conda-forge
huggingface_hub           0.13.3             pyhd8ed1ab_0    conda-forge
icu                       72.1                 hcb278e6_0    conda-forge
idna                      3.4                pyhd8ed1ab_0    conda-forge
importlib-metadata        6.1.0              pyha770c72_0    conda-forge
importlib_metadata        6.1.0                hd8ed1ab_0    conda-forge
joblib                    1.2.0              pyhd8ed1ab_0    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
krb5                      1.20.1               h81ceb04_0    conda-forge
ld_impl_linux-64          2.40                 h41732ed_0    conda-forge
libabseil                 20230125.0      cxx17_hcb278e6_1    conda-forge
libarrow                  11.0.0          h93537a5_13_cpu    conda-forge
libblas                   3.9.0           16_linux64_openblas    conda-forge
libbrotlicommon           1.0.9                h166bdaf_8    conda-forge
libbrotlidec              1.0.9                h166bdaf_8    conda-forge
libbrotlienc              1.0.9                h166bdaf_8    conda-forge
libcblas                  3.9.0           16_linux64_openblas    conda-forge
libcrc32c                 1.1.2                h9c3ff4c_0    conda-forge
libcurl                   7.88.1               hdc1c0ab_1    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 h516909a_1    conda-forge
libevent                  2.1.10               h28343ad_4    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 12.2.0              h65d4601_19    conda-forge
libgfortran-ng            12.2.0              h69a702a_19    conda-forge
libgfortran5              12.2.0              h337968e_19    conda-forge
libgoogle-cloud           2.8.0                h0bc5f78_1    conda-forge
libgrpc                   1.52.1               hcf146ea_1    conda-forge
libhwloc                  2.9.0                hd6dc26d_0    conda-forge
libiconv                  1.17                 h166bdaf_0    conda-forge
liblapack                 3.9.0           16_linux64_openblas    conda-forge
libnghttp2                1.52.0               h61bc06f_0    conda-forge
libnuma                   2.0.16               h0b41bf4_1    conda-forge
libopenblas               0.3.21          pthreads_h78a6416_3    conda-forge
libprotobuf               3.21.12              h3eb15da_0    conda-forge
libsentencepiece          0.1.97               h47aad16_1    conda-forge
libsqlite                 3.40.0               h753d276_0    conda-forge
libssh2                   1.10.0               hf14f497_3    conda-forge
libstdcxx-ng              12.2.0              h46fd767_19    conda-forge
libthrift                 0.18.1               h5e4af38_0    conda-forge
libutf8proc               2.8.0                h166bdaf_0    conda-forge
libxml2                   2.10.3               hfdac1af_6    conda-forge
libzlib                   1.2.13               h166bdaf_4    conda-forge
llvm-openmp               16.0.0               h417c0b6_0    conda-forge
lz4-c                     1.9.4                hcb278e6_0    conda-forge
magma                     2.6.2                hc72dce7_0    conda-forge
mkl                       2022.2.1         h84fe81f_16997    conda-forge
multidict                 6.0.4            py39h72bdee0_0    conda-forge
multiprocess              0.70.14          py39hb9d737c_3    conda-forge
nccl                      2.14.3.1             h0800d71_0    conda-forge
ncurses                   6.3                  h27087fc_1    conda-forge
ninja                     1.11.1               h924138e_0    conda-forge
numpy                     1.24.2           py39h7360e5f_0    conda-forge
openssl                   3.1.0                h0b41bf4_0    conda-forge
orc                       1.8.3                hfdbbad2_0    conda-forge
packaging                 23.0               pyhd8ed1ab_0    conda-forge
pandas                    1.5.3            py39h2ad29b5_1    conda-forge
parquet-cpp               1.5.1                         2    conda-forge
pip                       23.0.1             pyhd8ed1ab_0    conda-forge
psutil                    5.9.4            py39hb9d737c_0    conda-forge
pyarrow                   11.0.0          py39hf0ef2fd_13_cpu    conda-forge
pycparser                 2.21               pyhd8ed1ab_0    conda-forge
pyopenssl                 23.1.1             pyhd8ed1ab_0    conda-forge
pysocks                   1.7.1              pyha2e5f31_6    conda-forge
python                    3.9.7           hf930737_3_cpython    conda-forge
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python-xxhash             3.2.0            py39h72bdee0_0    conda-forge
python_abi                3.9                      3_cp39    conda-forge
pytorch                   1.13.1          cuda112py39hb0b7ed5_200    conda-forge
pytz                      2023.3             pyhd8ed1ab_0    conda-forge
pyyaml                    6.0              py39hb9d737c_5    conda-forge
re2                       2023.02.02           hcb278e6_0    conda-forge
readline                  8.2                  h8228510_1    conda-forge
regex                     2023.3.23        py39h72bdee0_0    conda-forge
requests                  2.28.2             pyhd8ed1ab_1    conda-forge
responses                 0.18.0             pyhd8ed1ab_0    conda-forge
s2n                       1.3.41               h3358134_0    conda-forge
sacremoses                0.0.53             pyhd8ed1ab_0    conda-forge
sentencepiece             0.1.97               hf3d152e_1    conda-forge
sentencepiece-python      0.1.97           py39h0fce851_1    conda-forge
sentencepiece-spm         0.1.97               h47aad16_1    conda-forge
setuptools                67.6.1             pyhd8ed1ab_0    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
sleef                     3.5.1                h9b69904_2    conda-forge
snappy                    1.1.10               h9fff704_0    conda-forge
sqlite                    3.40.0               h4ff8645_0    conda-forge
tbb                       2021.8.0             hf52228f_0    conda-forge
tk                        8.6.12               h27826a3_0    conda-forge
tokenizers                0.13.2           py39h585fa2d_0    conda-forge
tqdm                      4.65.0             pyhd8ed1ab_1    conda-forge
transformers              4.27.4             pyhd8ed1ab_0    conda-forge
typing-extensions         4.5.0                hd8ed1ab_0    conda-forge
typing_extensions         4.5.0              pyha770c72_0    conda-forge
tzdata                    2023c                h71feb2d_0    conda-forge
ucx                       1.14.0               h538f049_0    conda-forge
urllib3                   1.26.15            pyhd8ed1ab_0    conda-forge
websockets                10.4             py39hb9d737c_1    conda-forge
wheel                     0.40.0             pyhd8ed1ab_0    conda-forge
xxhash                    0.8.1                h0b41bf4_0    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
yaml                      0.2.5                h7f98852_2    conda-forge
yarl                      1.8.2            py39hb9d737c_0    conda-forge
zipp                      3.15.0             pyhd8ed1ab_0    conda-forge
zlib                      1.2.13               h166bdaf_4    conda-forge
zstd                      1.5.2                h3eb15da_6    conda-forge

@jorgtied
Copy link
Member

jorgtied commented Apr 1, 2023

Did you download the model that you want to convert? The script expects the model in the model path you specify on command-line. Maybe this makefile helps you to see how I use the script for converting models: https://github.com/Helsinki-NLP/Opus-MT/blob/master/hf/Makefile

@Lyaaaaaaaaaaaaaaa
Copy link
Author

Hello, yes I downloaded the model I want to convert, Opus-en-pt.
I believe I downloaded the good format, here is the list of files present in the opus-en-pt folder. Just in case

decoder.yml
opus.bpe32k-bpe32k.transformer.model1.npz.best-perplexity.npz
opus.bpe32k-bpe32k.transformer.valid1.log
postprocess.sh
README.md
source.tcmodel
tokenizer_config.json
LICENSE
opus.bpe32k-bpe32k.transformer.train1.log
opus.bpe32k-bpe32k.vocab.yml
preprocess.sh
source.bpe
target.bpe
vocab.json

I have difficulties to understand the makefile.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants