RuntimeError: Vector for token darang has 230 dimensions, but previously read vectors have 300 dimensions. All vectors must have the same number of dimensions. #57

aurooj · 2018-11-30T19:02:11Z

Expected Behavior

Load FastText vectors

Environment:
Ubuntu 16.04
Python 3.6.4
Pytorch 0.4.1

Actual Behavior

Throws the following error:

File "", line 1, in
File "/home/zxi/.local/lib/python3.6/site-packages/torchnlp/word_to_vector/fast_text.py", line 83, in init
super(FastText, self).init(name, url=url, **kwargs)
File "/home/zxi/.local/lib/python3.6/site-packages/torchnlp/word_to_vector/pretrained_word_vectors.py", line 72, in init
self.cache(name, cache, url=url)
File "/home/zxi/.local/lib/python3.6/site-packages/torchnlp/word_to_vector/pretrained_word_vectors.py", line 153, in cache
word, len(entries), dim))
RuntimeError: Vector for token darang has 230 dimensions, but previously read vectors have 300 dimensions. All vectors must have the same number of dimensions.

Steps to Reproduce the Problem

Open python console

Write the following code:

    from torchnlp.word_to_vector import FastText
    vectors = FastText()

Throws the error mentioned above.

The text was updated successfully, but these errors were encountered:

PetrochukM · 2018-11-30T22:52:25Z

Hi There!

This code base works just fine:

>>> from torchnlp.word_to_vector import FastText
>>> vectors = FastText()
wiki.en.vec: 6.60GB [05:28, 21.4MB/s]
  0%|                                                                      | 0/2519371 [00:00<?, ?it/s]Skipping token 2519370 with 1-dimensional vector ['300']; likely a header
100%|██████████████████████████████████████████████████████| 2519371/2519371 [05:19<00:00, 7884.92it/s]
>>> vectors['derang']
tensor([ 0.3663, -0.2729, -0.5492,  0.2594, -0.2059, -0.6579,  0.3311, -0.3561,
        -0.0211, -0.4950,  0.2345,  0.5009,  0.1284, -0.0284,  0.4262,  0.1306,
         0.0736, -0.1482,  0.1071,  0.3749, -0.3396,  0.2189, -0.0933, -0.6236,
         0.2598,  0.1215,  0.3682,  0.0977,  0.3826,  0.2483,  0.0497,  0.3010,
         0.1354, -0.1132,  0.3291,  0.1183,  0.0862, -0.2852, -0.2880,  0.4053,
        -0.2330,  0.4374, -0.0842,  0.1315, -0.1406,  0.1829, -0.1734,  0.2383,
         0.1084,  0.0826, -0.2086,  0.1929,  0.4043, -0.0709,  0.0764, -0.2958,
         0.0644,  0.4529,  0.0039,  0.0321,  0.2296,  0.1703,  0.3169,  0.3324,
        -0.1998,  0.1265, -0.4961, -0.1126,  0.3073, -0.0775,  0.1673, -0.1065,
         0.1746, -0.3484, -0.1683,  0.3709,  0.1794, -0.1061, -0.3025,  0.0797,
         0.7037, -0.3384,  0.0654,  0.0047,  0.0675,  0.2268, -0.2287, -0.0502,
        -0.1027, -0.1576,  0.0931, -0.5580,  0.3006, -0.6026,  0.0979, -0.1607,
         0.2291,  0.2667, -0.2266,  0.3741, -0.3300,  0.2384, -0.1749,  0.1554,
        -0.0474,  0.1531, -0.2938,  0.3155,  0.1208, -0.4494,  0.0461,  0.1716,
        -0.3338,  0.1848,  0.2872, -0.4439, -0.0408,  0.0823, -0.3677,  0.0684,
         0.1709, -0.2148, -0.0842,  0.4830, -0.2937, -0.0804, -0.1713, -0.1559,
        -0.1759,  0.1321,  0.0048,  0.1698,  0.1019,  0.1963,  0.0649, -0.0431,
        -0.3056, -0.2303, -0.2197,  0.0797, -0.1263,  0.2204, -0.0276, -0.0039,
         0.2605, -0.0019, -0.0057,  0.3839,  0.5118,  0.0172,  0.1729, -0.0898,
         0.1416, -0.4514, -0.0455,  0.2964, -0.1571,  0.5023,  0.0768, -0.3092,
        -0.1937,  0.2595, -0.2484,  0.5232, -0.1842, -0.3832, -0.4159, -0.3071,
         0.3744,  0.5791,  0.0642, -0.1190, -0.0598,  0.0508,  0.1179,  0.0383,
        -0.3242,  0.1952, -0.0211, -0.1509, -0.4514, -0.1727, -0.0395, -0.4362,
         0.3575,  0.1249,  0.0599,  0.0472,  0.6013,  0.1357, -0.0937,  0.1200,
         0.1294,  0.4008, -0.1689,  0.1403, -0.7018, -0.0751, -0.6768, -0.1206,
         0.5307, -0.0490, -0.1083,  0.2631,  0.0748, -0.1714,  0.1157,  0.3715,
         0.6093,  0.3088,  0.4642,  0.0930,  0.0624, -0.0640,  0.1391, -0.7331,
        -0.1361, -0.0859, -0.3891,  0.0768, -0.4963,  0.0695, -0.3626,  0.8411,
         0.1532, -0.1458, -0.2630, -0.2151, -0.3103,  0.1697, -0.1632, -0.3756,
        -0.0803, -0.1968,  0.5468,  0.1773, -0.2990, -0.0036,  0.0758, -0.3991,
        -0.0524,  0.2814, -0.2947, -0.1843,  0.3038,  0.4715, -0.3175,  0.1851,
         0.0134, -0.1914,  0.4584,  0.2807,  0.1590,  0.3280,  0.3517,  0.3911,
         0.1309, -0.2509, -0.0008, -0.2097,  0.2152,  0.1403,  0.3071,  0.0773,
         0.1583, -0.6938,  0.0017, -0.3672,  0.1968,  0.0241, -0.5667,  0.1639,
         0.0899, -0.1899, -0.1444,  0.3414,  0.4791,  0.0642,  0.0116, -0.1053,
         0.5087,  0.0990,  0.1311,  0.3384, -0.3098, -0.1424, -0.0206, -0.1233,
         0.1623, -0.0964, -0.2188,  0.4343,  0.1835, -0.0482, -0.3140,  0.2048,
        -0.0942,  0.0402,  0.0923, -0.1973])

You must have modified the wiki.en.vec file. Try deleting it and rerunning rm -r .word_vectors_cache/wiki.en.vec.

aurooj · 2018-11-30T23:54:14Z

Thanks for your reply!

I am running into one more issue:

After downloading the pre-trained embeddings, when it starts loading them, my RAM gets filled up and then machine dies or gives me memory error.
Same happens when I try loading GloVe.

I am not an expert in NLP or have any prior experience in text. All I want to do is to load pre-trained embeddings and features for the words in my dataset.

I tried on two machines with the following configurations:
Machine1:
Ubuntu 16.04
RAM 24GB
Python 3.6.4
Pytorch 0.4.1

Machine 2:
Ubuntu 14.04
RAM 16GB
Python 3.6.6
Pytorch 0.4.1

wiki.en.vec: 6.60GB [05:28, 21.4MB/s] <-- [this step finishes successfully.]
0%| | 0/2519371 [00:00<?, ?it/s]Skipping token 2519370 with 1-dimensional vector ['300']; likely a header
100%|██████████████████████████████████████████████████████| 2519371/2519371 [05:19<00:00, 7884.92it/s] <-- [My RAM starts filling up at this step resulting in freezing my machine or throwing the error I posted in this issue]

Your help is highly appreciated. Thanks.

PetrochukM · 2018-12-01T00:11:13Z

Yup, this is a known problem. You are attempting to put into memory all 6 gigabytes of embeddings. I'd use is_include to filter the embeddings by your vocabulary.

There are other more sophisticated options like so: https://github.com/vzhong/embeddings

aurooj · 2018-12-01T00:18:15Z

Ah, I see. Thank you, I will try these solutions.

PetrochukM closed this as completed Nov 30, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: Vector for token darang has 230 dimensions, but previously read vectors have 300 dimensions. All vectors must have the same number of dimensions. #57

RuntimeError: Vector for token darang has 230 dimensions, but previously read vectors have 300 dimensions. All vectors must have the same number of dimensions. #57

aurooj commented Nov 30, 2018 •

edited

PetrochukM commented Nov 30, 2018 •

edited

aurooj commented Nov 30, 2018

PetrochukM commented Dec 1, 2018

aurooj commented Dec 1, 2018

RuntimeError: Vector for token darang has 230 dimensions, but previously read vectors have 300 dimensions. All vectors must have the same number of dimensions. #57

RuntimeError: Vector for token darang has 230 dimensions, but previously read vectors have 300 dimensions. All vectors must have the same number of dimensions. #57

Comments

aurooj commented Nov 30, 2018 • edited

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

PetrochukM commented Nov 30, 2018 • edited

aurooj commented Nov 30, 2018

PetrochukM commented Dec 1, 2018

aurooj commented Dec 1, 2018

aurooj commented Nov 30, 2018 •

edited

PetrochukM commented Nov 30, 2018 •

edited