No such file Error: ../data/MSRvid2012 #2

huache · 2017-02-20T10:29:08Z

When I run the demo.sh in examples directory, this error occurred:

word vectors loaded from ../data/glove.840B.300d.txt
word weights computed from ../auxiliary_data/enwiki_vocab_min200.txt using parameter a=-1.000000
remove the first 0 principal components
Traceback (most recent call last):
  File "sim_sif.py", line 28, in <module>
    parr, sarr = eval.sim_evaluate_all(We, words, weight4ind, sim_algo.weighted_average_sim_rmpc, params)
  File "../src/eval.py", line 64, in sim_evaluate_all
    p,s = sim_getCorrelation(We, words, prefix+i, weight4ind, scoring_function, params)
  File "../src/eval.py", line 13, in sim_getCorrelation
    f = open(f,'r')
IOError: [Errno 2] No such file or directory: '../data/MSRvid2012'

Would you please tell me that:

Is this error detrimental to the model training ?
Where to download the missing data file ?

Thanks a lot !

The text was updated successfully, but these errors were encountered:

YingyuLiang · 2017-02-20T16:37:44Z

It's because the function sim_evaluate_all will evaluate over all textual similarity datasets, but I only put online a few example datasets.

You can:

run sim_evaluate_one instead of sim_evaluate_all. The function sim_evaluate_one will check only one example dataset.
If you would like to check over all datasets, you'll need to contact John Wieting and obtain all the datasets. These datasets are from (https://github.com/jwieting/iclr2016), but both of us only put example datasets online, since some other datasets have copyright issues.

huache · 2017-02-21T03:11:32Z

It is my fault, I should check the comments of source code before starting a new issue.
I will change the function , and try again. Thank you for replying so quickly !

huache · 2017-02-21T08:43:04Z

I run demo.sh again with updated codes, the error still occurred when running sim_tfidf.py:

Traceback (most recent call last):
  File "sim_tfidf.py", line 22, in <module>
    weight4ind = data_io.getIDFWeight(wordfile)
  File "../src/data_io.py", line 355, in getIDFWeight
    g1x,g1mask,g2x,g2mask = getDataFromFile(prefix+f, words)
  File "../src/data_io.py", line 309, in getDataFromFile
    f = open(f,'r')
IOError: [Errno 2] No such file or directory: '../data/MSRvid2012'

I will change the farr list at line 326 to only one element remained ["MSRpar2012"], and run it again.
Hope that everything is OK.

YingyuLiang · 2017-02-22T03:04:54Z

Yes, data_io.getIDFWeight will read all the data files and compute the idf weights. I forgot to change it to read only the example file. Just changed.

That being said, I recommend using more files for computing the idf weights. Using one file will probably lead to a not so accurate estimation.

huache · 2017-02-23T09:24:23Z

Sorry to bother you again.
I have run demo.sh two times more, but it seems that I need more memory to run it. I got this error on my 32G (28G free) memory machine:

Traceback (most recent call last):
  File "train.py", line 237, in <module>
    model = proj_model_sentiment(We, params)
        ......
        ......
    self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory

How many memory do I need to run demo.sh ?

YingyuLiang · 2017-02-23T23:41:46Z

Are you running it with the word vector file glove.840B.300d.txt?

This file contains a very large vocabulary (the file is about 6G), and probably using the whole set of word vectors causes memory issue. You can try using only the first 50,000 words (i.e., keep only the first 50,000 lines in the file), which merely affects the experiments.

huache · 2017-02-24T09:49:40Z

Yes, I haven't noticed that...
I used the first 50,000 words to run demo.sh again , it finally works.
Thank you so much !

YingyuLiang · 2017-02-24T11:59:15Z

No problem!

loretoparisi · 2017-07-20T22:09:15Z

@huache in my case it is not a OOM problem, but it takes to long. How you achieved to cap to the first 50K words?

huache · 2017-07-21T09:14:42Z

@loretoparisi As YingyuLiang said, modify the file glove.840B.300d.txt (or other word vector file you used) , "keep only the first 50,000 lines in the file".

loretoparisi · 2017-07-21T12:11:56Z

@huache ok so I just cat the first 50K rows of the text file.

YingyuLiang closed this as completed Feb 24, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No such file Error: ../data/MSRvid2012 #2

No such file Error: ../data/MSRvid2012 #2

huache commented Feb 20, 2017

YingyuLiang commented Feb 20, 2017

huache commented Feb 21, 2017

huache commented Feb 21, 2017

YingyuLiang commented Feb 22, 2017

huache commented Feb 23, 2017 •

edited

YingyuLiang commented Feb 23, 2017

huache commented Feb 24, 2017

YingyuLiang commented Feb 24, 2017

loretoparisi commented Jul 20, 2017 •

edited

huache commented Jul 21, 2017

loretoparisi commented Jul 21, 2017

No such file Error: ../data/MSRvid2012 #2

No such file Error: ../data/MSRvid2012 #2

Comments

huache commented Feb 20, 2017

YingyuLiang commented Feb 20, 2017

huache commented Feb 21, 2017

huache commented Feb 21, 2017

YingyuLiang commented Feb 22, 2017

huache commented Feb 23, 2017 • edited

YingyuLiang commented Feb 23, 2017

huache commented Feb 24, 2017

YingyuLiang commented Feb 24, 2017

loretoparisi commented Jul 20, 2017 • edited

huache commented Jul 21, 2017

loretoparisi commented Jul 21, 2017

huache commented Feb 23, 2017 •

edited

loretoparisi commented Jul 20, 2017 •

edited