util.preprocessing.perpareDataset() reducePretrainedEmbeddings==True causes error #29

apmoore1 · 2018-09-18T20:03:56Z

Within the util.preprocessing.perpareDataset() function the reducePretrainedEmbeddings argument when set to True causes the following error:

Traceback (most recent call last):
  File "test.py", line 25, in <module>
    reducePretrainedEmbeddings=True)
  File "/home/andrew/Documents/another/emnlp2017-bilstm-cnn-crf/util/preprocessing.py", line 42, in perpareDataset
    embeddings, word2Idx = readEmbeddings(embeddingsPath, datasets, frequencyThresholdUnknownTokens, reducePretrainedEmbeddings)
  File "/home/andrew/Documents/another/emnlp2017-bilstm-cnn-crf/util/preprocessing.py", line 118, in readEmbeddings
    dataColumnsIdx = {y: x for x, y in dataset['cols'].items()}
TypeError: string indices must be integers

To re-create this error I have provided the Python code below and you must ensure within the util.preprocessing.perpareDataset() function that you are not caching the pickle file caused by lines 37-39 or else it will return the cached pickle which might not have the reduced embeddings I believe and therefore no error.

Example code to re-create error:

import os
import logging
import sys
from neuralnets.BiLSTM import BiLSTM
from util.preprocessing import perpareDataset, loadDatasetPickle
abspath = os.path.abspath(__file__)
dname = os.path.dirname(abspath)
os.chdir(dname)
datasets = {
    'unidep_pos':                            #Name of the dataset
        {'columns': {1:'tokens', 3:'POS'},   #CoNLL format for the input data. Column 1 contains tokens, column 3 contains POS information
         'label': 'POS',                     #Which column we like to predict
         'evaluate': True,                   #Should we evaluate on this task? Set true always for single task setups
         'commentSymbol': None}              #Lines in the input data starting with this string will be skipped. Can be used to skip comments
}
embeddingsPath = 'komninos_english_embeddings.gz'
pickleFile = perpareDataset(embeddingsPath, datasets, 
                            reducePretrainedEmbeddings=True)

apmoore1 mentioned this issue Sep 18, 2018

fixes #29 #30

Merged

nreimers closed this as completed in b709f58 Sep 19, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

util.preprocessing.perpareDataset() reducePretrainedEmbeddings==True causes error #29

util.preprocessing.perpareDataset() reducePretrainedEmbeddings==True causes error #29

apmoore1 commented Sep 18, 2018

util.preprocessing.perpareDataset() reducePretrainedEmbeddings==True causes error #29

util.preprocessing.perpareDataset() reducePretrainedEmbeddings==True causes error #29

Comments

apmoore1 commented Sep 18, 2018