Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

util.preprocessing.perpareDataset() reducePretrainedEmbeddings==True causes error #29

Closed
apmoore1 opened this issue Sep 18, 2018 · 0 comments

Comments

@apmoore1
Copy link
Contributor

Within the util.preprocessing.perpareDataset() function the reducePretrainedEmbeddings argument when set to True causes the following error:

Traceback (most recent call last):
  File "test.py", line 25, in <module>
    reducePretrainedEmbeddings=True)
  File "/home/andrew/Documents/another/emnlp2017-bilstm-cnn-crf/util/preprocessing.py", line 42, in perpareDataset
    embeddings, word2Idx = readEmbeddings(embeddingsPath, datasets, frequencyThresholdUnknownTokens, reducePretrainedEmbeddings)
  File "/home/andrew/Documents/another/emnlp2017-bilstm-cnn-crf/util/preprocessing.py", line 118, in readEmbeddings
    dataColumnsIdx = {y: x for x, y in dataset['cols'].items()}
TypeError: string indices must be integers

To re-create this error I have provided the Python code below and you must ensure within the util.preprocessing.perpareDataset() function that you are not caching the pickle file caused by lines 37-39 or else it will return the cached pickle which might not have the reduced embeddings I believe and therefore no error.

Example code to re-create error:

import os
import logging
import sys
from neuralnets.BiLSTM import BiLSTM
from util.preprocessing import perpareDataset, loadDatasetPickle
abspath = os.path.abspath(__file__)
dname = os.path.dirname(abspath)
os.chdir(dname)
datasets = {
    'unidep_pos':                            #Name of the dataset
        {'columns': {1:'tokens', 3:'POS'},   #CoNLL format for the input data. Column 1 contains tokens, column 3 contains POS information
         'label': 'POS',                     #Which column we like to predict
         'evaluate': True,                   #Should we evaluate on this task? Set true always for single task setups
         'commentSymbol': None}              #Lines in the input data starting with this string will be skipped. Can be used to skip comments
}
embeddingsPath = 'komninos_english_embeddings.gz'
pickleFile = perpareDataset(embeddingsPath, datasets, 
                            reducePretrainedEmbeddings=True)
@apmoore1 apmoore1 mentioned this issue Sep 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant