Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coversion issue #1

Closed
ManniSingh opened this issue Nov 10, 2017 · 17 comments
Closed

Coversion issue #1

ManniSingh opened this issue Nov 10, 2017 · 17 comments

Comments

@ManniSingh
Copy link

Hi,
I am getting following error, Any suggestions?
Thanks


TypeError Traceback (most recent call last)
in ()
11 word, char, _, _, labels, masks, lengths = conll03_data.get_batch_variable(data_train, batch_size)
12 optim.zero_grad()
---> 13 loss = network.loss(word, char, labels, mask=masks)
14 loss.backward()
15 optim.step()

~/NeuroNLP2/neuronlp2/models/sequence_labeling.py in loss(self, input_word, input_char, target, mask, length, hx)
291 def loss(self, input_word, input_char, target, mask=None, length=None, hx=None):
292 # output from rnn [batch, length, tag_space]
--> 293 output, _, mask, length = self._get_rnn_output(input_word, input_char, mask=mask, length=length, hx=hx)
294
295 if length is not None:

~/NeuroNLP2/neuronlp2/models/sequence_labeling.py in _get_rnn_output(self, input_word, input_char, mask, length, hx)
252
253 # [batch, length, char_length, char_dim]
--> 254 char = self.char_embedd(input_char)
255 char_size = char.size()
256 # first transform to [batch *length, char_length, char_dim]

/anaconda/lib/python3.6/site-packages/torch/nn/modules/module.py in call(self, *input, **kwargs)
222 for hook in self._forward_pre_hooks.values():
223 hook(self, input)
--> 224 result = self.forward(*input, **kwargs)
225 for hook in self._forward_hooks.values():
226 hook_result = hook(self, input, result)

~/NeuroNLP2/neuronlp2/nn/modules/sparse.py in forward(self, input)
68 if input.dim() > 2:
69 num_inputs = np.prod(input_size[:-1])
---> 70 input = input.view(num_inputs, input_size[-1])
71
72 output_size = input_size + (self.embedding_dim, )

/anaconda/lib/python3.6/site-packages/torch/autograd/variable.py in view(self, *sizes)
508
509 def view(self, *sizes):
--> 510 return View.apply(self, sizes)
511
512 def view_as(self, tensor):

/anaconda/lib/python3.6/site-packages/torch/autograd/_functions/tensor.py in forward(ctx, i, sizes)
94 ctx.new_sizes = sizes
95 ctx.old_size = i.size()
---> 96 result = i.view(*sizes)
97 ctx.mark_shared_storage((i, result))
98 return result

TypeError: view received an invalid combination of arguments - got (numpy.int64, int), but expected one of:

  • (int ... size)
    didn't match because some of the arguments have invalid types: (numpy.int64, int)
  • (torch.Size size)
@XuezheMax
Copy link
Owner

XuezheMax commented Nov 10, 2017 via email

@ManniSingh
Copy link
Author

Thanks Max, i will wait for it.
Though i have a Tensorflow version but for some reason doesn't give F1: 91+ on same params.
Lasagne version is also so slow.

@XuezheMax
Copy link
Owner

Hi Manni,

Please try again to see if it works now.
Thanks.

@ManniSingh
Copy link
Author

ManniSingh commented Nov 18, 2017

Thanks it Works,
However, not giving me anywhere near f1: 90+ on conll2003. I tried both of your codes.

This one gives:
dev acc: 97.91%, precision: 90.59%, recall: 86.55%, F1: 88.52%
best dev acc: 98.01%, precision: 89.95%, recall: 87.31%, F1: 88.61% (epoch: 14)
best test acc: 96.40%, precision: 82.97%, recall: 80.67%, F1: 81.80% (epoch: 14)

Without any early stop:
Epoch 100 (LSTM(std), learning rate=0.0025, decay rate=0.0500 (1)):
loss: 0.0423, time: 133.20s
dev acc: 97.98%, precision: 89.70%, recall: 87.45%, F1: 88.56%
best dev acc: 98.03%, precision: 90.04%, recall: 87.89%, F1: 88.95% (epoch: 69)
best test acc: 96.27%, precision: 82.60%, recall: 80.74%, F1: 81.66% (epoch: 69)

Lasagne code gives:
best dev acc: 97.99%, precision: 89.74%, recall: 87.73%, F1: 88.72%
best test acc: 96.21%, precision: 81.13%, recall: 79.78%, F1: 80.45%

Under settings : (mentioned in your paper),
following are logs from the code:

loading embedding: glove from /data/manni/ner/glove/glove.6B.100d.txt.gz
2017-11-18 13:02:00,893 - NERCRF - INFO - Creating Alphabets
2017-11-18 13:02:00,911 - Create Alphabets - INFO - Word Alphabet Size (Singleton): 11985 (0)
2017-11-18 13:02:00,913 - Create Alphabets - INFO - Character Alphabet Size: 86
2017-11-18 13:02:00,915 - Create Alphabets - INFO - POS Alphabet Size: 47
2017-11-18 13:02:00,916 - Create Alphabets - INFO - Chunk Alphabet Size: 19
2017-11-18 13:02:00,918 - Create Alphabets - INFO - NER Alphabet Size: 9
2017-11-18 13:02:00,919 - NERCRF - INFO - Word Alphabet Size: 11985
2017-11-18 13:02:00,921 - NERCRF - INFO - Character Alphabet Size: 86
2017-11-18 13:02:00,923 - NERCRF - INFO - POS Alphabet Size: 47
2017-11-18 13:02:00,925 - NERCRF - INFO - Chunk Alphabet Size: 19
2017-11-18 13:02:00,926 - NERCRF - INFO - NER Alphabet Size: 9
2017-11-18 13:02:00,928 - NERCRF - INFO - Reading Data
Reading data from /data/manni/ner/conll2003/eng.train
reading data: 10000
Total number of data: 14041
Reading data from /data/manni/ner/conll2003/eng.testa
Total number of data: 3250
Reading data from /data/manni/ner/conll2003/eng.testb
Total number of data: 3453
oov: 11984
2017-11-18 13:02:12,675 - NERCRF - INFO - constructing network...
2017-11-18 13:02:13,287 - NERCRF - INFO - Network: LSTM, num_layer=1, hidden=200, filter=30, tag_space=128, crf=bigram
2017-11-18 13:02:13,289 - NERCRF - INFO - training: l2: 0.000000, (#training data: 14041, batch: 10, dropout: 0.50)

settings:
tag_space = 128 # WHAT IS THIS?
dropout = 'std'
logger = get_logger("NERCRF")
mode = 'LSTM'
train_path = "/data/manni/ner/conll2003/eng.train"
dev_path = "/data/manni/ner/conll2003/eng.testa"
test_path = "/data/manni/ner/conll2003/eng.testb"
num_epochs = 100
batch_size = 10
hidden_size = 200
num_filters = 30
learning_rate = 0.015
momentum = 0.9
decay_rate = 0.05
gamma = 0.0
schedule = 1
p = 0.5
bigram = True
embedding = 'glove'

@XuezheMax
Copy link
Owner

Hi Manni,
Here is my logs for the first few epochs.
loading embedding: glove from data/glove/glove.6B/glove.6B.100d.gz
2017-11-18 17:47:27,021 - NERCRF - INFO - Creating Alphabets
2017-11-18 17:47:27,041 - Create Alphabets - INFO - Word Alphabet Size (Singleton): 23598 (8122)
2017-11-18 17:47:27,041 - Create Alphabets - INFO - Character Alphabet Size: 86
2017-11-18 17:47:27,041 - Create Alphabets - INFO - POS Alphabet Size: 47
2017-11-18 17:47:27,041 - Create Alphabets - INFO - Chunk Alphabet Size: 19
2017-11-18 17:47:27,041 - Create Alphabets - INFO - NER Alphabet Size: 18
2017-11-18 17:47:27,041 - NERCRF - INFO - Word Alphabet Size: 23598
2017-11-18 17:47:27,041 - NERCRF - INFO - Character Alphabet Size: 86
2017-11-18 17:47:27,041 - NERCRF - INFO - POS Alphabet Size: 47
2017-11-18 17:47:27,041 - NERCRF - INFO - Chunk Alphabet Size: 19
2017-11-18 17:47:27,041 - NERCRF - INFO - NER Alphabet Size: 18
2017-11-18 17:47:27,042 - NERCRF - INFO - Reading Data
Reading data from data/conll2003/english/eng.train.bioes.conll
reading data: 10000
Total number of data: 14987
Reading data from data/conll2003/english/eng.dev.bioes.conll
Total number of data: 3466
Reading data from data/conll2003/english/eng.test.bioes.conll
Total number of data: 3684
oov: 339
2017-11-18 17:47:31,883 - NERCRF - INFO - constructing network...
2017-11-18 17:47:32,315 - NERCRF - INFO - Network: LSTM, num_layer=1, hidden=256, filter=30, tag_space=128, crf=bigram
2017-11-18 17:47:32,315 - NERCRF - INFO - training: l2: 0.000000, (#training data: 14987, batch: 16, dropout: 0.50, unk replace: 0.00)
Epoch 1 (LSTM(std), learning rate=0.0100, decay rate=0.0500 (schedule=1)):
train: 937 loss: 2.3267, time: 50.82s
dev acc: 97.04%, precision: 88.88%, recall: 85.93%, F1: 87.38%
best dev acc: 97.04%, precision: 88.88%, recall: 85.93%, F1: 87.38% (epoch: 1)
best test acc: 96.16%, precision: 84.89%, recall: 82.95%, F1: 83.91% (epoch: 1)
Epoch 2 (LSTM(std), learning rate=0.0095, decay rate=0.0500 (schedule=1)):
train: 937 loss: 0.8332, time: 50.86s
dev acc: 97.66%, precision: 90.54%, recall: 89.26%, F1: 89.90%
best dev acc: 97.66%, precision: 90.54%, recall: 89.26%, F1: 89.90% (epoch: 2)
best test acc: 96.83%, precision: 86.99%, recall: 86.51%, F1: 86.75% (epoch: 2)
Epoch 3 (LSTM(std), learning rate=0.0091, decay rate=0.0500 (schedule=1)):
train: 937 loss: 0.6846, time: 33.49s
dev acc: 98.10%, precision: 92.61%, recall: 90.86%, F1: 91.73%
best dev acc: 98.10%, precision: 92.61%, recall: 90.86%, F1: 91.73% (epoch: 3)
best test acc: 97.35%, precision: 88.87%, recall: 87.98%, F1: 88.42% (epoch: 3)
Epoch 4 (LSTM(std), learning rate=0.0087, decay rate=0.0500 (schedule=1)):
train: 937 loss: 0.5764, time: 39.42s
dev acc: 98.29%, precision: 92.49%, recall: 92.07%, F1: 92.28%
best dev acc: 98.29%, precision: 92.49%, recall: 92.07%, F1: 92.28% (epoch: 4)
best test acc: 97.46%, precision: 88.24%, recall: 88.60%, F1: 88.42% (epoch: 4)

It seems that there are some issues in your data. Please make sure that you follows the data format describe here #2
and use the BIOES tagging schema.
When you are trying to re-run the code, please first remove the vocabulary dir in data/alphabets/ner_crf/ so that the new vocabulary can be rebuilt.
Thanks.

@XuezheMax
Copy link
Owner

Moreover, PyTorch has some implicit parameter initialization which makes the training of the first few epochs unstable. When you see a pretty large loss at the beginning of the training (like 50+), just kill the program and re-run it :)

@ManniSingh
Copy link
Author

Hi Max,

  • Cleaned everything
  • Error starts < 15
  • I converted data to BIOES
  • Did not remove "-docstart-" sentences, i noticed from your log that says, "Total number of data: 14987".

But still (1% improvement i got):

Epoch 100 (LSTM(std), learning rate=0.0025, decay rate=0.0500 (1)):
1499/1499 [===========================>] - ETA: 8s - train loss: 0.0395, time: 126.78s
dev acc: 97.68%, precision: 89.58%, recall: 87.28%, F1: 88.42%
best dev acc: 97.87%, precision: 90.29%, recall: 88.44%, F1: 89.36% (epoch: 21)
best test acc: 96.17%, precision: 83.23%, recall: 81.80%, F1: 82.51% (epoch: 21)

@XuezheMax
Copy link
Owner

XuezheMax commented Nov 19, 2017 via email

@ManniSingh
Copy link
Author

ManniSingh commented Nov 20, 2017

HI Max,
I did clone, that is why it is working.

Following is the log:

loading embedding: glove from /data/manni/ner/glove/glove.6B.100d.txt.gz
2017-11-19 09:08:22,562 - NERCRF - INFO - Creating Alphabets
2017-11-19 09:08:22,565 - Create Alphabets - INFO - Creating Alphabets: /data/manni/alphabets/ner_crf/
2017-11-19 09:08:23,173 - Create Alphabets - INFO - Total Vocabulary Size: 23625
2017-11-19 09:08:23,175 - Create Alphabets - INFO - TOtal Singleton Size: 11641
2017-11-19 09:08:23,180 - Create Alphabets - INFO - Total Vocabulary Size (w.o rare words): 11984
2017-11-19 09:08:23,351 - Create Alphabets - INFO - Word Alphabet Size (Singleton): 11985 (0)
2017-11-19 09:08:23,352 - Create Alphabets - INFO - Character Alphabet Size: 86
2017-11-19 09:08:23,353 - Create Alphabets - INFO - POS Alphabet Size: 47
2017-11-19 09:08:23,354 - Create Alphabets - INFO - Chunk Alphabet Size: 19
2017-11-19 09:08:23,355 - Create Alphabets - INFO - NER Alphabet Size: 18
2017-11-19 09:08:23,357 - NERCRF - INFO - Word Alphabet Size: 11985
2017-11-19 09:08:23,358 - NERCRF - INFO - Character Alphabet Size: 86
2017-11-19 09:08:23,359 - NERCRF - INFO - POS Alphabet Size: 47
2017-11-19 09:08:23,360 - NERCRF - INFO - Chunk Alphabet Size: 19
2017-11-19 09:08:23,361 - NERCRF - INFO - NER Alphabet Size: 18
2017-11-19 09:08:23,362 - NERCRF - INFO - Reading Data
Reading data from /data/manni/ner/conll2003/eng.bioes.train
reading data: 10000
Total number of data: 14987
Reading data from /data/manni/ner/conll2003/eng.bioes.testa
Total number of data: 3466
Reading data from /data/manni/ner/conll2003/eng.bioes.testb
Total number of data: 3684
oov: 11984
2017-11-19 09:08:36,198 - NERCRF - INFO - constructing network...
2017-11-19 09:08:36,776 - NERCRF - INFO - Network: LSTM, num_layer=1, hidden=200, filter=30, tag_space=128, crf=bigram
2017-11-19 09:08:36,777 - NERCRF - INFO - training: l2: 0.000000, (#training data: 14987, batch: 10, dropout: 0.50)

After Epoch 100:

Epoch 100 (LSTM(std), learning rate=0.0025, decay rate=0.0500 (1)):
dev acc: 97.68%, precision: 89.58%, recall: 87.28%, F1: 88.42%
best dev acc: 97.87%, precision: 90.29%, recall: 88.44%, F1: 89.36% (epoch: 21)
best test acc: 96.17%, precision: 83.23%, recall: 81.80%, F1: 82.51% (epoch: 21)

@XuezheMax
Copy link
Owner

XuezheMax commented Nov 20, 2017 via email

@ManniSingh
Copy link
Author

ManniSingh commented Nov 21, 2017

  • I lowered words in "conll03_data.py"
  • Also did same in "reader.py"
  • Now the oov is 476 and getting F1:91+
    But why OOV is so less (I have "normalize_digits=False")? Are you doing Masking somewhere?

The orignal composition of glove to conll2003 i calculated is:

Glove Vocab length:400000
Vocab length:21009
Vocab length:9002
Vocab length:8548
Total Vocab: 26869
Total OOV: 3922

Now the result is:

Epoch 41 (LSTM(std), learning rate=0.0050, decay rate=0.0500 (1)):
1499/1499 [===========================>] - ETA: 8s - train: 61459 loss: 0.1576, time: 126.63s
dev acc: 98.81%, precision: 94.80%, recall: 94.58%, F1: 94.69%
best dev acc: 98.81%, precision: 94.80%, recall: 94.58%, F1: 94.69% (epoch: 41)
best test acc: 97.93%, precision: 91.26%, recall: 90.92%, F1: 91.09% (epoch: 41)

@XuezheMax
Copy link
Owner

XuezheMax commented Nov 21, 2017 via email

@ManniSingh
Copy link
Author

I noticed that in the code. But, to my knowledge, the 6B GloVe is uncased. Also, there are many non-word elements (Digits, Punkts etc.) in Conll2003 English dataset are you masking those?

@XuezheMax
Copy link
Owner

XuezheMax commented Nov 21, 2017 via email

@ManniSingh
Copy link
Author

ManniSingh commented Nov 21, 2017

So does that mean, you considering them(singletons) "unk" ?
BTW, That seems a good idea for generalisation!

@XuezheMax
Copy link
Owner

XuezheMax commented Nov 21, 2017 via email

@ManniSingh
Copy link
Author

Great! Thanks i will try that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants