Using the NeuroNLP2 in a different data format #11

ayrtondenner · 2018-04-24T18:46:23Z

Hello, I saw in #9 that you used a data formed of 4 columns for NER. I am trying to run it in a corpus formed of 2 columns, like in this pic:

So, my text base is formed of a column with an word and another column with a tag only. Is there any way to parameterize the script to support such kind of data, or I will have to adapt the code specific for my use? For instance, I will have to change in conll03_data to read tokens[0] instead of tokens[1] as an word, and deal with pos, chunk and ner alphabet. Anything else I should know?

Thanks.

The text was updated successfully, but these errors were encountered:

XuezheMax · 2018-04-24T18:59:45Z

Hi,
There are two ways you can do this.
Since the NER model uses only the words and the NER labels in the data, one way is to convert your format to match the original format by filling the POS and Chunking columns with any symbols you like.
Another way is to write a new Reader to handle your format.

ayrtondenner · 2018-04-24T19:06:24Z

I see. Assigning "None" to pos, chunk and ner variables in create_alphabets isn't enough? This way won't be any real assignment to such values. Or I guess I will insert "_" chars in my database, so I can create two more columns to match the current code.

XuezheMax · 2018-04-24T19:14:26Z

I am not sure if assigning None to them will raise errors or not. I read POS and chunk information for the consideration to use them in the future. I guess inserting '_' is a good idea :)

…

On Tue, Apr 24, 2018 at 3:06 PM, Ayrton Denner ***@***.***> wrote: I see. Assigning "None" to pos, chunk and ner variables in create_alphabets isn't enough? This way won't be any real assignment to such values. Or I guess I will insert "_" chars in my database, so I can create two more columns to match the current code. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#11 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADUtlkrpdDgS7GD_GZVmDBICJO3ScDR_ks5tr3exgaJpZM4TiN9R> .

-- ------------------ Best regards, Ma，Xuezhe Language Technologies Institute, School of Computer Science, Carnegie Mellon University Tel: +1 206-512-5977

ayrtondenner · 2018-04-24T19:16:39Z

Ok, so I will try that. Thanks!

XuezheMax closed this as completed Apr 25, 2018

jk78346 mentioned this issue May 17, 2018

tag columns of Input data for NER: can be self-defined tags? #15

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using the NeuroNLP2 in a different data format #11

Using the NeuroNLP2 in a different data format #11

ayrtondenner commented Apr 24, 2018

XuezheMax commented Apr 24, 2018

ayrtondenner commented Apr 24, 2018

XuezheMax commented Apr 24, 2018 via email

ayrtondenner commented Apr 24, 2018

Using the NeuroNLP2 in a different data format #11

Using the NeuroNLP2 in a different data format #11

Comments

ayrtondenner commented Apr 24, 2018

XuezheMax commented Apr 24, 2018

ayrtondenner commented Apr 24, 2018

XuezheMax commented Apr 24, 2018 via email

ayrtondenner commented Apr 24, 2018