Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using the NeuroNLP2 in a different data format #11

Closed
ayrtondenner opened this issue Apr 24, 2018 · 4 comments
Closed

Using the NeuroNLP2 in a different data format #11

ayrtondenner opened this issue Apr 24, 2018 · 4 comments

Comments

@ayrtondenner
Copy link

Hello, I saw in #9 that you used a data formed of 4 columns for NER. I am trying to run it in a corpus formed of 2 columns, like in this pic:

image

So, my text base is formed of a column with an word and another column with a tag only. Is there any way to parameterize the script to support such kind of data, or I will have to adapt the code specific for my use? For instance, I will have to change in conll03_data to read tokens[0] instead of tokens[1] as an word, and deal with pos, chunk and ner alphabet. Anything else I should know?

Thanks.

@XuezheMax
Copy link
Owner

Hi,
There are two ways you can do this.
Since the NER model uses only the words and the NER labels in the data, one way is to convert your format to match the original format by filling the POS and Chunking columns with any symbols you like.
Another way is to write a new Reader to handle your format.

@ayrtondenner
Copy link
Author

I see. Assigning "None" to pos, chunk and ner variables in create_alphabets isn't enough? This way won't be any real assignment to such values. Or I guess I will insert "_" chars in my database, so I can create two more columns to match the current code.

@XuezheMax
Copy link
Owner

XuezheMax commented Apr 24, 2018 via email

@ayrtondenner
Copy link
Author

Ok, so I will try that. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants