Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to add some new feature? #4

Closed
EricAugust opened this issue Jan 22, 2019 · 4 comments
Closed

how to add some new feature? #4

EricAugust opened this issue Jan 22, 2019 · 4 comments

Comments

@EricAugust
Copy link

I want to add some feature in data, ex: is_in_some_vocab?
To train more generalized model.
How can I do this?

@king-menin
Copy link
Collaborator

Now we add from_config to data and models. (but without examples for now). What do you mean then u say "more generalized model".

What about vocab - :( we release saving labels vocabs in next month. Text vocab is the BERT vocab_file.

@EricAugust
Copy link
Author

You know, in some situation, maybe a word predict as location, which actually is person name, or vice versa. So maybe I can keep a vocabulary to save location or person name or organization name.
Also I can use other feature, ex: pos tagging, or other manual defined feature.
After I do that, in the test process, perhaps there are low chances to predict wrong.

@EricAugust
Copy link
Author

I have another question, In predict, I need create data, model, learner, then load pre-trained model.
But, after I have trained the model, I don't need to create data, model etc. I only want to loads model, process data, then predict the sequence label.

@king-menin
Copy link
Collaborator

U are right. Before now we use this code only for experiments. We will add this functions in next month. That about meta (additional) information of words or sentences. U can add your own vector with such info:
data = NerData.create(train_path, valid_path, vocab_file, is_cls=False, is_meta=True)
model = BertBiLSTMAttnCRF.create(len(data.label2idx), bert_config_file, init_checkpoint_pt, meta_dim=30)
meta_dim - is the dimension of your additional information. U can encode POS tags with OneHot (but we know that this is bad). We will add embedder for meta soon.

We do release in next month with new features (meta info, different schemas (BIO, IOX - as in BERT)), easy predict and so on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants