Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BERT / ELMO embeddings for NER #22

Closed
RyanDsilva opened this issue Oct 20, 2019 · 7 comments
Closed

BERT / ELMO embeddings for NER #22

RyanDsilva opened this issue Oct 20, 2019 · 7 comments
Labels
enhancement New feature or request

Comments

@RyanDsilva
Copy link

When are the pretrained embeddings for BERT and ELMO with regards to NER planned for ?
Could you help me with the development process, I could try to contribute these features.

@amaiya amaiya added the enhancement New feature or request label Oct 20, 2019
@amaiya
Copy link
Owner

amaiya commented Oct 20, 2019

This sounds like a good first issue if you want to take a crack at it. I'm happy to answer any questions regarding development process.

One way to implement this is to maybe leverage TensorFlow Hub to generate the embeddings dynamically as sentences are transformed for input into the NN.

@RyanDsilva
Copy link
Author

Great! I'd definately want to try this! Taking this up! Thanks @amaiya

@code4kunal
Copy link

code4kunal commented Feb 7, 2020

Hey @amaiya, Thanks a lot for this great work till date. This package has been so intuitive and helpful for most of the tasks. I have tested BERT with text classification, worked like a charm.

Wanted to know when we can have NER tasks supported with bert, distilled-bert embeddings?

Best Regards
Kunal

@amaiya
Copy link
Owner

amaiya commented Feb 11, 2020

@code4kunal Thanks for your comments.

The user above volunteered to look into this a couple of months ago, but I don't know where it stands., The original idea was to maybe use TensorFlow Hub for this. However, now that the Hugging Face transormers library supports TensorFlow 2, it would probably make more sense to generate the embeddings using the transformers library in ktrain. This is still on the TODO list, which is why this issue is open, but I don't have an exact timeframe for this, unfortunately. Thanks again.

@mdavis95
Copy link

This might help someone:
huggingface/transformers#1950 (comment)

@amaiya
Copy link
Owner

amaiya commented Mar 3, 2020

@mdavis95 Thanks - it shouldn't be too difficult to incorporate this for sequence-labeling.

@amaiya
Copy link
Owner

amaiya commented Mar 30, 2020

As v0.12.x of ktrain, BERT and Elmo embeddings for downstream tasks like NER are now supported:

# English NER with BERT embeddings
import ktrain
from ktrain import text
(trn, val, preproc) = text.entities_from_conll2003(train.txt, val_filepath=valid.txt)
model = text.sequence_tagger('bilstm-bert', preproc, bert_model='bert-base-cased')
learner = ktrain.get_learner(model, train_data=trn, val_data=val, batch_size=128)
learner.fit(0.01, 2, cycle_len=5)

@amaiya amaiya closed this as completed Mar 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants