Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can I extract word embeddings using BanglaBERT ? #2

Closed
MusfiqDehan opened this issue Feb 24, 2022 · 1 comment
Closed

Can I extract word embeddings using BanglaBERT ? #2

MusfiqDehan opened this issue Feb 24, 2022 · 1 comment

Comments

@MusfiqDehan
Copy link

Hi,
Is it possible to extract/generate word embeddings using BanglaBERT?
I have tokenized my Bangla sentence using BanglaBERT. Now I want to generate Word Embeddings from my tokenized sentence.

!pip install transformers
!pip install git+https://github.com/csebuetnlp/normalizer

from transformers import AutoModelForPreTraining, AutoTokenizer
from normalizer import normalize
import torch

model = AutoModelForPreTraining.from_pretrained("csebuetnlp/banglabert")
tokenizer_bbert = AutoTokenizer.from_pretrained("csebuetnlp/banglabert")


text = 'দেশদ্রোহিতার মামলা স্বর্ণ মন্দিরের ভিতর ও বৈশাখী উৎসবের মিছিলে খলিস্তানপন্থী স্লোগান দেওয়ার জন্য কয়েকজন বিশ্ব যুবকের বিরুদ্ধে দেশদ্রোহিতার মামলা দায়ের করা হয়েছে ।'

text = normalize(text)

text = tokenizer_bbert.tokenize(text)

print(text)

# >>  ['দেশদ্রোহ', '##িতার', 'মামলা', 'স্বর্ণ', 'মন্দিরের', 'ভিতর', 'ও', 'বৈশাখী', 'উৎসবের', 'মিছিলে', 'খলি', '##স্তান', '##পন্থী', 'স্লোগান', 'দেওয়ার','জন্য', 'কয়েকজন', 'বিশ্ব', 'যুবকের', 'বিরুদ্ধে', 'দেশদ্রোহ', '##িতার', 'মামলা', 'দায়ের', 'করা', 'হয়েছে', '।']

I have find out how to generate Word Embeddings using BERT. Here is the link (https://discuss.huggingface.co/t/generate-raw-word-embeddings-using-transformer-models-like-bert-for-downstream-process/2958).
Will it be same for BanglaBERT or Bangla Language or it will be better to use a different Bangla Language specific approach?

Any kind of suggestion or advice will be helpful for me. Thanks in advance.

@MusfiqDehan MusfiqDehan changed the title Extracting word embeddings using BanglaBERT Can I extract word embeddings using BanglaBERT ? Feb 24, 2022
@Tahmid04
Copy link
Collaborator

Hi, the method you showed should work just fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants