In [3]:
!pip install "transformers[sentencepiece]"



### Zero Shot Classification with Hugging Face Transformers

In [14]:
from transformers import pipeline

classifier = pipeline("zero-shot-classification")
classifier(
    "I am suffering from anxiety and depression.",
    candidate_labels=["mental health", "sports", "business"],
)

No model was supplied, defaulted to facebook/bart-large-mnli and revision d7645e1 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cuda:0


{'sequence': 'I am suffering from anxiety and depression.',
 'labels': ['mental health', 'sports', 'business'],
 'scores': [0.9962947368621826, 0.0018988957162946463, 0.0018063858151435852]}

### Text generation example


In [25]:
from transformers import pipeline

generator = pipeline("text-generation", model="shahidul034/text_generation_bangla_model")
generator(
    "আমার মন আজকে খুব ভাল এর কারণ",
    max_length=100,
    # num_return_sequences=2,
    truncation=True
)

Device set to use cuda:0
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'আমার মন আজকে খুব ভাল এর কারণ হতো বাংলাদেশেই। সব বন্ধ করে দিয়ে গেছে। আজকে আবার ভারতের সঙ্গে নয় বলে মন্তব্য করেছেন শেখ হাসিনা। আজ বৃহস্পতিবার সকালে আশিকুলকের'}]

Mask filling

In [29]:
from transformers import pipeline

unmasker = pipeline("fill-mask")
unmasker("I always feel very when <mask> comes to my house.", top_k=2)

No model was supplied, defaulted to distilbert/distilroberta-base and revision fb53ab8 (https://huggingface.co/distilbert/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at distilbert/distilroberta-base were not used when initializing RobertaForMaskedLM: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cuda:0


[{'score': 0.057944636791944504,
  'token': 24,
  'token_str': ' it',
  'sequence': 'I always feel very when it comes to my house.'},
 {'score': 0.04559348523616791,
  'token': 951,
  'token_str': ' someone',
  'sequence': 'I always feel very when someone comes to my house.'}]

In [35]:
from transformers import pipeline
unmasker = pipeline('fill-mask', model='bert-base-cased')
unmasker("I want to make [MASK] with.")

Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cuda:0


[{'score': 0.44350960850715637,
  'token': 1567,
  'token_str': 'love',
  'sequence': 'I want to make love with.'},
 {'score': 0.2273048311471939,
  'token': 1149,
  'token_str': 'out',
  'sequence': 'I want to make out with.'},
 {'score': 0.10833299905061722,
  'token': 1146,
  'token_str': 'up',
  'sequence': 'I want to make up with.'},
 {'score': 0.03333759680390358,
  'token': 2053,
  'token_str': 'friends',
  'sequence': 'I want to make friends with.'},
 {'score': 0.02144251950085163,
  'token': 12237,
  'token_str': 'babies',
  'sequence': 'I want to make babies with.'}]

In [None]:
from transformers import BertTokenizer, BertModel
tokenizer = BertTokenizer.from_pretrained('bert-base-cased')
model = BertModel.from_pretrained("bert-base-cased")
text = "Replace me by any text you'd like."
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)


"[CLS] Replace me by any text you'd like. [SEP]"

### Named entity recognition

In [40]:
from transformers import pipeline

ner = pipeline("ner", grouped_entities=True)
ner("My name is Sylvain and I work at Hugging Face in Brooklyn.")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision 4c53496 (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/998 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/1.33G [00:00<?, ?B/s]

Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


tokenizer_config.json:   0%|          | 0.00/60.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

Device set to use cuda:0


[{'entity_group': 'PER',
  'score': np.float32(0.9981694),
  'word': 'Sylvain',
  'start': 11,
  'end': 18},
 {'entity_group': 'ORG',
  'score': np.float32(0.9796019),
  'word': 'Hugging Face',
  'start': 33,
  'end': 45},
 {'entity_group': 'LOC',
  'score': np.float32(0.9932106),
  'word': 'Brooklyn',
  'start': 49,
  'end': 57}]

In [1]:
from transformers import pipeline

transcriber = pipeline(
    task="automatic-speech-recognition", model="openai/whisper-base.en"
)
transcriber("https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac")
# Output: {'text': ' I have a dream that one day this nation will rise up and live out the true meaning of its creed.'}

config.json:   0%|          | 0.00/1.94k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/290M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/1.53k [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/805 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.41M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

normalizer.json:   0%|          | 0.00/52.7k [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/34.6k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/1.83k [00:00<?, ?B/s]

preprocessor_config.json:   0%|          | 0.00/185k [00:00<?, ?B/s]

Device set to use cuda:0


{'text': ' I have a dream that one day this nation will rise up and live out the true meaning of its creed.'}