## HuggingFace Transformers


#### Import Libraries

In [4]:
import os
os.environ["TRANSFORMERS_NO_TF"] = "1"
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

In [5]:
#!pip install transformers
#!pip install -U "torch>=2.2" "transformers>=4.42" "huggingface_hub>=0.23"

#### Creating a sentiment classifier pipleine

In [6]:
model_id = "distilbert-base-uncased-finetuned-sst-2-english"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

sentiment_classifier = pipeline(
    task="text-classification",   # same as sentiment-analysis
    model=model,
    tokenizer=tok,
    framework="pt",
)

Device set to use cpu


In [17]:
print(sentiment_classifier("bro forgot violence can be the big funny"))

[{'label': 'NEGATIVE', 'score': 0.8438990712165833}]


#### Name Entity recognition

In [8]:
ner = pipeline("ner", model = "dslim/bert-base-NER")

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
Some weights of the model checkpoint at dslim/bert-base-NER were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the chec

In [9]:
ner("Her name is Anna and she works in New York City for Morgan Stanley")

[{'entity': 'B-PER',
  'score': np.float32(0.9954881),
  'index': 4,
  'word': 'Anna',
  'start': 12,
  'end': 16},
 {'entity': 'B-LOC',
  'score': np.float32(0.99960667),
  'index': 9,
  'word': 'New',
  'start': 34,
  'end': 37},
 {'entity': 'I-LOC',
  'score': np.float32(0.9993955),
  'index': 10,
  'word': 'York',
  'start': 38,
  'end': 42},
 {'entity': 'I-LOC',
  'score': np.float32(0.9995803),
  'index': 11,
  'word': 'City',
  'start': 43,
  'end': 47},
 {'entity': 'B-ORG',
  'score': np.float32(0.9957462),
  'index': 13,
  'word': 'Morgan',
  'start': 52,
  'end': 58},
 {'entity': 'I-ORG',
  'score': np.float32(0.9979346),
  'index': 14,
  'word': 'Stanley',
  'start': 59,
  'end': 66}]

#### Text classification

In [10]:
zeroshot_classifier = pipeline("zero-shot-classification", model = "facebook/bart-large-mnli")

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
Device set to use cpu


In [11]:
sequence_to_classify = "one day I will see the world"
candidate_labels = ['travel', 'cooking', 'dancing']

In [12]:
zeroshot_classifier(sequence_to_classify, candidate_labels)

{'sequence': 'one day I will see the world',
 'labels': ['travel', 'dancing', 'cooking'],
 'scores': [0.9938650727272034, 0.0032737981528043747, 0.0028610387817025185]}

In [18]:
sequence_to_classify = "I love this new smartphone, it's amazing!"
candidate_labels = ['technology', 'cooking', 'dancing']

In [19]:
zeroshot_classifier(sequence_to_classify, candidate_labels)

{'sequence': "I love this new smartphone, it's amazing!",
 'labels': ['technology', 'dancing', 'cooking'],
 'scores': [0.9925756454467773, 0.0044857775792479515, 0.0029386423993855715]}

In [20]:
print(sentiment_classifier(sequence_to_classify))

[{'label': 'POSITIVE', 'score': 0.9998770952224731}]
