In [1]:
!pip install transformers[sentencepiece]



In [2]:
from transformers import pipeline


**Inferencing**

In [3]:
classifier = pipeline('sentiment-analysis')

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]



In [4]:
classifier("i am very excited to learn AI and its applications")

[{'label': 'POSITIVE', 'score': 0.9997367262840271}]

**Multiple** **Sentences**

In [5]:
sentences = ["I hate to go to church","i love to play football"]
classifier(sentences)

[{'label': 'NEGATIVE', 'score': 0.9925185441970825},
 {'label': 'POSITIVE', 'score': 0.9996190071105957}]

**zero shot classification**

In [6]:
classifier = pipeline('zero-shot-classification',model='MoritzLaurer/DeBERTa-v3-base-mnli-fever-anli')

config.json:   0%|          | 0.00/1.09k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/369M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.28k [00:00<?, ?B/s]

spm.model:   0%|          | 0.00/2.46M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/8.66M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/23.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/286 [00:00<?, ?B/s]

In [7]:
classifier("this is about artificial intelligence and Machine Learning",candidate_labels =['education','sports','business'])

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


{'sequence': 'this is about artificial intelligence and Machine Learning',
 'labels': ['business', 'education', 'sports'],
 'scores': [0.43000081181526184, 0.366953045129776, 0.20304615795612335]}

**Text Generation**

In [8]:
classifier = pipeline('text-generation',model='distilgpt2')

config.json:   0%|          | 0.00/762 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/353M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]



In [10]:
classifier("in this generative AI course, we will teach you how to")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'in this generative AI course, we will teach you how to generate objects based on a finite set of possible inputs. The course will offer you the required questions, concepts, and general overviews of AI technology. Each course will bring you the basics'}]

**NER**

In [11]:
classifier = pipeline('ner')

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/998 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.33G [00:00<?, ?B/s]

Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


tokenizer_config.json:   0%|          | 0.00/60.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]



In [13]:
sentences = ["My HP office is in UK","My DOB is 4th jan 2000"]
classifier(sentences)

[[{'entity': 'I-ORG',
   'score': 0.9957131,
   'index': 2,
   'word': 'HP',
   'start': 3,
   'end': 5},
  {'entity': 'I-LOC',
   'score': 0.9997191,
   'index': 6,
   'word': 'UK',
   'start': 19,
   'end': 21}],
 []]

**Unwrap the pipeline**

**1.Preprocessing the text**

In [11]:
# Let's understand the tokenization task
from transformers import AutoTokenizer
model = "distilbert/distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(model)

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]



In [12]:
# Let's see the working of this tokenizer
raw_input = ["i am very excited for this ML lecture","the instructor is really bad"]
inputs = tokenizer(raw_input, padding=True, truncation=True, return_tensors="pt" )

print(inputs)

{'input_ids': tensor([[  101,  1045,  2572,  2200,  7568,  2005,  2023, 19875,  8835,   102],
        [  101,  1996,  9450,  2003,  2428,  2919,   102,     0,     0,     0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 0, 0, 0]])}


**Feeding the input to the model**

In [13]:
from transformers import AutoModel
model_checkpoint ="distilbert/distilbert-base-uncased-finetuned-sst-2-english"
model = AutoModel.from_pretrained(model_checkpoint)


In [14]:
outputs = model(**inputs)
print(outputs)

BaseModelOutput(last_hidden_state=tensor([[[ 3.9578e-01, -6.0497e-02,  8.1651e-01,  ...,  3.5094e-01,
           9.4447e-01, -4.7624e-01],
         [ 9.0318e-01,  2.8437e-01,  7.7795e-01,  ...,  1.8319e-01,
           9.6942e-01, -6.0611e-02],
         [ 5.9134e-01,  2.0105e-01,  8.1381e-01,  ...,  1.6735e-01,
           7.6185e-01, -2.0699e-01],
         ...,
         [ 4.0831e-01, -1.6857e-01,  9.3144e-01,  ...,  3.9092e-01,
           8.7775e-01, -1.8266e-01],
         [ 1.3429e-01, -1.7171e-01,  1.0390e+00,  ...,  5.3714e-01,
           7.6197e-01, -4.2492e-02],
         [ 1.3907e+00,  1.3152e-01,  5.3266e-01,  ...,  7.4328e-01,
           2.3623e-01, -5.0415e-01]],

        [[-8.3076e-01,  6.2542e-01,  8.8852e-02,  ..., -4.3956e-02,
          -9.4242e-01, -4.4228e-01],
         [-8.8570e-01,  6.5761e-01,  3.3794e-02,  ..., -1.5362e-01,
          -9.8196e-01, -3.7572e-01],
         [-5.6697e-01,  6.1117e-01,  6.8196e-02,  ..., -3.1362e-01,
          -9.9811e-01, -2.3049e-01],
     

In [15]:
from transformers import AutoModelForSequenceClassification
model_checkpoint = "distilbert/distilbert-base-uncased-finetuned-sst-2-english"
model = AutoModelForSequenceClassification.from_pretrained(model_checkpoint)


In [16]:
outputs = model(**inputs)
print(outputs.logits)

tensor([[-3.4720,  3.6573],
        [ 4.7657, -3.8259]], grad_fn=<AddmmBackward0>)


In [17]:
import torch
torch.nn.functional.softmax(outputs.logits)

  torch.nn.functional.softmax(outputs.logits)


tensor([[8.0066e-04, 9.9920e-01],
        [9.9981e-01, 1.8563e-04]], grad_fn=<SoftmaxBackward0>)

In [18]:
model.config.id2label

{0: 'NEGATIVE', 1: 'POSITIVE'}