## Huggigface pre-trained transformer tutorial

In [1]:
!pip install transformers -q

In [2]:
from transformers import pipeline

### Sentiment analysis

In [3]:
classif = pipeline("sentiment-analysis", device=0)

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Device set to use cuda:0


In [5]:
# run the pipeline on sample text data
classif("It was a very bad movie i watched")

[{'label': 'NEGATIVE', 'score': 0.9998043179512024}]

### Summerization

In [10]:
summerizer = pipeline("summarization",device=0)

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cuda:0


In [11]:
text = """
Crop yield prediction is an important predictive analytics technique in the agriculture industry. It is an agricultural practice that can help farmers and agricultural businesses to predict crop yield in a particular season when to plant a crop, and when to harvest for better yield of crop. predictive analytics is a powerful tool that can help to improve decision-making in the agriculture industry.
In agriculture, predictive analytics techniques can be used for crop yield prediction, risk mitigation, reducing the cost of fertilizers, etc. It is used in crop yield prediction based on weather conditions, soil quality, fruit set, fruit mass, etc. It is used to optimize the usage of fertilizers and pesticides in agricultural practice. Predictive analytics is also used in applications like harvest scheduling, and predicting the risks of pesticides during crop plantations.
"""

In [12]:
summerizer(text, max_length=50, min_length=10)

[{'summary_text': ' Predictive analytics is a powerful tool that can help to improve decision-making in the agriculture industry . In agriculture, predictive analytics techniques can be used for crop yield prediction, risk mitigation, reducing the cost of fertilizers, etc. It'}]

### Text generation

In [13]:
generator = pipeline("text-generation",device=0)

generator("In this notebook, I am using hugging face transformers to... ")

No model was supplied, defaulted to openai-community/gpt2 and revision 607a30d (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Device set to use cuda:0
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In this notebook, I am using hugging face transformers to... \xa0change my face back and forth in both the body and the soul. \xa0The hand is a different kind of transformation - one that feels really tight/small and has the'}]

## Using autotokenizer and automodel class

In [14]:
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

In [24]:
model = "distilbert/distilbert-base-uncased-finetuned-sst-2-english"

In [25]:
tokenizer = AutoTokenizer.from_pretrained(model)
model = AutoModelForSequenceClassification.from_pretrained(model)

In [18]:
print(model)

DistilBertForSequenceClassification(
  (distilbert): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0-5): 6 x TransformerBlock(
          (attention): DistilBertSdpaAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (ffn): FFN(
            (dropout): Dropout(p=0.1, inplace=False)


In [19]:
inputs = tokenizer("I watched a movie that was terrible", return_tensors="pt")
with torch.no_grad():
    logits = model(**inputs).logits

predicted_class_id = logits.argmax().item()
model.config.id2label[predicted_class_id]

'NEGATIVE'