# Transformers
 -**What they can do?**

https://huggingface.co/course/chapter1/3?fw=pt

Transformer models are used to solve all kinds of NLP tasks. The following is a list of common NLP tasks, with some examples of each:

- **Classifying whole sentences**: Getting the sentiment of a review, detecting if an email is spam, determining if a sentence is grammatically correct or whether two sentences are logically related or not
- **Classifying each word in a sentence**: Identifying the grammatical components of a sentence (noun, verb, adjective), or the named entities (person, location, organization)
- **Generating text content**: Completing a prompt with auto-generated text, filling in the blanks in a text with masked words
- **Extracting an answer from a text**: Given a question and a context, extracting the answer to the question based on the information provided in the context
- **Generating a new sentence from an input text**: Translating a text into another language, summarizing a text

#### Install the Transformers, Datasets, and Evaluate libraries to run this notebook.

In [1]:
!pip install datasets evaluate transformers[sentencepiece]

Collecting datasets
  Downloading datasets-2.5.1-py3-none-any.whl (431 kB)
Collecting evaluate
  Downloading evaluate-0.2.2-py3-none-any.whl (69 kB)
Collecting transformers[sentencepiece]
  Downloading transformers-4.22.2-py3-none-any.whl (4.9 MB)
Collecting pyarrow>=6.0.0
  Downloading pyarrow-9.0.0-cp39-cp39-win_amd64.whl (19.6 MB)
Collecting dill<0.3.6
  Downloading dill-0.3.5.1-py2.py3-none-any.whl (95 kB)
Collecting xxhash
  Downloading xxhash-3.0.0-cp39-cp39-win_amd64.whl (29 kB)
Collecting huggingface-hub<1.0.0,>=0.1.0
  Downloading huggingface_hub-0.10.0-py3-none-any.whl (163 kB)
Collecting multiprocess
  Downloading multiprocess-0.70.13-py39-none-any.whl (132 kB)
Collecting responses<0.19
  Downloading responses-0.18.0-py3-none-any.whl (38 kB)
Collecting tokenizers!=0.11.3,<0.13,>=0.11.1
  Downloading tokenizers-0.12.1-cp39-cp39-win_amd64.whl (3.3 MB)
Collecting sentencepiece!=0.1.92,>=0.1.91
  Downloading sentencepiece-0.1.97-cp39-cp39-win_amd64.whl (1.1 MB)
Installing collec

### Working with Pipelines 
    The Most basic Ojects in the Transformers library is the pipline() function. It connects a model with its neccesary preprocessing and post processing steps, allowing us to directly input any text and get an intelligible answer
    
   **pipeline is a short cut instead of doing preprocessing steps, directly we are using pipeline models**

## Classification 

In [1]:
from transformers import pipeline

classifier = pipeline("sentiment-analysis")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
All model checkpoint layers were used when initializing TFDistilBertForSequenceClassification.

All the layers of TFDistilBertForSequenceClassification were initialized from the model checkpoint at distilbert-base-uncased-finetuned-sst-2-english.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertForSequenceClassification for predictions without further training.


In [3]:
classifier("I have been waiting for a HuggingFace course my whole life")

[{'label': 'POSITIVE', 'score': 0.9450720548629761}]

In [4]:
classifier("Edureka instructors was very good explaining in a understandable way")

[{'label': 'POSITIVE', 'score': 0.9990474581718445}]

In [9]:
classifier("The movie was excellent but it is too lengthy")

[{'label': 'NEGATIVE', 'score': 0.9977620840072632}]

In [6]:
classifier("Modi is developing Gujarat state comparing to other states")

[{'label': 'POSITIVE', 'score': 0.9944183826446533}]

In [7]:
classifier("Modi is developing only Gujarat state comparing to other states")  # above statement observe 

[{'label': 'NEGATIVE', 'score': 0.615805983543396}]

In [10]:
classifier(["Modi developing only one state in all of 28 states","Gujarat state is developing"])

[{'label': 'NEGATIVE', 'score': 0.9317912459373474},
 {'label': 'POSITIVE', 'score': 0.9975851774215698}]

### What is a Pipeline?

By default, this pipeline selects a particular pretrained model that has been fine-tuned for sentiment analysis in English. The model is downloaded and cached when you create the classifier object. If you rerun the command, the cached model will be used instead and there is no need to download the model again.

There are three main steps involved when you pass some text to a pipeline:

- The text is preprocessed into a format the model can understand.
- The preprocessed inputs are passed to the model.
- The predictions of the model are post-processed, so you can make sense of them.

- Some of the Currently available pipelines are :
    - **feature-extraction(get the vector representation of a text)**
    - **fill mask**
    - **NER(Name Entity Recognition)**
    - **Question-Anwersing**
    - **Sentiment- Anaysis**
    - **Summarization**
    - **Text - Generation**
    - **Translation**
    - **Zero-shot-classification**
         -  Let's have a look few of these

### Zero Shot Classification
    - We’ll start by tackling a more challenging task where we need to classify texts that haven’t been labelled. This is a common scenario in real-world projects because annotating text is usually time-consuming and requires domain expertise. 

      - For this use case, the zero-shot-classification pipeline is very powerful: it allows you to specify which labels to use for the classification, so you don’t have to rely on the labels of the pretrained model. 

     - You’ve already seen how the model can classify a sentence as positive or negative using those two labels — but it can also classify the text using any other set of labels you like.


In [13]:
from transformers import pipeline

classifier = pipeline("zero-shot-classification")

No model was supplied, defaulted to roberta-large-mnli and revision 130fb28 (https://huggingface.co/roberta-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading:   0%|          | 0.00/1.43G [00:00<?, ?B/s]

ValueError: Could not load model roberta-large-mnli with any of the following classes: (<class 'transformers.models.auto.modeling_tf_auto.TFAutoModelForSequenceClassification'>, <class 'transformers.models.roberta.modeling_tf_roberta.TFRobertaForSequenceClassification'>).

In [5]:
classifier("This is course about the Transoformer library ",
          candidate_labels = ['education','politics','business'])

NameError: name 'classifier' is not defined

In [None]:
classifier("Stephn Hawking said we can do time travel through black hole",
          )

In [None]:
classifier(" God Father movie is all about politics and Ghost movie is a suspense thriller movie both have good ratings,
           ")

In [None]:
classifier(" This Diwali I should buy crackers in bulk but the where should I find whole sale shops?,
           ")

- ** **

### Text Generation
       - Now let’s see how to use a pipeline to generate some text. The main idea here is that you provide a prompt and the model will auto-complete it by generating the remaining text. 
        - This is similar to the predictive text feature that is found on many phones. Text generation involves randomness, so it’s normal if you don’t get the same results as shown below.

In [None]:
from transformers import pipeline

generator = pipeline("text-generation")