<a href="https://colab.research.google.com/github/Behnaz81/MachineLearningDaily/blob/main/day10_transformer_models/MachineLearning10.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Transformer Models

## What is NLP?

NLP is a field of linguistics and machine learning focused on understanding everything related to human language. The aim of NLP tasks is not only to understand single words individually, but to be able to understand the context of those words.

## What is LLM?

A large language model (LLM) is an AI model trained on massive amounts of text data that can understand and generate human-like text, recognize patterns in language, and perform a wide variety of language tasks without task-specific training. They represent a significant advancement in the field of natural language processing (NLP).

## What can transformers do?

The most basic object in the Transformers library is the `pipeline()` function. It connects a model with its necessary preprocessing and postprocessing steps, allowing us to directly input any text and get an intelligible answer:

In [None]:
from transformers import pipeline

In [None]:
classifier = pipeline("sentiment-analysis")
classifier("I've been waiting for a HuggingFace course my whole life.")

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Device set to use cpu


[{'label': 'POSITIVE', 'score': 0.9598049521446228}]

### Zero-shot classification

It allows you to specify which labels to use for the classification.

In [None]:
classifier = pipeline("zero-shot-classification")
classifier(
    ["This is a course about the Transformer library", "I love money!"],
    candidate_labels=["education", "politics", "business"],
)

No model was supplied, defaulted to facebook/bart-large-mnli and revision d7645e1 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


[{'sequence': 'This is a course about the Transformer library',
  'labels': ['education', 'business', 'politics'],
  'scores': [0.9567621946334839, 0.031608760356903076, 0.011629056185483932]},
 {'sequence': 'I love money!',
  'labels': ['business', 'politics', 'education'],
  'scores': [0.7989078164100647, 0.1051742434501648, 0.09591798484325409]}]

### Text generation

You provide a prompt and the model will auto-complete it by generating the remaining text.

In [None]:
generator = pipeline("text-generation")
generator("In this course, we will teach you how to", num_return_sequences=2, max_length=15)

No model was supplied, defaulted to openai-community/gpt2 and revision 607a30d (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=15) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


[{'generated_text': 'In this course, we will teach you how to create an effective, effective, fast and efficient system of monitoring and evaluating the Internet. It will also use those systems to inform and plan for new and unforeseen problems.\n\nWe will create a system based on the concepts of "monitoring and evaluating the Internet", that is, a system where the Internet is monitored, monitored, monitored, monitored.\n\nOur system will be based on the principles of "monitoring and evaluating the Internet".\n\n"Monitoring and evaluating the Internet" is a term commonly used to describe a system that is designed to measure the amount of traffic flowing through the Internet.\n\n"Monitoring" is a term used to describe a system that is designed to measure the amount of traffic flowing through the Internet.\n\n"Monitoring" is a term used to describe a system that is designed to measure the amount of traffic flowing through the Internet.\n\nThe "Internet" is a physical and digital environm

### Using any model from the Hub in a pipeline

We used the default models earlier but we can always use other models provided too.

In [None]:
generator = pipeline("text-generation", model="HuggingFaceTB/SmolLM2-360M")
generator(
    "In this course, we will teach you how to",
    max_length=30,
    num_return_sequences=2,
)

config.json:   0%|          | 0.00/689 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/724M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/831 [00:00<?, ?B/s]

Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=30) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


[{'generated_text': 'In this course, we will teach you how to do it. We will teach you what you need to do and what the tools that we will use are.\n\nWhat is a database?\n\nA database is an organized collection of information. It is a place where you can store information about things.\n\nA database can be a website, a program on your computer, or a file on your hard drive.\n\nWhy do we need databases?\n\nWe use databases to store information. We use them for a lot of different things.\n\nOne of the things that we use them for is to store information about things that we do. For example, if we want to store information about our pets, we would use a database to store information about their names, their ages, and their breeds.\n\nAnother thing that we use them for is to store information about things that we want to buy. For example, if we want to buy a new pair of shoes, we would use a database to store information about the prices of those shoes.\n\nAnd one of the most important thi

In [None]:
from transformers import pipeline, AutoTokenizer, GPT2LMHeadModel
tokenizer = AutoTokenizer.from_pretrained('flax-community/gpt2-medium-persian')
model = GPT2LMHeadModel.from_pretrained('flax-community/gpt2-medium-persian')
generator = pipeline('text-generation', model, tokenizer=tokenizer, config={'max_length':100})
generated_text = generator('در یک اتفاق شگفت انگیز، پژوهشگران')

config.json:   0%|          | 0.00/921 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.44G [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.44G [00:00<?, ?B/s]

Device set to use cpu
Setting `pad_token_id` to `eos_token_id`:5 for open-end generation.


In [None]:
generated_text

[{'generated_text': 'در یک اتفاق شگفت انگیز، پژوهشگران دانشگاه\u200cام آی تی نوعی پوست الکترونیکی ابداع کرده\u200cاند که می\u200cتواند به طور بالقوه برای درمان زخم\u200cهای پوستی به کار گرفته شود. به گزارش ایسنا و به نقل از گیزمگ، زخم\u200cهای پوستی، رایج\u200cترین نوع از آسیب\u200cهای پوستی هستند که به دلیل جراحت یا آسیب به بافت\u200cهای بدن ایجاد می\u200cشوند. این زخم\u200cها می\u200cتوانند در طول زمان، وخیم\u200cتر شوند و به بافت\u200cهای زیرین\u200cتر پوست آسیب برسانند. به همین دلیل، پژوهشگران دانشگاه\u200cام آی تی یک پوست الکترونیکی ابداع کرده\u200cاند که قادر است از زخم\u200cهای پوستی به عنوان یک ابزار تشخیصی استفاده کند. این پوست الکترونیکی، می\u200cتواند به طور بالقوه برای درمان زخم\u200cهای پوستی به کار گرفته شود. این پوست الکترونیکی که "پوست الکترونیکی"( e - wear ) نامیده می\u200cشود، می\u200cتواند به طور بالقوه برای درمان زخم\u200cهای پوستی به کار گرفته شود. این پوست الکترونیکی از پوست انسان گرفته شده و می\u200cتواند برای تشخیص زخم\u200cهای پوستی مورد استفاده قرار گیرد. به گ

In [None]:
from transformers import pipeline, AutoTokenizer, GPT2LMHeadModel
tokenizer = AutoTokenizer.from_pretrained('bolbolzaban/gpt2-persian')
model = GPT2LMHeadModel.from_pretrained('bolbolzaban/gpt2-persian')
generator = pipeline('text-generation', model, tokenizer=tokenizer, config={'max_length':256})
sample = generator('در یک اتفاق شگفت انگیز، پژوهشگران')

Device set to use cpu


In [None]:
sample

[{'generated_text': 'در یک اتفاق شگفت انگیز، پژوهشگران به گزارش خبرنگار گروه استان\u200cهای باشگاه خبرنگاران جوان از قزوین ؛ مخاطبان صداوسیمای مرکز قزوین می\u200cتوانند برنامه\u200cهای مورد نظر خود را با مراجعه به این جدول پیگیری کنند.'}]

### Mask filling

The idea of this task is to fill in the blanks in a given text. The `top_k` argument controls how many possibilities you want to be displayed.

In [None]:
unmasker = pipeline("fill-mask")
unmasker("This course will teach you all about <mask> models.", top_k=2)

No model was supplied, defaulted to distilbert/distilroberta-base and revision fb53ab8 (https://huggingface.co/distilbert/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/480 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/331M [00:00<?, ?B/s]

Some weights of the model checkpoint at distilbert/distilroberta-base were not used when initializing RobertaForMaskedLM: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

Device set to use cpu


[{'score': 0.19619767367839813,
  'token': 30412,
  'token_str': ' mathematical',
  'sequence': 'This course will teach you all about mathematical models.'},
 {'score': 0.04052715748548508,
  'token': 38163,
  'token_str': ' computational',
  'sequence': 'This course will teach you all about computational models.'}]

In [None]:
unmasker = pipeline('fill-mask', model='bert-base-cased')
unmasker("This course will teach you all about [MASK] models.", top_k=2)

model.safetensors:   0%|          | 0.00/436M [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

Device set to use cpu


[{'score': 0.2596322000026703,
  'token': 1648,
  'token_str': 'role',
  'sequence': 'This course will teach you all about role models.'},
 {'score': 0.09427239000797272,
  'token': 1103,
  'token_str': 'the',
  'sequence': 'This course will teach you all about the models.'}]

### Named entity recognition

Named entity recognition (NER) is a task where the model has to find which parts of the input text correspond to entities such as persons, locations, or organizations.

In [None]:
ner = pipeline("ner", grouped_entities=True)
ner("My name is Sylvain and I work at Hugging Face in Brooklyn.")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision 4c53496 (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/998 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.33G [00:00<?, ?B/s]

Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


tokenizer_config.json:   0%|          | 0.00/60.0 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

Device set to use cpu


[{'entity_group': 'PER',
  'score': np.float32(0.9981694),
  'word': 'Sylvain',
  'start': 11,
  'end': 18},
 {'entity_group': 'ORG',
  'score': np.float32(0.9796019),
  'word': 'Hugging Face',
  'start': 33,
  'end': 45},
 {'entity_group': 'LOC',
  'score': np.float32(0.9932106),
  'word': 'Brooklyn',
  'start': 49,
  'end': 57}]

### Question answering

The `question-answering` pipeline answers questions using information from a given context:

In [None]:
question_answerer = pipeline("question-answering")
question_answerer(
    question="Where do I work?",
    context="My name is Sylvain and I work at Hugging Face in Brooklyn",
)

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 564e9b5 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/473 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

Device set to use cpu


{'score': 0.6949766278266907, 'start': 33, 'end': 45, 'answer': 'Hugging Face'}

### Summarization

Summarization is the task of reducing a text into a shorter text while keeping all (or most) of the important aspects referenced in the text.

In [None]:
from transformers import pipeline

summarizer = pipeline("summarization")
summarizer(
    """
    America has changed dramatically during recent years. Not only has the number of
    graduates in traditional engineering disciplines such as mechanical, civil,
    electrical, chemical, and aeronautical engineering declined, but in most of
    the premier American universities engineering curricula now concentrate on
    and encourage largely the study of engineering science. As a result, there
    are declining offerings in engineering subjects dealing with infrastructure,
    the environment, and related issues, and greater concentration on high
    technology subjects, largely supporting increasingly complex scientific
    developments. While the latter is important, it should not be at the expense
    of more traditional engineering.

    Rapidly developing economies such as China and India, as well as other
    industrial countries in Europe and Asia, continue to encourage and advance
    the teaching of engineering. Both China and India, respectively, graduate
    six and eight times as many traditional engineers as does the United States.
    Other industrial countries at minimum maintain their output, while America
    suffers an increasingly serious decline in the number of engineering graduates
    and a lack of well-educated engineers.
"""
)

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json: 0.00B [00:00, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

Device set to use cpu


[{'summary_text': ' America has changed dramatically during recent years . The number of engineering graduates in the U.S. has declined in traditional engineering disciplines such as mechanical, civil,    electrical, chemical, and aeronautical engineering . Rapidly developing economies such as China and India continue to encourage and advance the teaching of engineering .'}]

### Translation

In [None]:
translator = pipeline("translation", model="omid-ebi/mT5_base_translation_English_to_Persian-Farsi")
translator("This is a github repository")

config.json:   0%|          | 0.00/838 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.33G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/142 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/881 [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/4.31M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/416 [00:00<?, ?B/s]

You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Device set to use cpu


[{'translation_text': 'این مطلب توسط مؤسسه ی گوردون است.'}]

In [None]:
translator("In this session we will learn about NLP")

[{'translation_text': 'در این جلسه ما درباره ی NLP یاد می کنیم.'}]

## How do Transformers work?

_Pretraining_ is the act of training a model from scratch: the weights are randomly initialized, and the training starts without any prior knowledge. This pretraining is usually done on very large amounts of data. Therefore, it requires a very large corpus of data, and training can take up to several weeks.

_Fine-tuning_, on the other hand, is the training done **after** a model has been pretrained. To perform fine-tuning, you first acquire a pretrained language model, then perform additional training with a dataset specific to your task.



### General Transformer architecture

The model is primarily composed of two blocks:

- **Encoder**
- **Decoder**

Each of these parts can be used independently, depending on the task:

- **Encoder-only models**: Good for tasks that require understanding of the input, such as sentence classification.

- **Decoder-only models**: Good for generative tasks like text generation.

- **Encoder-decoder models**: Good for generative tasks that require an input like translation or summarization.

## How Transformers solve tasks

### How language models work

Two main approach for training a transformer model:

1. **Masked language modeling (MLM)**: Allows model to learn bidirectional context. Used by encoder models like BERT.

2. **Casual language modeling (CLM)**: The model can only use context from the left (previous tokens) to predict the next token. Used by decoder models like GPT.

### Types of language models

1. **Encoder-only models** (like BERT): These models use a bidirectional approach to understand context. They're suited for tasks that require deep understanding of text like classification.

2. **Decoder-only models** (like GPT): These models process text from left to right and are particularly good at text generation tasks like completing sentences.

3. **Encoder-decoder models** (like T5): These models combine two approaches, using an encoder to understand the input and a decoder to generate output. They excel at sequence-to-sequence tasks like translation.



## Inference with LLMs

Inference is the process of using a trained LLM to generate human-like text from a given input prompt.

### The Role of Attention

When predicting the next word, not every word in a sentence carries equal weight. This ability to focus on relevant information is what we call attention.

### The Two-Phase Inference Process

Let's dive into how LLMs actually generate text. The process can be broken into two main phases: prefill and decode.

#### The Prefill Phase

This phase is where all the initial ingredients are processed and made ready. This phase involves three key steps:

1. **Tokenization**: Convert the input text into tokens.

2. **Embedding Conversion**: Transforming these tokens into numerical representations that capture their meaning.

3. **Initial Processing**: Running the embeddings through the model's neural networks to create a rich understanding of the context.

This phase processes all input tokens at once.

#### The Decode Phase

The decode phase involves several key steps that happen for each new token:

1. **Attention Computation**: Looking back at all previous tokens to understand context

2. **Probability Calculation**: Determining the likelihood of each possible next toke.

3. **Token Selection**: Choosing the next token based on probablities

4. **Continuation Check**: Deciding whether to continue or stop

Thi phase is memory-intensive.





# Excercises

Ex1. An example to see pipeline working for different tasks.

In [1]:
from transformers import pipeline

# Sentiment analysis
sentiment = pipeline("sentiment-analysis")
print(sentiment("I love learning NLP with Hugging Face!"))

# Text generation
generator = pipeline("text-generation", model="gpt2")
print(generator("In the future, artificial intelligence will", max_length=30, num_return_sequences=1))

# Fill mask
fill_mask = pipeline("fill-mask")
print(fill_mask("Machine learning is <mask>."))

# Named Entity Recognition
ner = pipeline("ner", grouped_entities=True)
print(ner("Hugging Face is based in New York and works with Google."))


No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

Device set to use cpu


[{'label': 'POSITIVE', 'score': 0.9997377991676331}]


config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=30) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
No model was supplied, defaulted to distilbert/distilroberta-base and revision fb53ab8 (https://huggingface.co/distilbert/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'generated_text': 'In the future, artificial intelligence will be able to detect and exploit a range of problems and solve them by the power of its algorithms. This is the kind of breakthrough that will be the envy of computer scientists, and will be something that will be taken into the next generation of computers in the 21st century.\n\nThe technology is already being developed by companies such as Autodesk, which is building a high-performance computer. It is currently being developed by Deep Blue. They have already been working on chips and hardware for their technology.\n\nWhat is the significance of this technological advance?\n\nAs mentioned, this is the "first generation of artificial intelligence", as opposed to the current generation of artificial intelligence. In the next generation, we are looking at a "computer with a lot of power, but can do anything, and can\'t be manipulated by a human being". This means that the human brain is more capable than ever before.\n\nThis 

config.json:   0%|          | 0.00/480 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/331M [00:00<?, ?B/s]

Some weights of the model checkpoint at distilbert/distilroberta-base were not used when initializing RobertaForMaskedLM: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

Device set to use cpu
No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision 4c53496 (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'score': 0.030492667108774185, 'token': 4499, 'token_str': ' essential', 'sequence': 'Machine learning is essential.'}, {'score': 0.02547847479581833, 'token': 14007, 'token_str': ' evolving', 'sequence': 'Machine learning is evolving.'}, {'score': 0.02419901080429554, 'token': 2247, 'token_str': ' powerful', 'sequence': 'Machine learning is powerful.'}, {'score': 0.02404754050076008, 'token': 762, 'token_str': ' key', 'sequence': 'Machine learning is key.'}, {'score': 0.02292276732623577, 'token': 25107, 'token_str': ' ubiquitous', 'sequence': 'Machine learning is ubiquitous.'}]


config.json:   0%|          | 0.00/998 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.33G [00:00<?, ?B/s]

Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


tokenizer_config.json:   0%|          | 0.00/60.0 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

Device set to use cpu


[{'entity_group': 'ORG', 'score': np.float32(0.95679855), 'word': 'Hugging Face', 'start': 0, 'end': 12}, {'entity_group': 'LOC', 'score': np.float32(0.99887264), 'word': 'New York', 'start': 25, 'end': 33}, {'entity_group': 'ORG', 'score': np.float32(0.99925536), 'word': 'Google', 'start': 49, 'end': 55}]


Ex2. For each `sentiment`, `ner` and `fill-mask` model explain what they do and what they are used for?


`sentiment` models can detect if a sentence hase positive or negative meaning. It can be used to extract feelings from human talking.

`ner` models can detect entities in a sentence. To know what a sentence mean it's essential to know which words are entities.

`fill-masked` models can fill a part of an incomplete sentence. These can help programmers to code faster and more efficient by completing their codes.

Ex3. Try the code in Ex1 with a persian text and see the results.

In [4]:
sentiment = pipeline("sentiment-analysis")
print(sentiment("من عاشق برنامه نویسی هستم!"))

generator = pipeline("text-generation", model="gpt2")
print(generator("در آینده هوش مصنوعی", max_length=30, num_return_sequences=1))

fill_mask = pipeline("fill-mask")
print(fill_mask("یادگیری ماشین خیلی <mask>."))

ner = pipeline("ner", grouped_entities=True)
print(ner("شرکت ما در تهران قرار دارد و با گوگل همکاری می‌کند"))

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


[{'label': 'POSITIVE', 'score': 0.7739375829696655}]


Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=30) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
No model was supplied, defaulted to distilbert/distilroberta-base and revision fb53ab8 (https://huggingface.co/distilbert/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'generated_text': 'در آینده هوش مصنوعی انتیرش تاقریه بنصر هوش مصنوعی انتیرش تاقریه هوش مصنوعی انتیرش تاقریه هوش مصنوعی انتیرش تاقریه هوش مصنوعی انتیرش تاقریه هوش مصنوعی انتیرش تاقریه هوش مصنوعی انتیرش تاقریه هوش مصنوعی انتیرش تاقریه هوش مصنوعی ا'}]


Some weights of the model checkpoint at distilbert/distilroberta-base were not used when initializing RobertaForMaskedLM: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cpu
No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision 4c53496 (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'score': 0.22545477747917175, 'token': 30902, 'token_str': '\u200e', 'sequence': 'یادگیری ماشین خیلی\u200e.'}, {'score': 0.16830460727214813, 'token': 39004, 'token_str': 'د', 'sequence': 'یادگیری ماشین خیلید.'}, {'score': 0.1358729749917984, 'token': 29438, 'token_str': 'ا', 'sequence': 'یادگیری ماشین خیلیا.'}, {'score': 0.07894087582826614, 'token': 38605, 'token_str': 'ت', 'sequence': 'یادگیری ماشین خیلیت.'}, {'score': 0.07097788900136948, 'token': 40637, 'token_str': 'س', 'sequence': 'یادگیری ماشین خیلیس.'}]


Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cpu


[{'entity_group': 'LOC', 'score': np.float32(0.43965322), 'word': 'ش', 'start': 0, 'end': 1}, {'entity_group': 'LOC', 'score': np.float32(0.57393897), 'word': 'تهران', 'start': 11, 'end': 16}, {'entity_group': 'LOC', 'score': np.float32(0.32940742), 'word': '##ر', 'start': 20, 'end': 21}, {'entity_group': 'ORG', 'score': np.float32(0.35137004), 'word': 'د', 'start': 22, 'end': 23}, {'entity_group': 'LOC', 'score': np.float32(0.2865751), 'word': '##د', 'start': 25, 'end': 26}, {'entity_group': 'LOC', 'score': np.float32(0.5538217), 'word': 'گ', 'start': 32, 'end': 33}, {'entity_group': 'LOC', 'score': np.float32(0.43421817), 'word': '##گل همکار', 'start': 34, 'end': 42}, {'entity_group': 'MISC', 'score': np.float32(0.31301263), 'word': '##ی', 'start': 42, 'end': 43}]


As you can see it didn't work very well with persian because it's trained on English sentences. To use these models on persian text we can use pretrained persian models and fine-tune them using our own dataset which is prepaired for our special purpose.