<a href="https://colab.research.google.com/github/Benjamin-chidera/huggingFace-Transformers/blob/main/HuggingFace.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### **Pipeline Function**

In [7]:
from transformers import pipeline

classifier = pipeline("sentiment-analysis", model="distilbert/distilbert-base-uncased-finetuned-sst-2-english")

check_text = classifier([
    "the movie was good",
    "they tried in the movie but the actors where really bad"
])

print(check_text)

Device set to use cpu


[{'label': 'POSITIVE', 'score': 0.9998517036437988}, {'label': 'NEGATIVE', 'score': 0.9997195601463318}]


### **Zero shot classification**

In [11]:
from transformers import pipeline

classifier = pipeline("zero-shot-classification",model="sileod/deberta-v3-base-tasksource-nli")

text_1 = "Real madrid team will be playing against balcelona tomorrow"
text_2 = "The election is taking place tomorrow afternoon"
text_3 = "A new programming language will be lunched tomorrow"
text_4 = "I have an exam to take in school tomorrow"

label_data = ["sport", "politics", "education", "technology"]
check_classifier = classifier(text_4, label_data)

print(check_classifier)

Device set to use cpu
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


{'sequence': 'I have an exam to take in school tomorrow', 'labels': ['education', 'technology', 'sport', 'politics'], 'scores': [0.9379044771194458, 0.03358963504433632, 0.015404932200908661, 0.013100950047373772]}


### **The Text Generation Pipeline**

In [14]:
from transformers import pipeline

generator = pipeline("text-generation", model="gpt2")

text = "I will learn how to"

check_generator = generator(text, max_length = 30, num_return_sequences=5)

print(check_generator)

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=30) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


[{'generated_text': "I will learn how to fix this issue. It's a huge step. I'm sure you'll find out why, but keep in mind that to do this, you will have to make a lot of changes and I'm sure it may take a while.\n\nI'm sure you won't agree with me on this, but this is what I've been doing for a while now – I've started trying to fix this issue, but now that I've taken over the development and testing, I'm really enjoying the process. I'm very happy with the result.\n\nSo here's the thing. I'm not going to lie – I know that I've been looking forward to this for a while, and I've done a lot of work to get this fix out there. I'm very happy with it, and I'm really happy with it's release.\n\nI know that, in the first place, I've had a lot of great people, and I wish them the best in their job and in the future, and I have a lot of friends that I've had a lot of great conversations with.\n\nI also know that, in the beginning of this process, I was going to do a lot of personal things on my

### **The fill mask pipeline**

In [25]:
from transformers import pipeline

unmasker = pipeline("fill-mask")
text = "This course will teach you all about <mask> programming."
check_unmasker = unmasker(text, top_k=2)

print(check_unmasker)

No model was supplied, defaulted to distilbert/distilroberta-base and revision fb53ab8 (https://huggingface.co/distilbert/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at distilbert/distilroberta-base were not used when initializing RobertaForMaskedLM: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cpu


[{'score': 0.28929978609085083, 'token': 12628, 'token_str': ' functional', 'sequence': 'This course will teach you all about functional programming.'}, {'score': 0.05025694519281387, 'token': 31886, 'token_str': ' Python', 'sequence': 'This course will teach you all about Python programming.'}]


### **NER (Named Entity Recognition)**

In [27]:
from transformers import pipeline

ner = pipeline("ner", grouped_entities=True)

text = "My name is Benjamin, i work at Curaflux and Google, i live in Canada and i am from Nigeria"

check_ner = ner(text)

print(check_ner)

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision 4c53496 (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cpu
  return f

[{'entity_group': 'PER', 'score': np.float32(0.9994967), 'word': 'Benjamin', 'start': 11, 'end': 19}, {'entity_group': 'ORG', 'score': np.float32(0.98472315), 'word': 'Curaflux', 'start': 31, 'end': 39}, {'entity_group': 'ORG', 'score': np.float32(0.9993425), 'word': 'Google', 'start': 44, 'end': 50}, {'entity_group': 'LOC', 'score': np.float32(0.9997743), 'word': 'Canada', 'start': 62, 'end': 68}, {'entity_group': 'LOC', 'score': np.float32(0.99979967), 'word': 'Nigeria', 'start': 83, 'end': 90}]


### **Summarization Pipeline**

In [3]:
from transformers import pipeline

summarizer = pipeline("summarization")
text = """
The Impact of Technology on Society

Technology has dramatically transformed the way humans live, work, and interact over the past century. From the invention of the wheel to the rise of the internet and artificial intelligence, technological advancements have continuously reshaped societies and economies around the world.

One of the most significant changes technology has brought is in communication. The development of the telephone, followed by mobile phones and the internet, has made it possible for people to connect instantly regardless of geographical distance. Social media platforms have further revolutionized communication by enabling real-time sharing of information, ideas, and opinions on a global scale. This connectivity has facilitated globalization, bringing cultures closer together and creating new opportunities for collaboration and innovation.

In the field of healthcare, technology has led to remarkable improvements in diagnosis, treatment, and patient care. Medical imaging techniques, such as MRI and CT scans, allow doctors to see inside the human body without invasive procedures. Advances in pharmaceuticals and biotechnology have resulted in new treatments for previously incurable diseases. Telemedicine has also emerged, allowing patients to consult healthcare professionals remotely, improving access for those in remote or underserved areas.

The workplace has undergone substantial transformation due to automation, robotics, and artificial intelligence. Routine and repetitive tasks can now be performed by machines, increasing efficiency and productivity. This shift has created new job opportunities in tech-related fields but has also raised concerns about job displacement and the need for workforce retraining. Remote work, enabled by digital tools and platforms, has become more prevalent, changing traditional office dynamics and work-life balance.

Education has been revolutionized by technology as well. Online learning platforms, digital textbooks, and interactive educational tools have made knowledge more accessible to people around the world. Students can learn at their own pace, access a vast array of resources, and connect with instructors and peers virtually. However, the digital divide remains a challenge, with disparities in access to technology affecting educational outcomes.

While technology has brought numerous benefits, it also poses challenges and ethical considerations. Privacy concerns have grown with the rise of data collection and surveillance technologies. The spread of misinformation and digital addiction are social issues that require careful management. Moreover, the environmental impact of producing and disposing of electronic devices is increasingly recognized as a critical problem.

In conclusion, technology continues to be a powerful driver of change in society, offering immense potential to improve quality of life. Balancing innovation with ethical responsibility and inclusivity will be essential to harnessing its benefits for all.
"""

check_summarizer = summarizer(text)

print(check_summarizer)

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json: 0.00B [00:00, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

Device set to use cpu


[{'summary_text': ' Technology has dramatically transformed the way humans live, work, and interact over the past century . Technology has led to remarkable improvements in diagnosis, treatment, and patient care . Remote work, enabled by digital tools and platforms, has become more prevalent, changing traditional office dynamics and work-life balance . The workplace has undergone substantial transformation due to automation, robotics, and artificial intelligence .'}]


### **The Translation Pipeline**

In [13]:
from transformers import pipeline

translate = pipeline("translation", model="Helsinki-NLP/opus-mt-fr-en")

text = "Pas trop bien."

check_translations = translate(text)

print(check_translations)

Device set to use cpu


[{'translation_text': 'Ho vil ikkje gi bort sine personlege data.'}]


### **Tokenization**

In [1]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

check_token = tokenizer.tokenize("Let's try to tokenize!")

print(check_token)

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

['let', "'", 's', 'try', 'to', 'token', '##ize', '!']
