#  🧑‍💻 The open-source NLP toolkit 🧑‍💻

📅 _Data Science Summer School 2023, 22.08.2023_

👨‍🏫 By [Moritz Laurer](https://www.linkedin.com/in/moritz-laurer/).
For questions, reach out to: m.laurer@vu.nl


<a target="_blank" href="https://colab.research.google.com/github/MoritzLaurer/summer-school-transformers-2023/blob/main/1_open_source_toolkit.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

**Section structure**
1. The open-source ecosystem: increasing accessibility to machine learning (ML) software and hardware
2. Some simple code demonstrations
3. Q&A

**Your profiles at the Hertie Summer School:**
* Overall cohort: ~36% have prior coding experience; ~39% have "a little bit" of prior coding experience; ~25% have no prior coding experience.
* Some know Python, some R, some both. A minority knows one of: C++, JavaScript, Java, Matlab, Stata, SPSS.
* This workshop tries to be useful to both beginners and more experienced participants.
    * For **beginners**: You probably will not understand every line of code in the notebooks, but that's fine. The main learning objective for you is to get an overall feeling that you can do a lot in a few lines of code and you don't need a PhD for deep learning today. Start by simply copy-pasting my code. There are also sections that do not require any coding (e.g. data-centric AI).
    * For more **experienced participants**: Keep in mind that others might not know as much as you do and everyone is here to learn. Things will get in depth later on.

=> The overall objective is for everyone to get excited about learning more on their own. The best way of learning is getting excited and intrinsically motivated to learn more.



## 1. The open-source ML ecosystem, or the democratisation of ML software & hardware

### 1.1. What does the ML open-source ecosystem look like in 2023?

A key driver for progress in AI is the open-source community. Prominent example: [Hugging Face](https://huggingface.co) is the main platform for sharing ML models.
* Their [Transformers library](https://huggingface.co/transformers/) provides easy-to-use code for using state-of-the-art (SOTA) transformer models.
* Their [model hub](https://huggingface.co/models) provides 300.000~ models trained by NLP researchers - from universities, small companies or large companies like Microsoft, Facebook, Google.
* Their [Datasets library](https://huggingface.co/datasets) provides 55.000~ datasets.

=> They are the de-facto open-source standard for sharing and using transformer language models. (Besides PyTorch and TensorFlow for more advanced users)

=> Let's look at the website (model hub, tasks & model cards): https://huggingface.co/

### 1.2. How do people access ML hardware in 2023?

Running and training Transformers requires specialised hardware, so called GPUs (Graphics Processing Units). They were originally created for expensive graphics calculations in PC gaming. Now they are also specialised in the calculations necessary for running large AI models.

Several providers for cheap access to GPUs exist. A prominent example is Google Colaboratory ("Colab"), a programming environment which enables you to run code in the browser. The main advantages are:
* No setup on your local machine is required. Everything runs in the cloud.
* Free access to GPUs
* Easy sharing of code and text. See an introduction [here](https://colab.research.google.com/notebooks/intro.ipynb?utm_source=scs-index).

Colab is based on [Jupyter Notebooks](https://jupyter.org/) and has two main types of cells:
* code cells
* text cells

In [1]:
1+1

2

## 2. Ease-of-use: Using Transformers in 3 lines of code


**Overview of different tasks that can be automated with ML**
* Key ingredients: (1) a model trained on a specific task; (2) input data (e.g. texts or images); (3) output produced by the model.
* Transformers are currently the most popular type of deep learning algorithm. Most tasks below are solved with Transformers. There might be other types of algorithms coming up in the medium term.



**Install the Transformers library & dependencies**

In [2]:
!pip install transformers~=4.31.0  # The Transformers library from Hugging Face
!pip install sentencepiece==0.1.96  # optional tokeniser, required for some models. e.g. machine translation
!pip install wikipedia==1.4.0  # to download any text from wikipedia
# running large models with accelerate https://huggingface.co/blog/accelerate-large-models
# NOTE: we need to restart the runtime after installing accelerate
!pip install accelerate~=0.21.0

Collecting transformers~=4.31.0
  Downloading transformers-4.31.0-py3-none-any.whl (7.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.4/7.4 MB[0m [31m16.9 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.14.1 (from transformers~=4.31.0)
  Downloading huggingface_hub-0.16.4-py3-none-any.whl (268 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m268.8/268.8 kB[0m [31m26.2 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from transformers~=4.31.0)
  Downloading tokenizers-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m40.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting safetensors>=0.3.1 (from transformers~=4.31.0)
  Downloading safetensors-0.3.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [

In [None]:
# automatically chose CPU or GPU for inference, depending on your hardware
#import torch
#device_id = torch.cuda.current_device() if torch.cuda.is_available() else -1
# -1 == CPU ; 0 == GPU
#print(device_id)

0


**The Hugging Face Pipeline**
* Makes automation of many NLP tasks possible in 3 lines of code
* Detailed documentation is available [here](https://huggingface.co/transformers/main_classes/pipelines.html)

In [3]:
from transformers import pipeline
import pandas as pd
import numpy as np
from pprint import pprint

### 2.1 Many models tailored to specific tasks


#### 2.1.1 Text classification

Let's search for a few popular text classification models in the [HF model hub](https://huggingface.co/models?pipeline_tag=text-classification&sort=downloads).

In [7]:
pipeline_classification = pipeline("text-classification", model="cardiffnlp/twitter-roberta-base-irony")  # cardiffnlp/twitter-roberta-base-irony, SamLowe/roberta-base-go_emotions

Downloading (…)lve/main/config.json:   0%|          | 0.00/705 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/499M [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/150 [00:00<?, ?B/s]

In [8]:
text = "Well that workshop was totally worth my time..."  # "Well that workshop was totally worth my time..."  "This smells weird, I'm not sure if I should eat this ... Yikes, it tasted like old socks!"
output = pipeline_classification(text, top_k=10)
print(output)

[{'label': 'irony', 'score': 0.9424387812614441}, {'label': 'non_irony', 'score': 0.057561274617910385}]


In [9]:
# make output a bit cleaner
df_output = pd.DataFrame(output)
print(df_output)

       label     score
0      irony  0.942439
1  non_irony  0.057561


#### 2.1.2 Machine Translation

* Open source machine translation (MT) models enable you to translate between many different languages without Google Translate.
* [University of Helsinki](https://huggingface.co/Helsinki-NLP) uploaded models for more than 1000 language pairs to the Hugging Face hub
* [Facebook AI](https://huggingface.co/models?search=facebook+m2m) open-sourced several multi-lingual models
* The [EasyNMT library](https://github.com/UKPLab/EasyNMT), provides an easy wrapper for all these models
* Most machine translation models translate between two languages in one direction (e.g. German to English, but not English to German), some can translate in multiple directions.


In [10]:
# translation pipeline docs: https://huggingface.co/transformers/main_classes/pipelines.html#transformers.TranslationPipeline
pipeline_translate = pipeline("translation", model="facebook/m2m100_418M")

Downloading (…)lve/main/config.json:   0%|          | 0.00/908 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/1.94G [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/233 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/272 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/3.71M [00:00<?, ?B/s]

Downloading (…)tencepiece.bpe.model:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/1.14k [00:00<?, ?B/s]

In [11]:
text = "Ich bin ein Fisch"
pipeline_translate(text, src_lang="de", tgt_lang="en")

[{'translation_text': 'I am a fish'}]

In [12]:
# download any text from wikipedia, via  https://pypi.org/project/wikipedia/
import wikipedia
wikipedia.set_lang("de")

text = wikipedia.summary("Donald Trump").replace('\n', ' ')[:318]
print(f"Original text:\n{text}\n")

# translate the text from wikipedia
text_translated = pipeline_translate(text, src_lang="de", tgt_lang="en")
print(f"Translated text:\n{text_translated[0]['translation_text']}")


Original text:
Donald John Trump [ˈdɑn.əld dʒɑn tɹɐmp] (* 14. Juni 1946 in Queens, New York City, New York) ist ein US-amerikanischer Unternehmer, Entertainer und Politiker der Republikanischen Partei, der von 2017 bis 2021 der 45. Präsident der Vereinigten Staaten war. Der Rechtspopulist gilt als einer der umstrittensten Politiker

Translated text:
Donald John Trump [ˈdɑn.əld dʒɑn tɔmp] (born 14 June 1946 in Queens, New York City, New York) is an American entrepreneur, entertainer and politician of the Republican Party, who from 2017 to 2021 was the 45th President of the United States.


#### 2.1.3 Text Summarization

In [13]:
# docs for summarisation pipeline: https://huggingface.co/transformers/main_classes/pipelines.html#summarizationpipeline
pipeline_summarize = pipeline("summarization", model="sshleifer/distilbart-cnn-12-6")  # sshleifer/distilbart-cnn-12-6 , google/pegasus-cnn_dailymail

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.80k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

In [14]:
# download any long text from wikipedia, via  https://pypi.org/project/wikipedia/
import wikipedia
wikipedia.set_lang("en")

text_long = wikipedia.summary("Donald Trump").replace('\n', ' ')
print(f"Original text:\n{text_long}\n")

# translate the text from wikipedia
text_summarized = pipeline_summarize(text_long, min_length=5, max_length=30)
print(f"Summarized text:\n{text_summarized[0]['summary_text']}")

Original text:
Donald John Trump (born June 14, 1946) is an American politician, media personality, and businessman who served as the 45th president of the United States from 2017 to 2021. Trump graduated from the University of Pennsylvania with a bachelor's degree in economics in 1968. He became president of his father's real-estate business in 1971 and renamed it the Trump Organization. He expanded its operations to building and renovating skyscrapers, hotels, casinos, and golf courses and later started side ventures, mostly by licensing his name. From 2004 to 2015, he co-produced and hosted the reality television series The Apprentice. He and his businesses have been plaintiff or defendant in more than 4,000 state and federal legal actions, including six business bankruptcies. Trump won the 2016 presidential election as the Republican nominee against Democratic nominee Hillary Clinton while losing the popular vote. During the campaign, his political positions were described as popul

#### 2.1.4 Named Entity Recognition

In [15]:
pipeline_ner = pipeline("token-classification", model="dslim/bert-base-NER-uncased", aggregation_strategy="simple")

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.26k [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

Some weights of the model checkpoint at dslim/bert-base-NER-uncased were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Downloading (…)okenizer_config.json:   0%|          | 0.00/39.0 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

In [18]:
import wikipedia
wikipedia.set_lang("en")

text_long = wikipedia.summary("Donald Trump").replace('\n', ' ')

output = pipeline_ner(text_long)

pd.DataFrame(output)

Unnamed: 0,entity_group,score,word,start,end
0,PER,0.990035,donald john trump,0,17
1,MISC,0.993141,american,45,53
2,LOC,0.993848,united states,141,154
3,PER,0.991072,trump,174,179
4,ORG,0.678423,university of pennsylvania,199,225
5,ORG,0.81755,trump organization,357,375
6,MISC,0.959859,the apprentice,616,630
7,PER,0.994108,trump,776,781
8,MISC,0.994913,republican,824,834
9,MISC,0.995848,democratic,851,861


### 2.2. Universal models

The models above are always tailored to **one specific task from one dataset**. The main advantage of these models is, that they are very good at this specific task and perform well on one specific dataset. In reality, however, he problems you will encounter in the real world will require a slightly different task, with different definitions of categories or on different types of texts.

Universal models can partly address this issue. They also only one task. But this one task is to general/universal, that many other tasks can be reformulated as this universal task. Two examples for universal tasks are:
- Natural Language Inference (NLI): a task that can solve any classification task.
- Token generation: an even more universal task that can solve any text-related task.

#### Zero-shot classification

In [19]:
pipeline_zeroshot_classification = pipeline("zero-shot-classification", model="MoritzLaurer/mDeBERTa-v3-base-mnli-xnli")

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.07k [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/558M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/463 [00:00<?, ?B/s]

Downloading spm.model:   0%|          | 0.00/4.31M [00:00<?, ?B/s]

Downloading (…)in/added_tokens.json:   0%|          | 0.00/18.0 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/156 [00:00<?, ?B/s]



In [26]:
text = "Customer: I have not received my reimbursement yet. What the hell is going on?"
classes = ['payment issues', 'travel advice', 'bug report']  # "account opening", "customer complaint"

#text = "I do not think the government is trustworthy anymore. We need to mobilize and resist!"
#classes = ["civil disobedience", "praise of the government", "travel advice"]  # "collective action"

output = pipeline_zeroshot_classification(text, classes, multi_label=True)

pd.DataFrame(data=[output["labels"], output["scores"]], index=["class", "probability"]).T


Unnamed: 0,class,probability
0,payment issues,0.991133
1,bug report,0.076115
2,travel advice,0.018696


#### Zero-shot learning with large generative models and prompts (LLMs)

In [5]:
# info on GPU
!nvidia-smi
# info on available ram
from psutil import virtual_memory
ram_gb = virtual_memory().total / 1e9
print('\n\nYour runtime has {:.1f} gigabytes of available RAM\n'.format(ram_gb))

Mon Aug 21 16:03:18 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla V100-SXM2...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   41C    P0    25W / 300W |      0MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [2]:
# connecting to my google drive to load a large model from disk instead of downloading it
from google.colab import drive
import os
drive.mount('/content/drive', force_remount=False)

print(os.getcwd())
os.chdir("/content/drive/My Drive/PhD/generative-models")
print(os.getcwd())

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
/content
/content/drive/My Drive/PhD/generative-models


In [3]:
# if this doesn't work, try restarting the runtime
from transformers import pipeline
import torch

model_name = "flan-t5-xl"  # use "google/flan-t5-xl" to download the model

pipeline_zeroshot_prompting = pipeline(
    "text2text-generation",  # "text2text-generation", "text-generation"
    model=model_name,  device_map="auto",  #device=device_id,
    torch_dtype=torch.bfloat16,  #load_in_8bit=True,
)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

**A note on hardware and using LLMs:**

The main disadvantage of generative LLMs is that they are very large and inefficient. They need expensive hardware to run. You will not be able to run models larger than 7~ billion parameters on the standard T4 GPU from the free version of Google Colab. There are different solutions to this:
* Use better **hardware**. The main factor for loading a large model is "RAM" (Random Access Memory). The standard T4 GPU has 13.6 GB RAM, which is often not enough. With Colab Pro, you can get A100 GPUs with 40GB RAM. This GPU can load larger models, but is more expensive
* **Software solutions**: There are also software solutions like lowering precision or quantization. Making these solutions work takes some testing and depends on the model, so we will not cover this here.
* Hugging Face has some **documentation** [here](https://huggingface.co/docs/transformers/performance).
* Also consider using **smaller models**. For many use-cases, you might not need a very large LLM for your task.

In [5]:
## text classification (framed as multiple choice)
text = '''
Here is a quote:
"I do not think the government is trustworthy anymore. We need to mobilize and resist!".
Is this quote about either (a) "civil disobedience", or (b) "praise of the government", or (c) "government funded mobility"?
Only choose one of the three options.'''

output = pipeline_zeroshot_prompting(text)
output

[{'generated_text': '(a)'}]

In [7]:
## question answering
text = '''Here is a news article from Thursday 22.12.2022: "
European Parliament website hit by cyberattack after Russian terrorism vote
One official blamed pro-Russian hacking group Killnet for the DDoS attack.
The European Parliament website on Wednesday faced a "sophisticated" cyberattack disrupting its services moments after members voted to declare Russia a state sponsor of terrorism.
"I confirm that the Parliament has been subject to an external cyber attack, but the Parliamentary services are doing well to defend the Parliament," Dita Charanzová, Czech MEP and Parliament vice president responsible for cybersecurity, said in a statement.
Another senior Parliament official, requesting not to be named, said “it might be the most sophisticated attack that the Parliament has known so far.”
The attack is what's known as a distributed denial-of-service (DDoS) attack, in which massive amounts of traffic are sent to servers in an attempt to block internet users from accessing websites, Marcel Kolaja, European Parliament member for the Czech Pirate party, confirmed.
DDoS attacks are used by hacking groups to disrupt and cause chaos. It emerged as a favorite instrument of Russian hacking groups like Killnet, notably as a way to protest against political decisions in European countries to support Ukraine in the war.
The attack on the European Parliament website comes after the chamber voted on Wednesday to adopt a resolution declaring Russia a state sponsor of terrorism because of Moscow’s strikes on civilian targets in Ukraine.
"We have a strong indication that it is from Killnet, the hackers with links to Russia indeed. This is my information, but it is under control. It only cut the external access to the Parliament's website ... Unless there is extra attacks we expect it to be back and accessible very soon," said Eva Kaili, Greek member and vice president of the European Parliament.
"This morning Russia was still designated as a terrorist state in an official resolution. This afternoon the entire network collapses in [the European Parliament]," Alexandra Geese, German Greens' MEP, tweeted.
".

'''

prompt_lst = [
    "Was there a cyber attack? Yes or no.",
    "What is the name of the attacker?",
    "Which country does the attacker come from?",
    "What is the name of the victim of the cyber attack?",
    "Which country does the victim of the cyber attack come from?",
    "If there was a cyber attack, what type of cyber attack was it?",
    "What was the date of the cyber attack?",
    "What or who is the source of information on the cyber attack?",
    "What damages were caused by the cyber attack?",
    "What was the political response to the cyber attack?",
    'How certain is it that there was a cyber attack? "Very certain", "moderately certain", or "not certain"? Chose one of these options.'
]

# chain-of-thought tests https://arxiv.org/pdf/2210.11416.pdf
instructions_begin = ""  #"Answer the following question by reasoning step-by-step: "
instructions_end = ""  #" Explain the answer with step-by-step reasoning"
other_category = ' Answer "unknown" if the correct answer is not explicitly mentioned in the article.'

input_lst = [text + instructions_begin + prompt + other_category + instructions_end for prompt in prompt_lst]

output_lst = []
for input, prompt in zip(input_lst, prompt_lst):
    output = pipeline_zeroshot_prompting(input)
    output_lst.append(output)
    print(f'{prompt:90}{output[0]["generated_text"]}')


Was there a cyber attack? Yes or no.                                                      Yes
What is the name of the attacker?                                                         Killnet
Which country does the attacker come from?                                                Russia
What is the name of the victim of the cyber attack?                                       European Parliament
Which country does the victim of the cyber attack come from?                              unknown
If there was a cyber attack, what type of cyber attack was it?                            distributed denial-of-service (DDoS) attack
What was the date of the cyber attack?                                                    Wednesday
What or who is the source of information on the cyber attack?                             Eva Kaili
What damages were caused by the cyber attack?                                             unanswerable
What was the political response to the cyber attack?              

In [8]:
## text summarisation
text = """
"Donald John Trump (born June 14, 1946) is an American politician, media personality, and businessman who served as the 45th president of the United States from 2017 to 2021.
Trump graduated from the Wharton School of the University of Pennsylvania with a bachelor's degree in 1968. He became president of his father's real estate business in 1971 and renamed it The Trump Organization.
He expanded the company's operations to building and renovating skyscrapers, hotels, casinos, and golf courses. He later started side ventures, mostly by licensing his name. From 2004 to 2015, he co-produced
and hosted the reality television series The Apprentice. Trump and his businesses have been involved in more than 4,000 state and federal legal actions, including six bankruptcies. Trump's political positions
have been described as populist, protectionist, isolationist, and nationalist. He won the 2016 United States presidential election as the Republican nominee against Democratic nominee Hillary Clinton despite l
osing the national popular vote. He became the first U.S. president with no prior military or government service. His election and policies sparked numerous protests. The 2017–2019 special counsel investigation
led by Robert Mueller established that Russia interfered in the 2016 election to favor the election of Trump. Trump promoted conspiracy theories and made many false and misleading statements during his campaigns
and presidency, to a degree unprecedented in American politics. Many of his comments and actions have been characterized as racially charged or racist, and many as misogynistic. Trump ordered a travel ban
on citizens from several Muslim-majority countries, diverted military funding towards building a wall on the U.S.–Mexico border, and implemented a policy of family separations for apprehended migrants.
He rolled back more than 100 environmental policies and regulations in an aggressive attempt to weaken environmental protections."
Please summarize this text by providing the key information about Donald Trump. Summary:
"""

output = pipeline_zeroshot_prompting(text)
output

[{'generated_text': '"Donald Trump (born June 14, 1946) is an American politician, media personality,'}]

---

---

## Exercise  +  Q&A


**1. Exercise:** (5 min)

Browse through the Hugging Face Hub and **identify a model or dataset that could be useful for you**. Then open this Google Doc and copy-paste the model identifier and a short explanation why this model is interesting for you. Googel Doc: https://docs.google.com/document/d/1KZ6DnZDUg_sxqpS8hhF0MDohZ0IRUZaV83Ixu93n-X8/edit?usp=sharing




**2. Reading, thinking & asking:** (5 min)

a) Go through the notebook and ask any questions you might have. You can also run the notebook yourself.

b) Write the answers to the following questions on a piece of paper / digital notebook in your own words:

* How does open source help increase accessibility to machine learning? Where does it not help?

* In your own words, write down the main difference between standard models and universal models.

* **Post any questions in the chat!**




---

