
# Building AI Applications with ChatGPT

Sumudu Tennakoon, PhD
<hr>

# NLP With OpenSource Language Models

In this notebook we will explore some basic fetures on Python programing language for those who have a prior programing expereince.

To learn more about Python, refeer to the following websites

- Python : https://www.python.org

To learn more about the Python packages we explore in this notebook, refer to the following websites

- HuggingFace : https://huggingface.co


# Getting Started with HuggingFace

* Run below code cell to install required libraries before you continue. Ignore that if you already installed them.

In [1]:
# !pip install transformers sentencepiece

# Pipelines

* HuggingFace pipelines  streamlined interface for common NLP tasks, such as sentiment analysis, text classification, named entity recognition, and text generation, speech-recognition. 
* You can choose from many different models and tasks on the HuggingFace website. 
* Pipelines make it easy to use models without writing a lot of code.*

https://huggingface.co/docs/transformers/main_classes/pipelines

In [None]:
from transformers import pipeline

# Sentiment Analysis

In [3]:
from transformers import pipeline

classifier = pipeline('sentiment-analysis')
classifier('I enojoy watching this movie!')

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'POSITIVE', 'score': 0.9986220598220825}]

In [4]:
classifier = pipeline('sentiment-analysis', model="finiteautomata/bertweet-base-sentiment-analysis")
classifier('This movie was the worst in the series!')

[{'label': 'NEG', 'score': 0.9834356904029846}]

## Question Answering

In [5]:
from transformers import pipeline

nlp = pipeline("question-answering")

context = """ Marie Curie, née Maria Sklodowska, was born in Warsaw on November 7, \
1867, the daughter of a secondary-school teacher. She received a general education \
in local schools and some scientific training from her father. She became involved \
in a students’ revolutionary organization and found it prudent to leave Warsaw, then \
in the part of Poland dominated by Russia, for Cracow, which at that time was under \
Austrian rule. In 1891, she went to Paris to continue her studies at the Sorbonne \
where she obtained Licenciateships in Physics and the Mathematical Sciences. She met \
Pierre Curie, Professor in the School of Physics in 1894 and in the following year \
they were married. She succeeded her husband as Head of the Physics Laboratory at \
the Sorbonne, gained her Doctor of Science degree in 1903, and following the tragic \
death of Pierre Curie in 1906, she took his place as Professor of General Physics in \
the Faculty of Sciences, the first time a woman had held this position. She was also \
appointed Director of the Curie Laboratory in the Radium Institute of the University \
of Paris, founded in 1914.
"""

nlp(question="When did Marie Curie Born?", context=context)


No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


{'score': 0.9644644856452942,
 'start': 58,
 'end': 74,
 'answer': 'November 7, 1867'}

In [6]:
nlp(question="What are the positions Marie Curie held at University of Paris?", context=context)

{'score': 0.4384763538837433,
 'start': 1001,
 'end': 1033,
 'answer': 'Director of the Curie Laboratory'}

#### Specify Model and Reuse Pipleline for Multiple Questions

In [7]:
from transformers import pipeline

nlp = pipeline("question-answering", model='deepset/roberta-base-squad2')

question = "When did Marie Curie Born?"
response = nlp(question=question, context=context)
print({"question":question, "response": response})

question = "What are the positions Marie Curie held at University of Paris?"
response = nlp(question=question, context=context)
print({"question":question, "response": response})

{'question': 'When did Marie Curie Born?', 'response': {'score': 0.9599596858024597, 'start': 58, 'end': 74, 'answer': 'November 7, 1867'}}
{'question': 'What are the positions Marie Curie held at University of Paris?', 'response': {'score': 0.19194874167442322, 'start': 1001, 'end': 1033, 'answer': 'Director of the Curie Laboratory'}}


## Text Generation

In [8]:
from transformers import pipeline

text_generator = pipeline("text-generation")
text_generator("An apple fell from the", max_length=6, do_sample=True)

No model was supplied, defaulted to gpt2 and revision 6c0e608 (https://huggingface.co/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'An apple fell from the tree'}]

In [9]:
# Extend the length of Generated Sequence

text_generator("An apple fell from the", max_length=20, do_sample=True)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'An apple fell from the sky and was knocked to the ground in a hail of rain. The angel'}]

## Translation

In [10]:
from transformers import pipeline

text = "Hello. How are you?"
translator = pipeline("translation", model="Helsinki-NLP/opus-mt-en-fr")

translator(text)



[{'translation_text': 'Bonjour, comment allez-vous ?'}]

In [11]:
from transformers import pipeline

text = "Hola cómo estás?"
translator = pipeline("translation", model="Helsinki-NLP/opus-mt-es-en")

translator(text)

[{'translation_text': 'Hi, how are you?'}]

## Summarization

In [12]:
from transformers import pipeline

summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

text = """ Marie Curie, née Maria Sklodowska, was born in Warsaw on November 7, \
1867, the daughter of a secondary-school teacher. She received a general education \
in local schools and some scientific training from her father. She became involved \
in a students’ revolutionary organization and found it prudent to leave Warsaw, then \
in the part of Poland dominated by Russia, for Cracow, which at that time was under \
Austrian rule. In 1891, she went to Paris to continue her studies at the Sorbonne \
where she obtained Licenciateships in Physics and the Mathematical Sciences. She met \
Pierre Curie, Professor in the School of Physics in 1894 and in the following year \
they were married. She succeeded her husband as Head of the Physics Laboratory at \
the Sorbonne, gained her Doctor of Science degree in 1903, and following the tragic \
death of Pierre Curie in 1906, she took his place as Professor of General Physics in \
the Faculty of Sciences, the first time a woman had held this position. She was also \
appointed Director of the Curie Laboratory in the Radium Institute of the University \
of Paris, founded in 1914.
"""

summarizer(text)

[{'summary_text': 'Marie Curie, née Maria Sklodowska, was born in Warsaw on November 7, 1867. She received a general education in local schools and some scientific training from her father. In 1891, she went to Paris to continue her studies at the Sorbonne where she obtained Licenciateships in Physics and the Mathematical Sciences.'}]

## Classification

In [13]:
from transformers import pipeline

classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")

sequence_to_classify = "Today I am going to prepare a dinner for my friends"

candidate_labels = ['travel', 'cooking', 'playing', 'learning']

classifier(sequence_to_classify, candidate_labels)

{'sequence': 'Today I am going to prepare a dinner for my friends',
 'labels': ['cooking', 'learning', 'playing', 'travel'],
 'scores': [0.9570969343185425,
  0.035653986036777496,
  0.005505905486643314,
  0.001743113505654037]}

In [14]:
sequence_to_classify = "I am going to visit Paris next year"

candidate_labels = ['travel', 'cooking', 'playing', 'learning']

classifier(sequence_to_classify, candidate_labels)

{'sequence': 'I am going to visit Paris next year',
 'labels': ['travel', 'learning', 'playing', 'cooking'],
 'scores': [0.7273193597793579,
  0.1763770431280136,
  0.09052826464176178,
  0.005775331985205412]}

In [15]:
sequence_to_classify = "I scored 75 runs in the cricket match yesterday"

candidate_labels = ['travel', 'cooking', 'playing', 'learning']

classifier(sequence_to_classify, candidate_labels)

{'sequence': 'I scored 75 runs in the cricket match yesterday',
 'labels': ['playing', 'learning', 'travel', 'cooking'],
 'scores': [0.81123948097229,
  0.1423066407442093,
  0.030837208032608032,
  0.015616626478731632]}

# Conversation 

In [16]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

In [17]:
MODEL = "microsoft/GODEL-v1_1-large-seq2seq"

tokenizer = AutoTokenizer.from_pretrained(MODEL)
model = AutoModelForSeq2SeqLM.from_pretrained(MODEL)

### No Context

In [18]:
instruction = f'Instruction: given a dialog context, you need to response professionally. Limit answer to one sentence'

question = 'Why the sky is blue?'

context = ''

prompt = f"{instruction}\n[CONTEXT] {question}\n[KNOWLEDGE] {context}"

input_ids = tokenizer(f"{prompt}", return_tensors="pt").input_ids

outputs = model.generate(input_ids, max_length=30, min_length=8, top_p=1.0, do_sample=True)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(input_ids)
print(outputs)
print(prompt)
print(response)

tensor([[21035,    10,   787,     3,     9, 13463,  2625,     6,    25,   174,
            12,  1773, 13931,     5, 18185,  1525,    12,    80,  7142,   784,
         17752,  3463,     4,   382,   908,  1615,     8,  5796,    19,  1692,
            58,   784,   439, 12038, 17717,  5042,   908,     1]])
tensor([[   0,   37, 3412,   63,  774,   19,  147,   21,    8,   97,  230,    5,
            1]])
Instruction: given a dialog context, you need to response professionally. Limit answer to one sentence
[CONTEXT] Why the sky is blue?
[KNOWLEDGE] 
The rainy season is over for the time now.


### With Scientific Context

In [19]:
instruction = f'Instruction: given a dialog context, you need to response professionally. Limit answer to one sentence'

context = """
A portion of the beam of light coming from the sun scatters off molecules of gas and other \
small particles in the atmosphere. Here, Rayleigh scattering primarily occurs through \
sunlight's interaction with randomly located air molecules. It is this scattered light that \
gives the surrounding sky its brightness and its color. As previously stated, Rayleigh \
scattering is inversely proportional to the fourth power of wavelength, so that shorter \
wavelength violet and blue light will scatter more than the longer wavelengths (yellow and \
especially red light).
"""

question = 'Why the sky is blue?'

prompt = f"{instruction}\n[CONTEXT] {question}\n[KNOWLEDGE] {context}"

input_ids = tokenizer(f"{prompt}", return_tensors="pt").input_ids

outputs = model.generate(input_ids, max_length=30, min_length=8, top_p=1.0, do_sample=False)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(prompt)
print(response)

Instruction: given a dialog context, you need to response professionally. Limit answer to one sentence
[CONTEXT] Why the sky is blue?
[KNOWLEDGE] 
A portion of the beam of light coming from the sun scatters off molecules of gas and other small particles in the atmosphere. Here, Rayleigh scattering primarily occurs through sunlight's interaction with randomly located air molecules. It is this scattered light that gives the surrounding sky its brightness and its color. As previously stated, Rayleigh scattering is inversely proportional to the fourth power of wavelength, so that shorter wavelength violet and blue light will scatter more than the longer wavelengths (yellow and especially red light).

Rayleigh scattering is a process where light is scattered off molecules of gas and other small particles in the atmosphere.


### With Greek Mythology as Context

In [20]:
instruction = f'Instruction: given a dialog context, you need to response professionally. Limit answer to one sentence'
context = """
The story goes that one day Zeus, the Greek god of the sky, asked his daughter Athena \
to make a wish. The blue-eyed Athena, wrapped up in herself, wished that the world \
could see her beauty every single day. Zeus granted Athena’s wish by turning the sky \
in blue, the color of her beautiful eyes.
"""

question = 'Why the sky is blue?'

prompt = f"{instruction}\n[CONTEXT] {question}\n[KNOWLEDGE] {context}"

input_ids = tokenizer(f"{prompt}", return_tensors="pt").input_ids

outputs = model.generate(input_ids, max_length=30, min_length=8, top_p=1.0, do_sample=True)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(prompt)
print(response)

Instruction: given a dialog context, you need to response professionally. Limit answer to one sentence
[CONTEXT] Why the sky is blue?
[KNOWLEDGE] 
The story goes that one day Zeus, the Greek god of the sky, asked his daughter Athena to make a wish. The blue-eyed Athena, wrapped up in herself, wished that the world could see her beauty every single day. Zeus granted Athena’s wish by turning the sky in blue, the color of her beautiful eyes.

Zeus granted Athena’s wish to turn the sky blue, the color of her blue eyes.


<hr/>
First Upload 2023-07-04 | Last update 2023-12-15 by Sumudu Tennakoon

<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.