#Summarizing_with_ChatGPT
Copyright 2023 Denis Rothman, MIT License

**March 2023 message by Denis Rothman:**
This notebook replaces[Training_OpenAI_GPT_2_CH09.ipynb](https://github.com/Denis2054/Transformers-for-NLP-2nd-Edition/blob/main/Chapter09/Training_OpenAI_GPT_2_CH09.ipynb). Google Colab does not support Tensorflow 1.x anymore which makes the Training_OpenAI_GPT_2_CH09.ipybn notebook unstable.

The goal of *Transformers for NLP, 2nd Edition, Chapter 9, Matching Tokenizers and Datasets*, is to show how tokenizing works and the limitations of transformer models when embedding tokens.

This notebook shows how to use GPT-3.5(ChatGPT) with the OpenAI API to perform the summarization task of chapter 9, experimenting with rare words and showing the limits of SOA transformers no matter how evolved they are:

1. Installing openai and your API key<br>
2. Summarization<br>
3. Tokenizing<br>
4. Exploring the limits<br>
5. Conclusion<br>

To get the best out of this notebook:

*  make sure you have read Chapter 7

*  once you have understood the theory, go to section 4 of this notebook,  *4. Exploring the limits*, of this notebook and try to find more limitations and think of how you can filter them and find solutions.





# December 6,2023 OpenAI API update

[This notebook has been updated. See README "Getting Started with OpenAI API" section before running this notebook](https://github.com/Denis2054/Transformers-for-NLP-2nd-Edition/blob/main/README.md#getting-started-with-openai-api)


In [1]:
!pip install --upgrade pip

Collecting pip
  Downloading pip-23.3.1-py3-none-any.whl (2.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m13.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 23.1.2
    Uninstalling pip-23.1.2:
      Successfully uninstalled pip-23.1.2
Successfully installed pip-23.3.1


In [2]:
# December 4,2023 update : Tiktoken required to install OpenAI on Google Colab
# Tiktoken is a fast BPE tokenizer
!pip install tiktoken

# December 4,2023 update : Cohere required to install OpenAI to implement language AI.
# Cohere platform: https://dashboard.cohere.com/
!pip install --upgrade cohere

Collecting tiktoken
  Downloading tiktoken-0.5.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)
Downloading tiktoken-0.5.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m12.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: tiktoken
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
llmx 0.0.15a0 requires cohere, which is not installed.
llmx 0.0.15a0 requires openai, which is not installed.[0m[31m
[0mSuccessfully installed tiktoken-0.5.2
[0mCollecting cohere
  Downloading cohere-4.37-py3-none-any.whl.metadata (5.4 kB)
Collecting backoff<3.0,>=2.0 (from cohere)
  Downloading backoff-2.2.1-py3-none-any.whl (15 kB)
Collecting fastavro<2.0,>=1.8 (from cohere)
  Downloading fastavro-1.9.0-cp310-cp310-manylinux_2

#1.Installing openai


## installing and importing openai

In [3]:
#Importing openai
try:
  import openai
except:
  !pip install openai
  import openai

Collecting openai
  Downloading openai-1.3.7-py3-none-any.whl.metadata (17 kB)
Collecting httpx<1,>=0.23.0 (from openai)
  Downloading httpx-0.25.2-py3-none-any.whl.metadata (6.9 kB)
Collecting httpcore==1.* (from httpx<1,>=0.23.0->openai)
  Downloading httpcore-1.0.2-py3-none-any.whl.metadata (20 kB)
Collecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<1,>=0.23.0->openai)
  Downloading h11-0.14.0-py3-none-any.whl (58 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.3/58.3 kB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading openai-1.3.7-py3-none-any.whl (221 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m221.4/221.4 kB[0m [31m7.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading httpx-0.25.2-py3-none-any.whl (74 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.0/75.0 kB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading httpcore-1.0.2-py3-none-any.whl (76 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

##API Key

In [4]:
#2.API Key
#Store you key in a file and read it(you can type it directly in the notebook but it will be visible for somebody next to you)
from google.colab import drive
drive.mount('/content/drive')
f = open("drive/MyDrive/files/api_key.txt", "r")
API_KEY=f.readline()
f.close()

#The OpenAI Key
import os
os.environ['OPENAI_API_KEY'] =API_KEY
openai.api_key = os.getenv("OPENAI_API_KEY")


Mounted at /content/drive


#2. gpt-3.5 turbo(ChatGPT) dialog function

preparing the NLP message

In [5]:
 def dialog(uinput):
   #preparing the prompt for OpenAI
   role="user"

   #prompt="Where is Tahiti located?" #maintenance or if you do not want to use a microphone
   line = {"role": role, "content": uinput}

   #creating the mesage
   assert1={"role": "system", "content": "You are a Natural Language Processing Assistant."}
   assert2={"role": "assistant", "content": "You are helping viewers analyze social medial better."}
   assert3=line
   iprompt = []
   iprompt.append(assert1)
   iprompt.append(assert2)
   iprompt.append(assert3)

   #sending the message to ChatGPT
   import os
   from openai import OpenAI

   client = OpenAI(
    # This is the default and can be omitted
    api_key=os.environ.get("OPENAI_API_KEY"),
    )

   response=client.chat.completions.create(model="gpt-3.5-turbo",messages=iprompt) #ChatGPT dialog

   return response

# 3.Summarizing

The next to summarize:

"During such processes, cells sense the environment and respond to external factors that induce a certain direction of motion towards specific targets (taxis): this results in a persistent migration in a certain preferential direction. The guidance cues leading to directed migration may be biochemical or biophysical. Biochemical cues can be, for example, soluble factors or growth factors that give rise to chemotaxis, which involves a mono-directional stimulus. Other cues generating mono-directional stimuli include, for instance, bound ligands to the substratum that induce haptotaxis, durotaxis, that involves migration towards regions with an increasing stiffness of the ECM, electrotaxis, also known as galvanotaxis, that prescribes a directed motion guided by an electric field or current, or phototaxis, referring to the movement oriented by a stimulus of light [34]. Important biophysical cues are some of the properties of the extracellular matrix (ECM), first among all the alignment of collagen fibers and its stiffness. In particular, the fiber alignment is shown to stimulate contact guidance [22, 21]."


The summary by ChatGPT seems acceptable but implementing controlls by an SME(Subject Matter Expert) is good practice.

In [6]:
uinput="Summarize the following paragraph: During such processes, cells sense the environment and respond to external factors that induce a certain direction of motion towards specific targets (taxis): this results in a persistent migration in a certain preferential direction. The guidance cues leading to directed migration may be biochemical or biophysical. Biochemical cues can be, for example, soluble factors or growth factors that give rise to chemotaxis, which involves a mono-directional stimulus. Other cues generating mono-directional stimuli include, for instance, bound ligands to the substratum that induce haptotaxis, durotaxis, that involves migration towards regions with an increasing stiffness of the ECM, electrotaxis, also known as galvanotaxis, that prescribes a directed motion guided by an electric field or current, or phototaxis, referring to the movement oriented by a stimulus of light [34]. Important biophysical cues are some of the properties of the extracellular matrix (ECM), first among all the alignment of collagen fibers and its stiffness. In particular, the fiber alignment is shown to stimulate contact guidance [22, 21]."
response=dialog(uinput) #preparing the messages for ChatGPT
print("Viewer request",uinput)
print("ChatGPT response:",response.choices[0].message.content)

Viewer request Summarize the following paragraph: During such processes, cells sense the environment and respond to external factors that induce a certain direction of motion towards specific targets (taxis): this results in a persistent migration in a certain preferential direction. The guidance cues leading to directed migration may be biochemical or biophysical. Biochemical cues can be, for example, soluble factors or growth factors that give rise to chemotaxis, which involves a mono-directional stimulus. Other cues generating mono-directional stimuli include, for instance, bound ligands to the substratum that induce haptotaxis, durotaxis, that involves migration towards regions with an increasing stiffness of the ECM, electrotaxis, also known as galvanotaxis, that prescribes a directed motion guided by an electric field or current, or phototaxis, referring to the movement oriented by a stimulus of light [34]. Important biophysical cues are some of the properties of the extracellula

# 4.Exploring the limits

In chapter, GPT-2 struggles with "amoeboid". GPT-3.5 turbo(ChatGPT) finds the correct definition even in a difficult sentence.

In [7]:
#amoeboid
uinput="Explain this sentence: I don't use a false foot to move forward so I am not an amoeboid today."
response=dialog(uinput) #preparing the messages for ChatGPT
print("Viewer request",uinput)
print("ChatGPT response:",response.choices[0].message.content)


Viewer request Explain this sentence: I don't use a false foot to move forward so I am not an amoeboid today.
ChatGPT response: This sentence is an example of figurative language and can be interpreted metaphorically. 

The phrase "I don't use a false foot to move forward" suggests that the speaker does not rely on deceptive tactics or dishonesty to make progress or achieve their goals. It implies that they believe in being genuine and authentic in their approach to life.

The second part of the sentence, "so I am not an amoeboid today," adds further metaphorical meaning. Amoeboid is derived from the word amoeba, which is a single-celled organism that can change its shape or direction of movement. In this context, being "amoeboid" metaphorically signifies being aimless, indecisive, or lacking a clear purpose.

By stating "I am not an amoeboid today," the speaker is asserting that they are not feeling directionless or unsure about their path or actions at the present moment. They are em

ChatGPT struggles with  ["icing" in hockey](https://www.merriam-webster.com/dictionary/icing)

"pucks" is translated as nonesense in Frence as of March 15th, 2023. This might improve in the future.

Viewer request English to French: Icing pucks is fun!
ChatGPT response: Glaçage des rondelles est amusant!

In [8]:
#The verb to ice pucks
uinput="English to French: Icing pucks is fun!"
response=dialog(uinput) #preparing the messages for ChatGPT
print("Viewer request",uinput)
print("ChatGPT response:",response.choices[0].message.content)

Viewer request English to French: Icing pucks is fun!
ChatGPT response: Glaçage des rondelles est amusant !


The back translation produces nonesense:

Viewer request French to English: "Glaçage des rondelles est amusant!!"
ChatGPT response: "Icing the slices is fun!!"

In [9]:
#The verb to ice pucks
uinput="French to English: Glaçage des rondelles est amusant!!"
response=dialog(uinput) #preparing the messages for ChatGPT
print("Viewer request",uinput)
print("ChatGPT response:",response.choices[0].message.content)

Viewer request French to English: Glaçage des rondelles est amusant!!
ChatGPT response: Icing the rounds is fun!!


# 5.Conclusion

GPT-2 has reached it limits.

GPT-3.5 turbo(ChatGPT) represents a huge step forward.

We simply have to accept the limitations and provide altternative solutions when we reach them.

There is still much work to do!

Next Steps: Explore SOA examples in the [BONUS](https://github.com/Denis2054/Transformers-for-NLP-2nd-Edition/blob/main/Bonus/Readme.md) section! See what they can do and take them to their limits!




