# Text analysis tasks with ChatGPT API

## Text analysis tasks:

- Text summarization
- Extraction of topics, named entities, etc.
- Sentiment analysis
- Translation to other languages
- Rephrasing to correct or address a need

---

### *Imports and declarations*

In [8]:
import os
import openai
import wikipedia
import tiktoken
from langchain import OpenAI
from langchain.prompts import PromptTemplate
from langchain.callbacks import get_openai_callback

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

openai.api_key = os.environ['OPENAI_API_KEY']

llm_model = OpenAI(temperature=0.0)

tokenizer = tiktoken.encoding_for_model(llm_model.model_name)

# Cost of executing ChatGPT calls is accumulated in 'total_cost'
# Summary is printed at the end of this notebook
total_cost = 0.0

---

### Summarize Wikipedia article on GPT-3 

Python Wikipedia library documentation: https://wikipedia.readthedocs.io/en/latest/

In [9]:
def summarize(text, length, llm=llm_model, print_full_prompt=False):
    # text and length must be valid strings, length should be a string representation of an integer
    global total_cost
    
    summarization_template_string = """
    Summarize the text delimited by tripple backticks in {length} words.\
    text: ```{text}```
    """
    summarization_prompt_template = PromptTemplate(
        input_variables=["text", "length"],
        template=summarization_template_string
    )
    
    model_input = summarization_prompt_template.format(text=text, length=length)

    if print_full_prompt:
        print(f"Full prompt:\n{model_input}\n")
    
    with get_openai_callback() as cb:
        response = llm(model_input)
        
    total_cost += cb.total_cost
    
    return response

In [10]:
# Wikipedia page on GPT-3: https://en.wikipedia.org/wiki/GPT-3

wikipedia.set_lang("en")
gpt3_article = wikipedia.page("GPT-3", auto_suggest=False).content

In [11]:
print(gpt3_article[:500])

Generative Pre-trained Transformer 3 (GPT-3) is a large language model released by OpenAI in 2020. Like its predecessor GPT-2, it is a decoder-only transformer model of deep neural network, which uses attention in place of previous recurrence- and convolution-based architectures. Attention mechanisms allow the model to selectively focus on segments of input text it predicts to be the most relevant. It uses a 2048-tokens-long context and then-unprecedented size of 175 billion parameters, requirin


Check the article lenght in tokens to assure it fits into LLM's input limitation (together with prompt template text), which is 4096 tokens for GPT-3.5-Turbo

In [12]:
len(tokenizer.encode(gpt3_article))

3684

In [13]:
gpt3_summary = summarize(gpt3_article, length="200", print_full_prompt=True)

Full prompt:

    Summarize the text delimited by tripple backticks in 200 words.    text: ```Generative Pre-trained Transformer 3 (GPT-3) is a large language model released by OpenAI in 2020. Like its predecessor GPT-2, it is a decoder-only transformer model of deep neural network, which uses attention in place of previous recurrence- and convolution-based architectures. Attention mechanisms allow the model to selectively focus on segments of input text it predicts to be the most relevant. It uses a 2048-tokens-long context and then-unprecedented size of 175 billion parameters, requiring 800GB to store. The model demonstrated strong zero-shot and few-shot learning on many tasks.Microsoft announced on September 22, 2020, that it had licensed "exclusive" use of GPT-3; others can still use the public API to receive output, but only Microsoft has access to GPT-3's underlying model.


== Background ==
According to The Economist, improved algorithms, powerful computers, and an increase in d

In [14]:
print(f"Summary:\n{gpt3_summary}")

Summary:

Generative Pre-trained Transformer 3 (GPT-3) is a large language model released by OpenAI in 2020. It is a decoder-only transformer model of deep neural network, which uses attention in place of previous recurrence- and convolution-based architectures. GPT-3 has an unprecedented size of 175 billion parameters, requiring 800GB to store. It demonstrated strong zero-shot and few-shot learning on many tasks. Microsoft licensed exclusive use of GPT-3, while others can still use the public API to receive output.

GPT-3 is trained on hundreds of billions of words and is capable of coding in CSS, JSX, and Python, among others. It does not require further training for distinct language tasks, but it occasionally generates toxic language as a result of mimicking its training data. OpenAI has implemented strategies to limit the amount of toxic language generated by GPT-3.

InstructGPT is a finetuned version of GPT-3. It has been trained on a dataset of human-written instructions, allowi

In [15]:
# count words in the summary
import re

len(re.findall(r'\w+', gpt3_summary))

192

In [16]:
# count tokens in the summary

len(tokenizer.encode(gpt3_summary))

256

### Summarize Wikipedia article on GPT-4

In [17]:
# Wikipedia page on GPT-4: https://en.wikipedia.org/wiki/GPT-4

gpt4_article = wikipedia.page("GPT-4", auto_suggest=False).content
len(tokenizer.encode(gpt4_article))

3342

In [18]:
gpt4_summary = summarize(gpt4_article, length="200")
    
print(f"Summary:\n{gpt4_summary}")

Summary:

Generative Pre-trained Transformer 4 (GPT-4) is a large language model created by OpenAI, and the fourth in its series of GPT foundation models. It was initially released on March 14, 2023, and has been made publicly available via the paid chatbot product ChatGPT Plus, and via OpenAI's API. GPT-4 is a transformer-based model, which uses pre-training on public data and "data licensed from third-party providers" to predict the next token. After this step, the model was then fine-tuned with reinforcement learning feedback from humans and AI for human alignment and policy compliance.

Observers reported that the iteration of ChatGPT using GPT-4 was an improvement on the previous iteration based on GPT-3.5, with the caveat that GPT-4 retains some of the problems with earlier revisions. GPT-4 is also capable of taking images as input, though this feature has not been made available since launch. OpenAI has declined to reveal various technical details and statistics about GPT-4, suc

In [19]:
short_gpt4_summary = summarize(gpt4_article, length="100")
    
print(f"Summary:\n{short_gpt4_summary}")

Summary:

OpenAI's Generative Pre-trained Transformer 4 (GPT-4) is a large language model released in March 2023. It is a transformer-based model that uses pre-training and reinforcement learning to predict the next token. GPT-4 is a multimodal model, capable of taking images as input, and is used in products such as ChatGPT Plus and Microsoft Bing. It has been tested on standardized tests and medical problems, and has been found to be useful for coding tasks. It has been criticized for its lack of transparency and potential biases, and safety concerns have been raised due to its ability to hallucinate and respond to harmful prompts. GPT-4 is used in products such as ChatGPT Plus, Microsoft Bing, Copilot, Duolingo, Khan Academy, Be My Eyes, and Stripe.


In [20]:
# count words in the short summary

len(re.findall(r'\w+', short_gpt4_summary))

130

---

## Extract topics, named entities, etc. from text

In [21]:
def extract(text, topic, llm=llm_model):
    # text and topics must be valid strings
    global total_cost
    
    extraction_template_string = """
    Extract {topic} from the text delimited by tripple backticks.\
    text: ```{text}```
    """
    extraction_prompt_template = PromptTemplate.from_template(extraction_template_string)
    
    model_input = extraction_prompt_template.format(text=text, topic=topic)

    with get_openai_callback() as cb:
        response = llm(model_input)
        
    total_cost += cb.total_cost
    
    return response

In [22]:
print(extract(gpt3_summary, "main topic"))


Main Topic: GPT-3 and its variants


In [23]:
print(extract(gpt4_summary, "main topic"))

Retrying langchain.llms.openai.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-text-davinci-003 in organization org-v8G8kOQ3lnYLEW1SXIh6q9ej on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/account/billing to add a payment method..
Retrying langchain.llms.openai.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-text-davinci-003 in organization org-v8G8kOQ3lnYLEW1SXIh6q9ej on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/acco


Main topic: GPT-4


In [24]:
print(extract(gpt3_summary, "list of model names"))

Retrying langchain.llms.openai.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-text-davinci-003 in organization org-v8G8kOQ3lnYLEW1SXIh6q9ej on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/account/billing to add a payment method..
Retrying langchain.llms.openai.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-text-davinci-003 in organization org-v8G8kOQ3lnYLEW1SXIh6q9ej on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/acco



GPT-3, InstructGPT, GPT-3.5


In [25]:
print(extract(gpt4_summary, "list of applications"))

Retrying langchain.llms.openai.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-text-davinci-003 in organization org-v8G8kOQ3lnYLEW1SXIh6q9ej on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/account/billing to add a payment method..
Retrying langchain.llms.openai.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-text-davinci-003 in organization org-v8G8kOQ3lnYLEW1SXIh6q9ej on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/acco

KeyboardInterrupt: 

---

## Sentiment analysis

In [None]:
def sentiment_analysis(text, llm=llm_model):
    global total_cost
    
    sentiment_template_string = """
    Classify the sentiment expressed in the review delimited by tripple backticks.\
    review: ```{text}```
    """
    sentiment_prompt_template = PromptTemplate.from_template(sentiment_template_string)

    model_input = sentiment_prompt_template.format(text=text)

    with get_openai_callback() as cb:
        response = llm(model_input)
        
    total_cost += cb.total_cost
    
    return response

In [None]:
review_1 = """
I purchased the PixelPioneer Quantum 60" and it's a game-changer.
The 4K resolution is stunning and the smart features are easy to use.
Worth every penny! - George, Liverpool"""

print(sentiment_analysis(review_1))

In [None]:
review_2 = """
I'm not happy with the VisionCast UltraView 43".
The picture quality is subpar and the TV arrived with a scratch on the screen.
I expected better quality control. - Sarah, Los Angeles"""

print(sentiment_analysis(review_2))

In [None]:
review_3 = """
I bought the PixelPioneer Quantum 70" and it's simply fantastic.
The voice control remote is a game-changer.
However, the delivery was delayed by a week which was quite frustrating. - Emma, London"""

print(sentiment_analysis(review_3))

---

## Translation to other languages

In [None]:
def translate(text, target_language, llm=llm_model):
    global total_cost
    
    translation_template_string = """
    Translate the text delimited by tripple backticks into {language}.\
    text: ```{text}```
    """
    translation_prompt_template = PromptTemplate.from_template(translation_template_string)
    
    model_input = translation_prompt_template.format(text=text, language=target_language)

    with get_openai_callback() as cb:
        response = llm(model_input)
        
    total_cost += cb.total_cost
    
    return response

In [None]:
english_text = "Some of the capabilities of GPT-4 include describing humor in images, \
summarizing text from screenshots, and answering exam questions with diagrams."

spanish_translation = translate(english_text, "Spanish")

print(spanish_translation)

In [None]:
italian_translation = translate(english_text, "Italian")

print(italian_translation)

In [None]:
print(translate(spanish_translation, "Italian"))

In [None]:
# quote from Wikipedia: https://el.wikipedia.org/wiki/GPT-4

greek_text = "Ως μετασχηματιστής, το GPT-4 ήταν προεκπαιδευμένο για την πρόβλεψη του επόμενου διακριτικού \
(χρησιμοποιώντας δημόσια δεδομένα και «δεδομένα με άδεια από τρίτους παρόχους») και στη συνέχεια βελτιστοποιήθηκε \
με ενισχυτική μάθηση από την ανάδραση ανθρώπου και τεχνητής νοημοσύνης για ανθρώπινη ευθυγράμμιση και πολιτική συμμόρφωση."

print(translate(greek_text, "English"))

---

## Rephrasing to correct or address a need

In [None]:
def correct_text(text, llm=llm_model):
    global total_cost
    
    correct_grammar_template_string = """
    Correct grammar, punctuation and spelling in the text delimited by tripple backticks.\
    text: ```{text}```
    """
    correct_grammar_prompt_template = PromptTemplate.from_template(correct_grammar_template_string)
    
    model_input = correct_grammar_prompt_template.format(text=text)

    with get_openai_callback() as cb:
        response = llm(model_input)
        
    total_cost += cb.total_cost
    
    return response

In [None]:
original_text = """
The model has limitations, including the tendency to hallucinate and lack transparency
in its decision-making processes. It has also been found to have cognitive biases."""

altered_text = """
The mdel has limmitaions including, the tendency to halucinate and lsck trespacy
in its decision making processes. It has also been fond to hav cognitive biasses."""

print(correct_text(altered_text))

---

## Examples of other ChatGPT based applications:

- Chatbots
- Question answering over documents
- Customer support agents
- Querying and analyzis of structured data
- Personal assistants, etc.

---

# Get the total cost of running ChatGPT API calls in this notebook

In [None]:
print(f"Total cost: ${total_cost:.4f}")