# ChatGPT Article Summarizer:
## This program takes as input a PDF of text (or can take copy-paste), and outputs a summary. This can be useful for understanding text you don't have the time to read in full. 

# Outline
## Step 1: Seperate text into managable sections (small enough for ChatGPT to analyze) <br> Step 2: Calculate cost of computation (cost per token * tokens)<br> Step 3: Give sections to ChatGPT one by one and give summary prompt<br> Step 4: Feed its own summaries to it and create a general summary

# Start of Code

## Import necessary libraries, functions, variables, and files

In [1]:
#Import Libraries, variables, and files
import PyPDF2
import os
import openai
import tiktoken
openai.api_key = os.getenv("OPENAI_API_KEY")

#This is where you put the pdf file (example here is background info for an author)
myPDF = 'SportHazeReading.pdf'


### This function is what we will use to access ChatGPT (3.5 turbo model)

In [2]:
#This function is what we will use to use ChatGPT (3.5 turbo model)
def get_completion(prompt, model="gpt-3.5-turbo"): 
    messages = [{"role": "user", "content": prompt}]
    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        temperature=0, # this is the degree of randomness of the model's output
    )
    return response.choices[0].message["content"]

### Extracts text from pdf

In [3]:
#This function takes a page from a pdf, extracts the text, and puts it in a string variable
#It then repeats this process for as many pages in the pdf
def extract_text_from_pdf(pdf_path):
    with open(pdf_path, 'rb') as file:
        pdf_reader = PyPDF2.PdfReader(file)
        pdf_text = ""
        for page_num in range(len(pdf_reader.pages)):
            page = pdf_reader.pages[page_num]
            pdf_text += page.extract_text()
    return pdf_text

### Separates text into manageable sections

In [4]:
#This library and function are used to break our text into managable sections (by sentence count)
import nltk

def break_into_paragraphs(text, sentences_per_group=8):
    # Tokenize text into sentences
    sentences = nltk.sent_tokenize(text)
    
    # Group sentences into paragraphs/sections based on the specified number of sentences per group
    groups = [sentences[i:i+sentences_per_group] for i in range(0, len(sentences), sentences_per_group)]
    
    # Join sentences within each group to form paragraphs
    paragraphs = [' '.join(sentence_group) for sentence_group in groups]
    return paragraphs


### Call our functions for our pdf file and display result (limited display for longer text)

In [5]:
#This is where we use our functions on our pdf file to get our seperated text
pdf_path = myPDF
pdf_text = extract_text_from_pdf(pdf_path)
paragraphs = break_into_paragraphs(pdf_text, sentences_per_group=8)
print(f"Number of sections: {len(paragraphs)}\n")

# Print paragraphs (or a few if long text)
if (len(paragraphs) < 8): 
    for idx, paragraph in enumerate(paragraphs):
        print(f"Group {idx + 1}:\n{paragraph}\n")
else:
    for idx, paragraph in enumerate(paragraphs[:4]):
        print(f"Group {idx + 1}:\n{paragraph}\n")
    


Number of sections: 78

Group 1:
© C anadian  Journal  of SoCiology /CahierS  CanadienS  de SoCiologie  36(3) 2011 199Through  The LiminaL : a Compara Tive 
anaL ysis of CommuniT as and riTes of 
passage  in spor T hazing  and  
iniTia Tions
Jay Johnson
Abstract. Hazing rituals and ceremonies have been described in traditional, 
historical world cultures, junior and high school, the military, private schools, paramilitary organizations, fraternities and sororities, as well as sport (Allan 
and Madden 2008; Bryshun and Young 1999; Campo, Poulos, and Sipple 
2005; Fields, Collins, and Comstock 2010; Johnson 2000, 2009; Linhares de Albuquerque and Paes-Machado 2004; Nuwer 1999; Winslow 1999; Zacharda 2009). Student athletes often endure hazing practices with 80% of NCAA ath-
letes reporting some form of initiation (Hoover 1999) in exchange for member -
ship affiliation. This qualitative study compared the similarities and deviations 
between contemporary initiations and historical traditi

## Calculate Price

In [6]:
#This function calculates number of ChatGPT tokens for our text 
def num_tokens_from_string(string: str, encoding_name: str) -> int:
    """Returns the number of tokens in a text string."""
    encoding = tiktoken.get_encoding(encoding_name)
    num_tokens = len(encoding.encode(string))
    return num_tokens

In [7]:
#This function calculates the approximate price of using the model (in USD)
def checkInputPrice(text): 
    tokenCount = num_tokens_from_string(text, "cl100k_base")
    print(f"The number of tokens in the given text is: {tokenCount}")
    price = (.001) * (tokenCount / 1000)
    print(f"The price for the following input text is: {price}")

In [8]:
checkInputPrice(pdf_text)

The number of tokens in the given text is: 17320
The price for the following input text is: 0.017320000000000002


## Prompt ChatGPT to summarize each section and display result

In [9]:
#This is where we give the model our prompt and feed it each section individually
totalResponse = ""
for i in range(len(paragraphs)):
    prompt = f"""
    Your task is to generate a short summary of the text, \ 
    capturing the main ideas and important information. Your \
    goal is to provide a clear understanding to someone \
    who hasn't read the text, ensuring they grasp the main \
    points and crucial details.

    Summarize the section below, delimited by triple \
    backticks in at most 4 sentences. 

    section: ```{paragraphs[i]}```
    """

    response = get_completion(prompt)
    print(i + 1, response, "\n")
    totalResponse += response + "\n"


1 The text discusses hazing rituals and ceremonies in various settings such as traditional cultures, schools, military, and sports. Student athletes often undergo hazing practices as a form of initiation for membership affiliation. Despite the abusive nature of hazing ceremonies in sports, there is still a demand for entry rituals to mark membership and identity within the team structure. The study compares contemporary initiations with historical traditions, highlighting the importance of establishing "Rites of Passage" for group membership. 

2 The section discusses the prevalence of hazing rituals and ceremonies in various cultures and institutions, including high schools, the military, private schools, and sports teams. Athletes often undergo hazing practices, with a high percentage reporting initiation into groups. The study compares contemporary initiations with historical traditions, highlighting the importance of rituals in establishing group membership. Despite the abusive nat

## Combine all summaries into single summary

In [10]:
#First we check that we are within token limit for responses
finalTokenCount = num_tokens_from_string(totalResponse, "cl100k_base")
print(finalTokenCount)

7160


In [11]:
#If we are not within token limit, we will have to output multiple summaries
if finalTokenCount > 4000:
    finalText = break_into_paragraphs(totalResponse, sentences_per_group= 150)
    for i in range(len(finalText)):
        #Now we combine all of the summaries into a few general summaries of the entire text
        sumPrompt = f"""
            Your task is to generate a short summary of the text, \ 
            capturing the main ideas and important information. Your \
            goal is to provide a clear understanding to someone \
            who hasn't read the text, ensuring they grasp the main \
            points and crucial details. Make sure to include any necessary context.

            The text is made up of previous summaries you have made. We want \
            to combine the summaries into one complete summary

            Summarize the text below, delimited by triple \
            backticks in at most 10 sentences. Seperate each sentence in \
            the form of a bulleted list, with each bullet on a new line. 

            Text: ```{finalText[i]}```
            """

        endSummary = get_completion(sumPrompt)
        print(endSummary, "\n")

#If we are within token limit, we can output normally        
else:
    #Now we combine all of the summaries into one final general summary of the entire text
    sumPrompt = f"""
        Your task is to generate a short summary of the text, \ 
        capturing the main ideas and important information. Your \
        goal is to provide a clear understanding to someone \
        who hasn't read the text, ensuring they grasp the main \
        points and crucial details. Make sure to include any necessary context.

        The text is made up of previous summaries you have made. We want \
        to combine the summaries into one complete summary

        Summarize the text below, delimited by triple \
        backticks in at most 10 sentences. Seperate each sentence in \
        the form of a bulleted list, with each bullet on a new line. 

        Text: ```{totalResponse}```
        """

    endSummary = get_completion(sumPrompt)
    print(endSummary, "\n")


- The text explores hazing rituals and ceremonies in various settings such as traditional cultures, schools, military, and sports.
- Student athletes often undergo hazing practices as a form of initiation for membership affiliation.
- Despite the abusive nature of hazing ceremonies in sports, there is still a demand for entry rituals to mark membership and identity within the team structure.
- The study compares contemporary initiations with historical traditions, emphasizing the importance of establishing "Rites of Passage" for group membership.
- Hazing rituals and ceremonies are prevalent in various cultures and institutions, including high schools, the military, private schools, and sports teams.
- The text discusses the definition of hazing as any activity that humiliates, degrades, abuses, or endangers individuals joining a group.
- There is an increase in convictions for hazing-related offenses in recent years, leading to the implementation of anti-hazing policies and laws in mo

# We have successfully used ChatGPT to condense an article into a 10 bullet point summary! (Or more if the article is extremely long)