# A python script to summerize blog post using python and Open AI's GPT-3 

Imports the necessary modules openai and requests.

In [9]:
import openai
import requests
from bs4 import BeautifulSoup

The code below sets the OpenAI API key using the api_key property of the openai module. This is a private key that allows you to access the OpenAI API and use their language models. You'll need to obtain your own API key in order to use this code.

In [10]:
# Set up the OpenAI API client read from a file in the root folder.
with open("openai_api_key.txt", "r") as f:
    api_key = f.read().strip()

openai.api_key = api_key
print(api_key)

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


The **url** variable which holds the URL of the article that we want to summarize. In this case, it's set to "https://lawandspace.com/negotiating-their-future-a-marshallese-geography-of-us-policy/", which is a placeholder URL for demonstration purposes. **You would need to replace this with the URL of the actual article that you want to summarize.**

In [11]:
# Define the URL of the article to summarize
url = "https://lawandspace.com/negotiating-their-future-a-marshallese-geography-of-us-policy/"

## For Single Url
Code to Generate summery single url by taking all the paragraphs and heading within the blog/article into consideration

In [12]:
# Get the text content of the article
response = requests.get(url)
content = response.text

# Parse the HTML content using BeautifulSoup
soup = BeautifulSoup(content, 'html.parser')

# Extract the first five paragraphs and first two headers
paragraphs = soup.find_all('p')[:5]
headers = soup.find_all(['h1', 'h2'])[:2]

# Combine the paragraphs and headers into a single string
content_to_summarize = ''
for element in headers + paragraphs:
    content_to_summarize += element.text + '\n\n'

# Set up the GPT-3 API parameters
model_engine = "text-davinci-002"
max_tokens = 256
temperature = 0.7
n = 1
stop = None
prompt = "Please summarize the following article:\n\n"

# Generate the summary for the selected paragraphs and headers
prompt_with_content = prompt + content_to_summarize + "\n\nSummary:"
response = openai.Completion.create(
    engine=model_engine,
    prompt=prompt_with_content,
    max_tokens=max_tokens,
    temperature=temperature,
    n=n,
    stop=stop
)
summary = response.choices[0].text.strip()

# Print the summary
print(summary)

The article discusses the history of the United States' relationship with the Marshall Islands, and the various policies that have been enacted affecting Marshallese people living in the US. These policies have had a significant impact on Marshallese people's ability to access healthcare, education, housing, and employment. The ongoing negotiations to amend the Compact of Free Association are discussed, and the potential implications of these amendments are considered.


## For multiple Urls
Code to Generate summery of multiple urls by taking all the paragraphs and heading within the blog/article into consideration. The urls need to be passed as an array in **urls** variable like below. **You would need to replace this with the URL of the actual article that you want to summarize.**

In [13]:
urls = ["https://lawandspace.com/negotiating-their-future-a-marshallese-geography-of-us-policy/", 
        "https://immigrationimpact.com/2023/02/08/awaiting-court-dates-notice-to-appear/", 
        "https://www.theguardian.com/uk-news/2023/feb/04/rishi-sunak-stop-deportation-appeals-people-uk-small-boats"]

In [14]:
# Set up the GPT-3 API parameters
model_engine = "text-davinci-002"
max_tokens = 256
temperature = 0.7
n = 1
stop = None
prompt = "Please summarize the following article:\n\n"

# Loop over each URL and generate a summary
for url in urls:
    # Get the text content of the article
    response = requests.get(url)
    content = response.text

    # Parse the HTML content using BeautifulSoup
    soup = BeautifulSoup(content, 'html.parser')

    # Extract the first five paragraphs and first two headers
    paragraphs = soup.find_all('p')[:5]
    headers = soup.find_all(['h1', 'h2'])[:2]

    # Combine the paragraphs and headers into a single string
    content_to_summarize = ''
    for element in headers + paragraphs:
        content_to_summarize += element.text + '\n\n'

    # Generate the summary for the selected paragraphs and headers
    prompt_with_content = prompt + content_to_summarize + "\n\nSummary:"
    response = openai.Completion.create(
        engine=model_engine,
        prompt=prompt_with_content,
        max_tokens=max_tokens,
        temperature=temperature,
        n=n,
        stop=stop
    )
    summary = response.choices[0].text.strip()

    # Print the summary for the current URL
    print(f"Summary for {url}: {summary}\n")

Summary for https://lawandspace.com/negotiating-their-future-a-marshallese-geography-of-us-policy/: The article discusses the current negotiations between the United States and the Republic of the Marshall Islands (RMI) to amend the Compact of Free Association (COFA), which governs the mobility of Marshallese people between the two countries. It describes how the COFA is the result of decades of negotiations, and how it has been amended in the past. The article goes on to discuss how Marshallese people living in the United States have faced a number of challenges in areas such as healthcare, education, housing, and employment.

Summary for https://immigrationimpact.com/2023/02/08/awaiting-court-dates-notice-to-appear/: The article discusses the NBC report on the lack of NTAs given to 600,000 migrants who have crossed into the United States since March 2021. The report leaves out important context as to why releases occurred in the first place. The NBC report analyzed data on “family un

## Short discription of api parameters used

- **engine**: This specifies the GPT-3 engine to use for generating the summary. The text-davinci-002 engine is the most capable and is often used for natural language generation tasks.

- **prompt**: This is the prompt that is provided to the GPT-3 model. In this case, we are providing a prompt that asks the model to summarize the given article.

- **max_tokens**: This parameter sets the maximum number of tokens (words or sub-words) to be generated by the model. In this case, we are limiting the summary to 256 tokens.

- **temperature**: This parameter controls the "creativity" of the model. A lower temperature value will generate more conservative and predictable responses, while a higher temperature value will generate more creative and unpredictable responses. In this case, we are using a temperature of 0.7, which is a good balance between predictability and creativity.

- **n**: This parameter specifies the number of different completions to generate for the given prompt. In this case, we are generating only one completion.

- **stop**: This parameter specifies a string that, if generated by the model, will indicate that the summary is complete. In this case, we are not using a stop sequence, so the model will generate a summary of the maximum length specified by max_tokens.