<a href="https://colab.research.google.com/github/Josh-Em/wikipedia-summarizer/blob/main/WikipediaSummarizer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### üõ†Ô∏è Python Environment Setup with Libraries
Install essential Python libraries: requests, BeautifulSoup, and OpenAI, for web scraping and API tasks.

In [None]:
!pip install beautifulsoup4 requests
!pip install openai



### ü§ñ API Client Initialization with Python
Import and set up requests, BeautifulSoup, and OpenAI libraries to enable web scraping and OpenAI API usage.

In [None]:
import requests
from bs4 import BeautifulSoup
import re
from openai import OpenAI

api_key = "YOUR_API_KEY_HERE"

client = OpenAI(api_key=api_key)

### üåê Wikipedia Scraper Function
Create a function to scrape and extract clean text from Wikipedia pag

In [None]:
def scrape_wikipedia(url):
    """
    Scrapes text content from a Wikipedia page.

    Args:
        url (str): URL of the Wikipedia page to scrape

    Returns:
        str: The main text content of the Wikipedia page

    Raises:
        ValueError: If the URL is not a valid Wikipedia page
        requests.RequestException: If there's an error retrieving the page
    """
    # Verify it's a Wikipedia URL
    if not url.startswith('https://en.wikipedia.org/wiki/'):
        raise ValueError("URL must be a valid English Wikipedia page")

    try:
        # Send GET request to the URL
        response = requests.get(url)
        response.raise_for_status()

        # Create BeautifulSoup object to parse HTML
        soup = BeautifulSoup(response.text, 'html.parser')

        # Remove unwanted elements
        for element in soup.find_all(['script', 'style', 'table', 'sup', 'span']):
            element.decompose()

        # Get the main content div
        content = soup.find(id='mw-content-text')
        if not content:
            raise ValueError("Could not find main content section")

        # Get all text from the main content
        text = content.get_text(separator=' ')

        # Clean up the text
        text = re.sub(r'\[\d+\]', '', text)  # Remove reference numbers
        text = re.sub(r'\s+', ' ', text)  # Replace multiple spaces with single space
        text = text.strip()

        return text

    except requests.RequestException as e:
        raise requests.RequestException(f"Error retrieving the page: {str(e)}")
    except Exception as e:
        raise Exception(f"An error occurred: {str(e)}")

### üìù Wikipedia Article Summarizer with GPT
Utilize OpenAI's GPT to generate concise summaries of Wikipedia articles by processing the extrac

In [None]:
def get_summary(wiki_text):

  response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
      {
        "role": "system",
        "content": [
          {
            "type": "text",
            "text": "You are a wikipedia page summarizer. Given a wikipedia article, please summarize it in two or three sentences."
          }
        ]
      },
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": wiki_text
          }
        ]
      }
    ],
    response_format={
      "type": "text"
    },
    temperature=1,
    max_completion_tokens=2048,
    top_p=1,
    frequency_penalty=0,
    presence_penalty=0
  )

  return response.choices[0].message.content

### üîç Scrape and Summarize Wikipedia Article
Scrape a Wikipedia page and generate a brief summary using predefined functions for content extraction and summarization.

In [None]:
url = "https://en.wikipedia.org/wiki/Mesopotamia"
text = scrape_wikipedia(url)
summary = get_summary(text)

print(summary)

Mesopotamia, known as the "land between rivers," is a historical region in West Asia, primarily within present-day Iraq, but also covering parts of Iran, Turkey, Syria, and Kuwait. It is recognized as the cradle of civilization for its significant contributions, including the development of the first cereal crops, the invention of the wheel, and the establishment of the earliest forms of writing and mathematics. Its complex history saw the rise and fall of empires such as the Sumerians, Akkadians, Babylonians, and Assyrians, influencing the region until the Muslim conquests in the 7th century AD.
