In [1]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

In [2]:
url = "https://en.wikipedia.org/wiki/Environmental_ethics"

# Write a function to Get and parse html content from a Wikipedia page
response = requests.get(url)

# Check if the request was successful (status code 200 indicates success)
if response.status_code == 200:
    # If successful, print a success message
    print("Successfully fetched the webpage")
else:
    # If not successful, print a failure message
    print("Failed to fetch the webpage.")

Successfully fetched the webpage


In [5]:
soup = BeautifulSoup(response.content , 'html.parser')


In [23]:
soup.prettify()

'<!DOCTYPE html>\n<html class="client-nojs vector-feature-language-in-header-enabled vector-feature-language-in-main-page-header-disabled vector-feature-page-tools-pinned-disabled vector-feature-toc-pinned-clientpref-1 vector-feature-main-menu-pinned-disabled vector-feature-limited-width-clientpref-1 vector-feature-limited-width-content-enabled vector-feature-custom-font-size-clientpref-1 vector-feature-appearance-pinned-clientpref-1 vector-feature-night-mode-enabled skin-theme-clientpref-day vector-sticky-header-enabled vector-toc-available" dir="ltr" lang="en">\n <head>\n  <meta charset="utf-8"/>\n  <title>\n   Environmental ethics - Wikipedia\n  </title>\n  <script>\n   (function(){var className="client-js vector-feature-language-in-header-enabled vector-feature-language-in-main-page-header-disabled vector-feature-page-tools-pinned-disabled vector-feature-toc-pinned-clientpref-1 vector-feature-main-menu-pinned-disabled vector-feature-limited-width-clientpref-1 vector-feature-limited

In [9]:
def get_title(soup):
    title = soup.find('title')
    if title is not None:
        return title.text.strip()  # strip() to remove leading/trailing whitespaces
    return "Title not found"

In [11]:
#Write a function to Extract article title
title = get_title(soup)
print(title)

Environmental ethics - Wikipedia


In [25]:
# Write a function to Extract article text for each paragraph with their respective headings. Map those headings to their respective paragraphs in the dictionary###
def extract_article(soup):
    article = {}

    # Find all headings (h1, h2, h3, etc.)
    headings = soup.find_all(['h1', 'h2', 'h3', 'h4', 'h5', 'h6'])

    # Iterate through headings and extract corresponding paragraphs
    for heading in headings:
        heading_text = heading.text.strip()
        paragraphs = []
        paragraph = heading.find_next('p')
        while paragraph:
            paragraphs.append(paragraph.text.strip())
            paragraph = paragraph.find_next('p')
        article[heading_text] = paragraphs

    return article

# Call the function and store the result in a variable
article = extract_article(soup)

# Print the result
for heading, paragraphs in article.items():
    print(f"**{heading}**")
    for paragraph in paragraphs:
        print(paragraph)
    print()

**Contents**
In environmental philosophy, environmental ethics is an established field of practical philosophy "which reconstructs the essential types of argumentation that can be made for protecting natural entities and the sustainable use of natural resources."[1] The main competing paradigms are anthropocentrism, physiocentrism (called ecocentrism as well), and theocentrism. Environmental ethics exerts influence on a large range of disciplines including environmental law, environmental sociology, ecotheology, ecological economics, ecology and environmental geography.
There are many ethical decisions that human beings make with respect to the environment. These decision raise numerous questions. For example:
The academic field of environmental ethics grew up in response to the works of Rachel Carson and Murray Bookchin and events such as the first Earth Day in 1970, when environmentalists started urging philosophers to consider the philosophical aspects of environmental problems. Two

In [27]:
# Write a function to collect every link that redirects to another Wikipedia page
def collect_links(soup):
    wikipedia_links = []

    # Find all links on the page
    links = soup.find_all('a', href=True)

    # Iterate through links and check if they point to another Wikipedia page
    for link in links:
        href = link['href']
        if href.startswith('/wiki/') and ':' not in href:
            wikipedia_links.append(f"https://en.wikipedia.org{href}")

    return wikipedia_links

# Call the function and store the result in a variable
wikipedia_links = collect_links(soup)

# Print the result
for link in wikipedia_links:
    print(link)

https://en.wikipedia.org/wiki/Main_Page
https://en.wikipedia.org/wiki/Main_Page
https://en.wikipedia.org/wiki/Environmental_ethics
https://en.wikipedia.org/wiki/Environmental_ethics
https://en.wikipedia.org/wiki/Environmental_ethics
https://en.wikipedia.org/wiki/Environmental_Ethics_(journal)
https://en.wikipedia.org/wiki/Environmental_philosophy
https://en.wikipedia.org/wiki/Anthropocentrism
https://en.wikipedia.org/wiki/Physiocentrism
https://en.wikipedia.org/wiki/Ecocentrism
https://en.wikipedia.org/wiki/Theocentrism
https://en.wikipedia.org/wiki/Environmental_law
https://en.wikipedia.org/wiki/Environmental_sociology
https://en.wikipedia.org/wiki/Ecotheology
https://en.wikipedia.org/wiki/Ecological_economics
https://en.wikipedia.org/wiki/Ecology
https://en.wikipedia.org/wiki/Integrated_geography
https://en.wikipedia.org/wiki/Clearcutting
https://en.wikipedia.org/wiki/Internal_combustion_engine
https://en.wikipedia.org/wiki/Future_generations
https://en.wikipedia.org/wiki/Extinction


In [19]:
# Wrap all the previous functions into a single function that takes as parameters a Wikipedia link

def link_analyzer(wikipedia_link):
    response = requests.get(wikipedia_link)
    soup = BeautifulSoup(response.text, 'html.parser')

    title = soup.find('h1', id='firstHeading').text
    paragraphs = soup.find_all('p')
    text = '\n'.join([p.text for p in paragraphs])

    links = soup.find_all('a', href=True)
    wikipedia_links = []

    for link in links:
        href = link['href']
        if href.startswith('/wiki/') and ':' not in href:
            new_link = f"https://en.wikipedia.org{href}"
            response = requests.head(new_link, allow_redirects=True)
            if response.status_code == 200:
                wikipedia_links.append(new_link)

    return {
        'title': title,
        'text': text,
        'links': wikipedia_links
    }

# Example usage:
wikipedia_link = "https://en.wikipedia.org/wiki/Environmental_ethics"
result = link_analyzer(wikipedia_link)
print("Title:", result['title'])
print("Text:", result['text'])
print("Links:", result['links'])

Title: Environmental ethics
Text: In environmental philosophy, environmental ethics is an established field of practical philosophy "which reconstructs the essential types of argumentation that can be made for protecting natural entities and the sustainable use of natural resources."[1] The main competing paradigms are anthropocentrism, physiocentrism (called ecocentrism as well), and theocentrism. Environmental ethics exerts influence on a large range of disciplines including environmental law, environmental sociology, ecotheology, ecological economics, ecology and environmental geography.

There are many ethical decisions that human beings make with respect to the environment. These decision raise numerous questions. For example:

The academic field of environmental ethics grew up in response to the works of Rachel Carson and Murray Bookchin and events such as the first Earth Day in 1970, when environmentalists started urging philosophers to consider the philosophical aspects of envi

In [21]:
import requests
from bs4 import BeautifulSoup

def wikipedia_page_analyzer(url):
    response = requests.get(url)
    if response.status_code == 200:
        print("Successfully fetched the webpage!")
    else:
        print("Failed to fetch the webpage.")
        return None

    soup = BeautifulSoup(response.content, 'html.parser')

    def get_title(soup):
        title = soup.find('title')
        if title is not None:
            return title.text.strip()
        return "Title not found"

    def extract_article(soup):
        article = {}
        headings = soup.find_all(['h1', 'h2', 'h3', 'h4', 'h5', 'h6'])
        for heading in headings:
            heading_text = heading.text.strip()
            paragraphs = []
            paragraph = heading.find_next('p')
            while paragraph:
                paragraphs.append(paragraph.text.strip())
                paragraph = paragraph.find_next('p')
            article[heading_text] = paragraphs
        return article

    def collect_links(soup):
        wikipedia_links = []
        links = soup.find_all('a', href=True)
        for link in links:
            href = link['href']
            if href.startswith('/wiki/') and ':' not in href:
                wikipedia_links.append(f"https://en.wikipedia.org{href}")
        return wikipedia_links

    title = get_title(soup)
    print("Title:", title)

    article = extract_article(soup)
    print("Article Text:")
    for heading, paragraphs in article.items():
        print(f"**{heading}**")
        for paragraph in paragraphs:
            print(paragraph)
        print()

    wikipedia_links = collect_links(soup)
    print("Wikipedia Links:")
    for link in wikipedia_links:
        print(link)

    return {
        'title': title,
        'article_text': article,
        'wikipedia_links': wikipedia_links
    }

# Example usage:
url = "https://en.wikipedia.org/wiki/Happiness"
result = wikipedia_page_analyzer(url)

Successfully fetched the webpage!
Title: Happiness - Wikipedia
Article Text:
**Contents**

Happiness is a complex and multifaceted emotion that encompasses a range of positive feelings, from contentment to intense joy. It is often associated with positive life experiences, such as achieving goals, spending time with loved ones, or engaging in enjoyable activities. However, happiness can also arise spontaneously, without any apparent external cause.
Happiness is closely linked to well-being and overall life satisfaction. Studies have shown that individuals who experience higher levels of happiness tend to have better physical and mental health, stronger social relationships, and greater resilience in the face of adversity.
The pursuit of happiness has been a central theme in philosophy and psychology for centuries. While there is no single, universally accepted definition of happiness, it is generally understood to be a state of mind characterized by positive emotions, a sense of purpos