---
title: "Requests and Beautiful Soup"
Description: "Requesting a web page as an object and then usining Beautiful Soup to access various elemets of that object."
date: 2021-01-29T14:50:18Z
draft: false
---

***


## Import Modules

In [1]:
import requests
from bs4 import BeautifulSoup
# Example adapted from the links below
# https://requests.readthedocs.io/en/master/
# https://www.crummy.com/software/BeautifulSoup/bs4/doc/

## Request a Web Page

In [2]:
html_page = requests.get('https://www.crummy.com/software/BeautifulSoup/bs4/doc/', auth=('user', 'pass'))

## Return Page Headers

In [3]:
# Return Page Headers
html_page.headers['content-type']

'text/html; charset=UTF-8'

## Return Page Status Code

In [4]:
# Return Page Status Code
html_page.status_code

200

## Create a Beautiful Soup Object with HTML text

In [5]:
# Create a Beautiful Soup Object with HTML text
soup = BeautifulSoup(html_page.text, 'html.parser')

## Prettify HTML Text

In [6]:
# Prettify HTML Text
# print(soup.prettify()) # Uncomment this line to run this cell

## Return Page Title

In [7]:
# Return Page Title
soup.title

<title>Beautiful Soup Documentation — Beautiful Soup 4.9.0 documentation</title>

## Extract all Links

In [8]:
# Extract all links
for link in soup.find_all('a'):
    print(link.get('href'))

genindex.html
#

#beautiful-soup-documentation
http://www.crummy.com/software/BeautifulSoup/
http://www.crummy.com/software/BeautifulSoup/bs3/documentation.html
#porting-code-to-bs4
https://www.crummy.com/software/BeautifulSoup/bs4/doc.zh/
http://kondou.com/BS4/
https://www.crummy.com/software/BeautifulSoup/bs4/doc.ko/
https://www.crummy.com/software/BeautifulSoup/bs4/doc.ptbr
https://www.crummy.com/software/BeautifulSoup/bs4/doc.ru/
#getting-help
https://groups.google.com/forum/?fromgroups#!forum/beautifulsoup
#diagnose
#quick-start
#installing-beautiful-soup
http://www.crummy.com/software/BeautifulSoup/bs3/documentation.html
http://www.crummy.com/software/BeautifulSoup/download/4.x/
#problems-after-installation
#installing-a-parser
http://lxml.de/
http://code.google.com/p/html5lib/
#differences-between-parsers
#making-the-soup
#id17
#kinds-of-objects
#tag
#navigating-the-tree
#searching-the-tree
#name
#attributes
#multi-valued-attributes
#navigablestring
#navigating-the-tree
#searchi

## Extract all Text

In [9]:
# Extracting Text
# print(soup.get_text()) # Uncomment this line to run this cell