#  Requests

We use the requests package in Python for making HTTP requests to web servers. Here are some reasons why requests is commonly used:


**Simplicity**: The requests library provides a simple and elegant API for making HTTP requests. It abstracts away the complexities of dealing with low-level networking details, making it easy for developers to interact with web services.

**Ease of Use**: Making HTTP requests with requests is straightforward and requires minimal code. You can perform common operations like GET, POST, PUT, DELETE, etc., with just a few lines of code.


**Versatility**: requests supports various HTTP features and protocols, including SSL, cookies, authentication, proxies, and more. It allows you to customize requests and handle different scenarios effectively.



**Robustness**: The requests library handles many aspects of HTTP communication, such as connection management, timeouts, error handling, and redirection, automatically. This makes your code more robust and less error-prone.

**Widespread Adoption:** requests is one of the most widely used HTTP libraries in the Python ecosystem. It has a large and active community of developers, which means you can easily find documentation, tutorials, and support if you encounter any issues.



**Integration with Other Libraries**: requests integrates well with other Python libraries commonly used in web development and data science, such as BeautifulSoup for web scraping, Flask and Django for web development, Pandas for data manipulation, etc.



Overall, the requests package simplifies the process of making HTTP requests in Python, making it an essential tool for web development, data extraction, and interacting with web APIs.


more info : https://www.geeksforgeeks.org/python-requests-tutorial/

In [15]:
import requests 

# Making a GET request 
r = requests.get('https://webscraper.io/test-sites/e-commerce/static/computers/laptops?page=') 

 
# Check status code for response received
# Success code - 200
if r.status_code == 200:
    print("Request successful: ",r)
else:
    print(f"Request failed with status code: {r.status_code}")
# print content of request 
print(r.content)


Request successful:  <Response [200]>
b'<!DOCTYPE html>\n<html lang="en">\n<head>\n\t<!-- Google Tag Manager -->\n<script>(function (w, d, s, l, i) {\n\t\tw[l] = w[l] || [];\n\t\tw[l].push({\n\t\t\t\'gtm.start\':\n\t\t\t\tnew Date().getTime(), event: \'gtm.js\'\n\t\t});\n\t\tvar f = d.getElementsByTagName(s)[0],\n\t\t\tj = d.createElement(s), dl = l != \'dataLayer\' ? \'&l=\' + l : \'\';\n\t\tj.async = true;\n\t\tj.src =\n\t\t\t\'https://www.googletagmanager.com/gtm.js?id=\' + i + dl;\n\t\tf.parentNode.insertBefore(j, f);\n\t})(window, document, \'script\', \'dataLayer\', \'GTM-NVFPDWB\');</script>\n<!-- End Google Tag Manager -->\n\t<title>Static | Web Scraper Test Sites</title>\n\t<meta charset="utf-8">\n\t<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">\n\n\t<meta name="keywords"\n\t\t  content="web scraping,Web Scraper,Chrome extension,Crawling,Cross platform scraper"/>\n\t<meta name="description"\n\t\t  content="The most popular web scraping extension. Start scraping

# BeautifulSoup

href: https://beautiful-soup-4.readthedocs.io/en/latest/

## Extracting Text Content from Links (<a>)

In [19]:
from bs4 import BeautifulSoup

# Parse the HTML content of the webpage
soup = BeautifulSoup(r.content, 'html.parser')

# Find all links on the page
links = soup.find_all('a')

# Print the href attribute of each link
for link in links:
    print(link.get('href'))


None
/
/
/cloud-scraper
/pricing
#
/documentation
/tutorials
/how-to-videos
/test-sites
https://forum.webscraper.io/
https://chromewebstore.google.com/detail/web-scraper-free-web-scra/jnhgnonknehpejjnehehllkliplmbmhn?hl=en
https://cloud.webscraper.io/
/test-sites/e-commerce/static
/test-sites/e-commerce/static/computers
/test-sites/e-commerce/static/computers/laptops
/test-sites/e-commerce/static/computers/tablets
/test-sites/e-commerce/static/phones
/test-sites/e-commerce/static/product/31
/test-sites/e-commerce/static/product/32
/test-sites/e-commerce/static/product/33
/test-sites/e-commerce/static/product/34
/test-sites/e-commerce/static/product/35
/test-sites/e-commerce/static/product/36
/test-sites/e-commerce/static/computers/laptops?page=2
/test-sites/e-commerce/static/computers/laptops?page=3
/test-sites/e-commerce/static/computers/laptops?page=4
/test-sites/e-commerce/static/computers/laptops?page=5
/test-sites/e-commerce/static/computers/laptops?page=6
/test-sites/e-commerce/s

## Extracting Text Content from Paragraphs (<p>)

In [20]:
# Find all paragraphs on the page
paragraphs = soup.find_all('p')

# Print the text content of each paragraph
for paragraph in paragraphs:
    print(paragraph.text)


Web Scraper
Cloud Scraper
Pricing
Learn
15.6", AMD E2-3800 1.3GHz, 4GB, 500GB, Windows 8.1
2 reviews




15.6", Pentium N3520 2.16GHz, 4GB, 500GB, Linux
2 reviews





15.6", Core i5-4200M, 4GB, 500GB, Win7 Pro 64bit
2 reviews



14", Core i5 2.6GHz, 4GB, 500GB, Win7 Pro 64bit
8 reviews






12.5", Core i5-4300U, 8GB, 240GB SSD, Win7 Pro 64bit
12 reviews





15.6", Core i5-4200U, 8GB, 1TB, Radeon R7 M265, Windows 8.1
2 reviews



Products
Company
Resources
CONTACT US
Copyright © 2024
					Web Scraper | All rights
					reserved


## Extracting Image URLs from <img> Tags

In [21]:
# Find all image tags on the page
images = soup.find_all('img')

# Print the src attribute of each image (image URL)
for img in images:
    print(img.get('src'))


/img/logo_white.svg
/images/test-sites/e-commerce/items/cart2.png
/images/test-sites/e-commerce/items/cart2.png
/images/test-sites/e-commerce/items/cart2.png
/images/test-sites/e-commerce/items/cart2.png
/images/test-sites/e-commerce/items/cart2.png
/images/test-sites/e-commerce/items/cart2.png


## Extracting Table Data from <table> Tags

In [23]:
# Find all tables on the page
tables = soup.find_all('table')

# Assuming you want to extract data from the first table
if tables:
    # Find all rows in the first table
    rows = tables[0].find_all('tr')
    
    # Print the data from each row
    for row in rows:
        # Find all cells in the row
        cells = row.find_all(['th', 'td'])
        # Print the text content of each cell
        for cell in cells:
            print(cell.text)
else:
    print("No tables")

No tables


## Extracting Headings

In [24]:
# Find all headings on the page (h1, h2, h3, ...)
headings = [soup.find_all(f'h{i}') for i in range(1, 7)]

# Print the text content of each heading
for heading_list in headings:
    for heading in heading_list:
        print(heading.text)


Test Sites
Computers / Laptops
$416.99

Packard 255 G2

$306.99

Aspire E1-510

$1178.99

ThinkPad T540p

$739.99

ProBook

$1311.99

ThinkPad X240

$581.99

Aspire E1-572G



## Extracting Lists - Unordered List and Ordered List

In [25]:
# Find all unordered lists on the page
unordered_lists = soup.find_all('ul')

# Print the text content of each list item in each unordered list
for ul in unordered_lists:
    list_items = ul.find_all('li')
    for li in list_items:
        print(li.text)

# Similarly, you can do the same for ordered lists (<ol>)




Web Scraper





Cloud Scraper





Pricing





Learn




Documentation


Video Tutorials


How to


Test Sites


Forum




Documentation


Video Tutorials


How to


Test Sites


Forum


Install


Cloud Login


Documentation


Video Tutorials


How to


Test Sites


Forum


Home



					Computers
					




							Laptops
						



							Tablets
						





							Laptops
						



							Tablets
						



					Phones
					




							Laptops
						



							Tablets
						


‹

1
2
3
4
5
6
7
8
9
10
...
19
20

›

Products

Web Scraper browser extension


Web Scraper Cloud

Company

About us


Contact


Website Privacy Policy


Browser Extension Privacy Policy


Media kit

Jobs
Resources
Blog

Documentation


Video Tutorials


Screenshots


Test Sites


Forum


Status

CONTACT US

info@webscraper.io

Ubelu 5-71, Adazi, Latvia, LV-2164



























## Extracting Data Using CSS Selectors:

In [26]:
# Find all elements with class='title'
titles = soup.select('.title')

# Print the text content of each title
for title in titles:
    print(title.text)


Packard 255 G2
Aspire E1-510
ThinkPad T540p
ProBook
ThinkPad X240
Aspire E1-572G


In [29]:
# Find all elements with class='price'
prices = soup.select('.price')

# Print the text content of each element
for price in prices:
    print(price.text)


$416.99
$306.99
$1178.99
$739.99
$1311.99
$581.99


## Extracting Links from Elements with a Specific Class

In [31]:
# Find all elements with class='thumbnail'
thumbnails = soup.select('.thumbnail')

# Extract image URLs from each thumbnail
for thumbnail in thumbnails:
    image_url = thumbnail.find('img')['src']
    print(image_url)


/images/test-sites/e-commerce/items/cart2.png
/images/test-sites/e-commerce/items/cart2.png
/images/test-sites/e-commerce/items/cart2.png
/images/test-sites/e-commerce/items/cart2.png
/images/test-sites/e-commerce/items/cart2.png
/images/test-sites/e-commerce/items/cart2.png


## Extracting Data from Nested Elements with a Specific Class:

In [32]:
# Find all elements with class='row'
rows = soup.select('.row')

# Extract data from nested elements within each row
for row in rows:
    # Find elements with class='title' within the row
    titles = row.select('.title')
    for title in titles:
        print(title.text)

    # Find elements with class='price' within the row
    prices = row.select('.price')
    for price in prices:
        print(price.text)


Packard 255 G2
Aspire E1-510
ThinkPad T540p
ProBook
ThinkPad X240
Aspire E1-572G
$416.99
$306.99
$1178.99
$739.99
$1311.99
$581.99
Packard 255 G2
Aspire E1-510
ThinkPad T540p
ProBook
ThinkPad X240
Aspire E1-572G
$416.99
$306.99
$1178.99
$739.99
$1311.99
$581.99


## Extract all unique class names from the HTML content using BeautifulSoup

In [33]:
from bs4 import BeautifulSoup
import requests 

# Making a GET request 
r = requests.get('https://webscraper.io/test-sites/e-commerce/static/computers/laptops?page=') 

# Parse the HTML content of the webpage
soup = BeautifulSoup(r.content, 'html.parser')

# Find all elements with class attribute
all_classes = set()
for element in soup.find_all(True):
    classes = element.get('class')
    if classes:
        all_classes.update(classes)

# Print all unique class names
for class_name in all_classes:
    print(class_name)


thumbnail
col-md-4
pull-right
ws-icon-linkedin
wrapper
ws-icon-right
nav
active
col-lg-9
browser-icon-dark
dropdown
test-site
navbar-brand
flex-column
navbar-header
btn-menu2
side-collapse
card-img-top
page-link
btn-menu1
col-lg-3
ws-icon-facebook-f
visually-hidden
fixed-top
navbar-toggler
copyright
row
push
smedia
sidebar
container-fluid
crta
menuitm
in
card-text
bottom-bar
dropdown-menu
image
navbar-right
sidebar-nav
description
disabled
dropdown-item
navbar-light
title
ratings
nav-second-level
caption
product-wrapper
ws-icon-chrome-dark
collapse
middle-bar
card-body
img-responsive
formenu-here
navbar-nav
collapsed
col-xl-4
review-count
page-header
navbar-dark
navbar-expand-lg
ws-icon-twitter
img-fluid
category-link
navbar-collapse
clearfix
subcategory-link
nav-link
card-title
top-bar
price
card
icon-bar
pagination
float-end
ws-icon
col-lg-12
page-item
navbar-static
blog-hero
ws-icon-youtube
install-extension
nav-item
container
dropdown-toggle
footer
col-lg-4
ws-icon-star
navbar
extr

## Extract all unique IDs from the HTML content using BeautifulSoup

In [35]:
from bs4 import BeautifulSoup
import requests

# Making a GET request
r = requests.get('https://webscraper.io/test-sites/e-commerce/static/computers/laptops?page=')

# Parse the HTML content of the webpage
soup = BeautifulSoup(r.content, 'html.parser')

# Find all elements with id attribute
all_ids = set()
for element in soup.find_all(True):
    element_id = element.get('id')
    if element_id:
        all_ids.add(element_id)

# Print all unique IDs
for element_id in all_ids:
    print(element_id)


layout-footer
static-pagination
dropdownMenuLink
side-menu
navbar


# urllib


In [12]:
import urllib.request 

# URL of the web page to fetch 
url = 'https://webscraper.io/test-sites/e-commerce/static/computers/laptops?page='

try: 
	# Open the URL and read its content 
	response = urllib.request.urlopen(url) 
	
	# Read the content of the response 
	data = response.read() 
	
	# Decode the data (if it's in bytes) to a string 
	html_content = data.decode('utf-8') 
	print(type(html_content))
	# Print the HTML content of the web page 
	print(html_content) 

except Exception as e: 
	print("Error fetching URL:", e) 


<class 'str'>
<!DOCTYPE html>
<html lang="en">
<head>
	<!-- Google Tag Manager -->
<script>(function (w, d, s, l, i) {
		w[l] = w[l] || [];
		w[l].push({
			'gtm.start':
				new Date().getTime(), event: 'gtm.js'
		});
		var f = d.getElementsByTagName(s)[0],
			j = d.createElement(s), dl = l != 'dataLayer' ? '&l=' + l : '';
		j.async = true;
		j.src =
			'https://www.googletagmanager.com/gtm.js?id=' + i + dl;
		f.parentNode.insertBefore(j, f);
	})(window, document, 'script', 'dataLayer', 'GTM-NVFPDWB');</script>
<!-- End Google Tag Manager -->
	<title>Static | Web Scraper Test Sites</title>
	<meta charset="utf-8">
	<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">

	<meta name="keywords"
		  content="web scraping,Web Scraper,Chrome extension,Crawling,Cross platform scraper"/>
	<meta name="description"
		  content="The most popular web scraping extension. Start scraping in minutes. Automate your tasks with our Cloud Scraper. No software to download, no coding needed."/>
	<li

In [39]:
import urllib.request 
from bs4 import BeautifulSoup

# URL of the web page to fetch 
url = 'https://webscraper.io/test-sites/e-commerce/static/computers/laptops?page='


# Open the URL and read its content 
response = urllib.request.urlopen(url) 
    # Read the content of the response 
data = response.read() 

# Decode the data (if it's in bytes) to a string 
html_content = data.decode('utf-8') 

# Print the type of HTML content
print("Type of HTML content:", type(html_content))

# Print the length of HTML content (in characters)
print("Length of HTML content:", len(html_content))




Type of HTML content: <class 'str'>
Length of HTML content: 18140


In [40]:

# Parse the HTML content using BeautifulSoup
soup = BeautifulSoup(html_content, 'html.parser')

# Find all links on the page
links = soup.find_all('a')

# Print the number of links found
print("Number of links on the page:", len(links))


Number of links on the page: 58


In [41]:

# Print the href attribute of each link
for link in links:
    print(link.get('href'))


None
/
/
/cloud-scraper
/pricing
#
/documentation
/tutorials
/how-to-videos
/test-sites
https://forum.webscraper.io/
https://chromewebstore.google.com/detail/web-scraper-free-web-scra/jnhgnonknehpejjnehehllkliplmbmhn?hl=en
https://cloud.webscraper.io/
/test-sites/e-commerce/static
/test-sites/e-commerce/static/computers
/test-sites/e-commerce/static/computers/laptops
/test-sites/e-commerce/static/computers/tablets
/test-sites/e-commerce/static/phones
/test-sites/e-commerce/static/product/31
/test-sites/e-commerce/static/product/32
/test-sites/e-commerce/static/product/33
/test-sites/e-commerce/static/product/34
/test-sites/e-commerce/static/product/35
/test-sites/e-commerce/static/product/36
/test-sites/e-commerce/static/computers/laptops?page=2
/test-sites/e-commerce/static/computers/laptops?page=3
/test-sites/e-commerce/static/computers/laptops?page=4
/test-sites/e-commerce/static/computers/laptops?page=5
/test-sites/e-commerce/static/computers/laptops?page=6
/test-sites/e-commerce/s

In [45]:
# Extract text from all <h1> tags
titles = soup.find_all('h1')

# Print the text of each <h1> tag
for title in titles:
    print("Title of the page:", title.text)


Title of the page: Test Sites
Title of the page: Computers / Laptops
