## Run a Program with the webbrowser Module

In [1]:
# showmap.py - Launches a map in the browser using an address from the
# command line or clipboard

import webbrowser, sys, pyperclip
if len(sys.argv) > 1:
    # Get address from command line.
    address = ' '.join(sys.argv[1:])
else:
    # Get address from clipboard.
    address = pyperclip.paste()

# Open the web browser.
webbrowser.open('https://www.openstreetmap.org/search?query=' + address)



True

## Downloading Files from the Web with the requests Module


In [2]:
import requests
response = requests.get('https://automatetheboringstuff.com/files/rj.txt')
type(response)

requests.models.Response

In [3]:
response.status_code == requests.codes.ok

True

In [4]:
len(response.text)

174126

In [5]:
print(response.text[:210])

The Project Gutenberg EBook of Romeo and Juliet, by William Shakespeare

This eBook is for the use of anyone anywhere at no cost and with
almost no restrictions whatsoever.  You may copy it, give it away or
re-


In [6]:
# A simpler way to check for status
response = requests.get('https://inventwithpython.com/page_that_does_not_exist')
response.raise_for_status()

HTTPError: 404 Client Error: Not Found for url: https://inventwithpython.com/page_that_does_not_exist

In [8]:
# If a failed download isn’t a deal breaker, you can wrap the raise_for_status() line with try and except statements to handle this error case without crashing:
import requests
response = requests.get('https://inventwithpython.com/page_that_does_not_exist')
try:
    response.raise_for_status()
except Exception as exc:
    print(f'There was a problem: {exc}')

There was a problem: 404 Client Error: Not Found for url: https://inventwithpython.com/page_that_does_not_exist


## Saving Downloaded Files to the Hard Drive
You can save the web page to a file on your hard drive with the standard open() function and write() method. However, you must open the file in write binary mode by passing the string 'wb' as the second argument to open(). Even if the page is in plaintext (such as the Romeo and Juliet text you downloaded earlier), you need to write binary data instead of text data in order to maintain the Unicode encoding of the text.

In [None]:
import requests
response = requests.get('https://automatetheboringstuff.com/files/rj.txt')
response.raise_for_status()
with open('RomeoAndJuliet.txt', 'wb') as play_file:
    for chunk in response.iter_content(100000): # 100,000 byte chunks at a time
        play_file.write(chunk)

## Accessing a weather API

Many HTTP APIs deliver their responses as one large string. This string is often formatted as JSON or XML

json.loads(response.text) returns a Python data structure of lists and dictionaries containing the JSON data in response.text

In [18]:
import requests
city_name = 'San Francisco'
state_code = 'CA'
country_code = 'US'
API_key = "91a9b90134ef0b7913fe21bcf52dec4d"
response = requests.get(f'https://api.openweathermap.org/geo/1.0/direct?q={city_name},{state_code},{country_code}&appid={API_key}')
response.text

'[{"name":"San Francisco","local_names":{"li":"San Francisco","hu":"San Francisco","ie":"San Francisco","ce":"Сан-Франциско","nl":"San Francisco","fy":"San Francisco","ug":"San Fransisko","hr":"San Francisco","bo":"སན་ཧྥུ་རན་སིས་ཁོ","cy":"San Francisco","ml":"സാൻ ഫ്രാൻസിസ്കോ","ms":"San Francisco","ku":"San Francisco","bn":"সান ফ্রান্সিস্কো","bs":"San Francisco","ln":"San Francisco","sw":"San Francisco","fa":"سان فرانسیسکو","da":"San Francisco","vi":"Cựu Kim Sơn","na":"San Francisco","si":"සැන් ෆ්\u200dරැන්සිස්කෝ","ne":"सान फ्रान्सिस्को","ha":"San Francisco","te":"శాన్ ఫ్రాన్సిస్కో","uk":"Сан-Франциско","os":"Сан-Франциско","tr":"San Francisco","ba":"Сан-Франциско","et":"San Francisco","tl":"San Francisco","mg":"San Francisco","vo":"San Francisco","yo":"San Francisco","ru":"Сан-Франциско","eo":"San-Francisko","ca":"San Francisco","ga":"San Francisco","co":"San Francisco","pa":"ਸਾਨ ਫ਼ਰਾਂਸਿਸਕੋ","so":"San Fransisko","ht":"San Francisco","de":"San Francisco","la":"Franciscopolis","kk":"Сан-

In [19]:
import json
response_data = json.loads(response.text)
response_data

[{'name': 'San Francisco',
  'local_names': {'li': 'San Francisco',
   'hu': 'San Francisco',
   'ie': 'San Francisco',
   'ce': 'Сан-Франциско',
   'nl': 'San Francisco',
   'fy': 'San Francisco',
   'ug': 'San Fransisko',
   'hr': 'San Francisco',
   'bo': 'སན་ཧྥུ་རན་སིས་ཁོ',
   'cy': 'San Francisco',
   'ml': 'സാൻ ഫ്രാൻസിസ്കോ',
   'ms': 'San Francisco',
   'ku': 'San Francisco',
   'bn': 'সান ফ্রান্সিস্কো',
   'bs': 'San Francisco',
   'ln': 'San Francisco',
   'sw': 'San Francisco',
   'fa': 'سان فرانسیسکو',
   'da': 'San Francisco',
   'vi': 'Cựu Kim Sơn',
   'na': 'San Francisco',
   'si': 'සැන් ෆ්\u200dරැන්සිස්කෝ',
   'ne': 'सान फ्रान्सिस्को',
   'ha': 'San Francisco',
   'te': 'శాన్ ఫ్రాన్సిస్కో',
   'uk': 'Сан-Франциско',
   'os': 'Сан-Франциско',
   'tr': 'San Francisco',
   'ba': 'Сан-Франциско',
   'et': 'San Francisco',
   'tl': 'San Francisco',
   'mg': 'San Francisco',
   'vo': 'San Francisco',
   'yo': 'San Francisco',
   'ru': 'Сан-Франциско',
   'eo': 'San-Francisko',

In [20]:
response_data[0]['lat']

37.7790262

In [21]:
response_data[0]['lon']

-122.419906

In [22]:
lat = json.loads(response.text)[0]['lat']
lon = json.loads(response.text)[0]['lon']
response = requests.get(f'https://api.openweathermap.org/data/2.5/weather?lat={lat}&lon={lon}&appid={API_key}')
response_data = json.loads(response.text)
response_data

{'coord': {'lon': -122.4199, 'lat': 37.779},
 'weather': [{'id': 803,
   'main': 'Clouds',
   'description': 'broken clouds',
   'icon': '04n'}],
 'base': 'stations',
 'main': {'temp': 283.75,
  'feels_like': 283.03,
  'temp_min': 282.63,
  'temp_max': 284.74,
  'pressure': 1011,
  'humidity': 83,
  'sea_level': 1011,
  'grnd_level': 1008},
 'visibility': 10000,
 'wind': {'speed': 1.34, 'deg': 61, 'gust': 2.68},
 'clouds': {'all': 64},
 'dt': 1763457393,
 'sys': {'type': 2,
  'id': 2017837,
  'country': 'US',
  'sunrise': 1763477618,
  'sunset': 1763513793},
 'timezone': -28800,
 'id': 5391959,
 'name': 'San Francisco',
 'cod': 200}

In [23]:
response_data['main']['temp']

283.75

In [24]:
round(285.44 - 273.15, 1)  # Convert Kelvin to Celsius.

12.3

In [25]:
round(285.44 * (9 / 5) - 459.67, 1)  # Convert Kelvin to Fahrenheit.

54.1

## Parsing HTML with beautiful soupt

In [13]:
import requests, bs4
res = requests.get('https://autbor.com/example3.html')
res.raise_for_status()
example_soup = bs4.BeautifulSoup(res.text, 'html.parser')
type(example_soup)

bs4.BeautifulSoup

In [None]:
import bs4
with open('example3.html') as example_file:
    example_soup = bs4.BeautifulSoup(example_file, 'html.parser')
type(example_soup)

bs4.BeautifulSoup

In [2]:
import bs4
example_file = open('/Users/alanwright/Documents/GitHub/Python_Learning/Automate_The_Boring_Stuff_With_Python/example3.html')
example_soup = bs4.BeautifulSoup(example_file.read(), 'html.parser')
elems = example_soup.select('#author')
type(elems) # elems is a list of Tag objects.

bs4.element.ResultSet

In [3]:
type(elems[0])

bs4.element.Tag

In [4]:
str(elems[0])

'<span id="author">Al Sweigart</span>'

In [5]:
elems[0].get_text()

'Al Sweigart'

In [6]:
elems[0].attrs

{'id': 'author'}

In [7]:
p_elems = example_soup.select('p')
str(p_elems[0])

'<p>This &lt;p&gt; tag puts <b>content</b> into a <i>single</i> paragraph.</p>'

In [8]:
p_elems[0].get_text()

'This <p> tag puts content into a single paragraph.'

In [9]:
str(p_elems[1])

'<p><a href="https://inventwithpython.com/">This text is a link</a> to books by <span id="author">Al Sweigart</span>.</p>'

In [10]:
p_elems[1].get_text()

'This text is a link to books by Al Sweigart.'

In [11]:
str(p_elems[2])

'<p><img alt="Close up of my cat Zophie." src="./example3_files/wow_such_zophie_thumb.webp"/></p>'

In [12]:
p_elems[2].get_text()

''

#### Getting Data from an Element’s Attributes


In [14]:
import bs4
soup = bs4.BeautifulSoup(open('example3.html'), 'html.parser')
span_elem = soup.select('span')[0]
str(span_elem)


'<span id="author">Al Sweigart</span>'

In [15]:
span_elem.get('id')

'author'

In [16]:
span_elem.get('some_nonexistent_addr') == None

True

In [17]:
span_elem.attrs

{'id': 'author'}

## Project 7: Open all search results

Note - I couldn't actually get this to work, as pypi seemed to be blocking requests from my script.  Better to use specialist tools like Selenium, Playwright, Exa, etc for this type of activity

In [4]:
# searchpypi.py - Opens several search results on pypi.org

import requests, sys, webbrowser, bs4

print('Searching...') # Displaying text while downloading the search results page
res = requests.get('https://pypi.org/search/?q=' + ' '.join(sys.argv[1:]))
res.raise_for_status()

soup = bs4.BeautifulSoup(res.text, 'html.parser')
link_elems = soup.select('.package-snippet')

num_open = min(5, len(link_elems))
for i in range(num_open):
    url_to_open = 'https://pypi.org' + link_elems[i].get('href')
    print('Opening', url_to_open)
    webbrowser.open(url_to_open)


Searching...


## Project 8 : Extracting comics from XKCD

In [5]:
# downloadXkcdComics.py - Downloads XKCD comics

import requests, os, bs4, time

url = 'https://xkcd.com'  # Starting URL
os.makedirs('xkcd', exist_ok=True)  # Store comics in ./xkcd
num_downloads = 0
MAX_DOWNLOADS = 10
while not url.endswith('#') and num_downloads < MAX_DOWNLOADS:

    # Download the page.
    print(f'Downloading page {url}...')
    res = requests.get(url)
    res.raise_for_status()

    soup = bs4.BeautifulSoup(res.text, 'html.parser')

    # Find the URL of the comic image.
    comic_elem = soup.select('#comic img')
    if comic_elem == []:
        print('Could not find comic image.')
    else:
        comic_URL = 'https:' + comic_elem[0].get('src')
        # Download the image.
        print(f'Downloading image {comic_URL}...')
        res = requests.get(comic_URL)
        res.raise_for_status()

    # Save the image to ./xkcd.
        image_file = open(os.path.join('xkcd', os.path.basename(comic_URL)), 'wb')
        for chunk in res.iter_content(100000):
                image_file.write(chunk)
        image_file.close()

    # Get the Prev button's URL.
        prev_link = soup.select('a[rel="prev"]')[0]
        url = 'https://xkcd.com' + prev_link.get('href')
        num_downloads += 1
        time.sleep(1)  # Pause so we don't hammer the web server.
print('Done.')

Downloading page https://xkcd.com...
Downloading image https://imgs.xkcd.com/comics/service_outage.png...
Downloading page https://xkcd.com/3169/...
Downloading image https://imgs.xkcd.com/comics/epirbs.png...
Downloading page https://xkcd.com/3168/...
Downloading image https://imgs.xkcd.com/comics/beam_dump.png...
Downloading page https://xkcd.com/3167/...
Downloading image https://imgs.xkcd.com/comics/car_size.png...
Downloading page https://xkcd.com/3166/...
Downloading image https://imgs.xkcd.com/comics/big_and_little_spoons.png...
Downloading page https://xkcd.com/3165/...
Downloading image https://imgs.xkcd.com/comics/earthquake_prediction_flowchart.png...
Downloading page https://xkcd.com/3164/...
Downloading image https://imgs.xkcd.com/comics/metric_tip.png...
Downloading page https://xkcd.com/3163/...
Downloading image https://imgs.xkcd.com/comics/repair_video.png...
Downloading page https://xkcd.com/3162/...
Downloading image https://imgs.xkcd.com/comics/heart_mountain.png...

## Selenium

Selenium lets Python directly control the browser by programmatically clicking links and filling in forms, just as a human user would. Using Selenium, you can interact with web pages in a much more advanced way than with requests and Beautiful Soup; but because it launches a web browser, it’s a bit slower and hard to run in the background

Playwright from Microsoft is a newer version of Selenium

In [7]:
from selenium import webdriver
browser = webdriver.Firefox()
type(browser)

selenium.webdriver.firefox.webdriver.WebDriver

In [9]:
browser.get('https://inventwithpython.com')

In [10]:
from selenium import webdriver
from selenium.webdriver.common.by import By
browser = webdriver.Firefox()
browser.get('https://autbor.com/example3.html')
elems = browser.find_elements(By.CSS_SELECTOR, 'p')
print(elems[0].text)
print(elems[0].get_property('innerHTML'))

This <p> tag puts content into a single paragraph.
This &lt;p&gt; tag puts <b>content</b> into a <i>single</i> paragraph.


In [11]:
link_elem = browser.find_element(By.LINK_TEXT, 'This text is a link')
type(link_elem)


selenium.webdriver.remote.webelement.WebElement

In [12]:
link_elem.click()

In [13]:
from selenium import webdriver
from selenium.webdriver.common.by import By
browser = webdriver.Firefox()
browser.get('https://autbor.com/example3.html')
user_elem = browser.find_element(By.ID, 'login_user')
user_elem.send_keys('your_real_username_here')
password_elem = browser.find_element(By.ID, 'login_pass')
password_elem.send_keys('your_real_password_here')
password_elem.submit()

## Playwright

A newer alternative to Selenium

In [25]:
from playwright.async_api import async_playwright
async with async_playwright() as playwright:
    browser = await playwright.firefox.launch()
    page = await browser.new_page()
    await page.goto('https://autbor.com/example3.html')
    print(await page.title())
    await browser.close()

Example Website Title
