# Let's scrape the IRE homepage

Our goal: Print out the headlines from the [IRE home page](https://ire.org/).

[`requests`](http://docs.python-requests.org/en/master/) is a handy third-party library for making HTTP requests. It does the same thing your browser does when you type in a URL and hit enter -- sends a message to a server and requests a copy of the page -- but it allows us to do this programatically instead of pointing and clicking. For our purposes today, we're interested in the library's [`get()`](http://docs.python-requests.org/en/master/api/#requests.get) method.

We'll use the [BeautifulSoup library](https://www.crummy.com/software/BeautifulSoup/bs4/doc/), aka `bs4`, to parse the HTML.

### Import the libraries

In [1]:
import requests
from bs4 import BeautifulSoup

### Fetch and parse the HTML

In [2]:
# use the `get()` method to fetch a copy of the IRE home page
ire_page = requests.get('http://ire.org')

# feed the text of the web page to a BeautifulSoup object
soup = BeautifulSoup(ire_page.text, 'html.parser')

### Target the headlines

View source on the IRE homepage and find the headlines. What's the pattern?

We'll use the [`find_all()`](https://www.crummy.com/software/BeautifulSoup/bs4/doc/#searching-the-tree) method to get all of the headlines (`h1`) withe the class `title1`.

In [3]:
# get a list of headlines we're interested in
heds = soup.find_all('h1', {'class': 'title1'})

### Loop over the heds, printing out the text

You can drill down into a nested tag using a period.

In [4]:
for hed in heds:
    print(hed.a.string)

Digital Crime Correspondent  (Major Network)
City Editor (The Press Democrat)
Digital Editor (FRONTLINE)
Professor (University of Maryland)
Russiagate Freelance Reporter (WhoWhatWhy)
Investigative Reporter (Talking Points Memo)
Finalists announced for 2018 Golden Padlock Award


👉 For more details on using _for_ loops, [see this notebook](../../reference/Python%20data%20types%20and%20basic%20syntax.ipynb#for-loops).