# IRE Board members

The goal: Scrape [this list of IRE board members](https://www.ire.org/about-ire/past-ire-board-members/) into a CSV.

This project introduces a few new concepts:
- Scraping data that's not part of a table
- Specifying custom request headers to evade a bot detection rule on our server
- Using string methods and default values when parsing out the data

[The completed version is here](IRE%20Board%20members%20-%20complete.ipynb).

([See also this standalone version featuring a few more advanced techniques](/edit/ire-board/ire_board_scrape.py).)

In [None]:
# stdlib library we'll use to write the CSV file
import csv

# installed library to handle the HTTP traffic
import requests

# installed library to parse the HTML
from bs4 import BeautifulSoup

In [None]:
URL = 'https://www.ire.org/about-ire/past-ire-board-members/'

In [None]:
# make the request

# check for HTTP errors

In [None]:
# set up request headers with a custom user-agent string


In [None]:
# try the request again, with the new headers


# and raise for errors


In [None]:
# parse the HTML into soup


In [None]:
# search the HTML tree to find the div
# with the `id` attribute of "past-ire-board-members"


In [None]:
# within that div, find all the paragraph tags


In [None]:
# noodle around here to isolate the pieces of data for export

In [None]:
# set up the CSV headers to write to file


In [None]:
# next, set up the file to write the CSV data into
# https://docs.python.org/3/library/csv.html#csv.writer

# open the CSV file in write ('w') mode, specifying newline='' to deal with
# potential PC-only line ending problem


    # set up a csv.writer object tied to the file we just opened


    # write the list of headers


    # loop over the list of paragraphs we targeted above


        # we don't want the entire Tag object, just the text


        # set up some default values -- the member was not president


        # and is not deceased


        # IRE denotes past presidents with a leading asterisk
        # so check to see if the string startswith '*'
        # https://docs.python.org/3/library/stdtypes.html?highlight=startswith#str.startswith


            # if so, switch the value for the `was_president` variable to True


        # check to see if "(dec)" is anywhere in the text, which
        # indicates this person is deceased
        # https://docs.python.org/3/reference/expressions.html#in


        # next, start parsing out the pieces
        # separate the name from the terms by splitting on "("


        # the name will be the first ([0]) item in the resulting list
        # while we're at it, strip off any leading asterisks
        # https://docs.python.org/3/library/stdtypes.html?highlight=lstrip#str.lstrip
        # and strip() off any leading or trailing whitespace
        # https://docs.python.org/3/library/stdtypes.html?highlight=lstrip#str.strip


        # the term(s) of service will be the second item ([1]) in that list
        # and the term text is always terminated with a closing parens
        # so splitting on that closing parens and taking the first ([0])
        # item in the list will give us the term(s)


        # put the collected data into a list


        # and write this row of data into the CSV file
