# Python Web Scraping
In this lab, you are to continue to build on the Dr. Who popularity solution.  What remains
is to evaluate the popularity of each Dr. Who actor by
using the page views of the actor’s Wikipedia page as a proxy for their popularity.

##  Using the Names + BeautifulSoup to Get the Stats
Using the exact same principles used to collect the list of Dr. Who actors,
we now need to collect the 30-day page view stat for each actor.

The pseudocode for this activity is roughly as follows:

1. Explore the HTML underlying an example Wikipedia stats page:
https://en.wikipedia.org/w/index.php?title=Jodie_Whittaker&action=info
Look (**hard**) for a pattern that will allow you to capture the Page views in the past 30 days.
Turns out there is perfect pattern you should be able to exploit.
2. For each actor, combine the actor name with the Wikipedia URL string as a parameter
 - Fetch the stats web page by GET(ting) the URL just constructed
 - Parse the returned HTML using Beautiful Soup
 - Find the stats using your previously observed exploitable pattern
 - remove any noise from the stats string number
 - convert stats string to integer via int()
 - track the actor’s stat using a list or dictionary
3. Sort the actor stats in descending order
4. print the top 5

Have a beer – you deserve it!

In [1]:
from requests.exceptions import HTTPError
import requests
from bs4 import BeautifulSoup
import re

EW_URL = 'http://ew.com/tv/doctor-who-actors/'

'''
    # PHASE 1: From Previous Lab
    # Collect the actor names
'''

def simple_get(url, *args, **kwargs):
    """
    Attempts to get the content at `url` by making an HTTP GET request.
    If the content-type of response is some kind of HTML/XML, return the
    text content, otherwise return None.
    """
    try:
        resp = requests.get(url, *args, **kwargs)
        # If the response was successful, no Exception will be raised
        resp.raise_for_status()

    except HTTPError as http_err:
        print(f'HTTP error occurred: {http_err}')
        raise http_err
    except Exception as err:
        print(f'Other error occurred: {err}')
        raise err

    return resp

def who_actors(url):
    resp = simple_get(url, timeout=5)
    html = resp.text

    # sanity check. is this HTML?
    assert re.search('html', resp.headers['Content-Type'], re.IGNORECASE)

    soup = BeautifulSoup(html, 'html.parser')

    # to be returned
    actor_list = []
    print("hrer is testing")
    print(soup.find_all('img', title=re.compile(r'^Slide\s+\d+:\s+[A-Z]'))[0])
    print(soup.find_all('img', title=re.compile(r'^Slide\s+\d+:\s+[A-Z]'))[1])
    print(soup.find_all('img', title=re.compile(r'^Slide\s+\d+:\s+[A-Z]')))

    for img in soup.find_all('img', title=re.compile(r'^Slide\s+\d+:\s+[A-Z]')):

        # I want the name from the title attribute which looks like this:
        # Slide 10: Sixth Doctor: Colin Baker
        # Another good use for REs.
        # This RE starts the same as before; however, after the first :
        # the [^:]+[:]\s+ says "gobble up all (one ore more) characters that
        # are not a : until you run into a colon
        # that is followed by one or more spaces. After that,
        # capture all remaining characters in a group named <actor>"
        #
        title = img['title']
        #print("here is title")
        print(title)

        m = re.search(r'^Slide\s+\d+:[^:]+[:]\s+(?P<actor>.*)$', title)
        # if no match, then I've screwed up something
        assert m is not None
        if m:
            actor_list.append(m.group('actor'))

    # Great, got my list of actors. Return to caller
    return actor_list

'''
    # PHASE 2:
    # Collect the stats from Wikipedia
    # for each who actor
'''

def main():
    # PHASE 1:
    # Get the Dr.Who actors from EW_URL
    actor_list = who_actors(EW_URL)
    print(actor_list)

    # PHASE 2:
    # Collect the stats from Wikipedia
    # for each who actor
    #

if __name__ == "__main__":
    main()

hrer is testing
<img alt="Ruth/The Doctor: Jo Martin" src="https://imagesvc.meredithcorp.io/v3/mm/image?url=https%3A%2F%2Fstatic.onecms.io%2Fwp-content%2Fuploads%2Fsites%2F6%2F2017%2F07%2Fdw_1205_jp_4160_1000_rt-2000.jpg" title="Slide 2: Ruth/The Doctor: Jo Martin"/>
<img alt="Thirteenth Doctor: Jodie Whittaker" src="https://imagesvc.meredithcorp.io/v3/mm/image?url=https%3A%2F%2Fstatic.onecms.io%2Fwp-content%2Fuploads%2Fsites%2F6%2F2017%2F07%2Fdoctor-who_s11_costume-reveal-2000.jpg" title="Slide 3: Thirteenth Doctor: Jodie Whittaker"/>
[<img alt="Ruth/The Doctor: Jo Martin" src="https://imagesvc.meredithcorp.io/v3/mm/image?url=https%3A%2F%2Fstatic.onecms.io%2Fwp-content%2Fuploads%2Fsites%2F6%2F2017%2F07%2Fdw_1205_jp_4160_1000_rt-2000.jpg" title="Slide 2: Ruth/The Doctor: Jo Martin"/>, <img alt="Thirteenth Doctor: Jodie Whittaker" src="https://imagesvc.meredithcorp.io/v3/mm/image?url=https%3A%2F%2Fstatic.onecms.io%2Fwp-content%2Fuploads%2Fsites%2F6%2F2017%2F07%2Fdoctor-who_s11_costume-r