# Web Scraping with Beautiful Soup

* * * 

### Icons used in this notebook
🔔 **Question**: A quick question to help you understand what's going on.<br>
🥊 **Challenge**: Interactive exercise. We'll work through these in the workshop!<br>
⚠️ **Warning**: Heads-up about tricky stuff or common mistakes.<br>
💡 **Tip**: How to do something a bit more efficiently or effectively.<br>
🎬 **Demo**: Showing off something more advanced – so you know what Python can be used for!<br>

### Learning Objectives
1. [Reflection: To Scape Or Not To Scrape](#when)
2. [Extracting and Parsing HTML](#extract)
3. [Scraping the Illinois General Assembly](#scrape)

<a id='when'></a>

# To Scrape Or Not To Scrape

When we'd like to access data from the web, we first have to make sure if the website we are interested in offers a Web API. Platforms like Twitter, Reddit, and the New York Times offer APIs. **Check out D-Lab's [Python Web APIs](https://github.com/dlab-berkeley/Python-Web-APIs) workshop if you want to learn how to use APIs.**

However, there are often cases when a Web API does not exist. In these cases, we may have to resort to web scraping, where we extract the underlying HTML from a web page, and directly obtain the information we want. There are several packages in Python we can use to accomplish these tasks. We'll focus two packages: Requests and Beautiful Soup.

Our case study will be scraping information on the [state senators of Illinois](http://www.ilga.gov/senate), as well as the [list of bills](http://www.ilga.gov/senate/SenatorBills.asp?MemberID=1911&GA=98&Primary=True) each senator has sponsored. Before we get started, peruse these websites to take a look at their structure.

## Installation

We will use two main packages: [Requests](http://docs.python-requests.org/en/latest/user/quickstart/) and [Beautiful Soup](http://www.crummy.com/software/BeautifulSoup/bs4/doc/). Go ahead and install these packages, if you haven't already:

In [140]:
# 🌐 La librería requests es necesaria para hacer solicitudes HTTP y descargar páginas web.
# 🕸️ Esto es fundamental para hacer web scraping (extraer información de páginas web).
%pip install requests  

Note: you may need to restart the kernel to use updated packages.


In [141]:
# 🥣 La instrucción %pip install beautifulsoup4 sirve para instalar la librería Beautiful Soup 4 en tu entorno de Jupyter Notebook.
# 🕸️ Beautiful Soup es esencial para analizar y extraer información de archivos HTML y XML, lo que facilita el web scraping.
%pip install beautifulsoup4

Note: you may need to restart the kernel to use updated packages.


We'll also install the `lxml` package, which helps support some of the parsing that Beautiful Soup performs:

In [142]:
# 🧩 El comando %pip install lxml instala la librería lxml en tu entorno de Jupyter Notebook.
# ⚡ lxml es un parser rápido y eficiente para analizar y procesar archivos HTML y XML, muy útil para usar con Beautiful Soup en web scraping.
%pip install lxml

Note: you may need to restart the kernel to use updated packages.


In [143]:
# importamos las librerías necesarias
from bs4 import BeautifulSoup
from datetime import datetime
import requests
import time

<a id='extract'></a>

# Extracción y análisis de HTML

Para extraer y analizar correctamente HTML, seguiremos los siguientes 4 pasos:
1. Realizar una solicitud GET
2. Analizar la página con Beautiful Soup
3. Buscar elementos HTML
4. Obtener atributos y texto de estos elementos

## Paso 1: Realiza una solicitud GET para obtener el HTML de una página

Podemos usar la librería Requests para:

1. Realizar una solicitud GET a la página, y

2. Leer el código HTML de la página web.

El proceso de realizar una solicitud y obtener un resultado se asemeja al flujo de trabajo de una API web. Sin embargo, en este caso estamos haciendo la solicitud directamente al sitio web y tendremos que analizar el HTML por nuestra cuenta. Esto es diferente a cuando se nos proporciona la información ya organizada en un formato más sencillo como JSON o XML.

In [144]:
# Make a GET request
req = requests.get('http://www.ilga.gov/senate/default.asp')
# Read the content of the server’s response
src = req.text
# View some output
print(src[:1000])

<!DOCTYPE html>
<html lang="en">
<head id="Head1">
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <meta http-equiv="content-type" content="text/html;charset=utf-8" />
    <meta http-equiv="X-UA-Compatible" content="IE=Edge" />
    <meta charset="utf-8" />
    <meta charset="UTF-8">
    <!-- Meta Description -->
    <meta name="description" content="Welcome to the official government website of the Illinois General Assembly">
    <meta name="contactName" content="Legislative Information System">
    <meta name="contactOrganization" content="LIS Staff Services">
    <meta name="contactStreetAddress1" content="705 Stratton Office Building">
    <meta name="contactCity" content="Springfield">
    <meta name="contactZipcode" content="62706">
    <meta name="contactNetworkAddress" content="webmaster@ilga.gov">
    <meta name="contactPhoneNumber" content="217-782-3944">
    <meta name="contactFaxNumber" content="217-524-6059">
    <meta name


## Paso 2: Analiza la Página con Beautiful Soup

Ahora, utilizamos la función BeautifulSoup para analizar la respuesta y convertirla en un árbol HTML. Esto nos devuelve un objeto (llamado objeto soup) que contiene todo el HTML del documento original.

Si te aparece un error relacionado con una biblioteca de análisis, asegúrate de haber instalado el paquete lxml para que Beautiful Soup cuente con las herramientas necesarias para analizar el contenido.

In [145]:
# Analiza la respuesta y conviértela en un árbol HTML.
soup = BeautifulSoup(src, 'lxml')
# Echa un vistazo.
print(soup.prettify()[:1000])

<!DOCTYPE html>
<html lang="en">
 <head id="Head1">
  <meta content="width=device-width, initial-scale=1.0" name="viewport"/>
  <meta content="text/html;charset=utf-8" http-equiv="content-type"/>
  <meta content="IE=Edge" http-equiv="X-UA-Compatible"/>
  <meta charset="utf-8"/>
  <meta charset="utf-8"/>
  <!-- Meta Description -->
  <meta content="Welcome to the official government website of the Illinois General Assembly" name="description"/>
  <meta content="Legislative Information System" name="contactName"/>
  <meta content="LIS Staff Services" name="contactOrganization"/>
  <meta content="705 Stratton Office Building" name="contactStreetAddress1"/>
  <meta content="Springfield" name="contactCity"/>
  <meta content="62706" name="contactZipcode"/>
  <meta content="webmaster@ilga.gov" name="contactNetworkAddress"/>
  <meta content="217-782-3944" name="contactPhoneNumber"/>
  <meta content="217-524-6059" name="contactFaxNumber"/>
  <meta content="State Of Illinois" name="originatorJur

La salida se ve bastante similar a la anterior, pero ahora está organizada en un objeto soup, lo que nos permite recorrer la página de manera más sencilla.

## Paso 3: Buscar Elementos HTML

Beautiful Soup tiene varias funciones para encontrar componentes útiles en una página. Beautiful Soup te permite buscar elementos según sus:

1. Etiquetas HTML
2. Atributos HTML
3. Selectores CSS

Primero, busquemos etiquetas HTML.

La función find_all busca en el árbol soup todos los elementos que tengan una determinada etiqueta HTML y devuelve todos esos elementos.

¿Qué hace el siguiente ejemplo?

In [146]:
# Buscar todos los elementos con una determinada etiqueta
a_tags = soup.find_all("a")
print(a_tags[:10])

[<a b-0yw6sxot5c="" class="dropdown-item" data-lang="en" href="#">
<span b-0yw6sxot5c="" class="flag-icon flag-icon-us"></span> English
                            </a>, <a b-0yw6sxot5c="" class="dropdown-item" data-lang="af" href="#">
<span b-0yw6sxot5c="" class="flag-icon flag-icon-za"></span> Afrikaans
                            </a>, <a b-0yw6sxot5c="" class="dropdown-item" data-lang="sq" href="#">
<span b-0yw6sxot5c="" class="flag-icon flag-icon-al"></span> Albanian
                            </a>, <a b-0yw6sxot5c="" class="dropdown-item" data-lang="ar" href="#">
<span b-0yw6sxot5c="" class="flag-icon flag-icon-ae"></span> Arabic
                            </a>, <a b-0yw6sxot5c="" class="dropdown-item" data-lang="hy" href="#">
<span b-0yw6sxot5c="" class="flag-icon flag-icon-am"></span> Armenian
                            </a>, <a b-0yw6sxot5c="" class="dropdown-item" data-lang="az" href="#">
<span b-0yw6sxot5c="" class="flag-icon flag-icon-az"></span> Azerbaijani
            

Como find_all() es el método más popular en la API de búsqueda de Beautiful Soup, puedes usar un atajo para llamarlo. Si tratas el objeto BeautifulSoup como si fuera una función, es lo mismo que llamar a find_all() sobre ese objeto.

Estas dos líneas de código son equivalentes:

In [147]:
a_tags = soup.find_all("a")
a_tags_alt = soup("a")
print(a_tags[0])
print(a_tags_alt[0])

<a b-0yw6sxot5c="" class="dropdown-item" data-lang="en" href="#">
<span b-0yw6sxot5c="" class="flag-icon flag-icon-us"></span> English
                            </a>
<a b-0yw6sxot5c="" class="dropdown-item" data-lang="en" href="#">
<span b-0yw6sxot5c="" class="flag-icon flag-icon-us"></span> English
                            </a>


¿Cuántos enlaces obtuvimos?

In [148]:
print(len(a_tags))

270


¡Eso es bastante! Muchos elementos en una página tendrán la misma etiqueta HTML. Por ejemplo, si buscas todo lo que tenga la etiqueta a, probablemente obtendrás muchos resultados, muchos de los cuales quizás no te interesen. Recuerda que la etiqueta a define un hipervínculo, por lo que normalmente encontrarás muchos en cualquier página.

¿Qué pasa si queremos buscar etiquetas HTML con ciertos atributos, como clases CSS específicas?

Podemos hacerlo agregando un argumento adicional a find_all. En el siguiente ejemplo, estamos buscando todas las etiquetas a y luego filtrando aquellas que tengan class_="sidemenu".

In [149]:
# Obtener solo las etiquetas 'a' que tienen la clase 'sidemenu'
side_menus = soup("a", class_="sidemenu")
side_menus[:5]

[]

Una forma más eficiente de buscar elementos en un sitio web es mediante un selector CSS. Para esto, debemos usar un método diferente llamado `select()`. Solo tienes que pasar una cadena al método `.select()` para obtener todos los elementos que coincidan con ese selector CSS.

En el ejemplo anterior, podemos usar `a.sidemenu` como selector CSS, lo que nos devuelve todas las etiquetas `a` con la clase `sidemenu`.

In [150]:
# Obtener elementos con el selector CSS "a.sidemenu".
selected = soup.select("a.sidemenu")
selected[:5]

[]

## 🥊Desafío: Buscar Todos

Usa BeautifulSoup para encontrar todos los elementos `a` con la clase `mainmenu`. Le cambiee a dropdown-item para que se vea los resultados 

In [151]:
### TRABAJO PRACTICO 
enlaces = soup.find_all("a", class_="dropdown-item")
print(enlaces[:5]) #lo puse hasta 5 para que no se vea tan largo el resultado

[<a b-0yw6sxot5c="" class="dropdown-item" data-lang="en" href="#">
<span b-0yw6sxot5c="" class="flag-icon flag-icon-us"></span> English
                            </a>, <a b-0yw6sxot5c="" class="dropdown-item" data-lang="af" href="#">
<span b-0yw6sxot5c="" class="flag-icon flag-icon-za"></span> Afrikaans
                            </a>, <a b-0yw6sxot5c="" class="dropdown-item" data-lang="sq" href="#">
<span b-0yw6sxot5c="" class="flag-icon flag-icon-al"></span> Albanian
                            </a>, <a b-0yw6sxot5c="" class="dropdown-item" data-lang="ar" href="#">
<span b-0yw6sxot5c="" class="flag-icon flag-icon-ae"></span> Arabic
                            </a>, <a b-0yw6sxot5c="" class="dropdown-item" data-lang="hy" href="#">
<span b-0yw6sxot5c="" class="flag-icon flag-icon-am"></span> Armenian
                            </a>]


## Paso 4: Obtener atributos y texto de los elementos

Una vez que identificamos elementos, queremos acceder a la información en ese elemento. Usualmente, esto significa dos cosas:

1. Texto
2. Atributos

Obtener el texto dentro de un elemento es fácil. Todo lo que tenemos que hacer es usar el miembro `text` de un objeto `tag`:

In [152]:
# Encuentra todos los enlaces <a> del HTML
links = soup.find_all("a")

# Imprime el texto y el href de cada enlace
for link in links:
    print("Texto:", link.text)
    print("href:", link.get('href'))
    print("clases:", link.get('class'))
    print("------")



Texto: 
 English
                            
href: #
clases: ['dropdown-item']
------
Texto: 
 Afrikaans
                            
href: #
clases: ['dropdown-item']
------
Texto: 
 Albanian
                            
href: #
clases: ['dropdown-item']
------
Texto: 
 Arabic
                            
href: #
clases: ['dropdown-item']
------
Texto: 
 Armenian
                            
href: #
clases: ['dropdown-item']
------
Texto: 
 Azerbaijani
                            
href: #
clases: ['dropdown-item']
------
Texto: 
 Basque
                            
href: #
clases: ['dropdown-item']
------
Texto: 
 Bengali
                            
href: #
clases: ['dropdown-item']
------
Texto: 
 Bosnian
                            
href: #
clases: ['dropdown-item']
------
Texto: 
 Catalan
                            
href: #
clases: ['dropdown-item']
------
Texto: 
 Croatian
                            
href: #
clases: ['dropdown-item']
------
Texto: 
 Czech
                     

In [153]:
# obtener elementos con el selector CSS "a.dropdown-item"
side_menu_links = soup.select("a.dropdown-item")

# examinar el primer elemento
first_link = side_menu_links[0]

# obtener el texto del enlace
print(first_link.text)

# cuál clase de objeto es?
print('Class: ', type(first_link))



 English
                            
Class:  <class 'bs4.element.Tag'>


It's a Beautiful Soup tag! This means it has a `text` member:

In [154]:
print(first_link.text)


 English
                            


Sometimes we want the value of certain attributes. This is particularly relevant for `a` tags, or links, where the `href` attribute tells us where the link goes.

💡 **Tip**: You can access a tag’s attributes by treating the tag like a dictionary:

In [155]:
print(first_link['href'])

#


## 🥊 Desafío: Extraer atributos específicos

Extrae todos los atributos `href` de cada URL `mainmenu`.

In [156]:
# YOUR CODE HERE
# Extraer todos los atributos href de cada enlace con la clase 'mainmenu'
mainmenu_links = soup.select("a.mainmenu")

for link in mainmenu_links:
    print(link.get('href'))


<a id='scrape'></a>

# Análisis de la Asamblea General de Illinois

Aunque parezca increíble, estas son las herramientas fundamentales para analizar un sitio web. Una vez que dediques más tiempo a familiarizarte con HTML y CSS, solo tendrás que comprender la estructura de un sitio web específico y aplicar con inteligencia las herramientas de Beautiful Soup y Python.

Apliquemos estas habilidades para analizar la [98.ª Asamblea General de Illinois](http://www.ilga.gov/senate/default.asp?GA=98).

En concreto, nuestro objetivo es analizar la información de cada senador, incluyendo su nombre, distrito y partido.

## Analizar la página web

Analicemos la página web usando las herramientas que aprendimos en la sección anterior.

In [157]:
# Hacemos una nueva solicitud a otra página
req = requests.get('https://www.ilga.gov/Senate/Members/rptMemberList')
# leer el contenido de la respuesta del servidor
src = req.text
# analiza la respuesta y conviértela en un árbol HTML.
soup = BeautifulSoup(src, "lxml")

## Buscar los elementos de la tabla

Nuestro objetivo es obtener los elementos de la tabla en la página web. Recuerde: las filas se identifican con la etiqueta `tr`. Usemos `find_all` para obtener estos elementos.

In [158]:
# obtener todas las filas de la tabla
rows = soup.find_all("tr")
len(rows)

60

⚠️ **Advertencia**: Ten en cuenta que `find_all` obtiene *todos* los elementos con la etiqueta `tr`. Solo necesitamos algunos. Si usamos la función "Inspeccionar" de Google Chrome y observamos con atención, podemos usar selectores CSS para obtener solo las filas que nos interesan. En concreto, queremos las filas internas de la tabla:

In [159]:
# retornar solo las filas que están dentro de otra fila
rows = soup.select('a.dropdown-item')

for row in rows[:20]:
    print(row, '\n')


Parece que queremos todo lo que queda después de las dos primeras filas. Empecemos con una sola fila y construyamos nuestro bucle a partir de ahí.

In [160]:
print(len(rows))
print(rows)


0
[]


In [161]:
if rows:
	example_row = rows[0]
	print(example_row.prettify())
else:
	print("No rows found.")

No rows found.


Desglosemos esta fila en sus celdas/columnas mediante el método `select` con selectores CSS. Si analizamos el HTML con atención, hay un par de maneras de hacerlo.

* Podríamos identificar las celdas por su etiqueta `td`.
* Podríamos usar el nombre de clase `.detail`.
* Podríamos combinar ambos y usar el selector `td.detail`.

In [162]:
# Parse the HTML to get the rows
soup = BeautifulSoup(src, "lxml")
rows = soup.select("tbody tr")

if rows:
	example_row = rows[0]
else:
	example_row = None
	print("No rows found.")
  

In [163]:
if rows:
    example_row = rows[0]
    print(example_row.prettify())

    # Aquí procesa example_row solo si existe
    for cell in example_row.select('td'):
        print(cell)
    for cell in example_row.select('.detail'):
        print(cell)
    for cell in example_row.select('td.detail'):
        print(cell)
else:
    print("No rows found.")


<tr>
 <td>
  <a class="notranslate" href="/Senate/Members/Details/3312">
   Neil Anderson
  </a>
  (R)
  <br/>
  47th District
 </td>
 <td>
  208 A Capitol Building
  <br/>
  <br/>
  Springfield, IL 62706
  <br/>
  (217) 782-5957
 </td>
 <td>
  103 North College Avenue
  <br/>
  #201
  <br/>
  Aledo IL 61231
  <br/>
  (309) 230-7584
 </td>
</tr>

<td>
<a class="notranslate" href="/Senate/Members/Details/3312">Neil Anderson</a> (R)
                                    <br/>
                                    47th District
                                </td>
<td>
                                    208 A Capitol Building<br/>
<br/>
                                    Springfield, IL 62706    <br/>
                                    (217) 782-5957
                                    
                                </td>
<td>103 North College Avenue<br/>
                                    #201<br/>
                                    Aledo IL 61231    <br/>
                           

Podemos confirmar que todos son iguales.

In [167]:
# revisamos cuántos elementos encuentra cada método
tds = example_row.select('td')
details = example_row.select('.detail')
td_details = example_row.select('td.detail')

print("tds:", len(tds))
print("details:", len(details))
print("td.details:", len(td_details))

# solo para verificar que todos los .detail están en <td>
assert td_details == details  # estos deben de ser iguales

tds: 3
details: 0
td.details: 0


Let's use the selector `td.detail` to be as specific as possible.

In [None]:
# Select only those 'td' tags with class 'detail' 
detail_cells = example_row.select('td.detail')
detail_cells

Most of the time, we're interested in the actual **text** of a website, not its tags. Recall that to get the text of an HTML element, we use the `text` member:

In [None]:
# Keep only the text in each of those cells
row_data = [cell.text for cell in detail_cells]

print(row_data)

Looks good! Now we just use our basic Python knowledge to get the elements of this list that we want. Remember, we want the senator's name, their district, and their party.

In [None]:
print(row_data[0]) # Name
print(row_data[3]) # District
print(row_data[4]) # Party

## Getting Rid of Junk Rows

We saw at the beginning that not all of the rows we got actually correspond to a senator. We'll need to do some cleaning before we can proceed forward. Take a look at some examples:

In [None]:
print('Row 0:\n', rows[0], '\n')
print('Row 1:\n', rows[1], '\n')
print('Last Row:\n', rows[-1])

When we write our for loop, we only want it to apply to the relevant rows. So we'll need to filter out the irrelevant rows. The way to do this is to compare some of these to the rows we do want, see how they differ, and then formulate that in a conditional.

As you can imagine, there a lot of possible ways to do this, and it'll depend on the website. We'll show some here to give you an idea of how to do this.

In [None]:
# Bad rows
print(len(rows[0]))
print(len(rows[1]))

# Good rows
print(len(rows[2]))
print(len(rows[3]))

Perhaps good rows have a length of 5. Let's check:

In [None]:
good_rows = [row for row in rows if len(row) == 5]

# Let's check some rows
print(good_rows[0], '\n')
print(good_rows[-2], '\n')
print(good_rows[-1])

We found a footer row in our list that we'd like to avoid. Let's try something else:

In [None]:
rows[2].select('td.detail') 

In [None]:
# Bad row
print(rows[-1].select('td.detail'), '\n')

# Good row
print(rows[5].select('td.detail'), '\n')

# How about this?
good_rows = [row for row in rows if row.select('td.detail')]

print("Checking rows...\n")
print(good_rows[0], '\n')
print(good_rows[-1])

Looks like we found something that worked!

## Loop it All Together

Now that we've seen how to get the data we want from one row, as well as filter out the rows we don't want, let's put it all together into a loop.

In [None]:
# Define storage list
members = []

# Get rid of junk rows
valid_rows = [row for row in rows if row.select('td.detail')]

# Loop through all rows
for row in valid_rows:
    # Select only those 'td' tags with class 'detail'
    detail_cells = row.select('td.detail')
    # Keep only the text in each of those cells
    row_data = [cell.text for cell in detail_cells]
    # Collect information
    name = row_data[0]
    district = int(row_data[3])
    party = row_data[4]
    # Store in a tuple
    senator = (name, district, party)
    # Append to list
    members.append(senator)

In [None]:
# Should be 61
len(members)

Let's take a look at what we have in `members`.

In [None]:
print(members[:5])

## 🥊  Challenge: Get `href` elements pointing to members' bills 

The code above retrieves information on:  

- the senator's name,
- their district number,
- and their party.

We now want to retrieve the URL for each senator's list of bills. Each URL will follow a specific format. 

The format for the list of bills for a given senator is:

`http://www.ilga.gov/senate/SenatorBills.asp?GA=98&MemberID=[MEMBER_ID]&Primary=True`

to get something like:

`http://www.ilga.gov/senate/SenatorBills.asp?MemberID=1911&GA=98&Primary=True`

in which `MEMBER_ID=1911`. 

You should be able to see that, unfortunately, `MEMBER_ID` is not currently something pulled out in our scraping code.

Your initial task is to modify the code above so that we also **retrieve the full URL which points to the corresponding page of primary-sponsored bills**, for each member, and return it along with their name, district, and party.

Tips: 

* To do this, you will want to get the appropriate anchor element (`<a>`) in each legislator's row of the table. You can again use the `.select()` method on the `row` object in the loop to do this — similar to the command that finds all of the `td.detail` cells in the row. Remember that we only want the link to the legislator's bills, not the committees or the legislator's profile page.
* The anchor elements' HTML will look like `<a href="/senate/Senator.asp/...">Bills</a>`. The string in the `href` attribute contains the **relative** link we are after. You can access an attribute of a BeatifulSoup `Tag` object the same way you access a Python dictionary: `anchor['attributeName']`. See the <a href="http://www.crummy.com/software/BeautifulSoup/bs4/doc/#tag">documentation</a> for more details.
* There are a _lot_ of different ways to use BeautifulSoup to get things done. whatever you need to do to pull the `href` out is fine.

The code has been partially filled out for you. Fill it in where it says `#YOUR CODE HERE`. Save the path into an object called `full_path`.

In [None]:
# Make a GET request
req = requests.get('http://www.ilga.gov/senate/default.asp?GA=98')
# Read the content of the server’s response
src = req.text
# Soup it
soup = BeautifulSoup(src, "lxml")
# Create empty list to store our data
members = []

# Returns every ‘tr tr tr’ css selector in the page
rows = soup.select('tr tr tr')
# Get rid of junk rows
rows = [row for row in rows if row.select('td.detail')]

# Loop through all rows
for row in rows:
    # Select only those 'td' tags with class 'detail'
    detail_cells = row.select('td.detail') 
    # Keep only the text in each of those cells
    row_data = [cell.text for cell in detail_cells]
    # Collect information
    name = row_data[0]
    district = int(row_data[3])
    party = row_data[4]

    # YOUR CODE HERE
    full_path = ''

    # Store in a tuple
    senator = (name, district, party, full_path)
    # Append to list
    members.append(senator)

In [None]:
# Uncomment to test 
# members[:5]

## 🥊  Challenge: Modularize Your Code

Turn the code above into a function that accepts a URL, scrapes the URL for its senators, and returns a list of tuples containing information about each senator. 

In [None]:
# YOUR CODE HERE
def get_members(url):
    return [___]


In [None]:
# Test your code
url = 'http://www.ilga.gov/senate/default.asp?GA=98'
senate_members = get_members(url)
len(senate_members)

## 🥊 Take-home Challenge: Writing a Scraper Function

We want to scrape the webpages corresponding to bills sponsored by each bills.

Write a function called `get_bills(url)` to parse a given bills URL. This will involve:

  - requesting the URL using the <a href="http://docs.python-requests.org/en/latest/">`requests`</a> library
  - using the features of the `BeautifulSoup` library to find all of the `<td>` elements with the class `billlist`
  - return a _list_ of tuples, each with:
      - description (2nd column)
      - chamber (S or H) (3rd column)
      - the last action (4th column)
      - the last action date (5th column)
      
This function has been partially completed. Fill in the rest.

In [None]:
def get_bills(url):
    src = requests.get(url).text
    soup = BeautifulSoup(src)
    rows = soup.select('tr')
    bills = []
    for row in rows:
        # YOUR CODE HERE
        bill_id =
        description =
        chamber =
        last_action =
        last_action_date =
        bill = (bill_id, description, chamber, last_action, last_action_date)
        bills.append(bill)
    return bills

In [None]:
# Uncomment to test your code
# test_url = senate_members[0][3]
# get_bills(test_url)[0:5]

### Scrape All Bills

Finally, create a dictionary `bills_dict` which maps a district number (the key) onto a list of bills (the value) coming from that district. You can do this by looping over all of the senate members in `members_dict` and calling `get_bills()` for each of their associated bill URLs.

**NOTE:** please call the function `time.sleep(1)` for each iteration of the loop, so that we don't destroy the state's web site.

In [None]:
# YOUR CODE HERE


In [None]:
# Uncomment to test your code
# bills_dict[52]