# WebScraping

## 2. Extract all rows of data
So we've automated the process of extracting each item from a row, but how do we do that for all rows?

### Imports
First we import our required libraries up front and define our function we designed in the last session.

In [None]:
import requests
import urllib
from bs4 import BeautifulSoup

In [None]:
def row_info_extractor(row): # We'll feed it the isolated html for a row and let it pull it apart.
    author = row['data-author']
    
    id_item = row['class'][-1]
    thread_id = int(id_item.split('-')[-1])
    
    title_div = row.find('div', class_='structItem-title')
    title = title_div.a.text.strip() # remember to .strip() off the useless spaces on the ends.
    
    date = row.find('time')['datetime']
    
    views = row.find('dl',class_='pairs pairs--justified structItem-minor').dd.text

    relative_url = title_div.a['href']
    full_url = urllib.parse.urljoin('http://uberpeople.net',relative_url)
    
    data_package = {'id': thread_id,
                  'author': author,
                  'title': title,
                  'date': date,
                  'views': views,
                  'url': full_url}
    
    return data_package

### Re-establish our list of threads

In [None]:
response = requests.get('http://uberpeople.net/forums/Tips/')
soup = BeautifulSoup(response.text,'lxml')
threads_container = soup.find('div', class_="structItemContainer")
threads = threads_container.find_all('div',class_='structItem--thread')

In [None]:
#  Remember, our `threads` variable is actually a list of rows that was created after we asked BeautifulSoup to 
# .find_all of a particular type of divison within the threads_container.
threads[0:2]

In [None]:
# We've been working on the first row
first_row = threads[0]
row_info_extractor(first_row)

In [None]:
# we can easily use a loop to capture ALL the rows

rows_data = []

for row in threads:
    result = row_info_extractor(row)
    rows_data.append(result)


In [None]:
rows_data

### ACTIVITY: Build a Function
This function should...
- take a requests response
- convert it into soup - remember to use the .text attribute of the response
- isolate the list of rows and then...
- iterate across each row to extract the relevant data.
- Data should be saved to a list
- The function should return this filled list
- Note: Use the function we built in the previous activity inside your new function

In [None]:
# YOUR NEW FUNCTION HERE

def page_info_extractor(response):
    soup = BeautifulSoup(response.text,'lxml')
    threads_container = soup.find('div', class_="structItemContainer")
    threads = threads_container.find_all('div',class_='structItem--thread')
    
    page_data = []
    for row in threads:
        result = row_info_extractor(row)
        page_data.append(result)
    return page_data

In [None]:
# TEST THE FUNCTION

response = requests.get('https://uberpeople.net/forums/Tips/')
page_data = page_info_extractor(response)
page_data