# This is a sample of a web scrape

#### Collecting data from a webpage

This page contains a demonstration of grabbing data from a webpage and turning it into a dataframe for further analysis. The additional analysis is not conducted here, the focus is to demonstrate a web scraping program to collect data from an internet page.

In [1]:
# Importing libraries

from bs4 import BeautifulSoup
from lxml import html
import requests
import urllib

import pandas as pd
from pandas import Series, DataFrame

In [2]:
# Retreiving data from the web page

page = requests.get('http://econpy.pythonanywhere.com/ex/001.html')
tree = html.fromstring(page.content)
url = ('http://econpy.pythonanywhere.com/ex/001.html')

In [3]:
# This is a look of the code behind the web page. We need to know
# what the element tags are to grab the correct information.

urllib.urlopen(url).read(1000)

'<!DOCTYPE html>\n<html>\n<head>\n    <meta charset="utf-8">\n    <title>Items 1 to 20 -- Example Page 1</title>\n    <script type="text/javascript">\n      var _gaq = _gaq || [];\n      _gaq.push([\'_setAccount\', \'UA-23648880-1\']);\n      _gaq.push([\'_trackPageview\']);\n      _gaq.push([\'_setDomainName\', \'econpy.org\']);\n    </script>\n</head>\n<body>\n<div align="center">1, <a href="http://econpy.pythonanywhere.com/ex/002.html">[<font color="green">2</font>]</a>, <a href="http://econpy.pythonanywhere.com/ex/003.html">[<font color="green">3</font>]</a>, <a href="http://econpy.pythonanywhere.com/ex/004.html">[<font color="green">4</font>]</a>, <a href="http://econpy.pythonanywhere.com/ex/005.html">[<font color="green">5</font>]</a></div>\n<div title="buyer-info">\n  <div title="buyer-name">Carson Busses</div>\n  <span class="item-price">$29.95</span><br>\n</div>\n<div title="buyer-info">\n  <div title="buyer-name">Earl E. Byrd</div>\n  <span class="item-price">$8.37</span><br>

##### The information we want are the buyer and prices.

##### These are under "div title= "buyer-name" and
##### "span class= "item-price"

In [4]:
# This will create a list of buyers:
buyers = tree.xpath('//div[@title= "buyer-name"]/text()')

# This will create a list of prices
prices = tree.xpath('//span[@class= "item-price"]/text()')


In [5]:
# Let's see the buyers

print buyers

['Carson Busses', 'Earl E. Byrd', 'Patty Cakes', 'Derri Anne Connecticut', 'Moe Dess', 'Leda Doggslife', 'Dan Druff', 'Al Fresco', 'Ido Hoe', 'Howie Kisses', 'Len Lease', 'Phil Meup', 'Ira Pent', 'Ben D. Rules', 'Ave Sectomy', 'Gary Shattire', 'Bobbi Soks', 'Sheila Takya', 'Rose Tattoo', 'Moe Tell']


In [6]:
# Let's see the prices

print prices

['$29.95', '$8.37', '$15.26', '$19.25', '$19.25', '$13.99', '$31.57', '$8.49', '$14.47', '$15.86', '$11.11', '$15.98', '$16.27', '$7.50', '$50.85', '$14.26', '$5.68', '$15.00', '$114.07', '$10.09']


###### Since the web scraper grabbed the data systemmatically, we know  the buyer and price matches up in the separate lists.
###### Knowing that, we can combine it into a data frame so it's easier to view and for later analysis.

In [7]:
# Combining the separate lists into a single dataframe

buyer_price_df = DataFrame(zip(buyers,prices), 
                           columns = ("Buyer", "Price"))

Having the data in a dataframe is nice to work with in Python. However, it doesn't help those who don't know python!
So everybody can access the data and work with it, we can export it to a .csv file so others can view it in Excel or another
spreadsheet program.

In [8]:
# Exporting dataframe to .csv file

buyer_price_df.to_csv('buyer_price_data.csv')