#                                                   WEB SCRAPING

### Scrape Real Estate Property From Web Pages Using BeautifulSoup Library And Saving Cleaned Data Into CSV Format Using Pandas Dataframes

###### To crawl all webpages and find out the  required parameters of all properties(houses) for sale in below locations from real estate website "Century21.com"

1) Rock Springs is a city in Sweetwater County, Wyoming, United States.

2) Rocksprings is a town in Edwards County, Texas, in the United States.

3) Black Canyon City is a census-designated place (CDP) in Yavapai County, Arizona, United States.

**We need to extract following data from each property in above location which are for sale:**

1) Street Address and house/apartment number

2) city, state, and ZIP code 

3) Property Price

4) Number of Bed rooms, full baths, half baths

5) Area of property

6) Lot Size

**Note: Since web pages scraping is illegal we are using archived internet pages for educational purpose only.**

In [104]:
import requests

from bs4 import BeautifulSoup

url = "http://www.pyclass.com/real-estate/rock-springs-wy/LCWYROCKSPRINGS"

# Finding the total number of search pages available

page_Number = requests.get(url, \
                    headers={'User-agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0'})

page_Number1 = BeautifulSoup(page_Number.content, "html.parser")

Pages = int(page_Number1.find_all('a',{'class' :'Page' })[-1].text)

print("Total search Pages:", Pages)

Total search Pages: 3


In [108]:
properties = []

for  page in range(0,Pages*10,10):
    
    data = requests.get(url+"/t=0&s="+str(page)+".html", \
                    headers={'User-agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0'})

    soup = BeautifulSoup(data.content,"html.parser")

    mainRow = soup.find_all('div', {'class':"propertyRow"})
    
    for i in range(len(mainRow)):

        details = {}
        details['Address_street'] = (mainRow[i].find_all('span',{'class' : 'propAddressCollapse'})[0].text)
        details['Address_locality'] = (mainRow[i].find_all('span',{'class' : 'propAddressCollapse'})[1].text)
        details['Price'] = (mainRow[i].find('h4',{'class' : 'propPrice' }).text.replace('\n','').strip())
        bed = mainRow[i].find('span',{'class' : 'infoBed'})

        if bed is not None:
            details['Bed Rooms'] = (bed.find('b').text)
        else: details['Bed Rooms'] = None

        full_bath = mainRow[i].find('span',{'class' : 'infoValueFullBath'})
        half_bath = mainRow[i].find('span',{'class' : 'infoValueHalfBath'})


        if (full_bath is not None):
            details['Full Baths'] = full_bath.find('b').text
        else: details['Full Baths'] = None

        if (half_bath is not None):
            details['Half Baths'] = half_bath.find('b').text
        else: details['Half Baths'] = None

        area = mainRow[i].find('span',{'class' : 'infoSqFt'})

        if area is not None:
            details['Area'] = area.find('b').text
        else: details['Area'] = None

        getFeatures = mainRow[i].find_all('div',{'class' : 'columnGroup'})

        for item in getFeatures:
            if item.find('span',{'class' : 'featureGroup'}) is not None:
                if "Lot Size" in item.find('span',{'class' : 'featureGroup'}).text:
                    details['Lot Size'] = item.find('span',{'class' : 'featureName'}).text

        properties.append(details)

In [106]:
#print(properties)

In [110]:
import pandas as pd

df = pd.DataFrame(properties)

df

Unnamed: 0,Address_locality,Address_street,Area,Bed Rooms,Full Baths,Half Baths,Lot Size,Price
0,"Rock Springs, WY 82901",0 Gateway,,,,,,"$725,000"
1,"Rock Springs, WY 82901",1003 Winchester Blvd.,,4.0,4.0,,0.21 Acres,"$452,900"
2,"Rock Springs, WY 82901",600 Talladega,3154.0,5.0,3.0,,,"$396,900"
3,"Rock Springs, WY 82901",3239 Spearhead Way,3076.0,4.0,3.0,1.0,"Under 1/2 Acre,","$389,900"
4,"Rock Springs, WY 82901",522 Emerald Street,1172.0,3.0,3.0,,"Under 1/2 Acre,","$254,000"
5,"Rock Springs, WY 82901",1302 Veteran's Drive,1932.0,4.0,2.0,,0.27 Acres,"$252,900"
6,"Rock Springs, WY 82901",1021 Cypress Cir,1676.0,4.0,3.0,,"Under 1/2 Acre,","$210,000"
7,"Rock Springs, WY 82901",913 Madison Dr,1344.0,3.0,2.0,,"Under 1/2 Acre,","$209,000"
8,"Rock Springs, WY 82901",1344 Teton Street,1920.0,3.0,2.0,,"Under 1/2 Acre,","$199,900"
9,"Rock Springs, WY 82901",4 Minnies Lane,1664.0,3.0,2.0,,2.02 Acres,"$196,900"


In [111]:
# Eporting data into csv format

df.to_csv("C:/Users/Karthik/Documents/Output.csv")