### Import Libraries 

 - Request library allows you to send HTTP request in python to a specific URL. In our case we send an HTTP request to Zillow
 - Time module allows to handle time related task including formatting dates, waiting and representing time
 - The random module allows you to generate random 
 - The bs4 module allows you to pull data from HTML document after you get a response from HTTP request
 - The os modules allows ou to interact with operating systems including changing working directory
 - The selenium module allows you to automate interaction with a web browser including sending URL request and extracting HTML
   document response

In [4]:
import requests
import time
from bs4 import BeautifulSoup
from random import sample 
import pandas as pd 
import os
from selenium import webdriver
import json
import csv
from datetime import datetime


### Set Path
 - Identify your destination folder
 - Use os change directory to set your destination directory as the default. That is where all outputs will be exported to

In [5]:
path = "C:\\Users\\padu\\Desktop\\Zillow\\"
os.chdir(path)

### Create a file name
 - Create an outfile file name, I called mine ZillowSelium and formatted it a date time stamp
 - Note: If you are scraping multiple times in a day, then you need to format the time stamp with hours that way you don't overwrite already exported data

In [31]:
finalfile = "ZillowSelium" + "_" + "{:%Y_%m_%d_%h_%m}".format(datetime.now()) +".csv"
finalfile

'ZillowSelium_2022_02_17_Feb_02.csv'

### Main Webscraping 

- Output results
- Page numbers
- URL 
- Selenium Setup


In [30]:
#Create a list that will hold the results

results = []


# Inspect the zillow website and figure out the number pages for rental ads use
# In the charlotte example, there are a total of 20 pages so I set the range at 21

for page in range(1,21,1):
    
    print("This is page :" + str(page))
    
    #Identify the Zillow URL of your City, it should follow this format:
    # 1. Default Zillow url : https://www.zillow.com/
    # 2. Name of your City: eg. charlotte-nc, atlanta-ga
    # 3. Pass the page number 
    # 4. Add the "_p" that is a default thing with the Zillow website 
    # 5. In a sample URL on page 15 for example will be like: https://www.zillow.com/charlotte-nc/rentals/15_p/
    
    url = "https://www.zillow.com/charlotte-nc/rentals/" +str(page) + '_p/'
    
    # Here we are going to utilize the selenium. To automate the interaction behavior of a web browser you would
    # need a web driver. Each browser has a webdriver, in my case I am using google chrome so I download the web driver
    # from this website "https://chromedriver.storage.googleapis.com/index.html?path=98.0.4758.80/" 
    
    # After downloading and extracting the web drive(chromdriver.exe) you use the webdrive.Chrome() method to initiate
    # the chrome browser and pass the path where the driver is saved.
    
    
    CraiglistBrowser = webdriver.Chrome("C:/Users/padu/Downloads/chromedriver_win32/chromedriver.exe")
    CraiglistBrowser.maximize_window()
    
    # After the browser has been launched use the get() to pass the url 
    Craiglist = CraiglistBrowser.get(url)
    CraiglistHTML = CraiglistBrowser.execute_script("return document.documentElement.outerHTML")
    soup = BeautifulSoup(CraiglistHTML, 'html.parser')
    CraiglistBrowser.quit()
    print(url)


    deck = soup.find('ul',{'class': 'photo-cards photo-cards_wow photo-cards_short'})

    for card in deck.contents: 
        script = card.find('script',  {'type': 'application/ld+json'})


        try: 
            if script:
            
                script_json = json.loads(script.contents[0])
               
                try:
                    descriptions = script_json['url']
                    CraiglistBrowser = webdriver.Chrome("C:/Users/padu/Downloads/chromedriver_win32/chromedriver.exe")
                    CraiglistBrowser.maximize_window()
                    Craiglist = CraiglistBrowser.get(descriptions)
                    CraiglistHTML = CraiglistBrowser.execute_script("return document.documentElement.outerHTML")
                    soup = BeautifulSoup(CraiglistHTML, 'html.parser')
                    CraiglistBrowser.quit()

                except:
                    pass

                loop_soup = BeautifulSoup(CraiglistHTML, 'html.parser')

                try:
                    loopresults2 = loop_soup.find('div', {'class' : 'ds-overview-section'}).text


                except:
                    pass


                results.append({
                                    'latitude': script_json['geo']['latitude'],
                                    'longitude': script_json['geo']['longitude'],
                                    'floorsize': script_json['floorSize']['value'],
                                    'streetaddress': script_json['name'],
                                    'zipcode': script_json['address']['postalCode'],
                                    'Locality': script_json['address']['addressLocality'],
                                    'url': script_json['url'],
                                    'price': card.find('div', {'class': 'list-card-price'}).text,
                                    'bedrooms': card.find('ul',{'class': 'list-card-details'}).text[0],
                                    'bedroomsLab': card.find('ul',{'class': 'list-card-details'}).find('li', {'class': ''}).text,
                                    'baths': card.find('ul',{'class': 'list-card-details'}).text[5],
                                    'overview' : loopresults2
                                })
        except KeyError :
            pass

        Zillowdata =  pd.DataFrame(results)
        Zillowdata.to_csv(finalfile, index = False)


This is page :1
https://www.zillow.com/charlotte-nc/rentals/1_p/
This is page :2
https://www.zillow.com/charlotte-nc/rentals/2_p/
This is page :3
https://www.zillow.com/charlotte-nc/rentals/3_p/
This is page :4
https://www.zillow.com/charlotte-nc/rentals/4_p/
This is page :5
https://www.zillow.com/charlotte-nc/rentals/5_p/
This is page :6
https://www.zillow.com/charlotte-nc/rentals/6_p/
This is page :7
https://www.zillow.com/charlotte-nc/rentals/7_p/
This is page :8
https://www.zillow.com/charlotte-nc/rentals/8_p/
This is page :9
https://www.zillow.com/charlotte-nc/rentals/9_p/
This is page :10
https://www.zillow.com/charlotte-nc/rentals/10_p/
This is page :11
https://www.zillow.com/charlotte-nc/rentals/11_p/
This is page :12
https://www.zillow.com/charlotte-nc/rentals/12_p/
This is page :13
https://www.zillow.com/charlotte-nc/rentals/13_p/
This is page :14
https://www.zillow.com/charlotte-nc/rentals/14_p/
This is page :15
https://www.zillow.com/charlotte-nc/rentals/15_p/
This is page 

In [7]:
time.sleep(0.0006)