## Missing Middle CoA Webscraping
Keagan H Rankin 6-07-2022
<br>
This file introduces a framework for automatically webscraping Toronto's the committee of adjustments website for missing middle buildings.<br>
It uses Selenium and BeautifulSoup to automate web operations. <br>
It stores drawings to the machine or a non-local database (if run on the virtual machine?).

In [1]:
# Data
import numpy as np
import scipy as sp
import pandas as pd

import re

# Webscraping
# Selenium|
#basics
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By

#wait tools
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# chaining actions
from selenium.webdriver.common.action_chains import ActionChains

# selecting items from a dropdown.
from selenium.webdriver.support.ui import Select

# Python modules: time and tkinter for accessing clipboard
from tkinter import Tk
import time
import os

# Beautiful Soup
from bs4 import BeautifulSoup

### Beautiful Soup Basics

In [2]:
# Docstring
html_doc = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>

<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>

<p class="story">...</p>
"""

In [3]:
souper = soup(html_doc, 'html.parser')

#print(souper.prettify())

### Selenium Basics

The code below uses Selenium to search for 405 Huron Road and return the file names within radius <br> <br>
USEFUL SELENIUM TOOLS: <br>
Getting started (https://selenium-python.readthedocs.io/getting-started.html) <br>
How to create explicit/implicit waits (https://selenium-python.readthedocs.io/waits.html) <br>
Clicking and other tasks with waiting!! (https://stackoverflow.com/questions/59130200/selenium-wait-until-element-is-present-visible-and-interactable/59130336#59130336)
(https://stackoverflow.com/questions/44912203/selenium-web-driver-java-element-is-not-clickable-at-point-x-y-other-elem/44916498#44916498)

In [5]:
# Open the chrome driver
path = "selenium webdriver/chromedriver.exe"
driver = webdriver.Chrome(path)

# Get the CoA website and print the website title
driver.get('https://secure.toronto.ca/AIC/index.do')

# ------ GOAL: Loop through the first page of the results ------
# Get Search Bar: ID>NAME>CLASS[i]
search = driver.find_element(By.ID, 'address') #HTML id

# Search for address and hit enter
search.send_keys('405 huron rd')
search.send_keys(Keys.RETURN)

# Scroll to make sure the buttons are in view. Find the dropdown button. Click on it https://stackoverflow.com/questions/41744368/scrolling-to-element-using-webdriver
show_results = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.ID, "showResultsLink")))
actions = ActionChains(driver)
driver.execute_script("arguments[0].scrollIntoView(true);", show_results)
time.sleep(0.5)
show_results.click()

# Click on the first address and open the hyperlink of the application.
address = WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.ID, "195314"))).click()
application = WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CLASS_NAME, "detailLink"))).click()

# Store the address. Access the panel and return the enclosed HTML information.
address = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.ID, 'main-Property')))
address_soup = address.get_attribute('innerHTML')
print(address_soup)

description = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.ID, 'detail0')))
actions = ActionChains(driver)
driver.execute_script("arguments[0].scrollIntoView(true);", description)
time.sleep(0.5)
description_soup = description.get_attribute('innerHTML')
print(description_soup)

# Close the window so we can continue looping through the results.
# only clickable if the XPATH is called.
closer = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH, '//*[@id="printThis"]/div/div[1]/button'))) 
driver.execute_script("arguments[0].scrollIntoView(true);", closer)
time.sleep(0.5)
driver.execute_script("arguments[0].click();", closer)


time.sleep(3)
# Quit the driver
driver.quit()
print('[INFO] done')

  driver = webdriver.Chrome(path)



<div class="row">
	<div class="col-xs-12"><strong>6 ADMIRAL RD </strong></div>
</div>
<div class="row">
	<div class="col-md-6 col-xs-9">Ward 11: University-Rosedale</div>
	<div class="col-md-6 col-xs-3 text-right hidden-print" id="btnPrint" style="padding-right:45px;"><span class="glyphicon glyphicon-print" title="Print this application's details" style="font-size:18px;"></span></div>
</div>


	<div class="row">
		<div class="col-sm-4 col-xs-6" style="text-align:right;"><strong>Application Number:</strong></div>
        <div class="col-sm-8 col-xs-6">21 159268 STE 11 MV</div>
	</div>
	<div class="row">
		<div class="col-sm-4 col-xs-6" style="text-align:right;"><strong>Application Type:</strong></div>
        <div class="col-sm-8 col-xs-6">Minor Variance</div>
	</div>
	<div class="row">
		<div class="col-sm-4 col-xs-6" style="text-align:right;"><strong>Date Submitted:</strong></div>
        <div class="col-sm-8 col-xs-6">22/05/2021</div>
	</div>
	<div class="row">
		<div class="col-sm-

That works. Now we can put the above into a loop that will extract the address and description information about all given properties for a given address. <br>
We will also use beautifulsoup to parse the html and return just human readable data. This can be dumped into a .csv or something <br>
<br>
There are 10 addresses on a page. If there are less than 10 addresses, then we are on the last page. This gives the program a stopping condition. If mod10 of the last page is 0 then we need a 2nd stopping condition which is that the next button doesn't give us new information.<br>
Another thing we probable want to do at the beginning of the webscrape is double the search radius in the "more filters" dropdown <br>

### Webscraping Object for CoA
Above is implmented below with OOP for flexible use.

In [2]:
class CoAWebscraper:
    """
    A webscraper that scrapes data from Toronto's
    Committee of Adjustments web portal.
    """
    
    # Initialize with the webdriver and link.
    def __init__(self, path, webaddress):
        """
        instantiate the path to the webdriver
        and the address of the portal
        """
        # Vars def by user
        self.path = path
        self.web_address = webaddress
        
        # Vars for storing webscraped data.
        self.addresses = []
        self.descriptions = []
    
    
    def open_webdriver(self):
        """
        Opens the webdriver to the correct web_address.
        """
        # Open the chrome driver
        path = "selenium webdriver/chromedriver.exe"
        self.driver = webdriver.Chrome(self.path)

        # Get the CoA website and print the website title
        self.driver.get(self.web_address)
        

    def close_webdriver(self, delay):
        """
        CLoses the webdriver after a given time delay
        """
        time.sleep(delay)
        self.driver.quit()
        print('[Info] driver closed :)')
        
        
    def next_page(self):
        """
        Proceeds to the next page of the CoA results table (DataTables_Table_0).
        """
        # Find the next button by class, scroll to it, click
        next_button = WebDriverWait(self.driver, 20).until(EC.presence_of_element_located((By.ID, "DataTables_Table_0_next")))
        self.driver.execute_script("arguments[0].scrollIntoView(true);", next_button)
        next_button.click()
        

    def search_address(self, address):
        """
        Searches for the given Toronto address in the webdriver. Opens the search results table.
        """
        # First maximize the search radius under the more_filters dropdown.
        more_filters = WebDriverWait(self.driver, 20).until(EC.element_to_be_clickable((By.ID, "mapSearchBtn1"))).click()
        search_radius = WebDriverWait(self.driver, 20).until(EC.presence_of_element_located((By.ID, "radius")))
        actions = ActionChains(self.driver)
        self.driver.execute_script("arguments[0].scrollIntoView(true);", search_radius)
        time.sleep(0.5)
        select = Select(self.driver.find_element_by_id('radius'))
        select.select_by_value('1000')
        
        # Get Search Bar: ID>NAME>CLASS[i]
        search = WebDriverWait(self.driver, 20).until(EC.presence_of_element_located((By.ID, "address")))
        self.driver.execute_script("arguments[0].scrollIntoView(true);", search)
        time.sleep(0.5)

        # Search for address and hit enter
        search.send_keys(address)
        search.send_keys(Keys.RETURN)
        
        # Scroll to make sure the buttons are in view. Find the dropdown button. Click on it https://stackoverflow.com/questions/41744368/scrolling-to-element-using-webdriver
        show_results = WebDriverWait(self.driver, 20).until(EC.presence_of_element_located((By.ID, "showResultsLink")))
        self.driver.execute_script("arguments[0].scrollIntoView(true);", show_results)
        #time.sleep(0.8)
        show_results = WebDriverWait(self.driver, 20).until(EC.element_to_be_clickable((By.ID, "showResultsLink")))
        show_results.click()


    def loop_first_instance(self):
        """
        Performs one instance of the scrape loop. Used for testing/tweaking scraper.
        """
        # Scroll to make sure the buttons are in view. Find the dropdown button. Click on it.
        show_results = WebDriverWait(self.driver, 20).until(EC.presence_of_element_located((By.ID, "showResultsLink")))
        self.driver.execute_script("arguments[0].scrollIntoView(true);", show_results)
        time.sleep(0.5)
        show_results.click()

        # Click on the first address and open the hyperlink of the application.
        address = WebDriverWait(self.driver, 20).until(EC.element_to_be_clickable((By.ID, "195314"))).click()
        application = WebDriverWait(self.driver, 20).until(EC.element_to_be_clickable((By.CLASS_NAME, "detailLink"))).click()

        # Store the address. Access the panel and return the enclosed HTML information.
        address = WebDriverWait(self.driver, 20).until(EC.visibility_of_element_located((By.ID, 'main-Property')))
        address_soup = address.get_attribute('innerHTML')
        print(address_soup)

        description = WebDriverWait(self.driver, 20).until(EC.presence_of_element_located((By.ID, 'detail0')))
        self.driver.execute_script("arguments[0].scrollIntoView(true);", description)
        time.sleep(0.5)
        description_soup = description.get_attribute('innerHTML')
        print(description_soup)

        # Close the window so we can continue looping through the results.
        # only clickable if the XPATH is called.
        closer = WebDriverWait(self.driver, 20).until(EC.presence_of_element_located((By.XPATH, '//*[@id="printThis"]/div/div[1]/button'))) 
        self.driver.execute_script("arguments[0].scrollIntoView(true);", closer)
        self.driver.execute_script("arguments[0].click();", closer)
    
        
    def open_one_page(self):
        """
        Loops through the first page of the results table, opening all development applications.
        This function is cleaner because it uses find_elements, but raises exceptions after the
        previous page has been looped.
        """
        # XPATH seems to not work, try using even/odd class names instead 
        # Store the elements then loop through child instances.
        # simpler and more stable then 2n + 1 loop.
        results = self.driver.find_elements(By.CLASS_NAME, "propertyAddr")
        
        for re in results:
            self.driver.execute_script("arguments[0].scrollIntoView(true);", re)
            re.click()

            
    def count_results_on_page(self):
        """
        Counts the number of applications on a given page.
        Used for stop condition for looping an entire data table.
        """
        c = self.driver.find_elements(By.CLASS_NAME, "propertyAddr")
        count = len(c)
        
        return count

        
    def return_one_page_soup(self, long):        
        """
        Loops through applications on a given page, returning information in a nice text format.
        loop_one_page() MUST be run BEFORE this function for a given page to open up the applications.
        long = Boolean, whether to print longform stored data at the end.
        """
        
        # Create a list for the address and description text
        a_list = []
        d_list = []
        
        # Loop through the the tr tags in the data table dynamically.
        # We can try using a better XPATH, or use the "detail link" CLASS_NAME and loop through them on the page.
        apps = self.driver.find_elements(By.CLASS_NAME, "detailLink")
        print("[Info] storing addresses and descriptions.")
        
        for app in apps:
            self.driver.execute_script("arguments[0].scrollIntoView(true);", app)
            app.click()
            
            # Store the address. Access the panel and return the enclosed HTML information.
            address_txt = WebDriverWait(self.driver, 20).until(EC.visibility_of_element_located((By.ID, 'main-Property')))
            address_soup = address_txt.get_attribute('innerHTML')
            
            # Use Beautiful soup to return in a nice format
            # How I will output the values.
            address_soup = BeautifulSoup(address_soup, 'html.parser')
            a_text = str(address_soup.get_text())
            a_text_nice = [x.strip().lower() for x in a_text.split('\n')]
            a_text_nice = [x for x in a_text_nice if x]
            
            print("Address instance: {}".format(a_text_nice[0]))
            a_list.append(a_text_nice)
            
            description_txt = WebDriverWait(self.driver, 20).until(EC.presence_of_element_located((By.ID, 'detail0')))
            self.driver.execute_script("arguments[0].scrollIntoView(true);", description_txt)
            time.sleep(0.5)
            description_soup = description_txt.get_attribute('innerHTML')
            description_soup = BeautifulSoup(description_soup, 'html.parser')
            d_text = str(description_soup.get_text())
            d_text_nice = [x.strip().lower() for x in d_text.split('\n')]
            d_text_nice = [x for x in d_text_nice if x]
            
            # Here we get the application link as the final part of the description list.
            # Using the method get_application_link() defined below.
            a_link = self.get_application_link()
            
            # For description info, we want to store them in a list of dictionaries.
            # to retain the format and compare across instances.
            # Try except this. If we get an IndexError (missing description or other),
            # Then we can fill that row with Nulls and continue.
            try:
                d_dict = {"application number": d_text_nice[1],
                          "application type": d_text_nice[3],
                          "date submitted": d_text_nice[5],
                          "status": d_text_nice[7],
                          "description": d_text_nice[9],
                          "link": a_link
                         }
            except (IndexError):
                print('[Info] missing value, filling w/ null')
                d_dict = {"application number": '',
                          "application type": '',
                          "date submitted": '',
                          "status": '',
                          "description": '',
                          "link": a_link
                         }
            
            #print(d_text_nice)
            d_list.append(d_dict)
            
            # Close the window so we can continue looping through the results.
            # only clickable if the XPATH is called.
            closer = WebDriverWait(self.driver, 20).until(EC.presence_of_element_located((By.XPATH, '//*[@id="printThis"]/div/div[1]/button'))) 
            self.driver.execute_script("arguments[0].scrollIntoView(true);", closer)
            time.sleep(0.5)
            self.driver.execute_script("arguments[0].click();", closer)

        # Print total aggregated data.
        if long == True:
            print("address headers:")
            print(a_list)
            print("description dict list:")
            print(d_list)
        
        print("[Info] page scraping complete.")
        return a_list, d_list
        
        
    def get_application_link(self):
        """
        Function gets the application link text when in the application details pop up.
        During webscraping this needs to be embedded into the return_on_page_soup() function.
        """
        # Scroll to the accordion dropdown, open it.
        accordion = WebDriverWait(self.driver, 20).until(EC.presence_of_element_located((By.XPATH,'//*[@id="headingAppDtlUrl"]/h2/a')))
        self.driver.execute_script("arguments[0].scrollIntoView(true);", accordion)
        time.sleep(0.1)
        self.driver.execute_script("arguments[0].click();", accordion)
        
        # Copy the app link to the clipboard, return to a variable using built-in python module.
        # https://stackoverflow.com/questions/64720945/python-selenium-code-to-save-text-in-a-variable-from-clipboard-which-is-copied-t
        app_clipboard = WebDriverWait(self.driver, 20).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="collapseAppDtlUrl"]/div/div/div/div/div/span/button')))
        self.driver.execute_script("arguments[0].click();", app_clipboard)
        time.sleep(0.1)
        app_clipboard.send_keys(Keys.CONTROL, "c")
        
        # Return the link using tkinter.
        return Tk().clipboard_get()
    
    
        

##### --- The Testing Zone ---
Testing the functions created above in single instances

In [48]:
# Initialize, open
webby = CoAWebscraper("selenium webdriver/chromedriver.exe", 'https://secure.toronto.ca/AIC/index.do')
webby.open_webdriver()

  self.driver = webdriver.Chrome(self.path)


In [49]:
# Search the address
webby.search_address('405 huron rd')

  select = Select(self.driver.find_element_by_id('radius'))


In [50]:
# --- SCRAPPING THE DATA AND STORE IT ---
# Comment and uncomment as required

# Here is just one instance of the webscraping loop. Just prints the soup.
#webby.loop_first_instance()

# Here is an instance of loop through the entire page and storing the results.
webby.open_one_page()

In [48]:
#webby.count_results_on_page()

In [51]:
ta, td = webby.return_one_page_soup(long=True)

[Info] storing addresses and descriptions.
Address instance: 35 admiral rd
Address instance: 35 admiral rd
Address instance: 6 admiral rd
Address instance: 26 albany ave
Address instance: 104 avenue rd
Address instance: 110 avenue rd
Address instance: 110 avenue rd
Address instance: 110 avenue rd
Address instance: 110 avenue rd
Address instance: 121 avenue rd
Address instance: 148 avenue rd
address headers:
[['35 admiral rd', 'ward 11: university-rosedale'], ['35 admiral rd', 'ward 11: university-rosedale'], ['6 admiral rd', 'ward 11: university-rosedale'], ['26 albany ave', 'ward 11: university-rosedale'], ['104 avenue rd', 'ward 11: university-rosedale'], ['110 avenue rd', 'ward 11: university-rosedale'], ['110 avenue rd', 'ward 11: university-rosedale'], ['110 avenue rd', 'ward 11: university-rosedale'], ['110 avenue rd', 'ward 11: university-rosedale'], ['121 avenue rd', 'ward 11: university-rosedale'], ['148 avenue rd', 'ward 11: university-rosedale']]
description dict list:
[{'ap

In [55]:
ta

[['35 admiral rd', 'ward 11: university-rosedale'],
 ['35 admiral rd', 'ward 11: university-rosedale'],
 ['6 admiral rd', 'ward 11: university-rosedale'],
 ['26 albany ave', 'ward 11: university-rosedale'],
 ['104 avenue rd', 'ward 11: university-rosedale'],
 ['110 avenue rd', 'ward 11: university-rosedale'],
 ['110 avenue rd', 'ward 11: university-rosedale'],
 ['110 avenue rd', 'ward 11: university-rosedale'],
 ['110 avenue rd', 'ward 11: university-rosedale'],
 ['121 avenue rd', 'ward 11: university-rosedale'],
 ['148 avenue rd', 'ward 11: university-rosedale']]

In [8]:
webby.next_page()

In [38]:
webby.get_application_link()

http://app.toronto.ca/AIC/index.do?folderRsn=X9iL64%2Bim7SgzCVUcfOe6A%3D%3D


In [9]:
#close the webdriver
webby.close_webdriver(0.1)

[Info] driver closed :)


### Further Steps
Try chaining object methods together to do whole loops of the CoA results table for a searched address.

In [5]:
def full_scrap(webscraper, address):
    """
    testing a full loop of the CoA results table for an address.
    Defines a stoping condition for looping when the final page is reached.
    """
    # Open the driver
    webscraper.open_webdriver()
    time.sleep(0.1)
    
    # Search an address
    webscraper.search_address(address)
    time.sleep(0.1)
    
    i = 0
    res_count = 0
    # Loop through pages
    while True:
        i= i+1 
        try:
            # init the stop condition: count elements on a page.
            res = webscraper.count_results_on_page()
            res_count = res_count + res
            
            # Loop through the pages, skipping pages that hang.
            webscraper.open_one_page()
            time.sleep(0.5)
            
            # Try printing the text data for each page, suppress long print.
            ta, td = webscraper.return_one_page_soup(long=False)
            
            print("some results from this page:")
            print(ta)
            
            # Stop check before proceeding to next page.
            print('Page: {}, Results: {} '.format(i, res))
            if res < 10:
                print('[Stop Condition] reached final page.')
                break
                
            webscraper.next_page()
            time.sleep(0.5)
                      
        except:
            print('[Error] page {} hanging, skipped.'.format(i))
            webscraper.next_page()
            break
 
    print('[Info] Total Results: {} '.format(res_count))
    print('[Info] complete.')
    webscraper.close_webdriver(0.1)
    

In [None]:
pwrful_webby = CoAWebscraper("selenium webdriver/chromedriver.exe", 'https://secure.toronto.ca/AIC/index.do')
full_scrap(pwrful_webby, '405 huron rd')

  self.driver = webdriver.Chrome(self.path)
  select = Select(self.driver.find_element_by_id('radius'))


[Info] storing addresses and descriptions.
Address instance: 35 admiral rd
Address instance: 35 admiral rd
Address instance: 6 admiral rd
Address instance: 26 albany ave
Address instance: 104 avenue rd
Address instance: 110 avenue rd
Address instance: 110 avenue rd
Address instance: 110 avenue rd
Address instance: 110 avenue rd
Address instance: 121 avenue rd
Address instance: 148 avenue rd
[Info] page scraping complete.
some results from this page:
[['35 admiral rd', 'ward 11: university-rosedale'], ['35 admiral rd', 'ward 11: university-rosedale'], ['6 admiral rd', 'ward 11: university-rosedale'], ['26 albany ave', 'ward 11: university-rosedale'], ['104 avenue rd', 'ward 11: university-rosedale'], ['110 avenue rd', 'ward 11: university-rosedale'], ['110 avenue rd', 'ward 11: university-rosedale'], ['110 avenue rd', 'ward 11: university-rosedale'], ['110 avenue rd', 'ward 11: university-rosedale'], ['121 avenue rd', 'ward 11: university-rosedale'], ['148 avenue rd', 'ward 11: universi

Now put it all together. Scrap and store the data in a .csv file, removing/marking exact duplicates. <br>
facilitate via pandas, it has more flexibility than the basic python write to .csv

In [6]:
def full_scrap_store(webscraper, address):
    """
    testing a full loop of the CoA results table for an address.
    Defines a stoping condition for looping when the final page is reached.
    input -> webscraper instance, an address to search.
    output -> pandas df of unique instances.
    
    Close the webdriver at any time to halt the function, and
    you will get whatever has been stored up to that point.
    """
    # Open the driver
    webscraper.open_webdriver()
    time.sleep(0.1)
    
    # Search an address
    try:
        webscraper.search_address(address)
        time.sleep(0.1)
    except:
        print('[Error] search failed. Something went wrong, try running the function again.')
    
    # Init counters and the dataframe
    i = 0
    res_count = 0
    cols = ['address', 'ward', 'application number', 'application type', 'date submitted', 'status', 'description', 'link']
    output_df = pd.DataFrame(columns = cols)
    
    # Loop through pages
    while True:
        i= i+1
        try:
            # init the stop condition: count elements on a page.
            res = webscraper.count_results_on_page()
            res_count = res_count + res

            # Loop through the pages, skipping pages that hang.
            webscraper.open_one_page()
            time.sleep(0.5)

            # Extract text data for each page, suppress long print.
            ta, td = webscraper.return_one_page_soup(long=False)

            # Append the data to the df, drop exact duplicate rows,
            # reset the index to default increasing integers.
            td_df = pd.DataFrame(td)
            ta_df = pd.DataFrame(ta, columns = ['address','ward'])

            page_data = pd.concat([ta_df, td_df], axis=1)
            output_df = pd.concat([output_df, page_data], axis=0)
            output_df = output_df.drop_duplicates()
            output_df = output_df.reset_index(drop=True)

            # Stop check before proceeding to next page.
            print('Page: {}, Results: {} '.format(i, res))
            if res <= 10:
                print('[Stop Condition] reached final page.')
                break

            webscraper.next_page()
            time.sleep(0.5)
        
        except Exception as e:
            print('[Error] something went wrong. Exception: ')
            print(e)
            break

 
    print('[Info] complete. Summary:')
    print('[Info] Total Addresses looped: {} '.format(res_count))
    print('[Info] Total Unique Results: {}'.format(output_df.shape[0]))
    
    webscraper.close_webdriver(0.1)
    
    return output_df
 

In [4]:
pwrful_webby = CoAWebscraper("selenium webdriver/chromedriver.exe", 'https://secure.toronto.ca/AIC/index.do')
outputs = full_scrap_store(pwrful_webby, '756 dovercourt rd')#'405 huron rd')

#GETS TO PG 14, GET EXCEPTION "LIST INDEX OUT OF RANGE". hmmmm
#FIX THIS BUT PRETTY GOOD -> error is happening when an application has no description!
#ADD TRY EXCEPT ONTO THE ABOVE FUNCTION SO THAT WE CAN EXIT ANYTIME AND STILL GET THE OUTPUT_DF

  self.driver = webdriver.Chrome(self.path)
  select = Select(self.driver.find_element_by_id('radius'))


[Info] storing addresses and descriptions.
Address instance: 990 bloor st w
Address instance: 14 bartlett ave
Address instance: 52 bartlett ave
Address instance: 52 bartlett ave
Address instance: 65 bartlett ave
Address instance: 270 barton ave
Address instance: 90 croatia st
Address instance: 737 bloor st w
Address instance: 834 bloor st w
Address instance: 834 bloor st w
Address instance: 990 bloor st w
Address instance: 990 bloor st w
[Info] page scraping complete.
Page: 1, Results: 11 
[Info] storing addresses and descriptions.
Address instance: 990 bloor st w
Address instance: 990 bloor st w
Address instance: 990 bloor st w
Address instance: 451 1/2 brock ave
Address instance: 744 brock ave
Address instance: 29 carling ave
Address instance: 1049 college st
Address instance: 825 college st
Address instance: 877 college st
Address instance: 877 college st
Address instance: 877 college st
[Info] page scraping complete.
Page: 2, Results: 11 
[Info] storing addresses and descriptions.


In [5]:
# Export the webscraped data.
def scrapped_to_csv(path, output):
    """
    Small function to store scraped data to csv.
    input -> path, the output table for the run.
    checks if path exists then appends.
    """
    print('[Info] printing to {}'.format(path))
    output.to_csv(path, mode='a', header=not os.path.exists(path))
    
    
path = "C:/Users/Keagan Rankin/OneDrive - University of Toronto/Saxe - Rankin/Missing Middle/Missing Middle Code & Results/mm webscraping/756_dovercourt_rd.csv"
#scrapped_to_csv(path, outputs)
outputs 

[Info] printing to C:/Users/Keagan Rankin/OneDrive - University of Toronto/Saxe - Rankin/Missing Middle/Missing Middle Code & Results/mm webscraping/756_dovercourt_rd.csv


Unnamed: 0,address,ward,application number,application type,date submitted,status,description,link
0,990 bloor st w,ward 9: davenport,17 207556 ste 18 oz,rezoning,28/07/2017,omb appeal,zoning by-law amendment for a 11-storey plus m...,http://app.toronto.ca/AIC/index.do?folderRsn=r...
1,14 bartlett ave,ward 9: davenport,22 114538 ste 09 mv,minor variance,15/02/2022,closed,to alter the existing two-storey semi-detached...,http://app.toronto.ca/AIC/index.do?folderRsn=k...
2,52 bartlett ave,ward 9: davenport,13 241061 ste 18 oz,rezoning,24/09/2013,closed,an application has been filed with the city of...,http://app.toronto.ca/AIC/index.do?folderRsn=j...
3,65 bartlett ave,ward 9: davenport,22 103728 ste 09 mv,minor variance,13/01/2022,closed,to alter the existing 2½-storey detached dwell...,http://app.toronto.ca/AIC/index.do?folderRsn=k...
4,270 barton ave,ward 11: university-rosedale,18 250373 ste 19 sa,site plan approval,31/10/2018,noac issued,proposal to construct a new three-storey eleme...,http://app.toronto.ca/AIC/index.do?folderRsn=s...
...,...,...,...,...,...,...,...,...
81,804 shaw st,ward 11: university-rosedale,22 109048 ste 11 mv,minor variance,31/01/2022,accepted,alter the existing semi-detached dwelling by c...,http://app.toronto.ca/AIC/index.do?folderRsn=p...
82,998 shaw st,ward 11: university-rosedale,22 129407 ste 11 mv,minor variance,01/04/2022,tentatively scheduled,to alter the existing two-storey detached dwel...,http://app.toronto.ca/AIC/index.do?folderRsn=b...
83,57 sylvan ave,ward 9: davenport,21 249724 ste 09 mv,minor variance,17/12/2021,closed,to convert the existing two-storey building fr...,http://app.toronto.ca/AIC/index.do?folderRsn=T...
84,19 wallace ave,ward 9: davenport,22 108537 ste 09 mv,minor variance,28/01/2022,hearing scheduled,to alter the existing two-storey detached dwel...,http://app.toronto.ca/AIC/index.do?folderRsn=e...


##### --- The Junk Zone ---
Random ideas and method development while developing the webscraper

In [80]:
foo = """<div class="row">
	<div class="col-xs-12"><strong>35 ADMIRAL RD </strong></div>
</div>
<div class="row">
	<div class="col-md-6 col-xs-9">Ward 11: University-Rosedale</div>
	<div class="col-md-6 col-xs-3 text-right hidden-print" id="btnPrint" style="padding-right:45px;"><span class="glyphicon glyphicon-print" title="Print this application's details" style="font-size:18px;"></span></div>
</div>


	<div class="row">
		<div class="col-sm-4 col-xs-6" style="text-align:right;"><strong>Application Number:</strong></div>
        <div class="col-sm-8 col-xs-6">21 157248 STE 11 MV</div>
	</div>
	<div class="row">
		<div class="col-sm-4 col-xs-6" style="text-align:right;"><strong>Application Type:</strong></div>
        <div class="col-sm-8 col-xs-6">Minor Variance</div>
	</div>
	<div class="row">
		<div class="col-sm-4 col-xs-6" style="text-align:right;"><strong>Date Submitted:</strong></div>
        <div class="col-sm-8 col-xs-6">19/05/2021</div>
	</div>
	<div class="row">
		<div class="col-sm-4 col-xs-6" style="text-align:right;"><strong>Status:</strong></div>
        <div class="col-sm-8 col-xs-6">TLAB Appeal</div>
	</div>
	<div class="row">
		<div class="col-sm-4 col-xs-6" style="text-align:right;"><strong>Description:</strong></div>
        
        <div class="col-sm-8 col-xs-6"></div>
	</div>"""

"""<div class="col-sm-8 col-xs-6">To alter the existing three-storey dwelling by constructing: a front porch, and to permit a front yard parking space and a driveway. Also, to alter the interior of the third floor.</div>
	</div>"""

# How I will output the values.
soup = BeautifulSoup(foo, 'html.parser')
text = str(soup.get_text())
text_nice = [x.strip() for x in text.split('\n')]
text_nice = [x for x in text_nice if x]
if len(

text_nice

['35 ADMIRAL RD',
 'Ward 11: University-Rosedale',
 'Application Number:',
 '21 157248 STE 11 MV',
 'Application Type:',
 'Minor Variance',
 'Date Submitted:',
 '19/05/2021',
 'Status:',
 'TLAB Appeal',
 'Description:']

In [48]:
cols = ['address', 'ward', 'application number', 'application type', 'date submitted', 'status', 'description']
output = pd.DataFrame(columns = cols)

In [60]:
td_df = pd.DataFrame(td)
ta_df = pd.DataFrame(ta, columns = ['address','ward'])

foo1 = pd.concat([ta_df, td_df], axis=1)

output = pd.concat([output, foo1], axis=0)

output = output.drop_duplicates()

#output.shape[0]
#foo1
output

Unnamed: 0,address,ward,application number,application type,date submitted,status,description
0,35 admiral rd,ward 11: university-rosedale,21 157248 ste 11 mv,minor variance,19/05/2021,tlab appeal,to alter the existing three-storey dwelling by...
1,35 admiral rd,ward 11: university-rosedale,22 117902 s45 11 tlab,toronto local appeal body,28/02/2022,hearing scheduled,to alter the existing three-storey dwelling by...
2,6 admiral rd,ward 11: university-rosedale,21 159268 ste 11 mv,minor variance,22/05/2021,postponed,to alter the existing two-storey semi-detached...
3,26 albany ave,ward 11: university-rosedale,22 127776 ste 11 mv,minor variance,28/03/2022,tentatively scheduled,proposed construction of 2-car garage in rear ...
4,104 avenue rd,ward 11: university-rosedale,21 152076 ste 11 mv,minor variance,08/05/2021,accepted,to alter the existing three-storey semi-detach...
5,110 avenue rd,ward 11: university-rosedale,21 207592 ste 11 oz,rezoning,02/09/2021,under review,zoning by-law amendment to facilitate the deve...
9,121 avenue rd,ward 11: university-rosedale,18 149949 ste 27 sa,site plan approval,27/04/2018,under review,site plan control application for an 8-storey ...
10,148 avenue rd,ward 11: university-rosedale,21 178720 ste 11 oz,rezoning,02/07/2021,under review,zoning by-law amendment application to facilit...
