# Scraping Huizenzoeker.nl to Analyse the Dutch Housing Market

### Introduction
Which places in the Netherlands are hit hardest by the Dutch Housing crisis, and which the least?
Momentarily, the housing crisis is one of the most prominent societal challenges in the Netherlands. This script scrapes information of the Dutch housing market, enabling use to analyse the housing market and clearify which areas are hit hardest by the housing crisis. This script provides information such as gem. vraagprijs, # verkochte woningen, gem. vierkante meter prijs, and % overboden. The dataframe that is generated through this script offers very interesting information, for example for first-time buyers that are having a hard time purchasing their first home on the current stressed Dutch housing market.

The script is divided into seven steps:
* **Step 1. Loading all the basics**: this step loads all the relevent packages and sets up the BeautifulSoup basis.
* **Step 2. Collecting the municipality URLs**: this step collects the urls of the municipalities in the Netherlands. For this step, we first need to create a list of the province URLs (twelve in total; for each province in the Netherlands). From these twelve province URLs, we are able to scrape the municipality URLs, since each province URL offers the opportunity to navigate to their corresponding municipalities.
* **Step 3. Scraping data from URLs (municipality-level)**: this step scrapes the data from the municipality URLs that we have generated in step 2.
* **Step 4: Exporting dataframe as CSV file**: this step exports the final dataframe as CSV file.
* **Step 5: Providing summary statistics**: this step provides summary statistics for the final dataframe that we have generated in R, by loading the CSV file that we exported in step 4 in R.
* **Step 6: Scraping data from woningmarkt dashboard (province-level)**: this step scrapes the data from each province. The same code that was used for scraping data on municipality-level is employed in this step.
* **Step 7: Exporting dashboard data as CSV file**: this step exports the provinces data as CSV file.

## Step 1: Loading all the basics

These packages are needed to run our scraper, so make sure you install/load these first!

In [37]:
from bs4 import BeautifulSoup #for BeautifulSoup
import requests
import re
import pandas as pd 
import time 
import json
from selenium import webdriver #for Selenium
from webdriver_manager.chrome import ChromeDriverManager

To use BeautifulSoup, we request to use the source code of the huizenzoeker woningmarkt page. 

In [42]:
url = 'https://www.huizenzoeker.nl/woningmarkt/'
res = requests.get(url)
soup = BeautifulSoup(res.text, 'html.parser')

## Step 2: Collecting the municipality URLs

We first construct a base url and a province_url which once appended together creates the URL to each of the woningmarkt pages for each province. We generate the generate_links() function to append these parts of the URL together. 

In [43]:
base_url = 'https://www.huizenzoeker.nl/woningmarkt/'
province_url = ['noord-holland/', 'zuid-holland/', 'zeeland/', 'noord-brabant/', 'utrecht/', 'flevoland/', 
                'friesland/', 'groningen/', 'drenthe/', 'overijssel/', 'gelderland/', 'limburg/']

In [44]:
def generate_links(base_url,province_url): 
    page_links = []
    for i in province_url:
        full_links = base_url + i
        page_links.append(full_links)  
    return page_links

page_links = generate_links(base_url,province_url)

We then use this list of all provinces, to extract all municipalities from each, making use of window handling. 

In [45]:
driver = webdriver.Chrome(ChromeDriverManager().install())



Current google-chrome version is 94.0.4606
Get LATEST driver version for 94.0.4606
Driver [C:\Users\danie\.wdm\drivers\chromedriver\win32\94.0.4606.61\chromedriver.exe] found in cache


In [46]:
page_urls_full = []

for link in page_links:
    driver.switch_to.window(driver.window_handles[-1])
    driver.get(link)
    time.sleep(2)
    
    for elem in driver.find_elements_by_xpath("//li//div//a[@href]"):
        urls = elem.get_attribute('href')
        page_urls_full.append(urls)

In [16]:
#subset = page_urls_full[30:33] # use subset to try out on few urls (for time convenience)

In [48]:
page_urls_full

['https://www.huizenzoeker.nl/woningmarkt/noord-holland/aalsmeer/',
 'https://www.huizenzoeker.nl/woningmarkt/noord-holland/alkmaar/',
 'https://www.huizenzoeker.nl/woningmarkt/noord-holland/amstelveen/',
 'https://www.huizenzoeker.nl/woningmarkt/noord-holland/amsterdam/',
 'https://www.huizenzoeker.nl/woningmarkt/noord-holland/beemster/',
 'https://www.huizenzoeker.nl/woningmarkt/noord-holland/bergen-nh/',
 'https://www.huizenzoeker.nl/woningmarkt/noord-holland/beverwijk/',
 'https://www.huizenzoeker.nl/woningmarkt/noord-holland/blaricum/',
 'https://www.huizenzoeker.nl/woningmarkt/noord-holland/bloemendaal/',
 'https://www.huizenzoeker.nl/woningmarkt/noord-holland/castricum/',
 'https://www.huizenzoeker.nl/woningmarkt/noord-holland/den-helder/',
 'https://www.huizenzoeker.nl/woningmarkt/noord-holland/diemen/',
 'https://www.huizenzoeker.nl/woningmarkt/noord-holland/drechterland/',
 'https://www.huizenzoeker.nl/woningmarkt/noord-holland/edam-volendam/',
 'https://www.huizenzoeker.nl/w

## Step 3: Scrape data from each url (municipality-level)

For each municipality we extract:
* *Trend data*: gem. vraagprijs, verkochte woningen, gem.vierkantemeter prijs, % overboden (and how these numbers how changed t.o.v. vorige maand) 
* *Other information*: besteedbaar inkomen, aantal inwoners

#### Warning: Running the next cell for 'page_urls_full' will take aprox. 30 minutes. You might want to replace page_urls_full for 'subset'!

In [49]:
fn = 'saved_data.json'

def extract_city_trends(page_urls_full):
    trend_list = []
    for page_url in page_urls_full:
        driver.get(page_url)
        time.sleep(5) 
        soup = BeautifulSoup(driver.page_source, 'html.parser')
            #Province name
        province_name = soup.find_all('a')[6].get_text()
            # City name
        city_name = soup.find_all('h2')[0].get_text()
        city_name = city_name.replace('Woningmarkt','')
        city_name = city_name.replace(' ', '')
            # Gemiddelde vraagprijs
        content = soup.find_all(class_='trend-graph')[0]
        if content.find(class_="trend-graph-icon") == None:
            gem_vraagprijs = content.find("h3",{"class":"trend-graph-value"}).get_text()
            tov_vorige_maand_vraagprijs = "NA"
        else:
            if content.find(class_="trend-graph-pill trend-down") != None:
                gem_vraagprijs = content.find("h3",{"class":"trend-graph-value"}).get_text()
                gem_vraagprijs = gem_vraagprijs.replace("(","")
                gem_vraagprijs = gem_vraagprijs.replace(",)","")
                gem_vraagprijs = gem_vraagprijs.replace(".", ",")
                tov_vorige_maand_vraagprijs = content.find("div",{"class":"trend-graph-pill trend-down"}).get_text()
                tov_vorige_maand_vraagprijs = tov_vorige_maand_vraagprijs.replace("\n\n","")
                tov_vorige_maand_vraagprijs = tov_vorige_maand_vraagprijs.replace(" t.o.v. vorige maand\n","")
            else:
                gem_vraagprijs = content.find("h3",{"class":"trend-graph-value"}).get_text()
                gem_vraagprijs = gem_vraagprijs.replace("(","")
                gem_vraagprijs = gem_vraagprijs.replace(",)","")    
                gem_vraagprijs = gem_vraagprijs.replace(".", ",")
                tov_vorige_maand_vraagprijs = content.find("div",{"class":"trend-graph-pill"}).get_text()
                tov_vorige_maand_vraagprijs = tov_vorige_maand_vraagprijs.replace("\n\n","")
                tov_vorige_maand_vraagprijs = tov_vorige_maand_vraagprijs.replace(" t.o.v. vorige maand\n","")
            # Aantal verkochte woningen
        content = soup.find_all(class_='trend-graph')[1]
        if content.find(class_="trend-graph-icon") == None:
            verk_woningen = content.find("h3",{"class":"trend-graph-value"}).get_text()
            tov_vorige_maand_verkocht = "NA"
        else:
            if content.find(class_="trend-graph-pill trend-down") != None:
                verk_woningen = content.find("h3",{"class":"trend-graph-value"}).get_text()               
                tov_vorige_maand_verkocht = content.find("div",{"class":"trend-graph-pill trend-down"}).get_text()
                tov_vorige_maand_verkocht = tov_vorige_maand_verkocht.replace("\n\n","")
                tov_vorige_maand_verkocht = tov_vorige_maand_verkocht.replace(" t.o.v. vorige maand\n","")
            else:
                verk_woningen = content.find("h3",{"class":"trend-graph-value"}).get_text()             
                tov_vorige_maand_verkocht = content.find("div",{"class":"trend-graph-pill"}).get_text()
                tov_vorige_maand_verkocht = tov_vorige_maand_verkocht.replace("\n\n","")
                tov_vorige_maand_verkocht = tov_vorige_maand_verkocht.replace(" t.o.v. vorige maand\n","")
            # Gemiddelde vierkante meter prijs
        content = soup.find_all(class_='trend-graph')[2]
        if content.find(class_="trend-graph-icon") == None:
            m2_prijs = content.find("h3",{"class":"trend-graph-value"}).get_text()
            tov_vorige_maand_m2_prijs = "NA"
        else:
            if content.find(class_="trend-graph-pill trend-down") != None:
                m2_prijs = content.find("h3",{"class":"trend-graph-value"}).get_text()     
                m2_prijs = m2_prijs.replace(".", ",")
                tov_vorige_maand_m2_prijs = content.find("div",{"class":"trend-graph-pill trend-down"}).get_text()
                tov_vorige_maand_m2_prijs = tov_vorige_maand_m2_prijs.replace("\n\n","")
                tov_vorige_maand_m2_prijs = tov_vorige_maand_m2_prijs.replace(" t.o.v. vorige maand\n","")
            else:
                m2_prijs = content.find("h3",{"class":"trend-graph-value"}).get_text() 
                m2_prijs = m2_prijs.replace(".", ",")
                tov_vorige_maand_m2_prijs = content.find("div",{"class":"trend-graph-pill"}).get_text() 
                tov_vorige_maand_m2_prijs = tov_vorige_maand_m2_prijs.replace("\n\n","")
                tov_vorige_maand_m2_prijs = tov_vorige_maand_m2_prijs.replace(" t.o.v. vorige maand\n","")
            # Percentage overboden
        content = soup.find_all(class_='trend-graph')[3]
        if content.find(class_="trend-graph-icon") == None:
            perc_overboden = content.find("h3",{"class":"trend-graph-value"}).get_text()
            tov_vorige_maand_perc_overboden = "NA"
        else:
            if content.find(class_="trend-graph-pill trend-down") != None:
                perc_overboden = content.find("h3",{"class":"trend-graph-value"}).get_text()               
                tov_vorige_maand_perc_overboden = content.find("div",{"class":"trend-graph-pill trend-down"}).get_text()
                tov_vorige_maand_perc_overboden = tov_vorige_maand_perc_overboden.replace("\n\n","")
                tov_vorige_maand_perc_overboden = tov_vorige_maand_perc_overboden.replace(" t.o.v. vorige maand\n","")
            else:
                perc_overboden = content.find("h3",{"class":"trend-graph-value"}).get_text()             
                tov_vorige_maand_perc_overboden = content.find("div",{"class":"trend-graph-pill"}).get_text()
                tov_vorige_maand_perc_overboden = tov_vorige_maand_perc_overboden.replace("\n\n","")
                tov_vorige_maand_perc_overboden = tov_vorige_maand_perc_overboden.replace(" t.o.v. vorige maand\n","")
            # Besteedbaar inkomen
        bes_inkomen = soup.find_all(class_='detail__income huizenzoeker-card single-value-graph-container')[0].get_text()
        bes_inkomen = bes_inkomen.replace('\n','')
        bes_inkomen = bes_inkomen.replace('Besteedbaar Inkomen Per Huishouden','')
        bes_inkomen = bes_inkomen.replace(".", ",")
            # Inwoners
        content = soup.find("div", {"class": "buurt-info"})
        inwoners = content.find_all('p')[3].get_text
        inwoners = str(inwoners)
        inwoners = re.search('Dat zijn(.+?)inwoners', inwoners)
        if inwoners:
            found_inwoners = inwoners.group(1)
            found_inwoners = found_inwoners.strip()
            found_inwoners = found_inwoners.replace(".", ",")
        else:
            found_inwoners = 'NA'
            # Bevolkingsgroei
        content = soup.find("div", {"class": "buurt-info"})
        populatiegroei = content('p')[4].get_text
        populatiegroei = str(populatiegroei)
        populatiegroei_increase = re.search('afgelopen jaar met (.+?) gegroeid', populatiegroei)
        if populatiegroei_increase:
            found_populatiegroei = populatiegroei_increase.group(1)
            found_populatiegroei = found_populatiegroei.strip()
        else:
            found_populatiegroei = 'NA'
        populatiegroei_decline = re.search('afgelopen jaar met (.+?) gekrompen', populatiegroei)
        if populatiegroei_decline:
            found_populatiegroei_decline = populatiegroei_decline.group(1)
            found_populatiegroei_decline = found_populatiegroei_decline.strip() 
        else:
            found_populatiegroei_decline = 'NA'
            # Append list
        save_obj = {'Province':province_name, "City":city_name, 
                    "Gem. vraagprijs":gem_vraagprijs, "%Δ Vraagprijs (t.o.v vorige maand)": tov_vorige_maand_vraagprijs,
                    "Verkochte woningen":verk_woningen, "%Δ Verkochte woningen (t.o.v vorige maand)":tov_vorige_maand_verkocht,
                    "Gem. m2 prijs":m2_prijs, "%Δ M2 prijs (t.o.v vorige maand)":tov_vorige_maand_m2_prijs,
                    "% Vraagprijs overboden":perc_overboden, "%Δ Overboden (t.o.v vorige maand)":tov_vorige_maand_perc_overboden,
                    "Besteedbaar inkomen (per huishouden)":bes_inkomen,
                    "Aantal inwoners": found_inwoners,
                    "% Populatie stijging":found_populatiegroei, "% Populatie daling":found_populatiegroei_decline}
        trend_list.append(save_obj)
        f=open(fn, 'a', encoding='utf-8')
        f.write(json.dumps(save_obj)+'\n')
        f.close()
    return(trend_list)

In [50]:
df = extract_city_trends(page_urls_full) 
pd.DataFrame(df)

Unnamed: 0,Province,City,Gem. vraagprijs,%Δ Vraagprijs (t.o.v vorige maand),Verkochte woningen,%Δ Verkochte woningen (t.o.v vorige maand),Gem. m2 prijs,%Δ M2 prijs (t.o.v vorige maand),% Vraagprijs overboden,%Δ Overboden (t.o.v vorige maand),Besteedbaar inkomen (per huishouden),Aantal inwoners,% Populatie stijging,% Populatie daling
0,Noord-Holland,Aalsmeer,"€ 685,000",57.47%,12,-7.69%,"€ 4,476",9.22%,10.67%,3.42%,"€ 45,800",31859,0.41%,
1,Noord-Holland,Alkmaar,"€ 362,500",25.00%,38,-39.68%,"€ 3,926",10.62%,14.05%,1.43%,"€ 36,300",109436,0.81%,
2,Noord-Holland,Amstelveen,"€ 570,000",18.13%,21,-56.25%,"€ 4,724",1.88%,305.01%,296.30%,"€ 37,800",91675,0.92%,
3,Noord-Holland,Amsterdam,"€ 450,000",7.78%,230,-27.44%,"€ 6,961",5.90%,16.10%,0.37%,"€ 30,100",872757,1.13%,
4,Noord-Holland,Beemster,"€ 612,000",-12.26%,4,-33.33%,"€ 4,311",-6.89%,-0.23%,-12.10%,"€ 47,300",10022,2.81%,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
347,Limburg,ValkenburgaandeGeul,"€ 365,000",-5.19%,4,-55.56%,"€ 3,308",9.75%,12.60%,5.41%,"€ 35,600",16367,,0.63%
348,Limburg,Venlo,"€ 319,000",16.00%,32,-50.00%,"€ 2,727",17.80%,7.84%,-0.72%,"€ 33,700",101802,0.20%,
349,Limburg,Venray,"€ 297,000",8.99%,8,-33.33%,"€ 2,729",23.09%,8.98%,1.22%,"€ 39,100",43614,0.66%,
350,Limburg,Voerendaal,"€ 287,500",-11.13%,2,-75.00%,"€ 2,185",-12.53%,7.69%,-2.09%,"€ 40,800",12475,0.18%,


In [51]:
final_dataframe1=pd.DataFrame(df) #dataframe with all data for all municipalities in the Netherlands

In [60]:
#exporting the municipalities dataframe as CSV, to be able to use it in R-Studio. 
final_dataframe1.to_csv(r'C:\Users\danie\OneDrive\Documents\Repositories\oDCM-project-team-3\src\collection\Final runs code\huizenzoeker_scraper_data1.csv')

## Step 4: Scrape data from each url (province-level)

Here we use the function that we constructed (generate_links) before to construct the province-urls. This time, we are going to scrape the data from these urls themselves, instead of then navigating to each individual municipality. 

In [61]:
page_links = generate_links(base_url,province_url)

For each Province we again extract:
* *Trend data*: gem. vraagprijs, verkochte woningen, gem.vierkantemeter prijs, % overboden (and how these numbers how changed t.o.v. vorige maand) 
* *Other information*: besteedbaar inkomen, aantal inwoners

In [62]:
def extract_province_trends(page_links):
    trend_list = []
    for page_link in page_links:
        driver.get(page_link)
        time.sleep(1) 
        soup = BeautifulSoup(driver.page_source, 'html.parser')
            # Province name
        province_name = soup.find_all('h2')[0].get_text()
        province_name = province_name.replace('Woningmarkt','')
        province_name = province_name.replace(' ', '')
            # Gemiddelde vraagprijs
        content = soup.find_all(class_='trend-graph')[0]
        if content.find(class_="trend-graph-icon") == None:
            gem_vraagprijs = content.find("h3",{"class":"trend-graph-value"}).get_text()
            tov_vorige_maand_vraagprijs = "NA"
        else:
            if content.find(class_="trend-graph-pill trend-down") != None:
                gem_vraagprijs = content.find("h3",{"class":"trend-graph-value"}).get_text()
                gem_vraagprijs = gem_vraagprijs.replace("(","")
                gem_vraagprijs = gem_vraagprijs.replace(",)","")
                gem_vraagprijs = gem_vraagprijs.replace(".", ",")
                tov_vorige_maand_vraagprijs = content.find("div",{"class":"trend-graph-pill trend-down"}).get_text()
                tov_vorige_maand_vraagprijs = tov_vorige_maand_vraagprijs.replace("\n\n","")
                tov_vorige_maand_vraagprijs = tov_vorige_maand_vraagprijs.replace(" t.o.v. vorige maand\n","")
            else:
                gem_vraagprijs = content.find("h3",{"class":"trend-graph-value"}).get_text()
                gem_vraagprijs = gem_vraagprijs.replace("(","")
                gem_vraagprijs = gem_vraagprijs.replace(",)","")    
                gem_vraagprijs = gem_vraagprijs.replace(".", ",")
                tov_vorige_maand_vraagprijs = content.find("div",{"class":"trend-graph-pill"}).get_text()
                tov_vorige_maand_vraagprijs = tov_vorige_maand_vraagprijs.replace("\n\n","")
                tov_vorige_maand_vraagprijs = tov_vorige_maand_vraagprijs.replace(" t.o.v. vorige maand\n","")
            # Aantal verkochte woningen
        content = soup.find_all(class_='trend-graph')[1]
        if content.find(class_="trend-graph-icon") == None:
            verk_woningen = content.find("h3",{"class":"trend-graph-value"}).get_text()
            tov_vorige_maand_verkocht = "NA"
        else:
            if content.find(class_="trend-graph-pill trend-down") != None:
                verk_woningen = content.find("h3",{"class":"trend-graph-value"}).get_text()               
                tov_vorige_maand_verkocht = content.find("div",{"class":"trend-graph-pill trend-down"}).get_text()
                tov_vorige_maand_verkocht = tov_vorige_maand_verkocht.replace("\n\n","")
                tov_vorige_maand_verkocht = tov_vorige_maand_verkocht.replace(" t.o.v. vorige maand\n","")
            else:
                verk_woningen = content.find("h3",{"class":"trend-graph-value"}).get_text()             
                tov_vorige_maand_verkocht = content.find("div",{"class":"trend-graph-pill"}).get_text()
                tov_vorige_maand_verkocht = tov_vorige_maand_verkocht.replace("\n\n","")
                tov_vorige_maand_verkocht = tov_vorige_maand_verkocht.replace(" t.o.v. vorige maand\n","")
            # Gemiddelde vierkante meter prijs
        content = soup.find_all(class_='trend-graph')[2]
        if content.find(class_="trend-graph-icon") == None:
            m2_prijs = content.find("h3",{"class":"trend-graph-value"}).get_text()
            tov_vorige_maand_m2_prijs = "NA"
        else:
            if content.find(class_="trend-graph-pill trend-down") != None:
                m2_prijs = content.find("h3",{"class":"trend-graph-value"}).get_text()     
                m2_prijs = m2_prijs.replace(".", ",")
                tov_vorige_maand_m2_prijs = content.find("div",{"class":"trend-graph-pill trend-down"}).get_text()
                tov_vorige_maand_m2_prijs = tov_vorige_maand_m2_prijs.replace("\n\n","")
                tov_vorige_maand_m2_prijs = tov_vorige_maand_m2_prijs.replace(" t.o.v. vorige maand\n","")
            else:
                m2_prijs = content.find("h3",{"class":"trend-graph-value"}).get_text() 
                m2_prijs = m2_prijs.replace(".", ",")
                tov_vorige_maand_m2_prijs = content.find("div",{"class":"trend-graph-pill"}).get_text() 
                tov_vorige_maand_m2_prijs = tov_vorige_maand_m2_prijs.replace("\n\n","")
                tov_vorige_maand_m2_prijs = tov_vorige_maand_m2_prijs.replace(" t.o.v. vorige maand\n","")
            # Percentage overboden
        content = soup.find_all(class_='trend-graph')[3]
        if content.find(class_="trend-graph-icon") == None:
            perc_overboden = content.find("h3",{"class":"trend-graph-value"}).get_text()
            tov_vorige_maand_perc_overboden = "NA"
        else:
            if content.find(class_="trend-graph-pill trend-down") != None:
                perc_overboden = content.find("h3",{"class":"trend-graph-value"}).get_text()               
                tov_vorige_maand_perc_overboden = content.find("div",{"class":"trend-graph-pill trend-down"}).get_text()
                tov_vorige_maand_perc_overboden = tov_vorige_maand_perc_overboden.replace("\n\n","")
                tov_vorige_maand_perc_overboden = tov_vorige_maand_perc_overboden.replace(" t.o.v. vorige maand\n","")
            else:
                perc_overboden = content.find("h3",{"class":"trend-graph-value"}).get_text()             
                tov_vorige_maand_perc_overboden = content.find("div",{"class":"trend-graph-pill"}).get_text()
                tov_vorige_maand_perc_overboden = tov_vorige_maand_perc_overboden.replace("\n\n","")
                tov_vorige_maand_perc_overboden = tov_vorige_maand_perc_overboden.replace(" t.o.v. vorige maand\n","")
            # Besteedbaar inkomen
        bes_inkomen = soup.find_all(class_='detail__income huizenzoeker-card single-value-graph-container')[0].get_text()
        bes_inkomen = bes_inkomen.replace('\n','')
        bes_inkomen = bes_inkomen.replace('Besteedbaar Inkomen Per Huishouden','')
        bes_inkomen = bes_inkomen.replace(".", ",")
            # Inwoners
        content = soup.find("div", {"class": "buurt-info"})
        inwoners = content.find_all('p')[3].get_text
        inwoners = str(inwoners)
        inwoners = re.search('Dat zijn(.+?)inwoners', inwoners)
        found_inwoners = 'NA'
        if inwoners:
            found_inwoners = inwoners.group(1)
            found_inwoners = found_inwoners.strip()
            found_inwoners = found_inwoners.replace(".", ",")
            # Bevolkingsgroei
        content = soup.find("div", {"class": "buurt-info"})
        populatiegroei = content('p')[4].get_text
        populatiegroei = str(populatiegroei)
        populatiegroei_increase = re.search('afgelopen jaar met (.+?) gegroeid', populatiegroei)
        if populatiegroei_increase:
            found_populatiegroei = populatiegroei_increase.group(1)
            found_populatiegroei = found_populatiegroei.strip()
        else:
            found_populatiegroei = 'NA'
        populatiegroei_decline = re.search('afgelopen jaar met (.+?) gekrompen', populatiegroei)
        if populatiegroei_decline:
            found_populatiegroei_decline = populatiegroei_decline.group(1)
            found_populatiegroei_decline = found_populatiegroei_decline.strip() 
        else:
            found_populatiegroei_decline = 'NA'
            # Append list
        trend_list.append({"Province":province_name, 
                    "Gem. vraagprijs":gem_vraagprijs, "%Δ Vraagprijs (t.o.v vorige maand)": tov_vorige_maand_vraagprijs,
                    "Verkochte woningen":verk_woningen, "%Δ Verkochte woningen (t.o.v vorige maand)":tov_vorige_maand_verkocht,
                    "Gem. m2 prijs":m2_prijs, "%Δ M2 prijs (t.o.v vorige maand)":tov_vorige_maand_m2_prijs,
                    "% Vraagprijs overboden":perc_overboden, "%Δ Overboden (t.o.v vorige maand)":tov_vorige_maand_perc_overboden,
                    "Besteedbaar inkomen (per huishouden)":bes_inkomen,
                    "Aantal inwoners": found_inwoners,
                    "% Populatie stijging":found_populatiegroei, "% Populatie daling":found_populatiegroei_decline})
    return(trend_list)

In [63]:
df2 = extract_province_trends(page_links) 
province_dataframe1 = pd.DataFrame(df2)
province_dataframe1

Unnamed: 0,Province,Gem. vraagprijs,%Δ Vraagprijs (t.o.v vorige maand),Verkochte woningen,%Δ Verkochte woningen (t.o.v vorige maand),Gem. m2 prijs,%Δ M2 prijs (t.o.v vorige maand),% Vraagprijs overboden,%Δ Overboden (t.o.v vorige maand),Besteedbaar inkomen (per huishouden),Aantal inwoners,% Populatie stijging,% Populatie daling
0,Noord-Holland,"€ 425,000",13.33%,976,-24.75%,"€ 4,437",9.53%,28.66%,16.00%,"€ 36,200",2879527,0.92%,
1,Zuid-Holland,"€ 365,000",7.67%,1283,-38.14%,"€ 3,602",5.54%,20.98%,10.76%,"€ 35,800",3708696,0.95%,
2,Zeeland,"€ 280,000",1.82%,168,-38.91%,"€ 2,676",5.31%,10.05%,1.92%,"€ 36,900",383488,0.12%,
3,Noord-Brabant,"€ 350,000",3.24%,739,-46.76%,"€ 3,195",5.31%,8.74%,0.87%,"€ 38,100",2548585,0.71%,
4,Utrecht,"€ 425,000",9.25%,608,-8.30%,"€ 4,190",4.05%,13.03%,1.06%,"€ 39,500",1354834,0.94%,
5,Flevoland,"€ 339,000",4.31%,153,-39.53%,"€ 2,982",1.64%,16.40%,1.69%,"€ 39,500",423021,1.55%,
6,Friesland,"€ 285,000",3.64%,322,-12.74%,"€ 2,429",-1.82%,11.71%,1.07%,"€ 34,900",649957,0.35%,
7,Groningen,"€ 250,000",11.11%,316,-9.20%,"€ 2,550",6.65%,20.27%,4.51%,"€ 30,600",540009,0.38%,
8,Drenthe,"€ 309,000",4.75%,236,-25.55%,"€ 2,481",1.31%,11.58%,1.09%,"€ 37,100",493682,0.31%,
9,Overijssel,"€ 300,000",1.69%,457,-16.15%,"€ 2,692",4.42%,10.50%,0.66%,"€ 36,900",1162406,0.52%,


In [64]:
#exporting the province-level dataset as CSV, to load it into RStudio
#change this into a relative path instead of an absolute path so that it works for everyone? 
province_dataframe1.to_csv('huizenzoeker_province_data1.csv') #at province-level

### Step 6b: Scrape some more woningmarkt dashboard data

To this dataframe, we now want to add more data from the woningmarkt dashboard per province, e.g. 'aantal geintereseerden per woning', huuraanbod, profiel huizenzoekers (?), over woningen...  

But for now,  we export this dataframe already as CSV to R to fix the characters into numerics; such that it is an useable dataset!