# Scraping Huizenzoeker.nl

## First attempts at scraping the site

### Scraping data from the housing market page: https://www.huizenzoeker.nl/woningmarkt/Noord-Brabant/Veldhoven/Veldhoven/

## Method 1: Using Selenium

#### Scraping Trenddata for Veldhoven

In [44]:
from selenium import webdriver #make sure you have installed selenium in Anaconda prompt

In [45]:
driver = webdriver.Chrome()

In [46]:
driver.get('https://www.huizenzoeker.nl/woningmarkt/Noord-Brabant/Veldhoven/Veldhoven/')

In [47]:
element = driver.find_element_by_class_name('trend-graphs')

In [48]:
print(element.text)

GEM. VRAAGPRIJS
€ 385.000
0.00% t.o.v. vorige maand
VERKOCHTE WONINGEN
19
-51.28% t.o.v. vorige maand
GEM. VIERKANTEMETER PRIJS
€ 3.226
0.25% t.o.v. vorige maand
PERCENTAGE OVERBODEN
8.41%
0.01% t.o.v. vorige maand


#### Scraping 'gemiddeld inkomen' figure for Veldhoven

In [49]:
element1 = driver.find_element_by_class_name('single-value-graph-inner-container')

In [50]:
print(element1.text)

€ 42.000


## Method 2: Using BeautifulSoup 

#### Scraping Trenddata for Veldhoven

In [68]:
from bs4 import BeautifulSoup #make sure you have installed the package

In [69]:
import requests #again, install the package

In [102]:
import re #importing regex

In [70]:
url = 'https://www.huizenzoeker.nl/woningmarkt/Noord-Brabant/Veldhoven/Veldhoven/'

In [71]:
res = requests.get(url)

In [72]:
soup = BeautifulSoup(res.text, 'html.parser')

First cell trend data (gemiddelde vraagprijs): 

In [55]:
soup.find_all(class_='trend-graph')[0].find('h3').get_text() #prints only the value of 'Gemiddelde vraagprijs'

'€\xa0385.000'

In [103]:
vraagprijs=soup.find_all(class_='trend-graph')[0].get_text() #prints info 1st cell of trend-graphs ('Gem. vraagprijs')

In [105]:
vraagprijs=vraagprijs.replace(r'\n','') #replaces \n with '' 

In [139]:
vraagprijs = vraagprijs.replace(r'       ', '')

In [140]:
print(vraagprijs) 


Gem. Vraagprijs
€ 385.000


  0.00% t.o.v. vorige maand
     



In [155]:
' '.join(vraagprijs.split()) #prints cleaned 'vraagprijs'

'Gem. Vraagprijs € 385.000 0.00% t.o.v. vorige maand'

Second cell trend data (verkochte woningen): 

In [108]:
verkochtwoning=soup.find_all(class_='trend-graph')[1].get_text() #prints info 2nd cell of trend-graphs ('Verkochte woningen')

In [109]:
verkochtwoning=verkochtwoning.replace(r'\n','')

In [137]:
verkochtwoning = verkochtwoning.replace(r'            ', '')

In [138]:
print(verkochtwoning) 


Verkochte woningen
19


       -51.28% t.o.v. vorige maand
   



In [154]:
' '.join(verkochtwoning.split()) #prints cleaned 'verkochte woningen'

'Verkochte woningen 19 -51.28% t.o.v. vorige maand'

Third cell trend data (vierkantemeter prijs): 

In [111]:
vierkantemeter = soup.find_all(class_='trend-graph')[2].get_text() #prints info 3rd cell of trend-graphs ('Gem. vierkantemeter prijs')

In [112]:
vierkantemeter = vierkantemeter.replace(r'\n', '')

In [133]:
vierkantemeter = vierkantemeter.replace(r'            ','')

In [143]:
print(vierkantemeter) 


Gem. vierkantemeter prijs
€ 3.226


  0.25% t.o.v. vorige maand
     



In [144]:
' '.join(vierkantemeter.split()) #prints cleaned 'vierkantemeter prijs'

'Gem. vierkantemeter prijs € 3.226 0.25% t.o.v. vorige maand'

Fourth cell trend data (percentage overboden): 

In [114]:
overboden = soup.find_all(class_='trend-graph')[3].get_text() #prints info 4rd cell of trend-graphs ('Percentage overboden')

In [115]:
overboden = overboden.replace(r'\n','')

In [125]:
overboden = overboden.replace(r'           ', '')

In [145]:
print(overboden) 


Percentage overboden
8.41%


       0.01% t.o.v. vorige maand
   



In [153]:
' '.join(overboden.split()) #prints cleaned 'percentage overboden'

'Percentage overboden 8.41% 0.01% t.o.v. vorige maand'

Printing the whole trend graph table together: 

In [160]:
soup.find_all(class_='trend-graph-title') #prints all titles of the four cells of trend-graphs

[<h4 class="trend-graph-title">Gem. Vraagprijs</h4>,
 <h4 class="trend-graph-title">Verkochte woningen</h4>,
 <h4 class="trend-graph-title">Gem. vierkantemeter prijs</h4>,
 <h4 class="trend-graph-title">Percentage overboden</h4>]

In [147]:
soup.find_all(class_='trend-graph-value') #prints all values of the four cells of trend-graphs

[<h3 class="trend-graph-value">€ 385.000</h3>,
 <h3 class="trend-graph-value">19</h3>,
 <h3 class="trend-graph-value">€ 3.226</h3>,
 <h3 class="trend-graph-value">8.41%</h3>]

In [148]:
soup.find_all(class_='trend-graphs') #prints all info of trend-graphs together (but includes much unnecessary info)

[<div class="trend-graphs">
 <div class="trend-graph">
 <h4 class="trend-graph-title">Gem. Vraagprijs</h4>
 <h3 class="trend-graph-value">€ 385.000</h3>
 <div class="trend-graph-pill">
 <span class="trend-graph-icon"><svg fill="black" height="18px" viewbox="0 0 24 24" width="18px" xmlns="http://www.w3.org/2000/svg"><path d="M0 0h24v24H0z" fill="none"></path><path d="M16 6l2.29 2.29-4.88 4.88-4-4L2 16.59 3.41 18l6-6 4 4 6.3-6.29L22 12V6z"></path></svg></span>
                 0.00% t.o.v. vorige maand
             </div>
 </div>
 <div class="trend-graph">
 <h4 class="trend-graph-title">Verkochte woningen</h4>
 <h3 class="trend-graph-value">19</h3>
 <div class="trend-graph-pill trend-down">
 <span class="trend-graph-icon"><svg fill="black" height="18px" viewbox="0 0 24 24" width="18px" xmlns="http://www.w3.org/2000/svg"><path d="M0 0h24v24H0z" fill="none"></path><path d="M16 18l2.29-2.29-4.88-4.88-4 4L2 7.41 3.41 6l6 6 4-4 6.3 6.29L22 12v6z"></path></svg></span>
                 -51.28% 

In [149]:
soup.find_all(class_='single-value-graph-inner-container')

[<div class="single-value-graph-inner-container">
 <span class="large-text">€ 42.000</span>
 </div>]

In [150]:
soup.find_all(class_='detail__income huizenzoeker-card single-value-graph-container') #scrapes 'Besteedbaar inkomen per huishouden' + its value for Veldhoven

[<div class="detail__income huizenzoeker-card single-value-graph-container">
 <h3>Besteedbaar Inkomen Per Huishouden</h3>
 <div class="single-value-graph-inner-container">
 <span class="large-text">€ 42.000</span>
 </div>
 </div>]

In [151]:
soup.find_all(class_='buurt-info dynamic-text') #scrapes the 'woningmarkt en demografie' section for Veldhoven

[<div class="buurt-info dynamic-text">
 <h3>Woningmarkt en Demografie in Veldhoven</h3>
 <p>
 <strong>Hoeveel woningen zijn er deze maand verkocht in Veldhoven?</strong><br/>
         Deze maand zijn er 19 woningen verkocht in Veldhoven.
     </p>
 <p>
 <strong>Wat is de gemiddelde vierkante meter prijs in Veldhoven?</strong><br/>
         De gemiddelde vierkante meter prijs in de woonplaats Veldhoven is 3226 euro per vierkante meter.
     </p>
 <p>
 <strong>Hoeveel wordt er gemiddeld overboden in de woonplaats Veldhoven?</strong><br/>
         Deze maand is er gemiddeld met 8.41% overboden in Veldhoven.<br/>
                     Dat is 0.01% meer dan de vorige maand.
             </p>
 <p>
 <strong>Hoeveel inwoners heeft de woonplaats Veldhoven?</strong><br/>
         In de gemeente Veldhoven wonen 45.466 inwoners.<br/>
         Daarvan woont 99.94% in Veldhoven.
         Dat zijn 45.440 inwoners. <br/>
 </p>
 <p>
 <strong>Is de woonplaats Veldhoven afgelopen jaar gegroeid?</strong><b

In [152]:
soup.find_all(class_='graph-canvas-container') #we can't scrape the piechart I think 

[<div class="graph-canvas-container">
 <canvas id="average-price-canvas"></canvas>
 </div>,
 <div class="graph-canvas-container">
 <canvas class="graph-canvas" height="100px" id="houses-amount-canvas" width="100px"></canvas>
 </div>,
 <div class="graph-canvas-container">
 <canvas class="graph-canvas pie" height="100px" id="age-average-canvas" width="100px"></canvas>
 </div>]