Scraping Flat Data From Web
=============
____________________

In [1]:
import pandas as pd
import numpy as np
import datetime
import re

import requests
import warnings

from bs4 import BeautifulSoup

from stem import Signal
from stem.control import Controller

warnings.filterwarnings('ignore')

This notebook requires starting Tor and authentication in NetCat

In [2]:
!service tor start

Redirecting to /bin/systemctl start tor.service


In [3]:
!echo -e 'AUTHENTICATE "ju4n4n4290"' | nc 127.0.0.1 9051

250 OK


Some __functions__ we are going to use later

In [4]:
def set_new_ip():
    """Change IP using TOR"""
    
    with Controller.from_port(port=9051) as controller:
        controller.authenticate(password='ju4n4n4290')
        controller.signal(Signal.NEWNYM)

In [5]:
def get_current_ip():
    """get current ip"""
    
    local_proxy = 'socks5://localhost:9050'
    socks_proxy = {
        'http': local_proxy,
        'https': local_proxy
    }
    
    current_ip = requests.get(url='http://icanhazip.com/',
                              proxies=socks_proxy,
                              verify=False)
    
    return(current_ip.text.strip())

In [6]:
def test_change_ip():
    """test if the IP changes properly"""
    
    old_ip = get_current_ip()
    set_new_ip()
    new_ip = get_current_ip()
    
    if old_ip == new_ip:
        return "Error: IP has not changed"

In [7]:
def get_soup(url):
    """get soup requesting a url through Tor and Privoxy"""
    
    local_proxy = 'socks5://localhost:9050'
    socks_proxy = {
        'http': local_proxy,
        'https': local_proxy
    }
    
    r = requests.get(url, proxies=socks_proxy, verify=False)

    page = r.content
    soup = BeautifulSoup(page, 'html5lib')
    
    return(soup)

_______________

## __Getting ID and Price Properties__ 

(via [idealista.com](https://www.idealista.com))

In this part, we are going to get the price and ID of the flats for sale at the moment of the search for each specific city we are going to focus on (Alcorcon, Mostoles, Leganes, Fuenlabrada and Getafe, the main cities from the South of Madrid). Once we get the ID of the buildings, we will request one by one via web scraping and we will get other flat attributes (area, number of bedrooms and bathrooms, location, etc.).

For doing that we are going to follow this schema:
- Search for flats at the city we want to get data.
- Get the number of properties for sale in this city through `BeautifulSoup`.
- Once we know the number of properties we can also knwo the number of pages that there are for checking up (there are  flats per page).
- Get the link of each page.
- Request the ID and price of each property using `BeautifulSoup` as well.

In [8]:
city = "fuenlabrada"

In [9]:
def get_number_of_properties_for_sale(city):
    """
    Get the number of properties for sale at the moment of requesting. 
    This function has only been tested for the cities of Mostoles, Leganes, Fuenlabrada and Getafe
    """
    
    set_new_ip()
    test_change_ip()
    
    url_search = "https://www.idealista.com/venta-viviendas/"+city+"-madrid/con-pisos/"
    html_search = get_soup(url_search)
    
    string_properties = html_search.find_all("span", class_="breadcrumb-info")[2].get_text().replace(".","")
    
    properties = int(string_properties)
    
    return(properties)

In [10]:
def number_of_pages(number_of_properties):
    """Number of pages to scrap in the first search page"""
    
    pages = int(number_of_properties/30)+1 # there are 30 properties for page
    return(pages)

In [14]:
number_of_properties = get_number_of_properties_for_sale(city)
number_of_pages = number_of_pages(number_of_properties)

In [15]:
print("There are {} properties in {}".format(number_of_properties,city))
print("There are {} pages in the first search page at the city of {}".format(number_of_pages,city))

There are 668 properties in fuenlabrada
There are 23 pages in the first search page at the city of fuenlabrada


Link for each search page

In [16]:
def get_search_links(number_of_pages,city):
    """Get the links of the search page for scraping"""
    
    links = []
    for page in range(number_of_pages):
        page += 1
        
        url = "https://www.idealista.com/venta-viviendas/{}-madrid/con-pisos/pagina-{}.htm".format(city,page)
        links.append(url)
    
    return(links)

In [17]:
search_links = get_search_links(number_of_pages, city)

__Getting the ids and prices of each flat__

In [19]:
def process_properties(ids_properties,prices_properties):
    """
    Save id and price properties in a dataframe.
    Transform prices into integers and get the link of the flat from id
    """
    properties = pd.DataFrame({
        "price" : prices_properties,
        "id" : ids_properties
    })
    
    properties['price'] = properties['price'].map(lambda x:int(x.replace("€","").replace(".","")))
    
    properties['link'] = properties['id'].map(lambda x:"https://www.idealista.com/inmueble/{}/".format(x))
    
    return(properties)

In [20]:
def get_id_and_price(search_links):
    """Get the id and price of each flat from search links"""
    
    ids_properties = []
    prices_properties = []

    set_new_ip()
    
    counter = 0
    
    for link in search_links:

        counter += 1
        print("{} / {}".format(counter,len(search_links)))

        flat_and_price = get_soup(link)

        flat_ids = flat_and_price.find_all("a", class_="item-link")
        flat_prices = flat_and_price.find_all("span", class_="item-price h2-simulated")
        
        for ids in flat_ids:
            print(ids.get("href").split("/")[2])
            id_flat = ids.get("href").split("/")[2]
            
            ids_properties.append(id_flat)

        for prices in flat_prices:
            print(prices.get_text())
            price_flat = prices.get_text()
            
            prices_properties.append(price_flat)
    
    # process ids and prices
    properties = process_properties(ids_properties, prices_properties)
    
    return(properties)

In [21]:
properties = get_id_and_price(search_links)

1 / 23
35135324
39590039
39462776
39579318
38145079
39105801
38829061
39638946
39335327
39129082
38248047
39240268
38805354
30500572
39381411
39574718
39104834
39034238
39338975
35856081
35818641
38196758
39599325
39339215
38220501
39584531
37350454
37350381
36547593
39565698
207.500€
137.500€
158.000€
85.900€
125.000€
141.000€
129.900€
159.000€
143.400€
148.400€
129.900€
126.000€
124.000€
137.500€
239.000€
137.500€
124.000€
148.000€
76.850€
87.400€
74.900€
145.000€
139.000€
87.260€
127.000€
126.000€
208.400€
205.400€
202.750€
79.900€
2 / 23
39246290
38815516
39624366
36996107
39543482
39412243
39432098
38794625
39520969
38537278
38507356
39659153
39652587
39654195
39648438
39035993
39662833
39659078
39641575
33131668
37343114
39632630
39640088
39658781
39647320
39651721
39647120
39653778
39179587
39259908
97.000€
134.900€
125.000€
123.000€
166.000€
118.000€
136.000€
116.000€
133.000€
123.000€
145.000€
145.900€
265.000€
129.900€
156.000€
142.000€
98.630€
121.000€
146.000€
145.200€
170.

37068586
38903874
39502591
39363669
39310847
39200661
39553560
38794621
39411087
39316371
39303072
39238709
38998466
38735338
38622123
36572591
38401550
37343632
37073424
39580362
39551669
39539669
39517907
39506878
39473288
39481934
39396996
39384106
37165465
38813519
133.000€
120.000€
245.000€
115.000€
179.900€
128.000€
134.530€
169.000€
177.600€
190.000€
127.000€
118.700€
125.000€
94.000€
186.300€
113.159€
178.000€
164.000€
117.350€
103.000€
120.000€
82.000€
180.000€
118.000€
136.000€
151.000€
147.000€
149.000€
199.000€
185.000€
18 / 23
38788839
38778319
37725639
39157689
39586227
37998404
39513539
39371920
39528582
39433213
39284527
39250312
39196597
39098679
38935422
38931354
38884770
38878933
38666355
36533557
31700343
37763531
1844238
36912035
36543112
34416032
33074049
1638424
28251226
26018461
160.000€
158.000€
131.500€
94.000€
179.000€
165.000€
175.000€
118.000€
165.000€
118.000€
133.000€
155.800€
136.000€
119.900€
132.900€
131.000€
140.000€
110.000€
110.000€
95.718€
163.300€

In [22]:
properties.head()

Unnamed: 0,id,price,link
0,35135324,207500,https://www.idealista.com/inmueble/35135324/
1,39590039,137500,https://www.idealista.com/inmueble/39590039/
2,39462776,158000,https://www.idealista.com/inmueble/39462776/
3,39579318,85900,https://www.idealista.com/inmueble/39579318/
4,38145079,125000,https://www.idealista.com/inmueble/38145079/


In [23]:
properties.shape

(668, 3)

__It looks great!!__ Just we want it.

__________-

### Getting Attributes From Flats

Now that we have the link, price and id of each flat in the South zone of Madrid published at idealisa, we can access to each one and get the following attributes: 
- Number of bedrooms.
- Number of bathrooms
- Area.
- Whether the flat has terrace or it has not.
- Presence of garage.
- Orientation. 
- Floor number.
- Equipments (elevator, storage, air conditioning, built-in closets, etc.)
- Location.

Let's first create some useful funcions we will use in the main program:

In [24]:
def try_set_new_ip(n_times, link):
    """
    This functions is specific for getting attributes at each property page (link)
    It tries to get the price by url request and if it does not get any information change the IP
    and repeat the process n times.
    """
    attempts = 0
    
    while attempts < n_times:
        try:
            soup = get_soup(link)
            price = soup.find_all('span', class_='h3-simulated txt-bold')[0].get_text()
            break
        
        except:
            "list index out of range"
            print("Trying to set another IP")
            set_new_ip()
            test_change_ip()
            attempts += 1

In [25]:
def process_attributes(att_main, att_build, att_equipment, att_location, att_price):
    """Process the attributes of a flat scraped through web and convert them into a single data frame"""
    
    att_main = pd.DataFrame.from_dict(att_main,orient='index')
    att_build = pd.DataFrame.from_dict(att_build,orient='index')
    att_equipment = pd.DataFrame.from_dict(att_equipment,orient='index')
    att_location = pd.DataFrame.from_dict(att_location,orient='index')
    
    attributes = pd.concat([att_main, att_build, att_equipment, att_location], axis = 1)
    
    attributes['price'] = att_price
    
    return(attributes)

In [31]:
def get_attributes(property_links):
    """
    This function goes over property links and gets some flat attributes.
    The IP has to be changed each n requests for avoiding to be banned.
    """
    
    # Attributes to scrap
    att_id = []
    att_price = []
    att_main = {}
    att_build = {}
    att_equipment = {}
    att_location = {}

    counter = 0

    for link in property_links:

        counter += 1
        print("{} / {}".format(counter,len(properties['link'])))

        print(get_current_ip())
        if counter % 100 == 0:
            set_new_ip()

        html_flat = get_soup(link)

        try_set_new_ip(10, link)

        # id
        id_number = link.split("/")[4]
        att_id.append(id_number) # id
        
        print(id_number)

        # price
        try: 
            price = html_flat.find_all('span', class_='h3-simulated txt-bold')[0].get_text()
    
        except:
            "list index out of range"
            price = [None]
        
        att_price.append(price)
        print(price)

        # attributes
        flat_attributes = html_flat.find_all('div', class_='details-property_features')
        
        # main attributes
        try:
            number_of_main_attributes = len(flat_attributes[0].find_all("li"))

            ids_main = []
            for main_attribute in range(number_of_main_attributes):
                flat_main_attribute = flat_attributes[0].find_all("li")[main_attribute].get_text()
                ids_main.append(flat_main_attribute)
                
                att_main[id_number] = ids_main
        
        except:
            "list index out of range"
            att_main[id_number] = [None]

        # build attributes
        try:    
            number_of_build_attributes = len(flat_attributes[1].find_all("li"))

            ids_build = []
            for build_attribute in range(number_of_build_attributes):
                flat_build_attribute = flat_attributes[1].find_all("li")[build_attribute].get_text()
                ids_build.append(flat_build_attribute)
                
                att_build[id_number] = ids_build
        
        except:
            "list index out of range"
            att_build[id_number] = [None]

        # equipment attributes
        try:
            number_of_equipment_attributes = len(flat_attributes[2].find_all("li"))

            ids_equipment = []
            for equipment_attribute in range(number_of_equipment_attributes):
                flat_equipment_attribute = flat_attributes[1].find_all("li")[equipment_attribute].get_text()
                ids_equipment.append(flat_equipment_attribute)
                
                att_equipment[id_number] = ids_equipment
        
        except:
            "list index out of range"
            att_equipment[id_number] = [None]

        # location
        try:
            location = html_flat.find_all('div', class_='ide-box-detail overlay-box')[2].find_all("li")
            number_of_location_attributes = len(location)
            
            ids_location = []
            for location_attribute in range(number_of_location_attributes):
                flat_location_attribute = location[location_attribute].get_text()
                ids_location.append(flat_location_attribute)
                
                att_location[id_number] = ids_location
        
        except:
            "list index out of range"
            att_location[id_number] = [None]
    
        # processing attributes
        attributes = process_attributes(att_main, att_build, att_equipment, att_location, att_price)
    
    return(attributes)

In [32]:
attributes = get_attributes(properties['link'])

1 / 668
93.115.86.6
35135324
207.500
2 / 668
93.115.86.6
39590039
137.500
3 / 668
93.115.86.6
39462776
158.000
4 / 668
93.115.86.6
39579318
85.900
5 / 668
93.115.86.6
38145079
125.000
6 / 668
93.115.86.6
39105801
141.000
7 / 668
93.115.86.6
38829061
129.900
8 / 668
93.115.86.6
39638946
159.000
9 / 668
93.115.86.6
39335327
143.400
10 / 668
93.115.86.6
39129082
148.400
11 / 668
93.115.86.6
38248047
129.900
12 / 668
93.115.86.6
39240268
126.000
13 / 668
93.115.86.6
38805354
124.000
14 / 668
93.115.86.6
30500572
137.500
15 / 668
93.115.86.6
39381411
239.000
16 / 668
93.115.86.6
39574718
137.500
17 / 668
93.115.86.6
39104834
124.000
18 / 668
93.115.86.6
39034238
148.000
19 / 668
93.115.86.6
39338975
76.850
20 / 668
93.115.86.6
35856081
87.400
21 / 668
93.115.86.6
35818641
74.900
22 / 668
93.115.86.6
38196758
145.000
23 / 668
93.115.86.6
39599325
139.000
24 / 668
93.115.86.6
39339215
87.260
25 / 668
93.115.86.6
38220501
127.000
26 / 668
93.115.86.6
39584531
126.000
27 / 668
93.115.86.6
37350

163.172.45.46
39607293
99.000
209 / 668
163.172.45.46
39596240
132.900
210 / 668
163.172.45.46
39451654
132.000
211 / 668
163.172.45.46
39290537
190.000
212 / 668
163.172.45.46
39281138
160.000
213 / 668
163.172.45.46
32905985
145.000
214 / 668
163.172.45.46
38390491
65.000
215 / 668
163.172.45.46
38007953
179.000
216 / 668
163.172.45.46
37781010
146.000
217 / 668
163.172.45.46
37620169
141.000
218 / 668
163.172.45.46
37617581
146.000
219 / 668
163.172.45.46
37399174
134.000
220 / 668
163.172.45.46
37368315
206.000
221 / 668
163.172.45.46
37350458
263.400
222 / 668
163.172.45.46
37350437
205.400
223 / 668
163.172.45.46
37350453
208.400
224 / 668
163.172.45.46
37350452
205.400
225 / 668
163.172.45.46
37350393
202.750
226 / 668
163.172.45.46
37350382
263.400
227 / 668
163.172.45.46
37350380
208.400
228 / 668
163.172.45.46
37350379
208.400
229 / 668
163.172.45.46
37350378
205.400
230 / 668
163.172.45.46
37349957
233.400
231 / 668
163.172.45.46
37349851
233.400
232 / 668
163.172.45.46
3731

178.20.55.18
38499927
119.000
386 / 668
178.20.55.18
38443023
150.000
387 / 668
178.20.55.18
38410653
70.000
388 / 668
178.20.55.18
37999416
129.000
389 / 668
178.20.55.18
37706988
185.000
390 / 668
178.20.55.18
37467938
125.000
391 / 668
178.20.55.18
36122425
181.000
392 / 668
178.20.55.18
39623461
69.900
393 / 668
178.20.55.18
39529197
205.000
394 / 668
178.20.55.18
38550816
140.000
395 / 668
178.20.55.18
38294484
269.000
396 / 668
178.20.55.18
36415313
250.000
397 / 668
178.20.55.18
39578681
145.000
398 / 668
178.20.55.18
39578400
130.900
399 / 668
178.20.55.18
39578350
120.000
400 / 668
178.20.55.18
Trying to set another IP
39562204
[None]
401 / 668
185.220.101.16
Trying to set another IP
39561977
[None]
402 / 668
87.118.122.30
39509202
113.159
403 / 668
87.118.122.30
39507963
87.750
404 / 668
87.118.122.30
Trying to set another IP
Trying to set another IP
39497698
[None]
405 / 668
88.99.33.103
39410334
119.900
406 / 668
137.74.169.241
39213461
105.000
407 / 668
137.74.169.241
3616

195.22.125.137
26901050
194.515
580 / 668
195.22.125.137
25580064
190.000
581 / 668
195.22.125.137
39192115
180.000
582 / 668
195.22.125.137
39242113
135.000
583 / 668
195.22.125.137
39218754
225.000
584 / 668
195.22.125.137
39038385
134.000
585 / 668
195.22.125.137
38929125
133.000
586 / 668
195.22.125.137
38443213
107.600
587 / 668
195.22.125.137
31182921
78.000
588 / 668
195.22.125.137
34977597
84.000
589 / 668
195.22.125.137
25949445
150.000
590 / 668
195.22.125.137
1657118
219.000
591 / 668
195.22.125.137
39553034
124.000
592 / 668
195.22.125.137
39487824
136.000
593 / 668
195.22.125.137
39473291
119.000
594 / 668
195.22.125.137
39432094
145.000
595 / 668
195.22.125.137
39432088
134.000
596 / 668
195.22.125.137
39187213
130.000
597 / 668
195.22.125.137
39023626
200.000
598 / 668
195.22.125.137
38870602
217.000
599 / 668
195.22.125.137
38478142
250.000
600 / 668
195.22.125.137
Trying to set another IP
38365335
[None]
601 / 668
85.248.227.163
37984524
120.000
602 / 668
85.248.227.16

In [39]:
attributes.head()

Unnamed: 0,att_main_0,att_main_1,att_main_2,att_main_3,att_main_4,att_main_5,att_main_6,att_main_7,att_main_8,att_main_9,...,att_build_0,att_build_1,att_equipment_0,att_equipment_1,att_location_0,att_location_1,att_location_2,att_location_3,att_location_4,price
1630393,"110 m² construidos, 90 m² útiles",3 habitaciones,1 baño,Plaza de garaje incluida en el precio,Segunda mano/buen estado,Armarios empotrados,"Orientación norte, este, oeste",Certificación energética: no indicado,,,...,Planta 4ª exterior,Con ascensor,Planta 4ª exterior,Con ascensor,"Avenida Ocho de Marzo, 3",Urb. LORANCA,Distrito Loranca,Fuenlabrada,"Zona sur, Madrid",207.5
1632340,"90 m² construidos, 82 m² útiles",3 habitaciones,1 baño,Segunda mano/buen estado,Armarios empotrados,Orientación este,Certificación energética: no indicado,,,,...,Planta 8ª exterior,Con ascensor,Planta 8ª exterior,,"Calle Santa Ana, 4",Distrito El Arroyo - La Fuente,Fuenlabrada,"Zona sur, Madrid",,137.5
1636513,"93 m² construidos, 77 m² útiles",3 habitaciones,2 baños,Plaza de garaje incluida en el precio,Segunda mano/buen estado,Armarios empotrados,Trastero,Certificación energética: no indicado,,,...,Planta 7ª exterior,Con ascensor,,,"Calle Arados, 5",Distrito El Arroyo - La Fuente,Fuenlabrada,"Zona sur, Madrid",,158.0
1638424,"115 m² construidos, 98 m² útiles",3 habitaciones,2 baños,Segunda mano/buen estado,Armarios empotrados,Certificación energética: no indicado,,,,,...,Planta 1ª exterior,Con ascensor,Planta 1ª exterior,,"Calle San Joaquín, 3",Distrito Centro,Fuenlabrada,"Zona sur, Madrid",,85.9
1640421,105 m² construidos,3 habitaciones,2 baños,Terraza,Plaza de garaje incluida en el precio,Segunda mano/buen estado,Armarios empotrados,Trastero,Certificación energética: no indicado,,...,Planta 4ª exterior,Con ascensor,,,"Calle Vitoria, 11",Urb. Lorea,Distrito La Serna,Fuenlabrada,"Zona sur, Madrid",125.0


In [29]:
attributes.shape

(5, 16)

### Writing Files

Before writing files, let's change the name of the columns. Now, the column names are something like `[0,1,2...13,0,1,0,...]` and we want to refer which attribute corresponds with each number `[att_main_0, att_main_1,..., att_build_0,...]`

In [36]:
def parse_columns_names(attributes_columns):
    """parsing column names in the form att_main_X, att_build_X, att_equipment_X, att_location_X and price"""
    
    
    names = ["att_main","att_build","att_equipment","att_location"]
    
    new_columns = []
    
    counter_name = 0
    counter_column = 0

    for column_name in attributes_columns:
        try:
            new_name = names[counter_name]+"_"+str(column_name)
            new_columns.append(new_name)

            if attributes_columns[counter_column] >= attributes_columns[counter_column+1]:
                counter_name += 1
            counter_column += 1
        except:
            "'>=' not supported between instances of 'int' and 'str'"
    
    new_columns[-1] = "price"
    
    return(new_columns)

In [37]:
attributes.columns = parse_columns_names(attributes.columns)

In [40]:
attributes.head()

Unnamed: 0,att_main_0,att_main_1,att_main_2,att_main_3,att_main_4,att_main_5,att_main_6,att_main_7,att_main_8,att_main_9,...,att_build_0,att_build_1,att_equipment_0,att_equipment_1,att_location_0,att_location_1,att_location_2,att_location_3,att_location_4,price
1630393,"110 m² construidos, 90 m² útiles",3 habitaciones,1 baño,Plaza de garaje incluida en el precio,Segunda mano/buen estado,Armarios empotrados,"Orientación norte, este, oeste",Certificación energética: no indicado,,,...,Planta 4ª exterior,Con ascensor,Planta 4ª exterior,Con ascensor,"Avenida Ocho de Marzo, 3",Urb. LORANCA,Distrito Loranca,Fuenlabrada,"Zona sur, Madrid",207.5
1632340,"90 m² construidos, 82 m² útiles",3 habitaciones,1 baño,Segunda mano/buen estado,Armarios empotrados,Orientación este,Certificación energética: no indicado,,,,...,Planta 8ª exterior,Con ascensor,Planta 8ª exterior,,"Calle Santa Ana, 4",Distrito El Arroyo - La Fuente,Fuenlabrada,"Zona sur, Madrid",,137.5
1636513,"93 m² construidos, 77 m² útiles",3 habitaciones,2 baños,Plaza de garaje incluida en el precio,Segunda mano/buen estado,Armarios empotrados,Trastero,Certificación energética: no indicado,,,...,Planta 7ª exterior,Con ascensor,,,"Calle Arados, 5",Distrito El Arroyo - La Fuente,Fuenlabrada,"Zona sur, Madrid",,158.0
1638424,"115 m² construidos, 98 m² útiles",3 habitaciones,2 baños,Segunda mano/buen estado,Armarios empotrados,Certificación energética: no indicado,,,,,...,Planta 1ª exterior,Con ascensor,Planta 1ª exterior,,"Calle San Joaquín, 3",Distrito Centro,Fuenlabrada,"Zona sur, Madrid",,85.9
1640421,105 m² construidos,3 habitaciones,2 baños,Terraza,Plaza de garaje incluida en el precio,Segunda mano/buen estado,Armarios empotrados,Trastero,Certificación energética: no indicado,,...,Planta 4ª exterior,Con ascensor,,,"Calle Vitoria, 11",Urb. Lorea,Distrito La Serna,Fuenlabrada,"Zona sur, Madrid",125.0


The output file follows the next structure:

`nameOfTheCity_properties_date.csv` for `properties` dataframe and `nameOfTheCity_attributes_date.csv` for `attributes` dataframe. 

Both file are going to saved into a folder with the name of the city.

In [112]:
date = str(datetime.datetime.now())[:10]

__Writing files...__

In [115]:
properties.to_csv("./data/raw/"+date+"_"+city+"/properties_"+date+".csv", sep = "^")

In [116]:
attributes.to_csv("./data/raw/"+date+"_"+city+"/attributes_"+date+".csv", sep = "^")

In [119]:
df1 = pd.DataFrame([1,2,3,4,5,6,7,8,9])
df2 = pd.DataFrame([2,3,4,6,7,8,12,15,19])

In [133]:
set([2,3,4,4,6,7,8,12,15,19]) 

{2, 3, 4, 6, 7, 8, 12, 15, 19}

In [140]:
set(list(properties['id'])) - set([306849,317461])

{308485,
 445402,
 1624550,
 1633219,
 1670047,
 1695777,
 1722829,
 1761554,
 1789435,
 1814879,
 1834421,
 1853719,
 1862526,
 1911796,
 2030707,
 2048926,
 2050035,
 25382233,
 26358912,
 26822813,
 27148553,
 28039517,
 28219048,
 28495092,
 28741802,
 28742522,
 28868204,
 28892186,
 29955334,
 30878179,
 31671882,
 32525952,
 32594858,
 32772222,
 33083442,
 33083458,
 33083459,
 33083520,
 33083521,
 33325138,
 33596183,
 33629108,
 33689289,
 34068862,
 34246454,
 34355788,
 34664666,
 35081037,
 35133257,
 35151374,
 35243722,
 35603271,
 35703178,
 35731207,
 35735293,
 35735299,
 35735301,
 35736460,
 35736461,
 35736463,
 35736465,
 35736602,
 35736603,
 35736604,
 35736607,
 35736608,
 35736609,
 35736610,
 35736611,
 35736612,
 35736734,
 35736743,
 35736906,
 35736918,
 35736919,
 35736920,
 35736921,
 35736923,
 35736926,
 35736938,
 35736940,
 35737014,
 35737016,
 35737019,
 35737029,
 35737030,
 35737031,
 35737034,
 35737035,
 35737036,
 35737530,
 35737531,
 357375