# Project: Building a rental price barometer for Madrid using Idealista

### Notebook: Data extraction
 
Antonio Montilla

Madrid, December 2023

## Data extraction process

- This notebook is part of the repository for the project "Rental price barometer for Madrid using Idealista", focusing on the data extraction from rental portal https://www.idealista.com.
-  Idealista is the most popular and prominent of the Spanish real estate portals and currently leads the way with the biggest database of houses for sale and rent in Spain.
- The aim of this project is to build a model that predicts rental prices in the city of Madrid, using rental listings uploaded in Idealista as of December 11. The scope of the project is to select listed properties sorted by its distance from Madrid city center (Puerta del Sol).
- The data is obtained using Idealista API, for which an API-key and a secret password was requested and kindly provided by Idealista for academic and research purposes.
- Following indications from the portal, the data extraction process involved building the following functions:
    * **encode_api_key_secret(api_key, secret)**: this functions URL encodes the provided API-key and secrets to be used in the authentication process, following indications and requierements by Idealista. It takes as input the given API-key and secrets and returns the encoded passwords.
    * **fetch_access_token(api_key, secret)**: this function obtains a token to connect with Idealista API. It receives as input the encoded passwords and returns a token access, following Idealista's instructions.
    * **fetch_rental_listings(api_key, secret, page)**: this function is built to make requests, based on the defined criteria: rental properties in the city of Madrid sorted by distance from Puerta del Sol geographical position. The function also limits the geographical scope to 10 km from Puerta del Sol (this parameter is requested by the portal). This functions takes as input the information from previous functions and returns the actual response with the request in json format, which accounts for up to 50 listings for each requests.
    * **process_result(result)**: this function is built to process the responses from json format into a dataframe, extracting all the features in columns. This process was then completed by a final function (**fetch_and_process(api_key, secret, page)**) that delivers the dataframe.
    * **save_and_append_data(api_key, secret, total_pages)**: finally, this functions is set to loop multiple requests (each amounts to 50 listings) iterating with the parameter 'numPage'. The function builds on previous steps and functions to: (i) make the request for each page, (ii) process the json response into a dataframe, (iii) store each dataframe into a csv file, (iv) append and return a resulting dataframe which combines all responses for different pages. This dataframe will be compiled with other requests made separately to obtain the final dataframe in a csv file which will be imported in the main notebook for analysis and modelling.

## Importing libraries

In [1]:
import pandas as pd
import numpy as np
import requests
import time
import json
import os
import base64
import seaborn as sns
import matplotlib.pyplot as plt

### Step 1: URL encoding API-key and secrets

In [2]:
#NOTE: THIS LINE WILL NEED TO BE DELETED AFTER EXTRACTING DATA
# Adding information
api_key = 'X'
secret = 'X'

In [3]:
#defining a function to URL encode api_key and secret following idealista instructions:

def encode_api_key_secret(api_key, secret):
    # Concatenate API key, colon, and secret
    api_key_secret_str = f"{api_key}:{secret}"
    
    # Encode the string as bytes
    api_key_secret_bytes = api_key_secret_str.encode('utf-8')
    
    # Base64 encode the bytes
    encoded_credentials = base64.b64encode(api_key_secret_bytes)
    
    # Convert the bytes back to a string
    return encoded_credentials.decode('utf-8')

In [4]:
#confirming it produces the output as idealista example(api_key = abc, secret = 123 should be ''YWJjOjEyMw=='')
encode_api_key_secret("abc", "123")

'YWJjOjEyMw=='

### Step 2: Connecting with the API

In [5]:
#defining a function to first connect with API
def fetch_access_token(api_key, secret):
    token_url = 'https://api.idealista.com/oauth/token'
    
    headers = {
        'Authorization': f'Basic {encode_api_key_secret(api_key, secret)}',
        'Content-Type': 'application/x-www-form-urlencoded'
    }
    
    data = {
        'grant_type': 'client_credentials', 
        'scope': 'read'
    }
    

    response = requests.post(token_url, headers=headers, data=data)

    if response.status_code == 200:
        return response.json().get('access_token')
    else:
        print(f"Error getting access token: {response.status_code}")
        print(response.text)
        return None


### Step 3: Making requests with specified criteria

In [6]:
#defining a function to extract houses listed for rent in Madrid, page 1 (50 listings), sorted by distance from Sol:
def fetch_rental_listings(api_key, secret, page):
    access_token = fetch_access_token(api_key, secret)

    if access_token:
        base_url = 'https://api.idealista.com/3.5/es/search'
        
        headers = {
            'Content-Type': 'application/x-www-form-urlencoded',
            'Authorization': f'Bearer {access_token}'
        }

        params = {
            'country': 'es',
            'operation': 'rent',
            'propertyType': 'homes',
            'center': '40.41690,-3.703500', #this is Plaza Puerta del Sol
            'distance': '10000', #limiting to 10km away from Puerta del Sol
            'maxItems': '50',
            'order': 'distance', #sorted by distance from Puerta del Sol
            'sort': 'asc',
            'numPage': page
        }

        response = requests.post(base_url, headers=headers, params=params)

        if response.status_code == 200:
            return response.json()
        else:
            print(f"Error: {response.status_code}")
            print(response.text) 
            return None


### Step 4: Processing the responses into dataframe

In [7]:
#Defining a function to process the results as a df:
def process_result(result):
    listings = result.get('elementList', [])

    data = {
        'propertyCode': [],
        'thumbnail': [],
        'externalReference': [],
        'numPhotos': [],
        'price': [],
        'propertyType': [],
        'operation': [],
        'size': [],
        'rooms': [],
        'bathrooms': [],
        'address': [],
        'province': [],
        'municipality': [],
        'district': [],
        'country': [],
        'latitude': [],
        'longitude': [],
        'showAddress': [],
        'url': [],
        'distance': [],
        'description': [],
        'hasVideo': [],
        'status': [],
        'newDevelopment': [],
        'parkingSpace': [],
        'priceByArea': [],
        'typology': [],
        'subTypology': [],
        'subtitle': [],
        'title': [],
        'hasPlan': [],
        'has3DTour': [],
        'has360': [],
        'hasLift': [],
        'hasStaging': [],
        'luxuryType': [],
        'villaType': [],
        'superTopHighlight': [],
        'topNewDevelopment': [],
        'topPlus': [],
    }

    for listing in listings:
        data['propertyCode'].append(listing.get('propertyCode', ''))
        data['thumbnail'].append(listing.get('thumbnail', ''))
        data['externalReference'].append(listing.get('externalReference', ''))
        data['numPhotos'].append(listing.get('numPhotos', 0))
        data['price'].append(listing.get('price', 0.0))
        data['propertyType'].append(listing.get('propertyType', ''))
        data['operation'].append(listing.get('operation', ''))
        data['size'].append(listing.get('size', 0.0))
        data['rooms'].append(listing.get('rooms', 0))
        data['bathrooms'].append(listing.get('bathrooms', 0))
        data['address'].append(listing.get('address', ''))
        data['province'].append(listing.get('province', ''))
        data['municipality'].append(listing.get('municipality', ''))
        data['district'].append(listing.get('district', ''))
        data['country'].append(listing.get('country', ''))
        data['latitude'].append(listing.get('latitude', 0.0))
        data['longitude'].append(listing.get('longitude', 0.0))
        data['showAddress'].append(listing.get('showAddress', False))
        data['url'].append(listing.get('url', ''))
        data['distance'].append(listing.get('distance', ''))
        data['description'].append(listing.get('description', ''))
        data['hasVideo'].append(listing.get('hasVideo', False))
        data['status'].append(listing.get('status', ''))
        data['newDevelopment'].append(listing.get('newDevelopment', False))
        data['parkingSpace'].append(listing.get('parkingSpace', {}).get('hasParkingSpace', False))
        data['priceByArea'].append(listing.get('priceByArea', 0.0))
        detailed_type = listing.get('detailedType', {})
        data['typology'].append(detailed_type.get('typology', ''))
        data['subTypology'].append(detailed_type.get('subTypology', ''))
        suggested_texts = listing.get('suggestedTexts', {})
        data['subtitle'].append(suggested_texts.get('subtitle', ''))
        data['title'].append(suggested_texts.get('title', ''))
        data['hasPlan'].append(listing.get('hasPlan', False))
        data['hasLift'].append(listing.get('hasLift', False))
        data['has3DTour'].append(listing.get('has3DTour', False))
        data['has360'].append(listing.get('has360', False))
        data['hasStaging'].append(listing.get('hasStaging', False))
        labels = listing.get('labels', [])
        data['luxuryType'].append(any(label['name'] == 'luxuryType' for label in labels))
        data['villaType'].append(any(label['name'] == 'villaType' for label in labels))
        data['superTopHighlight'].append(listing.get('superTopHighlight', False))
        data['topNewDevelopment'].append(listing.get('topNewDevelopment', False))
        data['topPlus'].append(listing.get('topPlus', False))

    df = pd.DataFrame(data)
    return df

In [8]:
#Defining function to make request for each page using functions from above:
def fetch_and_process(api_key, secret, page):
    result = fetch_rental_listings(api_key, secret, page)
    if result:
        dataframe = process_result(result)
        return dataframe
    else:
        return None

### Step 5: Making multiple requests and storing results

In [52]:
#Defining function to save and append results for each page
def save_and_append_data(api_key, secret, total_pages):
    existing_df = None

    for page in range(1, total_pages + 1): #range needs to be adjusted to set desired page numbers.
        dataframe = fetch_and_process(api_key, secret, page)

        if dataframe is not None:
            # Save the DataFrame to a CSV file
            file_name = f"data_{page}.csv"
            dataframe.to_csv(file_name, index=False)
            print(f"Saved data to {file_name}")

            # Append the new DataFrame to the existing one
            if existing_df is None:
                existing_df = dataframe
            else:
                existing_df = pd.concat([existing_df, dataframe], ignore_index=True)

            # Introduce a delay to avoid overwhelming the API
            time.sleep(3)

    return existing_df

### Actual data extration

In [10]:
#Code for running request. Need to define total_pages
total_pages = 6 #Note: = 2 delivers 2 csv... also there is a max of 100 requests, each up to 50 listings... 50*90= 4500
data_1_6 = save_and_append_data(api_key, secret, total_pages)

Saved data to data_1.csv
Saved data to data_2.csv
Saved data to data_3.csv
Saved data to data_4.csv
Saved data to data_5.csv
Saved data to data_6.csv


In [12]:
data_1_6

Unnamed: 0,propertyCode,thumbnail,externalReference,numPhotos,price,propertyType,operation,size,rooms,bathrooms,...,hasPlan,has3DTour,has360,hasLift,hasStaging,luxuryType,villaType,superTopHighlight,topNewDevelopment,topPlus
0,103241997,https://img3.idealista.com/blur/WEB_LISTING/90...,ACCM,21,1350.0,flat,rent,55.0,1,2,...,True,False,False,True,False,False,False,False,False,False
1,101227123,https://img3.idealista.com/blur/WEB_LISTING/0/...,3d26340f0da112d7aafd,15,1743.0,studio,rent,25.0,0,1,...,True,False,False,True,False,False,False,False,False,False
2,458306,https://img3.idealista.com/blur/WEB_LISTING/0/...,,24,1595.0,flat,rent,98.0,2,2,...,False,False,False,True,False,False,False,False,False,False
3,99586539,https://img3.idealista.com/blur/WEB_LISTING/0/...,2392,22,2400.0,flat,rent,57.0,1,1,...,False,False,False,True,False,False,False,False,False,False
4,102075463,https://img3.idealista.com/blur/WEB_LISTING/0/...,120413,23,1488.0,flat,rent,42.0,1,1,...,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
295,102830411,https://img3.idealista.com/blur/WEB_LISTING/0/...,ALQGVPADDD,19,2600.0,flat,rent,60.0,1,1,...,False,False,False,True,False,False,False,False,False,False
296,98518740,https://img3.idealista.com/blur/WEB_LISTING/0/...,MAD-62,18,2450.0,flat,rent,69.0,1,1,...,True,False,False,True,False,False,False,False,False,False
297,103309760,https://img3.idealista.com/blur/WEB_LISTING/0/...,MS2811,18,1850.0,flat,rent,112.0,3,2,...,True,False,False,True,False,False,False,False,False,False
298,102749582,https://img3.idealista.com/blur/WEB_LISTING/0/...,T106255,20,1450.0,flat,rent,85.0,1,1,...,False,False,False,True,False,False,False,False,False,False


In [14]:
data_1_6['propertyCode'].value_counts(dropna = False)

95803117     2
103241997    1
92746017     1
103090467    1
101969400    1
            ..
103309202    1
87019564     1
98783453     1
100041672    1
103378800    1
Name: propertyCode, Length: 299, dtype: int64

In [16]:
data_1_6.to_csv("data_1_6.csv", index=False)

In [18]:
#Requesting 7-10
total_pages = 10 #Note: = 2 delivers 2 csv... also there is a max of 100 requests, each up to 50 listings... 50*90= 4500
data_7_10 = save_and_append_data(api_key, secret, total_pages)

Saved data to data_7.csv
Saved data to data_8.csv
Saved data to data_9.csv
Saved data to data_10.csv


In [19]:
data_7_10 

Unnamed: 0,propertyCode,thumbnail,externalReference,numPhotos,price,propertyType,operation,size,rooms,bathrooms,...,hasPlan,has3DTour,has360,hasLift,hasStaging,luxuryType,villaType,superTopHighlight,topNewDevelopment,topPlus
0,103378800,https://img3.idealista.com/blur/WEB_LISTING/0/...,0100538,25,1350.0,flat,rent,53.0,1,1,...,False,False,False,True,False,False,False,False,False,False
1,100089586,https://img3.idealista.com/blur/WEB_LISTING/0/...,APARTMENTODELUJO,51,14000.0,flat,rent,397.0,6,7,...,False,False,False,True,False,True,False,False,False,False
2,95705077,https://img3.idealista.com/blur/WEB_LISTING/0/...,Atocha XX,12,3375.0,flat,rent,148.0,2,2,...,False,False,False,True,False,False,False,False,False,False
3,101159872,https://img3.idealista.com/blur/WEB_LISTING/0/...,,17,3000.0,flat,rent,148.0,2,2,...,False,True,False,True,False,False,False,False,False,False
4,102628249,https://img3.idealista.com/blur/WEB_LISTING/0/...,83262623,30,4600.0,flat,rent,182.0,2,2,...,False,False,False,True,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
195,102075783,https://img3.idealista.com/blur/WEB_LISTING/0/...,120363,27,1437.0,flat,rent,25.0,2,1,...,False,False,False,False,False,False,False,False,False,False
196,97741062,https://img3.idealista.com/blur/WEB_LISTING/0/...,,20,1450.0,flat,rent,45.0,1,1,...,False,False,False,True,False,False,False,False,False,False
197,103301322,https://img3.idealista.com/blur/WEB_LISTING/0/...,3706,16,895.0,studio,rent,35.0,0,1,...,False,False,False,True,False,False,False,False,False,False
198,103210148,https://img3.idealista.com/blur/WEB_LISTING/0/...,San Carlos XII,19,2430.0,flat,rent,70.0,2,2,...,False,False,False,True,False,False,False,False,False,False


In [20]:
#combining up to 10
data_1_10 = pd.concat([data_1_6, data_7_10], ignore_index=True)
data_1_10

Unnamed: 0,propertyCode,thumbnail,externalReference,numPhotos,price,propertyType,operation,size,rooms,bathrooms,...,hasPlan,has3DTour,has360,hasLift,hasStaging,luxuryType,villaType,superTopHighlight,topNewDevelopment,topPlus
0,103241997,https://img3.idealista.com/blur/WEB_LISTING/90...,ACCM,21,1350.0,flat,rent,55.0,1,2,...,True,False,False,True,False,False,False,False,False,False
1,101227123,https://img3.idealista.com/blur/WEB_LISTING/0/...,3d26340f0da112d7aafd,15,1743.0,studio,rent,25.0,0,1,...,True,False,False,True,False,False,False,False,False,False
2,458306,https://img3.idealista.com/blur/WEB_LISTING/0/...,,24,1595.0,flat,rent,98.0,2,2,...,False,False,False,True,False,False,False,False,False,False
3,99586539,https://img3.idealista.com/blur/WEB_LISTING/0/...,2392,22,2400.0,flat,rent,57.0,1,1,...,False,False,False,True,False,False,False,False,False,False
4,102075463,https://img3.idealista.com/blur/WEB_LISTING/0/...,120413,23,1488.0,flat,rent,42.0,1,1,...,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
495,102075783,https://img3.idealista.com/blur/WEB_LISTING/0/...,120363,27,1437.0,flat,rent,25.0,2,1,...,False,False,False,False,False,False,False,False,False,False
496,97741062,https://img3.idealista.com/blur/WEB_LISTING/0/...,,20,1450.0,flat,rent,45.0,1,1,...,False,False,False,True,False,False,False,False,False,False
497,103301322,https://img3.idealista.com/blur/WEB_LISTING/0/...,3706,16,895.0,studio,rent,35.0,0,1,...,False,False,False,True,False,False,False,False,False,False
498,103210148,https://img3.idealista.com/blur/WEB_LISTING/0/...,San Carlos XII,19,2430.0,flat,rent,70.0,2,2,...,False,False,False,True,False,False,False,False,False,False


In [21]:
data_1_10['propertyCode'].value_counts(dropna = False)

95803117     2
103378800    2
103042271    1
90217156     1
86368798     1
            ..
103302065    1
103227713    1
102699555    1
92149907     1
103200352    1
Name: propertyCode, Length: 498, dtype: int64

In [22]:
#saving 1-10
data_1_10.to_csv("data_1_10.csv", index=False)

In [24]:
#requesting 11-30
total_pages = 30 #Note: = 2 delivers 2 csv... also there is a max of 100 requests, each up to 50 listings... 50*90= 4500
data_11_30 = save_and_append_data(api_key, secret, total_pages)

Saved data to data_11.csv
Saved data to data_12.csv
Saved data to data_13.csv
Saved data to data_14.csv
Saved data to data_15.csv
Saved data to data_16.csv
Saved data to data_17.csv
Saved data to data_18.csv
Saved data to data_19.csv
Saved data to data_20.csv
Saved data to data_21.csv
Saved data to data_22.csv
Saved data to data_23.csv
Saved data to data_24.csv
Saved data to data_25.csv
Saved data to data_26.csv
Saved data to data_27.csv
Saved data to data_28.csv
Saved data to data_29.csv
Saved data to data_30.csv


In [25]:
data_11_30

Unnamed: 0,propertyCode,thumbnail,externalReference,numPhotos,price,propertyType,operation,size,rooms,bathrooms,...,hasPlan,has3DTour,has360,hasLift,hasStaging,luxuryType,villaType,superTopHighlight,topNewDevelopment,topPlus
0,102464810,https://img3.idealista.com/blur/WEB_LISTING/0/...,San Carlos XI,14,2430.0,penthouse,rent,82.0,2,1,...,False,False,False,True,False,False,False,False,False,False
1,98946080,https://img3.idealista.com/blur/WEB_LISTING/0/...,San Carlos II,13,2430.0,flat,rent,82.0,2,2,...,False,False,False,True,False,False,False,False,False,False
2,100760490,https://img3.idealista.com/blur/WEB_LISTING/0/...,San Carlos VIII,7,2295.0,flat,rent,63.0,2,2,...,False,False,False,True,False,False,False,False,False,False
3,102075824,https://img3.idealista.com/blur/WEB_LISTING/0/...,120377,35,2368.0,flat,rent,70.0,2,1,...,False,False,False,False,False,False,False,False,False,False
4,103252504,https://img3.idealista.com/blur/WEB_LISTING/0/...,,8,2200.0,flat,rent,130.0,3,2,...,False,False,False,True,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,100523409,https://img3.idealista.com/blur/WEB_LISTING/0/...,PEDRA,27,3725.0,flat,rent,137.0,4,4,...,True,False,True,True,False,False,False,False,False,False
996,98418830,https://img3.idealista.com/blur/WEB_LISTING/0/...,,18,1400.0,penthouse,rent,70.0,1,1,...,False,False,False,False,False,False,False,False,False,False
997,99543702,https://img3.idealista.com/blur/WEB_LISTING/0/...,,40,3200.0,flat,rent,144.0,4,2,...,True,False,False,True,False,False,False,False,False,False
998,103347090,https://img3.idealista.com/blur/WEB_LISTING/0/...,5231,17,2100.0,flat,rent,112.0,3,2,...,True,False,False,True,False,False,False,False,False,False


In [26]:
#combining up to 30
data_1_30 = pd.concat([data_1_10, data_11_30], ignore_index=True)
data_1_30

Unnamed: 0,propertyCode,thumbnail,externalReference,numPhotos,price,propertyType,operation,size,rooms,bathrooms,...,hasPlan,has3DTour,has360,hasLift,hasStaging,luxuryType,villaType,superTopHighlight,topNewDevelopment,topPlus
0,103241997,https://img3.idealista.com/blur/WEB_LISTING/90...,ACCM,21,1350.0,flat,rent,55.0,1,2,...,True,False,False,True,False,False,False,False,False,False
1,101227123,https://img3.idealista.com/blur/WEB_LISTING/0/...,3d26340f0da112d7aafd,15,1743.0,studio,rent,25.0,0,1,...,True,False,False,True,False,False,False,False,False,False
2,458306,https://img3.idealista.com/blur/WEB_LISTING/0/...,,24,1595.0,flat,rent,98.0,2,2,...,False,False,False,True,False,False,False,False,False,False
3,99586539,https://img3.idealista.com/blur/WEB_LISTING/0/...,2392,22,2400.0,flat,rent,57.0,1,1,...,False,False,False,True,False,False,False,False,False,False
4,102075463,https://img3.idealista.com/blur/WEB_LISTING/0/...,120413,23,1488.0,flat,rent,42.0,1,1,...,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1495,100523409,https://img3.idealista.com/blur/WEB_LISTING/0/...,PEDRA,27,3725.0,flat,rent,137.0,4,4,...,True,False,True,True,False,False,False,False,False,False
1496,98418830,https://img3.idealista.com/blur/WEB_LISTING/0/...,,18,1400.0,penthouse,rent,70.0,1,1,...,False,False,False,False,False,False,False,False,False,False
1497,99543702,https://img3.idealista.com/blur/WEB_LISTING/0/...,,40,3200.0,flat,rent,144.0,4,2,...,True,False,False,True,False,False,False,False,False,False
1498,103347090,https://img3.idealista.com/blur/WEB_LISTING/0/...,5231,17,2100.0,flat,rent,112.0,3,2,...,True,False,False,True,False,False,False,False,False,False


In [27]:
data_1_30['propertyCode'].value_counts(dropna = False)

95803117     2
100674158    2
100686146    2
101171581    2
100684631    2
            ..
103210148    1
103301322    1
97741062     1
102075783    1
102070695    1
Name: propertyCode, Length: 1494, dtype: int64

In [30]:
#requesting 31-60
total_pages = 60 #Note: = 2 delivers 2 csv... also there is a max of 100 requests, each up to 50 listings... 50*90= 4500
data_31_60 = save_and_append_data(api_key, secret, total_pages)

Saved data to data_31.csv
Saved data to data_32.csv
Saved data to data_33.csv
Saved data to data_34.csv
Saved data to data_35.csv
Saved data to data_36.csv
Saved data to data_37.csv
Saved data to data_38.csv
Saved data to data_39.csv
Saved data to data_40.csv
Saved data to data_41.csv
Saved data to data_42.csv
Saved data to data_43.csv
Saved data to data_44.csv
Saved data to data_45.csv
Saved data to data_46.csv
Saved data to data_47.csv
Saved data to data_48.csv
Saved data to data_49.csv
Saved data to data_50.csv
Saved data to data_51.csv
Saved data to data_52.csv
Saved data to data_53.csv
Saved data to data_54.csv
Saved data to data_55.csv
Saved data to data_56.csv
Saved data to data_57.csv
Saved data to data_58.csv
Saved data to data_59.csv
Saved data to data_60.csv


In [31]:
data_31_60

Unnamed: 0,propertyCode,thumbnail,externalReference,numPhotos,price,propertyType,operation,size,rooms,bathrooms,...,hasPlan,has3DTour,has360,hasLift,hasStaging,luxuryType,villaType,superTopHighlight,topNewDevelopment,topPlus
0,102995748,https://img3.idealista.com/blur/WEB_LISTING/0/...,3577,27,2650.0,flat,rent,130.0,3,2,...,False,False,False,False,False,False,False,False,False,False
1,103368831,https://img3.idealista.com/blur/WEB_LISTING/0/...,Ref. 50012,52,4850.0,flat,rent,176.0,3,3,...,False,False,False,True,False,False,False,False,False,False
2,103390455,https://img3.idealista.com/blur/WEB_LISTING/0/...,Ref. 176711,71,5000.0,flat,rent,150.0,3,2,...,False,False,False,True,False,False,False,False,False,False
3,102946497,https://img3.idealista.com/blur/WEB_LISTING/0/...,106961,5,2600.0,flat,rent,75.0,2,2,...,False,False,False,True,False,False,False,False,False,False
4,103380562,https://img3.idealista.com/blur/WEB_LISTING/0/...,153,21,975.0,flat,rent,35.0,1,1,...,False,False,False,True,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1495,100353052,https://img3.idealista.com/blur/WEB_LISTING/0/...,Hermosilla XLVIII,12,2565.0,flat,rent,60.0,2,2,...,False,False,False,False,False,False,False,False,False,False
1496,103341399,https://img3.idealista.com/blur/WEB_LISTING/0/...,,31,1329.0,flat,rent,75.0,2,2,...,False,False,False,True,False,False,False,False,False,False
1497,103328055,https://img3.idealista.com/blur/WEB_LISTING/90...,A58,12,1000.0,flat,rent,60.0,2,1,...,False,False,False,False,False,False,False,False,False,False
1498,98332469,https://img3.idealista.com/blur/WEB_LISTING/0/...,OPALA,28,2130.0,flat,rent,75.0,1,2,...,True,False,True,True,False,False,False,False,False,False


In [32]:
#combining up to 60
data_1_60 = pd.concat([data_1_30, data_31_60], ignore_index=True)
data_1_60

Unnamed: 0,propertyCode,thumbnail,externalReference,numPhotos,price,propertyType,operation,size,rooms,bathrooms,...,hasPlan,has3DTour,has360,hasLift,hasStaging,luxuryType,villaType,superTopHighlight,topNewDevelopment,topPlus
0,103241997,https://img3.idealista.com/blur/WEB_LISTING/90...,ACCM,21,1350.0,flat,rent,55.0,1,2,...,True,False,False,True,False,False,False,False,False,False
1,101227123,https://img3.idealista.com/blur/WEB_LISTING/0/...,3d26340f0da112d7aafd,15,1743.0,studio,rent,25.0,0,1,...,True,False,False,True,False,False,False,False,False,False
2,458306,https://img3.idealista.com/blur/WEB_LISTING/0/...,,24,1595.0,flat,rent,98.0,2,2,...,False,False,False,True,False,False,False,False,False,False
3,99586539,https://img3.idealista.com/blur/WEB_LISTING/0/...,2392,22,2400.0,flat,rent,57.0,1,1,...,False,False,False,True,False,False,False,False,False,False
4,102075463,https://img3.idealista.com/blur/WEB_LISTING/0/...,120413,23,1488.0,flat,rent,42.0,1,1,...,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2995,100353052,https://img3.idealista.com/blur/WEB_LISTING/0/...,Hermosilla XLVIII,12,2565.0,flat,rent,60.0,2,2,...,False,False,False,False,False,False,False,False,False,False
2996,103341399,https://img3.idealista.com/blur/WEB_LISTING/0/...,,31,1329.0,flat,rent,75.0,2,2,...,False,False,False,True,False,False,False,False,False,False
2997,103328055,https://img3.idealista.com/blur/WEB_LISTING/90...,A58,12,1000.0,flat,rent,60.0,2,1,...,False,False,False,False,False,False,False,False,False,False
2998,98332469,https://img3.idealista.com/blur/WEB_LISTING/0/...,OPALA,28,2130.0,flat,rent,75.0,1,2,...,True,False,True,True,False,False,False,False,False,False


In [33]:
data_1_60['propertyCode'].value_counts(dropna = False)

101171581    2
100674158    2
100686146    2
100686270    2
100684631    2
            ..
101575429    1
101575418    1
102075586    1
103166352    1
98763852     1
Name: propertyCode, Length: 2993, dtype: int64

In [37]:
#requesting 61-85
total_pages = 85 #Note: = 2 delivers 2 csv... also there is a max of 100 requests, each up to 50 listings... 50*90= 4500
data_61_85 = save_and_append_data(api_key, secret, total_pages)

Saved data to data_61.csv
Saved data to data_62.csv
Saved data to data_63.csv
Saved data to data_64.csv
Saved data to data_65.csv
Saved data to data_66.csv
Saved data to data_67.csv
Saved data to data_68.csv
Saved data to data_69.csv
Saved data to data_70.csv
Saved data to data_71.csv
Saved data to data_72.csv
Saved data to data_73.csv
Saved data to data_74.csv
Saved data to data_75.csv
Saved data to data_76.csv
Saved data to data_77.csv
Saved data to data_78.csv
Saved data to data_79.csv
Saved data to data_80.csv
Saved data to data_81.csv
Saved data to data_82.csv
Saved data to data_83.csv
Saved data to data_84.csv
Saved data to data_85.csv


In [38]:
data_61_85

Unnamed: 0,propertyCode,thumbnail,externalReference,numPhotos,price,propertyType,operation,size,rooms,bathrooms,...,hasPlan,has3DTour,has360,hasLift,hasStaging,luxuryType,villaType,superTopHighlight,topNewDevelopment,topPlus
0,103238500,https://img3.idealista.com/blur/WEB_LISTING/0/...,,28,1050.0,flat,rent,35.0,1,1,...,False,False,False,False,False,False,False,False,False,False
1,103367615,https://img3.idealista.com/blur/WEB_LISTING/0/...,2801-018534-02,11,790.0,flat,rent,50.0,2,1,...,False,False,False,False,False,False,False,False,False,False
2,103319969,https://img3.idealista.com/blur/WEB_LISTING/0/...,,34,3850.0,flat,rent,240.0,3,3,...,False,False,False,True,False,False,False,False,False,False
3,103048227,https://img3.idealista.com/blur/WEB_LISTING/0/...,L2207MA,21,2200.0,flat,rent,98.0,2,2,...,True,False,False,True,False,False,False,False,False,False
4,98428762,https://img3.idealista.com/blur/WEB_LISTING/0/...,LIKOU,26,3175.0,flat,rent,133.0,3,2,...,True,False,True,True,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1245,103059096,https://img3.idealista.com/blur/WEB_LISTING/0/...,107046,11,2400.0,flat,rent,65.0,2,1,...,False,False,False,True,False,False,False,False,False,False
1246,103138399,https://img3.idealista.com/blur/WEB_LISTING/0/...,W-02TWVU,29,1595.0,flat,rent,82.0,2,2,...,False,False,False,True,False,False,False,False,False,False
1247,103015173,https://img3.idealista.com/blur/WEB_LISTING/0/...,2815-028509-01,16,790.0,studio,rent,27.0,0,1,...,False,False,False,True,False,False,False,False,False,False
1248,103379718,https://img3.idealista.com/blur/WEB_LISTING/0/...,Castillesas,50,2640.0,flat,rent,99.0,3,2,...,False,False,False,True,False,False,False,False,False,False


In [40]:
#combining up to 85
data_1_85 = pd.concat([data_1_60, data_61_85], ignore_index=True)
data_1_85

Unnamed: 0,propertyCode,thumbnail,externalReference,numPhotos,price,propertyType,operation,size,rooms,bathrooms,...,hasPlan,has3DTour,has360,hasLift,hasStaging,luxuryType,villaType,superTopHighlight,topNewDevelopment,topPlus
0,103241997,https://img3.idealista.com/blur/WEB_LISTING/90...,ACCM,21,1350.0,flat,rent,55.0,1,2,...,True,False,False,True,False,False,False,False,False,False
1,101227123,https://img3.idealista.com/blur/WEB_LISTING/0/...,3d26340f0da112d7aafd,15,1743.0,studio,rent,25.0,0,1,...,True,False,False,True,False,False,False,False,False,False
2,458306,https://img3.idealista.com/blur/WEB_LISTING/0/...,,24,1595.0,flat,rent,98.0,2,2,...,False,False,False,True,False,False,False,False,False,False
3,99586539,https://img3.idealista.com/blur/WEB_LISTING/0/...,2392,22,2400.0,flat,rent,57.0,1,1,...,False,False,False,True,False,False,False,False,False,False
4,102075463,https://img3.idealista.com/blur/WEB_LISTING/0/...,120413,23,1488.0,flat,rent,42.0,1,1,...,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4245,103059096,https://img3.idealista.com/blur/WEB_LISTING/0/...,107046,11,2400.0,flat,rent,65.0,2,1,...,False,False,False,True,False,False,False,False,False,False
4246,103138399,https://img3.idealista.com/blur/WEB_LISTING/0/...,W-02TWVU,29,1595.0,flat,rent,82.0,2,2,...,False,False,False,True,False,False,False,False,False,False
4247,103015173,https://img3.idealista.com/blur/WEB_LISTING/0/...,2815-028509-01,16,790.0,studio,rent,27.0,0,1,...,False,False,False,True,False,False,False,False,False,False
4248,103379718,https://img3.idealista.com/blur/WEB_LISTING/0/...,Castillesas,50,2640.0,flat,rent,99.0,3,2,...,False,False,False,True,False,False,False,False,False,False


In [41]:
data_1_85['propertyCode'].value_counts(dropna = False)

95803117     2
100686270    2
100684631    2
101171581    2
103378800    2
            ..
103255150    1
100012656    1
101786655    1
103259102    1
103406517    1
Name: propertyCode, Length: 4243, dtype: int64

In [44]:
#requesting 86-90
total_pages = 90 #Note: = 2 delivers 2 csv... also there is a max of 100 requests, each up to 50 listings... 50*90= 4500
data_86_90 = save_and_append_data(api_key, secret, total_pages)

Saved data to data_86.csv
Saved data to data_87.csv
Saved data to data_88.csv
Saved data to data_89.csv
Saved data to data_90.csv


In [45]:
data_86_90

Unnamed: 0,propertyCode,thumbnail,externalReference,numPhotos,price,propertyType,operation,size,rooms,bathrooms,...,hasPlan,has3DTour,has360,hasLift,hasStaging,luxuryType,villaType,superTopHighlight,topNewDevelopment,topPlus
0,102526021,https://img3.idealista.com/blur/WEB_LISTING/0/...,SV0010472,16,1800.0,flat,rent,80.0,4,1,...,False,False,False,False,False,False,False,False,False,False
1,102893493,https://img3.idealista.com/blur/WEB_LISTING/0/...,c443c9b863767f0facd1,13,2128.0,flat,rent,78.0,2,1,...,False,False,False,False,False,False,False,False,False,False
2,25488178,https://img3.idealista.com/blur/WEB_LISTING/0/...,HD15 2-77,57,5900.0,flat,rent,360.0,5,5,...,True,False,False,True,False,True,False,False,False,False
3,103288044,https://img3.idealista.com/blur/WEB_LISTING/0/...,,14,2299.0,flat,rent,85.0,2,2,...,False,False,False,True,False,False,False,False,False,False
4,103030201,https://img3.idealista.com/blur/WEB_LISTING/0/...,20231024-FV,24,6000.0,flat,rent,384.0,6,5,...,False,False,False,True,False,True,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
245,100627014,https://img3.idealista.com/blur/WEB_LISTING/0/...,Skyline VI,13,2160.0,flat,rent,66.0,2,2,...,False,False,False,True,False,False,False,False,False,False
246,100930829,https://img3.idealista.com/blur/WEB_LISTING/0/...,Skyline V,16,2160.0,flat,rent,55.0,1,1,...,False,False,False,True,False,False,False,False,False,False
247,100097037,https://img3.idealista.com/blur/WEB_LISTING/0/...,Skyline I,11,2700.0,flat,rent,85.0,2,2,...,False,False,False,True,False,False,False,False,False,False
248,100931418,https://img3.idealista.com/blur/WEB_LISTING/0/...,Skyline VIII,13,2160.0,flat,rent,50.0,1,1,...,False,False,False,True,False,False,False,False,False,False


In [46]:
#combining up to 90
data_1_90 = pd.concat([data_1_85, data_86_90], ignore_index=True)
data_1_90

Unnamed: 0,propertyCode,thumbnail,externalReference,numPhotos,price,propertyType,operation,size,rooms,bathrooms,...,hasPlan,has3DTour,has360,hasLift,hasStaging,luxuryType,villaType,superTopHighlight,topNewDevelopment,topPlus
0,103241997,https://img3.idealista.com/blur/WEB_LISTING/90...,ACCM,21,1350.0,flat,rent,55.0,1,2,...,True,False,False,True,False,False,False,False,False,False
1,101227123,https://img3.idealista.com/blur/WEB_LISTING/0/...,3d26340f0da112d7aafd,15,1743.0,studio,rent,25.0,0,1,...,True,False,False,True,False,False,False,False,False,False
2,458306,https://img3.idealista.com/blur/WEB_LISTING/0/...,,24,1595.0,flat,rent,98.0,2,2,...,False,False,False,True,False,False,False,False,False,False
3,99586539,https://img3.idealista.com/blur/WEB_LISTING/0/...,2392,22,2400.0,flat,rent,57.0,1,1,...,False,False,False,True,False,False,False,False,False,False
4,102075463,https://img3.idealista.com/blur/WEB_LISTING/0/...,120413,23,1488.0,flat,rent,42.0,1,1,...,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4495,100627014,https://img3.idealista.com/blur/WEB_LISTING/0/...,Skyline VI,13,2160.0,flat,rent,66.0,2,2,...,False,False,False,True,False,False,False,False,False,False
4496,100930829,https://img3.idealista.com/blur/WEB_LISTING/0/...,Skyline V,16,2160.0,flat,rent,55.0,1,1,...,False,False,False,True,False,False,False,False,False,False
4497,100097037,https://img3.idealista.com/blur/WEB_LISTING/0/...,Skyline I,11,2700.0,flat,rent,85.0,2,2,...,False,False,False,True,False,False,False,False,False,False
4498,100931418,https://img3.idealista.com/blur/WEB_LISTING/0/...,Skyline VIII,13,2160.0,flat,rent,50.0,1,1,...,False,False,False,True,False,False,False,False,False,False


In [47]:
data_1_90['propertyCode'].value_counts(dropna = False)

95803117     2
100686270    2
100684631    2
101171581    2
100686146    2
            ..
100592666    1
102810920    1
101755619    1
103303152    1
100426560    1
Name: propertyCode, Length: 4493, dtype: int64

In [48]:
#saving 1-90
data_1_90.to_csv("data_1_90.csv", index=False)

### Further steps continue in a separate notebook: Rental_price_barometer2_EDA_modelling