# Reading Rent Flats

This dataset is composed of multiple files where each file corresponds a set of flats of each capital in Spain. For instance, we have the rent flats in Lleida provided by Idealista and we can know different features how price, the location, etc.

Also, the dataset files have one format (json) because the information provided by Idealista is returned in json format.

In [1]:
import json
from pathlib import Path

import pandas as pd

## Reading JSON files

The files in this dataset are not *pure* JSON, they are `JSON files text format` \[1\]. Also known as newline-delimited JSON. JSON Lines is a convenient format for storing structured data that may be processed one record at a time (which seems pretty handy for flats data).

The data will be readed from data/raw directory.



In [15]:
DATA_PATH = Path('../data/raw')

Once we determined the path, now we are going to load the JSON file. For example, the rent flats of Lleida.

In [18]:
lleida_flats = DATA_PATH / 'Lleida_rent.json'

with open(lleida_flats) as f:
    data = json.load(f)

print(data[:1])

[{'propertyCode': '86254994', 'thumbnail': 'https://img3.idealista.com/blur/WEB_LISTING/0/id.pro.es.image.master/18/ef/f8/782172025.jpg', 'externalReference': '1587', 'numPhotos': 10, 'price': 550.0, 'propertyType': 'flat', 'operation': 'rent', 'size': 105.0, 'exterior': True, 'rooms': 3, 'bathrooms': 2, 'address': 'Avenida de Rosa Parks', 'province': 'Lleida', 'municipality': 'Lleida', 'district': 'Balafia', 'country': 'es', 'latitude': 41.6281639, 'longitude': 0.6294471, 'showAddress': False, 'url': 'https://www.idealista.com/inmueble/86254994/', 'distance': '2495', 'hasVideo': False, 'newDevelopment': False, 'hasLift': True, 'priceByArea': 5.0, 'detailedType': {'typology': 'flat'}, 'suggestedTexts': {'subtitle': 'Balafia, Lleida', 'title': 'Piso en Avenida de Rosa Parks'}, 'hasPlan': False, 'has3DTour': False, 'has360': False, 'topNewDevelopment': False}]


Now, we are going to load them as a `pandas.DataFrame`.

In [19]:
df = pd.DataFrame(data)
df.head()

Unnamed: 0,propertyCode,thumbnail,externalReference,numPhotos,price,propertyType,operation,size,exterior,rooms,...,priceByArea,detailedType,suggestedTexts,hasPlan,has3DTour,has360,topNewDevelopment,floor,status,parkingSpace
0,86254994,https://img3.idealista.com/blur/WEB_LISTING/0/...,1587,10,550.0,flat,rent,105.0,True,3,...,5.0,{'typology': 'flat'},"{'subtitle': 'Balafia, Lleida', 'title': 'Piso...",False,False,False,False,,,
1,90045751,https://img3.idealista.com/blur/WEB_LISTING/0/...,,14,650.0,flat,rent,95.0,True,4,...,7.0,{'typology': 'flat'},"{'subtitle': 'Universitat, Lleida', 'title': '...",False,False,False,False,7.0,good,"{'hasParkingSpace': True, 'isParkingSpaceInclu..."
2,89446908,https://img3.idealista.com/blur/WEB_LISTING/0/...,1029,27,640.0,flat,rent,96.0,True,3,...,7.0,{'typology': 'flat'},"{'subtitle': 'Cappont, Lleida', 'title': 'Piso...",False,False,False,False,1.0,good,
3,90027477,https://img3.idealista.com/blur/WEB_LISTING/0/...,SRB_ALTA_SRB_ALTAMIRA_24749,11,550.0,flat,rent,109.0,False,3,...,5.0,{'typology': 'flat'},"{'subtitle': 'Princep de Viana-Clot, Lleida', ...",False,False,False,False,,,
4,90009506,https://img3.idealista.com/blur/WEB_LISTING/0/...,14400,7,700.0,flat,rent,99.0,True,4,...,7.0,{'typology': 'flat'},"{'subtitle': 'Centre Històric, Lleida', 'title...",False,False,False,False,1.0,good,


We can merge all the content from different cities in spain into a same DataFrame.

In [26]:
import glob

file_list = glob.glob('../data/raw/*.json')

allFilesDict = {v:k for v, k in enumerate(file_list, 1)}

allFilesDict

{1: '../data/raw/Barna_rent.json',
 2: '../data/raw/Girona_rent.json',
 3: '../data/raw/Lleida_rent.json',
 4: '../data/raw/Valencia_rent.json',
 5: '../data/raw/Madrid_rent.json',
 6: '../data/raw/Mallorca_rent.json',
 7: '../data/raw/Zaragoza_rent.json',
 8: '../data/raw/Malaga_rent.json',
 9: '../data/raw/Tarragona_rent.json'}

In [28]:
data = []

for k,v in allFilesDict.items():
    with open(v, 'r') as d:
        jdata = json.load(d)
        if jdata:
            data.append(jdata)

df = pd.DataFrame(data)
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,2490,2491,2492,2493,2494,2495,2496,2497,2498,2499
0,"{'propertyCode': '88707408', 'thumbnail': 'htt...","{'propertyCode': '90050445', 'thumbnail': 'htt...","{'propertyCode': '89790054', 'thumbnail': 'htt...","{'propertyCode': '89813394', 'thumbnail': 'htt...","{'propertyCode': '90050238', 'thumbnail': 'htt...","{'propertyCode': '90050078', 'thumbnail': 'htt...","{'propertyCode': '39196118', 'thumbnail': 'htt...","{'propertyCode': '81451487', 'thumbnail': 'htt...","{'propertyCode': '90049959', 'thumbnail': 'htt...","{'propertyCode': '89209350', 'thumbnail': 'htt...",...,"{'propertyCode': '88129920', 'thumbnail': 'htt...","{'propertyCode': '39458483', 'thumbnail': 'htt...","{'propertyCode': '87958150', 'thumbnail': 'htt...","{'propertyCode': '87957563', 'thumbnail': 'htt...","{'propertyCode': '87755061', 'thumbnail': 'htt...","{'propertyCode': '87754964', 'thumbnail': 'htt...","{'propertyCode': '87733061', 'thumbnail': 'htt...","{'propertyCode': '87462782', 'thumbnail': 'htt...","{'propertyCode': '87303659', 'thumbnail': 'htt...","{'propertyCode': '86979694', 'thumbnail': 'htt..."
1,"{'propertyCode': '90047939', 'thumbnail': 'htt...","{'propertyCode': '90045373', 'thumbnail': 'htt...","{'propertyCode': '90043757', 'thumbnail': 'htt...","{'propertyCode': '38453848', 'thumbnail': 'htt...","{'propertyCode': '86398762', 'thumbnail': 'htt...","{'propertyCode': '89812878', 'thumbnail': 'htt...","{'propertyCode': '90036228', 'thumbnail': 'htt...","{'propertyCode': '90036014', 'thumbnail': 'htt...","{'propertyCode': '27300814', 'thumbnail': 'htt...","{'propertyCode': '90019463', 'thumbnail': 'htt...",...,,,,,,,,,,
2,"{'propertyCode': '86254994', 'thumbnail': 'htt...","{'propertyCode': '90045751', 'thumbnail': 'htt...","{'propertyCode': '89446908', 'thumbnail': 'htt...","{'propertyCode': '90027477', 'thumbnail': 'htt...","{'propertyCode': '90009506', 'thumbnail': 'htt...","{'propertyCode': '90003319', 'thumbnail': 'htt...","{'propertyCode': '88121348', 'thumbnail': 'htt...","{'propertyCode': '89912836', 'thumbnail': 'htt...","{'propertyCode': '89976558', 'thumbnail': 'htt...","{'propertyCode': '89944817', 'thumbnail': 'htt...",...,,,,,,,,,,
3,"{'propertyCode': '90050626', 'thumbnail': 'htt...","{'propertyCode': '26385635', 'thumbnail': 'htt...","{'propertyCode': '90050455', 'thumbnail': 'htt...","{'propertyCode': '86825567', 'thumbnail': 'htt...","{'propertyCode': '90050143', 'thumbnail': 'htt...","{'propertyCode': '81941300', 'thumbnail': 'htt...","{'propertyCode': '90050102', 'thumbnail': 'htt...","{'propertyCode': '90049883', 'thumbnail': 'htt...","{'propertyCode': '90049882', 'thumbnail': 'htt...","{'propertyCode': '84752845', 'thumbnail': 'htt...",...,,,,,,,,,,
4,"{'propertyCode': '90050942', 'thumbnail': 'htt...","{'propertyCode': '1196489', 'thumbnail': 'http...","{'propertyCode': '88852928', 'thumbnail': 'htt...","{'propertyCode': '85712217', 'thumbnail': 'htt...","{'propertyCode': '37964690', 'thumbnail': 'htt...","{'propertyCode': '88333780', 'thumbnail': 'htt...","{'propertyCode': '30076038', 'thumbnail': 'htt...","{'propertyCode': '29827272', 'thumbnail': 'htt...","{'propertyCode': '82439014', 'thumbnail': 'htt...","{'propertyCode': '28725706', 'thumbnail': 'htt...",...,,,,,,,,,,
