-
Notifications
You must be signed in to change notification settings - Fork 5
Week 02 (W47 Nov16) London RE
When requesting the data from the zoopla api we ran into the issue that page numbers > 100 result in status code '400' and cannot be processed. That would mean we could only fetch data for the first 10k properties. As a workaround we included the postcode parameter in the url and are now requesting results per postcode. The list with all postcodes from London can be found at wikipedia. That will hopefully allow us to get all the data we need.
The final dataset can be found here: https://drive.google.com/open?id=0B2WhEEEx5z8zcmt6dXViTS11Sk0
The dataset contains 55.313 property instances with 35 attributes for each entry (there are almost no empty cells in the dataset). The attributes are the following:
- agent_address
- agent_logo
- agent_name
- agent_phone
- category (note: all properties are 'residential' and no 'commercial')
- country (note: all 'England')
- country_code (note: all 'gb')
- county (note: all 'London')
- latitude
- longitude
- outcode
- post_town (note: all 'London')
- street_name


- available_from_date
- description
- short_description
- details_url
- displayable_address
- letting_fees
- listing_status (note: all 'rent' and no 'sale')
- num_bathrooms
- num_bedrooms
- num_floors
- num_recepts
- price (price per week)
- rental_prices_accurate
- rental_prices_per_month
- rental_prices_per_week
- property_type
- status
- first_published_date
- last_published_date
- property_report_url
- The textual descriptions were present in html format. Since the tags would not have really served any actual purpose for the data mining, the data was cleaned up and all html tags were removed. This way there is now relatively lesser content to go over, improves readability and lowers chances of errors due to irrelevant text presence.
- around 1700 agents/offices in total were responsible for all of the listings in the dataset.