##**Makeover Monday | 2026 W2 The Biggest Housing Bubble Risks Globally**
Data came from the article publised by Visual Capitalist article "[Mapped: The Biggest Housing Bubble Risks Globally](https://www.visualcapitalist.com/sp/ter01-the-biggest-housing-bubble-risks-globally/)" (by Jenna Ross, Jennifer West and Zack Aboulazm), with data sourced from "UBS, collected through August 28 2025". The data used in this notebook and the final Tableau Dashboard was prepared by the [Makeover Monday](https://makeovermonday.co.uk/), for use in the Tableau Makeover Monday Data Visualization challenge.

* **Article Link**: https://www.visualcapitalist.com/sp/ter01-the-biggest-housing-bubble-risks-globally/
* **Compiled Data** *(published by Makeover Monday)*: https://data.world/makeovermonday/2026wk2-the-biggest-housing-bubble-risks-globally


Taken from the Article:

**Methodolgy:**
*What is a Real Estate Bubble?
A “bubble” is a large and long-term mispricing of an asset, which can only be identified in hindsight when the bubble bursts and prices plummet.
UBS examined five factors to gauge bubble risks:
* Home prices outpace local incomes
* Home prices rise faster than rents
* Mortgage lending expands too quickly
* Construction activity surges
* City prices far exceed national averages*

This notebook supports a Makeover Monday visualization exploring global housing bubble risk across major cities, based on Visual Capitalist data.  
The focus is on regional comparison, risk categorization, and explanatory context.

In [None]:
# !pip install geopy #install this library for getting your geocoding coordinates from City, State and Country



In [None]:
import pandas as pd
import pandas_gbq
import numpy as np

In [None]:
df = pd.read_excel('https://query.data.world/s/lttbm6noasqpyimt6qvtkbo4nv45cl?dws=00000')
df.head()

Unnamed: 0,Rank,City,Bubble Risk Category,Bubble Risk Score
0,1,Miami,High,1.7
1,2,Tokyo,High,1.6
2,3,Zurich,High,1.6
3,4,Los Angeles,Elevated,1.1
4,5,Dubai,Elevated,1.1


In [None]:
df.dtypes

Unnamed: 0,0
Rank,int64
City,object
Bubble Risk Category,object
Bubble Risk Score,float64


In [None]:
df.describe(include='all').T

Unnamed: 0,count,unique,top,freq,mean,std,min,25%,50%,75%,max
Rank,21.0,,,,11.0,6.204837,1.0,6.0,11.0,16.0,21.0
City,21.0,21.0,Miami,1.0,,,,,,,
Bubble Risk Category,21.0,4.0,Low,7.0,,,,,,,
Bubble Risk Score,21.0,,,,0.761905,0.505447,-0.1,0.3,0.8,1.1,1.7


##The Shape of the Data
What does the shape of this data tell us and what are we looking at when inspecting the 21 cities bubble risks?
The data set only includees **21** cities (as expected) and no missing data. Bubble risk score ranges from -0.1 to 1.7 (Miami), with the mean in the data set of 21 to be 0.76. There are 4 unique items in "Bubble Risk Category" and the category with the most items is "low" (with a frequency of 7) which mean that of the **21 cities**, 1/3 of them are in the "**Low**" category - but I also know from reading the article that 7 of them are also in "**Moderate**".
As far as "completeness of data" is concerned for pulling into Tableau - we could try and pull this in as is, but mapping based on City would required some manual maniplation and assigment, so instead I will add longitude and latitude using Geopy - that way we pull the geocoding data in nicely.

#Plan for cleaning and reshaping the Data : my plan for a Viz
In the next steps I will:
* Rename the columns to be all in lower snake case to be compatible with BigQuery and for consistency
* Enrich the Geogrpahical data with Geospatial data (latitude, longitude)

In [None]:
#BigQuery does not like (or accept) weird names for columns which are nice for human readable table.
## this step renames the columns to be compatible with BigQuery and easy for ingestion : remove spaces/special characters so exports + SQL are painless
df = df.rename(columns={
    "City": "city",
    "Bubble Risk Category": "bubble_risk_category",
    "Bubble Risk Score": "bubble_risk_score",
})

df.head()

Unnamed: 0,Rank,city,bubble_risk_category,bubble_risk_score
0,1,Miami,High,1.7
1,2,Tokyo,High,1.6
2,3,Zurich,High,1.6
3,4,Los Angeles,Elevated,1.1
4,5,Dubai,Elevated,1.1


In [None]:
#Using the geopy library I will now *attempt* to get the coordinates from the City, State_Region and Country
##See the documentation from Nominatim (nominatim.org) for further information
#Note: Geocoding uses Nominatim via geopy and may take a few minutes due to rate limits
# if this was a larger dataset or if I was running this often, I would consider doing it in batches (large data sets) and saving/caching result to a table in BigQuery or csv.

from geopy.geocoders import Nominatim
from geopy.extra.rate_limiter import RateLimiter

geolocator = Nominatim(user_agent="allison_mom_bubble_risk_cities")  # change string if you want
geocode = RateLimiter(
    geolocator.geocode,
    min_delay_seconds=1,      # be kind to the service
    max_retries=2,
    error_wait_seconds=2,
    swallow_exceptions=True
)

# Build a query string that adapts when state_region is missing
def build_query(row):
    # parts = [row.get("city"), row.get("state_region"), row.get("country")]
    parts = [row.get("city")]
    parts = [p for p in parts if pd.notna(p) and str(p).strip() != ""]
    return ", ".join(parts)

df["geo_query"] = df.apply(build_query, axis=1)

# Cache dictionary so reruns don't re-hit the API
cache = {}

def geocode_cached(q):
    if q in cache:
        return cache[q]
    loc = geocode(q)
    if loc is None:
        cache[q] = (None, None)
    else:
        cache[q] = (loc.latitude, loc.longitude)
    return cache[q]

df[["latitude", "longitude"]] = df["geo_query"].apply(
    lambda q: pd.Series(geocode_cached(q))
)

df[["geo_query", "latitude", "longitude"]].head(13)

Unnamed: 0,geo_query,latitude,longitude
0,Miami,25.774157,-80.193597
1,Tokyo,35.67686,139.763895
2,Zurich,47.374449,8.541042
3,Los Angeles,34.053691,-118.242766
4,Dubai,25.074282,55.188539
5,Amsterdam,52.37308,4.892453
6,Geneva,46.201756,6.146601
7,Toronto,43.653482,-79.383935
8,Sydney,-33.869844,151.208285
9,Madrid,40.416782,-3.703507


In [None]:
#Sanity Check to make sure you have All of your Latitudes and Longitudes
has_nulls = df[['latitude','longitude']].isnull().any()
print(has_nulls)

latitude     False
longitude    False
dtype: bool


#Save to BigQuery Table
That's it! Now all thats left to do it save the df to a dedicated table in my BigQuery warehouse.
This is my final step before pushing it to Google Sheets that I will use to connect to Tableau Public.

In [None]:
#convert the df to target table in bigquery dataet
UPLOAD_TO_BQ = False # set True when you actually want to write tables

if UPLOAD_TO_BQ:
  project_id = 'your_project_id'
  destination_table = 'your_dataset.another_new_table'

  pandas_gbq.to_gbq(
      dataframe=df,
      destination_table=destination_table,
      project_id=project_id,
      if_exists='replace' ## 'if_exists' options: 'fail', 'replace', 'append'
  )

100%|██████████| 1/1 [00:00<00:00, 7371.36it/s]


##**Tableau Public Dashboard**
Check out the final [Tableau Dashboard](https://public.tableau.com/app/profile/allison.jones/viz/MoM2026_w2RealEstateBubbleRisks/RealEstateBubbleRisks) that is a slightly different take on the graphic that was by the authors of the visual capitalist article in partnership with Terzo. Hopefully it is as clear and powerful. ![What does the 2025 Real Estate Market Tell us about potential Bubble Risks accross the globe](images/tableaudashboard_mom_2026_w2_Real Estate Bubble Risks.png)