index.qmd

---
title: "Housing ownership"
author: "James Goldie"
format: html
---

```{r}
#| label: setup

library(tidyverse)
library(readxl)
library(corrr)
library(here)

dir.create(here("out"), showWarnings = FALSE)
dir.create(here("out", "tenure-by-country-and-income"), showWarnings = FALSE)
```

There're a few interesting sheets in this workbook. All of the indicators are annual and by country:

- `HM1.3.A3` is the share of home-owner households in each income quintile (regardless of whether the home owned is mortgaged or owned outright). It can be used to answer questions like, "Are the people with the least income houses less than they used to?"
- `HM1.3.A4` breaks proportions down across _every_ type of housing tenure, not just the two types of home ownership. If the previous indicator shows a change in home ownership within an income bracket, this indicator could tell us whether they are going to rents or to unknown tenure. If it doesn't show a change, this could tell us whether mortgages are shifting to full ownership, or vice-versa.
- `HM1.3.A5` reframes the proportions from - `HM1.3.A4`: instead of the proportion of tenure types within an income bracket, it shows the proportion of income brackets within a tenure type. 

```{r}
#| label: download
oecd_tenure_path <- here("data", "oced-housing-tenture.xlsx")
download.file(
  "https://www.oecd.org/els/family/HM1.3-Housing-tenures.xlsx",
  oecd_tenure_path)
```

# Tenure mode across all incomes

```{r}
#| label: import-tenure-all

# import the id cols separately
oecd_tenure_path |>
  read_excel(sheet = "HM1.3.A1", range = "A4:B209", na = "..",
    col_types = "text") |>
  fill(Country, .direction = "down") |>
  rename(tenure_mode = "...2") ->
tenure_all_header
  
# now import the spreadsheet body, bolt the id cols on and pivot
oecd_tenure_path |>
  read_excel(sheet = "HM1.3.A1", range = "C4:N209", na = "..",
    col_types = "numeric") |>
  bind_cols(x = tenure_all_header, y = _) |>
  pivot_longer(cols = where(is.numeric), names_to = "year",
    values_to = "value") ->
tenure_all
```

```{r}
#| label: vis-tenure-all
tenure_all |>
  write_csv(here("data", "tenure-allincomes.csv")) |>
  mutate(year = ymd(paste0(year, "-01-01"))) |>
  ggplot() +
    aes(x = year, y = value, fill = tenure_mode) +
    geom_area(alpha = 0.75) +
    facet_wrap(vars(Country)) +
    scale_fill_brewer(palette = "Set1") +
    scale_y_continuous(labels = scales::label_percent(scale = 1)) +
    theme_minimal() +
    labs(title = "Mode of tenure")

tenure_all |>
  filter(year < 2021, !is.na(value)) |>
  arrange(Country, year, tenure_mode) |>
  pivot_wider(names_from = tenure_mode, values_from = value) |>
  write_csv(here("data", "tenure-allincomes-wide.csv"))

```

# Tenure mode by income

Let's look at `HM1.3.A4`. This will allow us to focus on a particular income bracket and see how its housing tenures are changing.

This sheet needs some cleanup, as it has a grouped header (income quintile and then year). We'll import that separately and add the merged headers back in once we have the body.

```{r}
#| label: import-tenure-by-income

# first, we'll load the grouped header, fuse the merged cells and glue it all
oecd_tenure_path |>
  read_excel(sheet = "HM1.3.A4", skip = 2, n_max = 2, col_names = FALSE) |>
  t() |>
  as_tibble(.name_repair = "universal") |>
  slice(-(1:2)) |>
  fill(...1, .direction = "down") |>
  filter(!is.na(...2)) |>
  mutate(merged_header = paste(...1, ...2, sep = "_")) ->
tenure_by_income_header

# now import the spreadsheet proper, dropping the blank columns (which come in
# as logical) and reconstructing the header. then we can pivot the quintile and
# year back out
# NOTE - needs adjustment if number of countries changes in the future
oecd_tenure_path |>
  read_excel(sheet = "HM1.3.A4", skip = 3, na = "..", n_max = 200) |>
  select(-where(is.logical)) |>
  set_names(
    c("country", "tenure_type", tenure_by_income_header$merged_header)) |>
  fill(country, .direction = "down") |>
  pivot_longer(
    cols = -c(country, tenure_type),
    names_to = c("income", "year"),
    names_sep = "_") |>
  mutate(
    tenure_type = factor(tenure_type,
      levels = c("Rent (subsidized)", "Rent (private)",
        "Owner with mortgage", "Own outright",  "Other, unknown")),
    income = factor(income,
      levels = c("Bottom Quintile", "2nd Quintile", "3rd Quintile",
        "4th Quintile", "Top Quintile"),
      ordered = TRUE),
    year = as.integer(year),
    value = value / 100) ->
tenure_by_income

tenure_by_income |>
  write_csv(here("data", "tenure-byincome.csv"))
```

Let's focus in on the bottom quintile:

```{r}
#| label: bottom-quntile

tenure_by_income |>
  filter(income == "Bottom Quintile") |>
  ggplot() +
    aes(x = year, y = value, fill = tenure_type) +
    geom_area(alpha = 0.75) +
    facet_wrap(vars(country)) +
    scale_fill_brewer(palette = "Set1") +
    scale_y_continuous(labels = scales::label_percent()) +
    theme_minimal() +
    labs(title = "Tenure type for bottom 20% of people by income")

```

There's a lot going on here, but it's just an exploratory graphic! A few things I see:

- Sudden changes in the rate of subsidised rentals in several countries like France and Germany. This probably indicates policy changes in those schemes. They generally trade with non-subsidised rents, which makes sense.
- Italy looks like it's seen some growth in outright ownership in recent years, but since it comes from "Other (unknown)", this could just be a methodological change (ie. some sales data was missing).
- Similar situation with Norway, where it looks like (apart from some perhaps time-limited trials of subsidised rents) some rental data recently became available.
- Slovakia has seen substantial growth in its mortgage market—but coming from outright ownership, not rentals.
- UK and US both have outright ownership replacing rentals to a small extent. Colombia has this to a larger extent. (This could be a move from owner-occupied houses to investment properties?)
- Sweden has outright ownership replacing mortgages

```{r}
#| label: all-quintiles-australia

tenure_by_income |>
  filter(country == "Australia") |>
  ggplot() +
    aes(x = year, y = value, colour = tenure_type) +
    geom_point(size = 1) +
    geom_line(linetype = 1, size = 0.5) +
    # geom_smooth(method = "lm", se = FALSE) +
    facet_wrap(vars(income)) +
    scale_color_brewer(palette = "Set1") +
    scale_y_continuous(labels = scales::label_percent()) +
    theme_minimal(base_family = "Libre Franklin") +
    theme(
      legend.position = "top", legend.direction = "horizontal"
    ) +
    labs(title = "Tenure type in Australia, broken down by income group")

```
In Australia:

- For all income groups (incl. poorest 20%), rents and mortgages trade against each other: when one goes up, the other goes down, and vice-versa.
- For middle and higher income, outright ownership has been declining over the last decade, boosting mortgage rates. I think this reflects lengthening mortgages.
- Outright ownership has not changed in the poorest 20%. I think these are retirees who enter this income group when they retire (ie. people are in other income groups when they’re paying off the mortgage, then they go to low income after it’s paid off)

```{r}
#| label: tenure-correlations
tenure_by_income |>
  pivot_wider(
    names_from = tenure_type,
    values_from = value) |>
  select(-year) |>
  nest(.by = c(country, income), .key = "tenures") |>
  mutate(corr = map(tenures, correlate)) ->
  # mutate(corr = map(tenures, ~ correlate(.x) |> shave())) ->
tenure_correlations

tenure_correlations |>
  select(-tenures) |>
  unnest_longer(corr) |>
  unpack(corr) |>
  rename(tenure_x = term) |>
  pivot_longer(
    -c(country, income, tenure_x),
    names_to = "tenure_y",
    values_to = "corr") ->
tenure_corr_tidy

tenure_by_income |>
  write_csv(here("data", "tenure-income-correlations.csv"))
```

```{r}
#| label: tenure-correlations-vis

tenure_corr_tidy |>
  pull(country) |>
  unique() |>
  length() ->
n_countries

tenure_corr_tidy |>
  # filter(country == "Australia") |>
  # mutate(across(starts_with("tenure"), str_replace, " ", "\n")) |>
  mutate(tenure_y = str_replace(tenure_y, " ", "\n")) |>
  ggplot() +
    aes(x = tenure_x, y = tenure_y, colour = corr, size = abs(corr)) +
    geom_point() +
    facet_grid(rows = vars(country), cols = vars(income)) +
    # facet_wrap(vars(income), ncol = 2) +
    scale_x_discrete(position = "top", labels = scales::label_wrap(15)) +
    scale_y_discrete(labels = scales::label_wrap(15)) +
    scale_colour_fermenter(type = "div", palette = "RdBu", direction = 1,
      limits = c(-1, 1), breaks = seq(-1, 1, by = 0.2)) +
    guides(size = guide_none()) +
    theme_minimal() +
    theme(
      legend.direction = "horizontal",
      legend.position = "top",
      legend.key.width = unit(0.075, "npc"),
      axis.text.x = element_text(angle = 90),
      panel.border = element_rect(colour = "grey", fill = NA)) +
    labs(
      x = NULL, y = NULL, colour = "Correlation"
    ) ->
tenure_corr_plot

ggsave(
  here("out", "housing-tenure-income-global.pdf"),
  plot = tenure_corr_plot,
  width = 5, height = n_countries + 1, dpi = 150, units = "in", scale = 2,
  limitsize = FALSE)
ggsave(
  here("out", "housing-tenure-income-global.png"),
  plot = tenure_corr_plot,
  width = 5, height = n_countries + 1, dpi = 150, units = "in", scale = 2,
  limitsize = FALSE)
```

Now I feel like we're getting toward a place where we can compare countries. Perhaps it'd be helpful if we could interactively compare them:

```{r}
#| label: network-plots

# create and save a network plot for each country and income bracket
tenure_correlations |>
  select(-tenures) |>
  pwalk(function(country, income, corr) {
    corr |>
      mutate(across(where(is.numeric), ~ replace_na(.x, 0))) ->
    corr_fix
    
    # modify the network plot to use binner 
    corr_plot <-
      network_plot(corr_fix) +
      scale_colour_fermenter(
        type = "div",
        palette = "RdBu",
        direction = 1,
        limits = c(-1, 1),
        breaks = seq(-1, 1, by = 0.2)) +
      scale_size_continuous(range = c(0.5, 3)) +
      guides(colour = guide_none())
      

    ggsave(
      here("out", "tenure-by-country-and-income",
        paste0(country, "-", income, ".png")),
      corr_plot,
      width = 600, height = 600, units = "px", dpi = 150, scale = 2)
  })


```