Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider VRK:n rakennusten osoitetiedot ja äänestysalueet -data #13

Open
muuankarski opened this issue Aug 7, 2019 · 2 comments
Open
Assignees
Labels
documentation Improvements or additions to documentation enhancement New feature or request

Comments

@muuankarski
Copy link
Collaborator

muuankarski commented Aug 7, 2019

Väestörekisterikeskus publishes annually data containing all buildings in Finland. Data is zipped delimited file with .OPT-extension and has 3,6 million rows. It can be read and processed in R (slowly) with following code:

# 2019
library(dplyr)
library(sp)
library(sf)
tmpfile <- tempfile()
tmpdir <- tempdir()
download.file("https://www.avoindata.fi/data/dataset/cf9208dc-63a9-44a2-9312-bbd2c3952596/resource/ae13f168-e835-4412-8661-355ea6c4c468/download/suomi_osoitteet_2019-05-15.zip",
              destfile = tmpfile)
unzip(zipfile = tmpfile,
      exdir = tmpdir)

opt <- read.csv(glue::glue("{tmpdir}/Suomi_osoitteet_2019-05-15.OPT"), 
                sep = ";", 
                stringsAsFactors = FALSE, 
                header = FALSE)

names(opt) <- c("rakennustu","sijaintiku",
                "sijaintima","rakennusty",
                "CoordY","CoordX",
                "osoitenume", "katunimi_f",
                "katunimi_s", "katunumero",
                "postinumer", "vaalipiirikoodi",
                "vaalipiirinimi","tyhja",
                "idx", "date")
if (F){ # subsetting just to make conversions faster
opt_orig <- as_tibble(opt)
opt <- sample_n(opt_orig, size = 2000)
}

opt$katunimi_f <- iconv(opt$katunimi_f, from = "windows-1252", to = "UTF-8")
opt$katunimi_s <- iconv(opt$katunimi_s, from = "windows-1252", to = "UTF-8")
opt$katunumero <- iconv(opt$katunumero, from = "windows-1252", to = "UTF-8")
opt$vaalipiirinimi <- iconv(opt$vaalipiirinimi, from = "windows-1252", to = "UTF-8")

sp.data <- SpatialPointsDataFrame(opt[, c("CoordX", "CoordY")], 
                                  opt, 
                                  proj4string = CRS("+init=epsg:3067"))

# Project the spatial data to lat/lon
# sp.data <- spTransform(sp.data, CRS("+proj=longlat +datum=WGS84"))

shape <- st_as_sf(sp.data)

st_coordinates(shape)

# shape %>% select(rakennustu) %>% plot()

saveRDS(shape, file=paste0("./sf19_buildings.RDS"))

Any ideas how to incorporate this with geofi. It is useful for instance when geocoding sensitive addresses.

However, this would require a storage as the data should be preprocessed. Do you think this as a suitable data for geofi and should we create a data repo such as geofi_data?

@muuankarski muuankarski added this to To do in Data Aug 7, 2019
@muuankarski
Copy link
Collaborator Author

Created a new branch and wrote simple function & tutorial example here:
b67fcd3

@antagomir antagomir added documentation Improvements or additions to documentation enhancement New feature or request labels Jan 20, 2020
@muuankarski
Copy link
Collaborator Author

stale branch feature-vrk-building removed

content below:

#' @title Get geospatial data with all buildings and electoral districts from Väestörekisterikeskus
#' @description preprocessed geospatial data sf-objects
#' @author Markus Kainu <markus.kainu@kela.fi>
#' @return sf-object
#' @export
#' @examples
#'  \dontrun{
#'  f <- get_buildings()
#'  plot(f)
#'  }
#'
#' @rdname get_buildings
#' @export

get_buildings <- function(){
  
  library(dplyr)
  library(sp)
  library(sf)
  tmpfile <- tempfile()
  tmpdir <- tempdir()
  download.file("https://www.avoindata.fi/data/dataset/cf9208dc-63a9-44a2-9312-bbd2c3952596/resource/ae13f168-e835-4412-8661-355ea6c4c468/download/suomi_osoitteet_2019-08-15.zip",
                destfile = tmpfile)
  unzip(zipfile = tmpfile,
        exdir = tmpdir)
  
  opt <- read.csv(glue::glue("{tmpdir}/suomi_osoitteet_2019-08-15.OPT"), fileEncoding = "latin1",
                  sep = ";", 
                  # nrows = 50000,
                  stringsAsFactors = FALSE, 
                  header = FALSE)
  
  names(opt) <- c("rakennustu","sijaintiku",
                  "sijaintima","rakennusty",
                  "CoordY","CoordX",
                  "osoitenume", "katunimi_f",
                  "katunimi_s", "katunumero",
                  "postinumer", "vaalipiirikoodi",
                  "vaalipiirinimi","tyhja",
                  "idx", "date")

  sp.data <- SpatialPointsDataFrame(opt[, c("CoordX", "CoordY")], 
                                    opt, 
                                    proj4string = CRS("+init=epsg:3067"))
  
  # Project the spatial data to lat/lon
  # sp.data <- spTransform(sp.data, CRS("+proj=longlat +datum=WGS84"))
  
  shape <- st_as_sf(sp.data)
  
  # st_coordinates(shape)
  
  # shape %>% select(rakennustu) %>% plot()
  
  return(shape)
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
Data
  
To do
Tutorials
  
Awaiting triage
Development

No branches or pull requests

2 participants