# Importing data from BBR into a relational database

This document shows how to retrieve data from the Danish building registry, BBR, and insert it into a relational database for easier use. <br>

As prerequisites, you need to be able to run Python and have a PostgresQL database ready with a table to receive the BBR data.

# 1) Obtaining the data

1. Go to https://datafordeler.dk/ <br>
Click on "log in" and create a new web user. You should get a user name and password to access datafordeler. <br>
<br>
2. With your username and password, log in to the datafordeler self-service to retrieve data: <br> https://selfservice.datafordeler.dk/ <br>
<br>
3. It's not intuitive, but your account is linked to several "users", each with different permissions. Check the Users tab (Brugere) - if you only have the user "Webbruger", you need to create a new one. Click on the + tab and create a service user with the "user name and access code" method.<br>
<br>
4. You are now ready to request public data on datafordeler. Go to the Downloads tab (Filudtræk). You should see an empty field - that's because you haven't requested data yet. To get access to data, you need to create a download. You have three choices:<br> - Clicking Opret will allow you to create a permanent download button, that is kept up to date and that you can use multiple times.<br> - Clicking Download will allow you to request a one-time download of the dataset.<br> - Clicking Predefined will allow you to download a dataset with a fixed set of parameters (instead of customizing everything). In particular, you can use Predefined to download the BBR dataset with only up-to-date entries, in JSON or XML format. <br>
<br>
5. You should now see a list of all available downloads. Give your download a name (Visningsnavn) and select BBR Totaludtræk in the list (or BBR Aktuelt Totaludtræk if using Predefined). Click Next. If you chose Opret or Download, you can now adjust a lot of parameters, such as downloading entries for only a specific municipality. If you used Predefined, the parameters are locked.<br><br>

6. Click Save (Gem). You will be taken back to the Download tabs. If you used Opret or Predefined, you should see your data subscription there. You can modify or delete it if you don't think that you will need to download it again in the future. You will receive an email with information on how to get your data.<br>
<br>
7. Actually getting the data is a bit tricky: you cannot download it from Datafordeler directly. You need to use a FTP client like https://filezilla-project.org/ <br>
Download and install FileZilla. When you launch it, enter the address provided in the email you got from Datafordeler, as well as your Datafordeler service user number (*not* your initial username: this is the user number you created in step 3) and password. Click Connect, and you should finally be able to see and download your files! 



### Note on file format

Because BBR is a very large dataset (if you download the whole thing), your computer will run out of memory when trying to parse it in one chunk. For this reason, we want to parse the file iteratively. Doing this with an XML file is relatively straightforward on Python, but takes a lot of time. Working with a JSON file requires installing a specific package, *ijson*, to allow for iterative parsing - but it will be faster. In this notebook, we will work with the JSON format.

# 2) Inserting values in the PostgresQL database

Before retrieving values from the database, we need to define a function that will process these values and insert them into the database. Here the idea is to write a function that inserts one row into the database. It takes as input a Python dictionary where values of various parameters for the building are recorded. You need to adjust the SQL code to fit the names of your table and columns.

### Setup

In [None]:
import psycopg as pg # Package to communicate between Python and PostgresQL

In [None]:
dbparams='dbname=macrocomponents user=postgres password=mypassword' # Write the parameters to connect to your PostgresQL database here.

In [None]:
# Write here the parameters you want to retrieve from the BBR data file. These need to be the same names as in BBR, in Danish. See the end of this notebook for a full list of parameters.
parameters=['id_lokalId','kommunekode', 'byg007Bygningsnummer', 'jordstykke', 'grund', 'husnummer', 'byg404Koordinat', 'byg026Opførelsesår', 'byg027OmTilbygningsår', 'byg021BygningensAnvendelse', 'byg041BebyggetAreal', 'byg038SamletBygningsareal', 'byg039BygningensSamledeBoligAreal', 'byg040BygningensSamledeErhvervsAreal', 'byg042ArealIndbyggetGarage', 'byg043ArealIndbyggetCarport', 'byg044ArealIndbyggetUdhus', 'byg045ArealIndbyggetUdestueEllerLign', 'byg046SamletArealAfLukkedeOverdækningerPåBygningen', 'byg047ArealAfAffaldsrumITerrænniveau', 'byg048AndetAreal',  'byg049ArealAfOverdækketAreal', 'byg032YdervæggensMateriale', 'byg050ArealÅbneOverdækningerPåBygningenSamlet', 'byg051Adgangsareal',  'byg054AntalEtager', 'byg055AfvigendeEtager', 'byg033Tagdækningsmateriale', 'byg034SupplerendeYdervæggensMateriale', 'byg035SupplerendeTagdækningsMateriale', 'byg036AsbestholdigtMateriale', 'byg056Varmeinstallation', 'byg057Opvarmningsmiddel', 'byg058SupplerendeVarme', 'byg071BevaringsværdighedReference', 'byg130ArealAfUdvendigEfterisolering', 'byg150Gulvbelægning', 'byg151Frihøjde']

### Insertion function

In [None]:
def insert_bbr_from_dict(row_dict):
    # We first convert the dictionary into a tuple, since the PostgresQL insertion function takes a tuple as input
    # The dictionary keys are the names of parameters from BBR, in Danish
    l=list()
    ks=row_dict.keys()
    for k in ks:
        l.append(row_dict[k])
    row_tuple=tuple(l)
    
    # Building the SQL query to insert values in the database.     
    sql ="INSERT INTO buildings("
    for k in ks:
        sql+=k+', '
    sql=sql[0:len(sql)-2]+') VALUES('
    for n in range(len(ks)):
        sql+='%s, '
    sql=sql[0:len(sql)-2]+') ON CONFLICT ON CONSTRAINT buildings_pkey DO UPDATE SET ('
    for k in ks:
        sql+=k+', '
    sql=sql[0:len(sql)-2]+') = ('
    for k in ks:
        sql+='EXCLUDED.'+k+', '
    sql=sql[0:len(sql)-2]+');'
    
    connector = None
    bbrid = None
    
    try:
        # connect to the PostgreSQL database
        connector = pg.connect(dbparams)
        # create a new cursor
        cur = connector.cursor()
        # execute the INSERT statement
        cur.execute(sql, row_tuple)
        # commit the changes to the database
        connector.commit()
        # close communication with the database
        cur.close()
        
    except (Exception, pg.DatabaseError) as error:
        print(row_tuple)
        print(error)
        
    finally:
        if connector is not None:
            connector.close()

Note: it is possible to write a slightly simpler function that inserts all rows at once, using the *executemany* method instead of *execute*. However, that requires building a dictionary with all rows first - and we want to avoid this due to the size of the dataset.

# 3) Retrieving values from the database

Python has a very easy to use *json* package to read JSON files and convert them into Python objects. However, the json package works by loading the entire JSON file into memory before processing it. With a very large file like BBR, this will likely cause errors and crashes. Instead, we want to parse BBR iteratively using the ijson package.

### Setup

In [None]:
import ijson # package to parse JSON iteratively 

In [None]:
jsonfile='C:/Users/KJ35FA/Documents/BBR_Aktuelt_Totaludtraek_JSON_20221107180020.json' #Write the location of your JSON file here.

### Parsing the JSON file

The *ijson kvitems* method reads the JSON file as a stream, without loading it all into memory. It records all objects found with a specific path (prefix). The following function will parse iteratively through the JSON file, and record the parameters we're interested in. When a new building is encountered, existing parameters are inserted into the database and their values are cleared to record values for the new building. 

As of November 2022, BBR includes around 5 800 000 building records. If we import the entire BBR database by running this function on a regular desktop or laptop, the function will run for several days straight. This makes it sensitive to crashes. Therefore, as a safety, the function also records the id of the last building it inserted in the database. In case of crash, the function can then be run again with *last_building_recorded* as an input parameter: it will then ignore all buildings when parsing the JSON file until it reaches the last building previously recorded. This considerably speeds up the new run.

It is important that the parameter names we give as inputs to the function are written in the same way as in the JSON record, or they won't be recognized. We include *encoding='utf8'* to recognize the letters ø, æ and å.

In [None]:
def retrieve_values(parameters,jsonfile,last_recorded_id=''):
    jsondata = open(jsonfile, encoding='utf8')
    items = ijson.kvitems(jsondata, 'BygningList.item')
    building_dict=dict()
    isNewBuilding=False # Have we already recorded this building in a previous (unfinished) run?
    
    if last_recorded_id=='':
        isNewBuilding=True # By default we assume that no building has previously been recorded.
    
    for param, value in items: # Parse the json file, reading the name and value of each parameter for each building
        
        if param == 'forretningshændelse': # This is the first parameter in the JSON file for each building, so it indicates the start of a new building.
            
            # Insert previously recorded values, if we have not recorded this building before:
            if len(building_dict.values())>0:
                if isNewBuilding:
                    insert_bbr_from_dict(building_dict)         
                    
                    # Save the recorded building's id to start again from there in case the program crashes (use it as the last_recorded_id parameter for the next run).
                    last_building_recorded=building_dict['id_lokalId']
                    
                # If the current id is equal to the last_recorded_id from a previous run, all buildings read after this point must be recorded
                elif building_dict['id_lokalId']==last_recorded_id:
                    isNewBuilding=True
                
            # Reset the building dictionary to record values for the next building:
            building_dict=dict()

        elif param in parameters:
            # If the parameter we're reading is on the list of parameters we're interested in, record it.
            building_dict[param]=value


Whenever we use the *ijson kvitems* method, we need to open the data file again to "reset" it. In other words, each occurence of *ijson.kvitems* must be preceded by *jsondata = open(jsonfile)*. This is the case in the function above - but good to know if you need to parse the file again later.

Now all we need to do is call the function we just defined.

In [None]:
retrieve_values(parameters, jsonfile, last_recorded_id='')

_______________________________________

# Appendix: list of BBR parameters

The following function will print all BBR parameters for buildings:

In [5]:
jsondata = open(jsonfile, encoding='utf8')
buildings = ijson.kvitems(jsondata, 'BygningList.item')
l=list()
for param, value in buildings:
    if param in l:
        for p in l:
            print(p)
        break
    else:
        l.append(param)

forretningshændelse
forretningsområde
forretningsproces
id_namespace
id_lokalId
kommunekode
registreringFra
registreringsaktør
registreringTil
virkningFra
virkningsaktør
virkningTil
status
byg007Bygningsnummer
byg021BygningensAnvendelse
byg024AntalLejlighederMedKøkken
byg025AntalLejlighederUdenKøkken
byg026Opførelsesår
byg027OmTilbygningsår
byg029DatoForMidlertidigOpførtBygning
byg030Vandforsyning
byg031Afløbsforhold
byg032YdervæggensMateriale
byg033Tagdækningsmateriale
byg034SupplerendeYdervæggensMateriale
byg035SupplerendeTagdækningsMateriale
byg036AsbestholdigtMateriale
byg037KildeTilBygningensMaterialer
byg038SamletBygningsareal
byg039BygningensSamledeBoligAreal
byg040BygningensSamledeErhvervsAreal
byg041BebyggetAreal
byg042ArealIndbyggetGarage
byg043ArealIndbyggetCarport
byg044ArealIndbyggetUdhus
byg045ArealIndbyggetUdestueEllerLign
byg046SamletArealAfLukkedeOverdækningerPåBygningen
byg047ArealAfAffaldsrumITerrænniveau
byg048AndetAreal
byg049ArealAfOverdækketAreal
byg050ArealÅbneO

The following function will print all BBR parameters for floors (for information):

In [6]:
jsondata = open(jsonfile, encoding='utf8')
floors = ijson.kvitems(jsondata, 'EtageList.item')
l=list()
for param, value in floors:
    if param in l:
        for p in l:
            print(p)
        break
    else:
        l.append(param)

forretningshændelse
forretningsområde
forretningsproces
id_namespace
id_lokalId
kommunekode
registreringFra
registreringsaktør
registreringTil
virkningFra
virkningsaktør
virkningTil
status
eta006BygningensEtagebetegnelse
eta020SamletArealAfEtage
eta021ArealAfUdnyttetDelAfTagetage
eta022Kælderareal
eta023ArealAfLovligBeboelseIKælder
eta024EtagensAdgangsareal
eta025Etagetype
eta026ErhvervIKælder
eta500Notatlinjer
bygning
