# Importing data from BBR into a relational database

This document shows how to retrieve data from the Danish building registry, BBR, and insert it into a relational database for easier use. <br>

As prerequisites, you need to be able to run Python and have a PostgresQL database ready with a table to receive the BBR data.

# 1) Obtaining the data

1. Go to https://datafordeler.dk/ <br>
Click on "log in" and create a new web user. You should get a user name and password to access datafordeler. <br>
<br>
2. With your username and password, log in to the datafordeler self-service to retrieve data: <br> https://selfservice.datafordeler.dk/ <br>
<br>
3. It's not intuitive, but your account is linked to several "users", each with different permissions. Check the Users tab (Brugere) - if you only have the user "Webbruger", you need to create a new one. Click on the + tab and create a service user with the "user name and access code" method.<br>
<br>
4. You are now ready to request public data on datafordeler. Go to the Downloads tab (Filudtræk). You should see an empty field - that's because you haven't requested data yet. To get access to data, you need to create a download. You have three choices:<br> - Clicking Opret will allow you to create a permanent download button, that is kept up to date and that you can use multiple times.<br> - Clicking Download will allow you to request a one-time download of the dataset.<br> - Clicking Predefined will allow you to download a dataset with a fixed set of parameters (instead of customizing everything). In particular, you can use Predefined to download the BBR dataset with only up-to-date entries, in JSON or XML format. <br>
<br>
5. You should now see a list of all available downloads. Give your download a name (Visningsnavn) and select BBR Totaludtræk in the list (or BBR Aktuelt Totaludtræk if using Predefined). Click Next. If you chose Opret or Download, you can now adjust a lot of parameters, such as downloading entries for only a specific municipality. If you used Predefined, the parameters are locked.<br><br>

6. Click Save (Gem). You will be taken back to the Download tabs. If you used Opret or Predefined, you should see your data subscription there. You can modify or delete it if you don't think that you will need to download it again in the future. You will receive an email with information on how to get your data.<br>
<br>
7. Actually getting the data is a bit tricky: you cannot download it from Datafordeler directly. You need to use a FTP client like https://filezilla-project.org/ <br>
Download and install FileZilla. When you launch it, enter the address provided in the email you got from Datafordeler, as well as your Datafordeler service user number (*not* your initial username: this is the user number you created in step 3) and password. Click Connect, and you should finally be able to see and download your files! 



### Note on file format

Because BBR is a very large dataset (if you download the whole thing), your computer will run out of memory when trying to parse it in one chunk. For this reason, we want to parse the file iteratively. Doing this with an XML file is relatively straightforward on Python, but takes a lot of time. Working with a JSON file requires installing a specific package, *ijson*, to allow for iterative parsing - but it will be faster. In this notebook, we will work with the JSON format.

# 2) Inserting values in the PostgresQL database

Before retrieving values from the database, we need to define a function that will process these values and insert them into the database. Here the idea is to write a function that inserts one row into the database. It takes as input a Python dictionary where values of various parameters for the building are recorded. You need to adjust the SQL code to fit the names of your table and columns.

### Setup

In [None]:
import psycopg as pg # Package to communicate between Python and PostgresQL

In [None]:
params='dbname=macrocomponents3 user=postgres password=19Ni93co44PG!' # Write the parameters to connect to your PostgresQL database here.

In [3]:
# Write here the parameters you want to retrieve from the BBR data file. These need to be the same names as in BBR, in Danish
parameters=['id_lokalId','kommunekode', 'byg007Bygningsnummer', 'jordstykke', 'grund', 'husnummer', 'byg404Koordinat', 'byg026Opførelsesår', 'byg027OmTilbygningsår', 'byg021BygningensAnvendelse', 'byg041BebyggetAreal', 'byg038SamletBygningsareal', 'byg039BygningensSamledeBoligAreal', 'byg040BygningensSamledeErhvervsAreal', 'byg042ArealIndbyggetGarage', 'byg043ArealIndbyggetCarport', 'byg044ArealIndbyggetUdhus', 'byg045ArealIndbyggetUdestueEllerLign', 'byg046SamletArealAfLukkedeOverdækningerPåBygningen', 'byg047ArealAfAffaldsrumITerrænniveau', 'byg048AndetAreal',  'byg049ArealAfOverdækketAreal', 'byg032YdervæggensMateriale', 'byg050ArealÅbneOverdækningerPåBygningenSamlet', 'byg051Adgangsareal',  'byg054AntalEtager', 'byg055AfvigendeEtager', 'byg033Tagdækningsmateriale', 'byg034SupplerendeYdervæggensMateriale', 'byg035SupplerendeTagdækningsMateriale', 'byg036AsbestholdigtMateriale', 'byg056Varmeinstallation', 'byg057Opvarmningsmiddel', 'byg058SupplerendeVarme', 'byg071BevaringsværdighedReference', 'byg130ArealAfUdvendigEfterisolering', 'byg150Gulvbelægning', 'byg151Frihøjde']

### Insertion function

In [13]:
def insert_bbr_from_dict(row_dict):
    # We first convert the dictionary into a tuple, since the insertion function takes a tuple as input
    # The keys are the names of parameters from BBR, in Danish
    l=list()
    parameters=row_dict.keys()
    for p in parameters:
        l.append(row_dict[p])
    row_tuple=tuple(l)
    
    # Building the SQL query to insert values in the database.     
    sql ="INSERT INTO buildings("
    for p in parameters:
        sql+=p+', '
    sql=sql[0:len(sql)-2]+') VALUES('
    for n in range(len(parameters)):
        sql+='%s, '
    sql=sql[0:len(sql)-2]+') ON CONFLICT ON CONSTRAINT buildings_pkey DO UPDATE SET ('
    for p in parameters:
        sql+=p+', '
    sql=sql[0:len(sql)-2]+') = ('
    for p in parameters:
        sql+='EXCLUDED.'+p+', '
    sql=sql[0:len(sql)-2]+');'
    
    connector = None
    bbrid = None
    
    try:
        # connect to the PostgreSQL database
        connector = pg.connect(params)
        # create a new cursor
        cur = connector.cursor()
        # execute the INSERT statement
        cur.execute(sql, row_tuple)
        # commit the changes to the database
        connector.commit()
        # close communication with the database
        cur.close()
        
    except (Exception, pg.DatabaseError) as error:
        print(error)
        
    finally:
        if connector is not None:
            connector.close()

In [73]:
def insert_bbr_from_dict(row_dict):
    # We first convert the dictionary into a tuple, since the insertion function takes a tuple as input
    # The keys are the names of parameters from BBR, in Danish
    row_tuple = (row_dict['id_lokalId'], row_dict['kommunekode'], row_dict['grund'], row_dict['husnummer'], row_dict['byg404Koordinat'], row_dict['byg026Opførelsesår'], row_dict['byg027OmTilbygningsår'], row_dict['byg021BygningensAnvendelse'], row_dict['byg041BebyggetAreal'], row_dict['byg038SamletBygningsareal'], row_dict['byg054AntalEtager'], row_dict['byg032YdervæggensMateriale'], row_dict['byg033Tagdækningsmateriale'])
    
    # SQL query to insert values in the database. Replace table and column names by the ones in your PostgresQL database.    
    sql = """INSERT INTO buildings(
    bbr_id, 
    municipality_nr, 
    grund, 
    husnummer, 
    coord, 
    construction_year, 
    latest_renovation_year,
    bbr_use_category, 
    built_area, 
    total_building_area, 
    n_floors, 
    wall_material,
    sup_wall_material,
    roof_material,
    sup_roof_material,
    used_area_under_roof,
    used_basement_area,
    commercial_area,
    residential_area,
    built_in_garage_area,
    built_in_carport_area,
    built_in_outhouse_area,
    built_in_conservatory_area,
    enclosed_covered_area,
    waste_room_area,
    other_area,
    covered_area,
    total_area_open_roofs,
    access_area,
    different_floors,
    heat_installation,
    heat_supply,
    heat_supplement,
    materials_with_asbestos,
    materials_source,
    conservation_value) 
    
    VALUES(%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)
    
    ON CONFLICT ON CONSTRAINT buildings_pkey DO UPDATE SET (
    bbr_id, 
    municipality_nr, 
    grund, 
    husnummer, 
    coord, 
    construction_year, 
    latest_renovation_year,
    bbr_use_category, 
    built_area, 
    total_building_area, 
    n_floors, 
    wall_material,
    sup_wall_material,
    roof_material,
    sup_roof_material,
    used_area_under_roof,
    used_basement_area,
    commercial_area,
    residential_area,
    built_in_garage_area,
    built_in_carport_area,
    built_in_outhouse_area,
    built_in_conservatory_area,
    enclosed_covered_area,
    waste_room_area,
    other_area,
    covered_area,
    total_area_open_roofs,
    access_area,
    different_floors,
    heat_installation,
    heat_supply,
    heat_supplement,
    materials_with_asbestos,
    materials_source,
    conservation_value) =
    
    (EXCLUDED.bbr_id, 
    EXCLUDED.municipality_nr, 
    EXCLUDED.grund, 
    EXCLUDED.husnummer, 
    EXCLUDED.coord, 
    EXCLUDED.construction_year, 
    EXCLUDED.latest_renovation_year,
    EXCLUDED.bbr_use_category, 
    EXCLUDED.built_area, 
    EXCLUDED.total_building_area, 
    EXCLUDED.n_floors, 
    EXCLUDED.wall_material,
    EXCLUDED.sup_wall_material,
    EXCLUDED.roof_material,
    EXCLUDED.sup_roof_material,
    EXCLUDED.used_area_under_roof,
    EXCLUDED.used_basement_area,
    EXCLUDED.commercial_area,
    EXCLUDED.residential_area,
    EXCLUDED.built_in_garage_area,
    EXCLUDED.built_in_carport_area,
    EXCLUDED.built_in_outhouse_area,
    EXCLUDED.built_in_conservatory_area,
    EXCLUDED.enclosed_covered_area,
    EXCLUDED.waste_room_area,
    EXCLUDED.other_area,
    EXCLUDED.covered_area,
    EXCLUDED.total_area_open_roofs,
    EXCLUDED.access_area,
    EXCLUDED.different_floors,
    EXCLUDED.heat_installation,
    EXCLUDED.heat_supply,
    EXCLUDED.heat_supplement,
    EXCLUDED.materials_with_asbestos,
    EXCLUDED.materials_source,
    EXCLUDED.conservation_value);"""
    
    connector = None
    bbrid = None
    
    try:
        # connect to the PostgreSQL database
        connector = pg.connect(params)
        # create a new cursor
        cur = connector.cursor()
        # execute the INSERT statement
        cur.execute(sql, row_tuple)
        # commit the changes to the database
        connector.commit()
        # close communication with the database
        cur.close()
        
    except (Exception, pg.DatabaseError) as error:
        print(error)
        
    finally:
        if connector is not None:
            connector.close()

Note: it is possible to write a slightly simpler function that inserts all rows at once, using the *executemany* method instead of *execute*. However, that requires building a dictionary with all rows first - and we want to avoid this due to the size of the dataset.

# 3) Retrieving values from the database

Python has a very easy to use *json* package to read JSON files and convert them into Python objects. However, the *json* package works by loading the entire JSON file into memory before processing it. With a very large file like BBR, this will likely cause errors and crashes. Instead, we want to parse BBR iteratively using the *ijson* package. 

### Setup

In [14]:
import ijson # package to parse JSON iteratively 

In [15]:
import reprlib # package to limit print size if you want to avoid very long prints. 
# If you have many very long prints, the notebook might crash when you open it later.
# Tip: on Jupyter, press Esc then R then Y to reset a cell and delete its output.

In [16]:
jsonfile='C:/Users/KJ35FA/Documents/BBR_Aktuelt_Totaludtraek_JSON_20221107180020.json' #Write the location of your JSON file here.

### Parsing the JSON file

The *ijson parse* method reads the JSON file as a stream, without loading it all into memory. It scans through the JSON file, recording each event it encounters (events include the start and end of the record, each time a new path to a value is defined, and each time a new data point is recorded). For each event, it records the current location/path in the database (prefix), the nature of the event, and the corresponding value, if any.

The following function will parse iteratively through the JSON file, and record the parameters we're interested in. When a new building is encountered, existing parameters are inserted into the database and their values are cleared to record values for the new building. Note: it is important that the parameters we give as inputs to the function are written in the same way as in the JSON record, or they won't be recognized. 

In [77]:
def retrieve_values(parameters,jsonfile):
    jsondata = open(jsonfile, encoding='utf8')
    parser = ijson.parse(jsondata)
    building_dict=dict()
    
    for prefix, event, value in parser:
        if 'BygningList.item.forretningshændelse' in prefix: 
            # This is the first parameter recorded in the JSON file for each building, so it indicates the start of a new building.
            # Insert previously recorded values, if any:
            if len(building_dict.values())>0:
                insert_bbr_from_dict(building_dict)
            #Reset the building dictionary to record values for the new building:
            building_dict=dict()

        elif 'BygningList.item.' in prefix:
            n=0
            # Iterate over all desired parameters until we find one that corresponds to the value we're reading (if any)
            while n < len(parameters):
                if parameters[n] in prefix:
                    # If we find a matching parameter, record it and exit the loop
                    building_dict[parameters[n]]=value
                    n=len(parameters)
                else:
                    n+=1

In [17]:
def retrieve_values(parameters,jsonfile):
    jsondata = open(jsonfile, encoding='utf8')
    items = ijson.kvitems(jsondata, 'BygningList.item')
    building_dict=dict()
    
    for param, value in items:
        if param == 'forretningshændelse': 
            # This is the first parameter recorded in the JSON file for each building, so it indicates the start of a new building.
            # Insert previously recorded values, if any:
            if len(building_dict.values())>0:
                insert_bbr_from_dict(building_dict)
            # Reset the building dictionary to record values for the new building:
            building_dict=dict()

        elif param in parameters:
            # If the parameter we're reading is one we're interested in, record it.
            building_dict[param]=value


Note: Whenever we use the *ijson parse* method, we need to open the data file again to "reset" it. In other words, each occurence of *parser = ijson.parse(jsondata)* must be preceded by *jsondata = open(jsonfile)*. This is the case in the function above - but good to know if you need to parse the file again later.

Now all we need to do is call the function we just defined.

In [None]:
retrieve_values(parameters, jsonfile)