# Estimation of building height using digital elevation data

This work strives to test a hypothesis. The hypothesis is that a building height can be estimated by means of the difference between digital terrain model (DTM) and digital surface model (DSM). To test the hypothesis, city of Halle in Germany is considerd in combination with openly available data from Saxony-Anhalt (https://www.lvermgeo.sachsen-anhalt.de/de/kostenfreie_geobasisdaten_lvermgeo.html) and Open Street Map (https://www.openstreetmap.org). 

The following procedure is used to test the hypothesis. In the __first__ part, DSM, DTM, and geometry of the buildings are extracted. In the __second__ part, DSM and DTM data are intersected on building geometry and the differenced between the DSM and DTM heights are calculated. And in the __third__ part, a weighted average method is used to estimate flat height with confidence interval of buildings. 

In [1]:
import geopandas as gpd
import pandas as pd
import osmnx as ox
from shapely.geometry import LineString, Point, Polygon

import os 

In [2]:
# Toggle between users

# Babak
upath = r'.'

## ''Part 1''

#### Reading and concatination of DTM and DSM files

In [3]:
def get_file_list(srcdir) -> list:
    return [os.path.join(srcdir, fn) for fn in os.listdir(srcdir) if os.path.isfile(os.path.join(srcdir, fn))]

In [4]:
%%time
# get list of DTM files in the following directory
fd_dtm = os.path.join(upath, 'input', 'DTM')
dtmfiles = get_file_list(fd_dtm)


# get list of DSM files in the following directory
fd_dsm = os.path.join(upath, 'input', 'DSM')
dsmfiles = get_file_list(fd_dsm)

CPU times: user 1.5 ms, sys: 733 µs, total: 2.23 ms
Wall time: 2.52 ms


In [5]:
%%time
# extractin .xyz files asspacited with the DTM and DSM data and convert them to geo-files and later concat them.

# geodataframe containg DTM and DSM
gdtm = gpd.GeoDataFrame(columns = ['x','y','z','geometry'], geometry = 'geometry')
gdsm = gpd.GeoDataFrame(columns = ['x','y','z','geometry'], geometry = 'geometry')

for dtm in dtmfiles: #dtmfiles[:1]
    print('Index number of the read DTM file = ' + str(dtmfiles.index(dtm)))
    df = []
    df = pd.read_table(dtm, delim_whitespace=True, names=['x', 'y', 'z'])
    gdtm = pd.concat([gdtm,
                      gpd.GeoDataFrame(df, 
                                       crs = 'epsg:25832', 
                                       geometry = gpd.points_from_xy(df.x,df.y))], 
                     ignore_index=True)

for dsm in dsmfiles:
    print('Index number of the read DSM file = ' + str(dsmfiles.index(dsm)))
    df = []
    df = pd.read_table(dsm, delim_whitespace=True, names=['x', 'y', 'z'])
    gdsm = pd.concat([gdsm,
                      gpd.GeoDataFrame(df, 
                                       crs = 'epsg:25832', 
                                       geometry = gpd.points_from_xy(df.x,df.y))], 
                     ignore_index=True)

# .buffer(1,cap_style = 3)

Index number of the read DTM file = 0




Index number of the read DTM file = 1
Index number of the read DTM file = 2
Index number of the read DTM file = 3
Index number of the read DTM file = 4
Index number of the read DTM file = 5
Index number of the read DTM file = 6
Index number of the read DTM file = 7
Index number of the read DTM file = 8
Index number of the read DTM file = 9
Index number of the read DTM file = 10
Index number of the read DTM file = 11
Index number of the read DTM file = 12
Index number of the read DTM file = 13
Index number of the read DTM file = 14
Index number of the read DTM file = 15
Index number of the read DTM file = 16
Index number of the read DTM file = 17
Index number of the read DTM file = 18
Index number of the read DTM file = 19
Index number of the read DTM file = 20
Index number of the read DTM file = 21
Index number of the read DTM file = 22
Index number of the read DTM file = 23
Index number of the read DTM file = 24
Index number of the read DTM file = 25
Index number of the read DTM file 



Index number of the read DSM file = 1
Index number of the read DSM file = 2
Index number of the read DSM file = 3
Index number of the read DSM file = 4
Index number of the read DSM file = 5
Index number of the read DSM file = 6
Index number of the read DSM file = 7
Index number of the read DSM file = 8
Index number of the read DSM file = 9
Index number of the read DSM file = 10
Index number of the read DSM file = 11
Index number of the read DSM file = 12
Index number of the read DSM file = 13
Index number of the read DSM file = 14
Index number of the read DSM file = 15
Index number of the read DSM file = 16
Index number of the read DSM file = 17
Index number of the read DSM file = 18
Index number of the read DSM file = 19
Index number of the read DSM file = 20
Index number of the read DSM file = 21
Index number of the read DSM file = 22
Index number of the read DSM file = 23
Index number of the read DSM file = 24
Index number of the read DSM file = 25
Index number of the read DSM file 

#### Extraction of building geometries from openstreetmap

In [6]:
%%time
# Setting a boundary box for the city of Halle in Germany
# south, east, north, west = [51.3617,12.2793,51.5837,11.6651]
south, east, north, west = (51.3435, 12.5023, 51.6020, 11.4419)

# Extracting data associated with building
buildings = ox.geometries_from_bbox(north, south, east, west, tags = {'building': True})

# attributes of interest associated with buildings
bcols = ['geometry', 'building', 'building:levels']
bdata = buildings[bcols]

# extracting only polygon geometries for buildigns
bdata = bdata[bdata['geometry'].geom_type == 'Polygon']

CPU times: user 1min, sys: 5.61 s, total: 1min 6s
Wall time: 1min 6s


In [7]:
bdata.crs

<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

## Clipping

In [8]:
# Load administrative boundary 
border = gpd.read_file('halle.geojson')

In [9]:
border.crs

<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

In [10]:
bdata = gpd.clip(bdata, border)

  clipped.loc[


In [11]:
bdata.crs

<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

In [12]:
len(bdata)

40644

In [13]:
bdata = bdata.to_crs(epsg=25832)

In [14]:
bdata.crs

<Derived Projected CRS: EPSG:25832>
Name: ETRS89 / UTM zone 32N
Axis Info [cartesian]:
- E[east]: Easting (metre)
- N[north]: Northing (metre)
Area of Use:
- name: Europe between 6°E and 12°E: Austria; Belgium; Denmark - onshore and offshore; Germany - onshore and offshore; Norway including - onshore and offshore; Spain - offshore.
- bounds: (6.0, 38.76, 12.01, 84.33)
Coordinate Operation:
- name: UTM zone 32N
- method: Transverse Mercator
Datum: European Terrestrial Reference System 1989 ensemble
- Ellipsoid: GRS 1980
- Prime Meridian: Greenwich

#### Carrying out a series of GIS processing

In [15]:
%%time
# droping DTM and DSM points that are not overlaying with building geometry
# converting decimal degree to meter via ".to_crs(epsg=3763)"
# for i in ['gdtm','gdsm']:
#     locals()['b_'+i] = gpd.overlay(bdata.to_crs(epsg=25832).reset_index(),
#                                    locals()[i].set_crs(epsg=25832), 
#                                    how='intersection',
#                                    keep_geom_type=False)
for i in ['gdtm','gdsm']:
    locals()['b_'+i] = gpd.overlay(bdata.reset_index(),
                                   locals()[i].set_crs(epsg=25832), 
                                   how='intersection',
                                   keep_geom_type=False)

# buffering (2-by-2 meter) remaining points from the previous transformer that are within the geometry of buildings
for i in ['b_gdtm','b_gdsm']:
    dum = []
    dum = locals()[i].copy()
    
    del dum['geometry']
    dum['geometry'] = Polygon([(0, 0), (0, 0), (0, 0)])
    dum['geometry'] = locals()[i].buffer(1,cap_style = 3)
    locals()['b'+i] = dum

# # filltering out buffered points that are not 100% within the building geometry
# for i in ['bb_gdtm','bb_gdsm']:
    
#     dum = []
#     dum = gpd.overlay(bdata.to_crs(epsg=25832).reset_index(),
#                       locals()[i][['x','y','z','geometry']],
#                       how='intersection',
#                       keep_geom_type=False)
    
#     locals()['bbw'+i[-5:]] = dum[dum.area == 4]
# filltering out buffered points that are not 100% within the building geometry
for i in ['bb_gdtm','bb_gdsm']:
    
    dum = []
    dum = gpd.overlay(bdata.reset_index(),
                      locals()[i][['x','y','z','geometry']],
                      how='intersection',
                      keep_geom_type=False)
    
    locals()['bbw'+i[-5:]] = dum[dum.area == 4]


# intersecting DTM and DSM
dem = gpd.overlay(bbw_gdtm[['osmid','building','building:levels','x','y','z','geometry']],
                  bbw_gdsm[['z','geometry']],
                  how='intersection',
                  keep_geom_type=False)

# calculating the difference 
dem['z_meter'] = (dem['z_2']-dem['z_1']).astype(float)

CPU times: user 8min 2s, sys: 40.9 s, total: 8min 43s
Wall time: 9min 36s


## Reproject 

In [16]:
dem = dem.to_crs(epsg=3035)

#### Estimation of building height
In this approach, the created DEM file is grouped by building's 'OSM ID's and then mean, median, min, and max are calculated.

In [17]:
save_dir = '.'

# dissolving the created "dem" dataframe by DTM x and y 
dem_diss_b = dem.dissolve(by=['osmid', 'building'], 
                        aggfunc = {'z_meter': 'mean'
                                  })

# save as shapefile
dem_diss_b.to_file(os.path.join(upath , save_dir, 'dem_diss_b.shp'))

In [18]:
# with open('dem_diss_b.geojson', 'w') as fout:
#     fout.write(dem_diss_b.to_json())

#### CUBE representation of DEM  (creating a raster file using DTM as the reference geometry)
In this approach, the created DEM file is grouped by DTM's 'x' and 'y' coordinates and later the mean of heights associated with DTM and DSM are calculated. NB. here, CUBEs that are not 2-by-2 are filtered out.

In [19]:
# dissolving the created "dem" dataframe by DTM x and y 
dem_diss_c = dem.dissolve(by=['x','y'], 
                          aggfunc = {'osmid': 'mean',
                                     'building': 'first',
                                     'z_1': 'mean',
                                     'z_2': 'mean',
                                     'z_meter': 'mean'
                                    })
# rename some columns
dem_diss_c = dem_diss_c.rename(columns = {'z_1':'z_terrain',
                                          'z_2': 'z_surface'})



# save as shapefile
dem_diss_c.to_file(os.path.join(upath , save_dir, 'dem_diss_c.shp'))

In [20]:
dem_diss_c.columns.tolist()

['geometry', 'osmid', 'building', 'z_terrain', 'z_surface', 'z_meter']

## Rasterization

In [21]:
from geocube.api.core import make_geocube

In [22]:
# Rasterize
rastheights = make_geocube(vector_data=dem_diss_c, measurements=['z_meter'], resolution=[-10, 10], fill=65535)

In [23]:
rastheights.rio.to_raster('dem_dtm_heights.tiff',  driver='GTiff')