# Check LAUs and NUTS3 with no buildings

We look at the overview files to look for 0s. 

This is a first sanity check that is complemented with the coverage analysis comparing to GHSL.

In [84]:
import pandas as pd
import os
import glob 
from collections import Counter

root = '/home/nmd/Desktop/'

# Microsoft

There are important gaps.

The results are consistent with the map provided by MSFT and our own investigations e.g. via eubucco-analysis. 

In [31]:
paths_msft = glob.glob(os.path.join(root,'v1','msft_*'))
overviews_msft = pd.DataFrame()

for path in paths_msft:
    tmp = pd.read_csv(path)
    tmp.insert(0,'country',os.path.split(path)[1].split('_')[1])
    overviews_msft = pd.concat([overviews_msft,tmp])

In [32]:
# empty laus
Counter(overviews_msft[overviews_msft.n_bldgs==0]['country'])

Counter({'spain': 4723,
         'france': 2719,
         'czechia': 1634,
         'switzerland': 511,
         'italy': 289,
         'portugal': 156,
         'germany': 137,
         'slovakia': 118,
         'hungary': 74,
         'netherlands': 31,
         'estonia': 11,
         'norway': 9,
         'poland': 9,
         'ireland': 5,
         'latvia': 3,
         'denmark': 3,
         'finland': 2,
         'sweden': 2,
         'lithuania': 2,
         'bulgaria': 1,
         'uk': 1})

In [33]:
# empty NUTS
gb = overviews_msft.groupby('NUTS3_ID')['n_bldgs'].sum().reset_index()
gb[gb.n_bldgs==0]

Unnamed: 0,NUTS3_ID,n_bldgs
114,CH025,0
115,CH031,0
116,CH032,0
146,CZ072,0
656,ES533,0
668,ES703,0
671,ES706,0
672,ES707,0
1122,PT200,0


# OSM

The results are encouraging as relatively few cities are totally empty. 

At the NUTS level mostly the Canary islands are missing.

At the LAU level, results are consistent with our other analyses (lack of coverage in Greece, Romania, Spain, Portugal in particular).

In [5]:
paths_osm = glob.glob(os.path.join(root,'v1','osm_*'))

In [19]:
overviews_osm = pd.DataFrame()

for path in paths_osm:
    tmp = pd.read_csv(path)
    tmp.insert(0,'country',os.path.split(path)[1].split('_')[1])
    overviews_osm = pd.concat([overviews_osm,tmp])

In [25]:
# empty laus
Counter(overviews_osm[overviews_osm.n_bldgs==0]['country'])

Counter({'greece': 862,
         'romania': 227,
         'spain': 176,
         'portugal': 53,
         'cyprus': 44,
         'italy': 20,
         'germany': 6,
         'switzerland': 4,
         'netherlands': 2,
         'france': 1})

In [29]:
# empty NUTS
gb = overviews_osm.groupby('NUTS3_ID')['n_bldgs'].sum().reset_index()

In [30]:
gb[gb.n_bldgs==0]

Unnamed: 0,NUTS3_ID,n_bldgs
308,DE502,0
668,ES703,0
669,ES704,0
670,ES705,0
671,ES706,0
672,ES707,0
673,ES708,0
674,ES709,0


# Government

Here we expect very high coverag rate. 

Check with is happening with Cyprus, ITH1 and maybe Czecia. Otherwise OK.

In [85]:
paths_gov = glob.glob(os.path.join(root,'v1','gov_*'))
overviews_gov = pd.DataFrame()

for path in paths_gov:
    tmp = pd.read_csv(path)
    tmp.insert(0,'country',os.path.split(path)[1].split('_')[1])
    overviews_gov = pd.concat([overviews_gov,tmp])

In [86]:
laus_gov = pd.read_csv('../3-choose-msft-osm/laus_gov.csv')

In [87]:
overviews_gov = overviews_gov[overviews_gov.LAU_ID.isin(laus_gov.LAU_ID)]

In [89]:
# empty laus
Counter(overviews_gov[overviews_gov.n_bldgs==0]['country'])

Counter({'cyprus': 185,
         'czechia': 12,
         'germany': 4,
         'switzerland': 4,
         'italy': 2,
         'france': 1})

In [90]:
# empty NUTS
gb = overviews_gov.groupby('NUTS3_ID')['n_bldgs'].sum().reset_index()
gb[gb.n_bldgs==0]

Unnamed: 0,NUTS3_ID,n_bldgs


## Issues

Cyprus: ok because for gov data no data from the Turkish side. Should we remove these LAUs or combine?
Czechia: some random cities