Add new Indicator about buildings using ML #265

matthiasschaub · 2022-03-03T20:34:52Z

Description

Predicts the building area of the AOI using a trained Random Forest Regressor.
The result is the ratio between the prediction of building area and the building
area mapped in OSM.

The input parameters (X or Covariates) to the models are population and population
density (GHSL GHS-POP), settlement typologies (GHSL SMOD), subnational Humand
Development Index (GDL SHDI) and nightlights (EGO VNL).

The spatial resolution of the model are hex-cells at zoom level 12. The input AOI is
split into hex-cells and the prediction is done for each of those hex-cells. The
model is trained on hex-cells in Africa. Therefor the Indicator is restricted to
input AOI within the bounding box of Africa.

Corresponding issue

Closes #243

New or changed dependencies

building-completeness-model

Checklist

#265

matthiasschaub · 2022-05-24T08:26:30Z

@Gigaszi could you have another look at the resulting figure? Does this resemble your implementation?

matthiasschaub · 2022-05-24T08:27:29Z

{
  "apiVersion": "0.9.0",
  "attribution": {
    "url": "https://github.com/GIScience/ohsome-quality-analyst/blob/main/data/COPYRIGHTS.md",
    "text": "© OpenStreetMap contributors"
  },
  "type": "Feature",
  "geometry": {
    "type": "MultiPolygon",
    "coordinates": [
      [
        [
          [
            5.779109,
            33.164272
          ],
          [
            5.779597,
            33.165833
          ],
          [
            5.873785,
            33.161541
          ],
          [
            5.953008,
            33.158386
          ],
          [
            5.994273,
            33.156437
          ],
          [
            6.00104,
            33.156494
          ],
          [
            6.003383,
            33.156322
          ],
          [
            6.010534,
            33.159302
          ],
          [
            6.032845,
            33.167492
          ],
          [
            6.050291,
            33.174145
          ],
          [
            6.064626,
            33.179127
          ],
          [
            6.065389,
            33.176319
          ],
          [
            6.065595,
            33.173397
          ],
          [
            6.06603,
            33.169556
          ],
          [
            6.06764,
            33.166409
          ],
          [
            6.069221,
            33.163601
          ],
          [
            6.070499,
            33.160736
          ],
          [
            6.071112,
            33.159073
          ],
          [
            6.071789,
            33.155117
          ],
          [
            6.07125,
            33.152485
          ],
          [
            6.07192,
            33.150482
          ],
          [
            6.075404,
            33.150593
          ],
          [
            6.076807,
            33.147846
          ],
          [
            6.076922,
            33.145035
          ],
          [
            6.07785,
            33.143661
          ],
          [
            6.078469,
            33.138214
          ],
          [
            6.079002,
            33.135067
          ],
          [
            6.079151,
            33.133175
          ],
          [
            6.079191,
            33.130653
          ],
          [
            6.077878,
            33.128132
          ],
          [
            6.077868,
            33.126587
          ],
          [
            6.077988,
            33.124237
          ],
          [
            6.077168,
            33.117935
          ],
          [
            6.080079,
            33.117073
          ],
          [
            6.075289,
            33.109226
          ],
          [
            6.072396,
            33.108307
          ],
          [
            6.063847,
            33.101608
          ],
          [
            6.062587,
            33.094959
          ],
          [
            6.057929,
            33.095989
          ],
          [
            6.049627,
            33.095417
          ],
          [
            6.046957,
            33.091751
          ],
          [
            6.045289,
            33.081322
          ],
          [
            6.044991,
            33.079891
          ],
          [
            6.019088,
            33.086136
          ],
          [
            6.001361,
            33.092896
          ],
          [
            5.900273,
            33.090488
          ],
          [
            5.778012,
            33.084068
          ],
          [
            5.758799,
            33.099312
          ],
          [
            5.779109,
            33.164272
          ]
        ]
      ]
    ]
  },
  "properties": {
    "metadata": {
      "name": "Building Area",
      "description": "Building Area"
    },
    "layer": {
      "name": "Building Area",
      "description": "All buildings as defined by all objects tagged with 'building=*'.\n"
    },
    "result": {
      "timestamp_oqt": "2022-05-24T08:25:01.333304+00:00",
      "timestamp_osm": "2022-05-15T20:00:00+00:00",
      "label": "red",
      "value": 0.1429655218503864,
      "description": "For the AOI the building area mapped in OSM is 957956.53 sqkm and\nthe predicted building area is 6700612.27 sqkm. The weighted\naverage of the ratio between the building area mapped in OSM and the\npredicted building area is 14.3 %. The weight is the\npredicted building area.\nThe building area mapped in OSM is significantly less than predicted.\nThis indicates that many buildings have not been mapped yet.\n"
    },
    "data": {
      "model_name": "Random Forest Regressor",
      "building_area_osm": [
        0,
        0,
        0,
        52486.14,
        841764.76,
        0,
        0,
        63705.63,
        0
      ],
      "building_area_prediction": [
        53320.37,
        5264.09625,
        57319.51,
        875144.24,
        3987139.72,
        840.03723294476,
        8152.7,
        1712591.56,
        840.03723294476
      ],
      "covariates": [
        {
          "ghs_pop": 0,
          "ghs_pop_density": 0,
          "water": 0,
          "very_low_density_rural": 1,
          "low_density_rural": 0,
          "rural_cluster": 0,
          "suburban_or_peri_urban": 0,
          "semi_dense_urban_cluster": 0,
          "dense_urban_cluster": 0,
          "urban_centre": 0,
          "shdi": 0.749352689479516,
          "vnl": 40.20217514038086
        },
        {
          "ghs_pop": 23.039753437042236,
          "ghs_pop_density": 2.4016750284244584e-7,
          "water": 0,
          "very_low_density_rural": 1,
          "low_density_rural": 0,
          "rural_cluster": 0,
          "suburban_or_peri_urban": 0,
          "semi_dense_urban_cluster": 0,
          "dense_urban_cluster": 0,
          "urban_centre": 0,
          "shdi": 0.749352689479516,
          "vnl": 3.270573616027832
        },
        {
          "ghs_pop": 0,
          "ghs_pop_density": 0,
          "water": 0,
          "very_low_density_rural": 1,
          "low_density_rural": 0,
          "rural_cluster": 0,
          "suburban_or_peri_urban": 0,
          "semi_dense_urban_cluster": 0,
          "dense_urban_cluster": 0,
          "urban_centre": 0,
          "shdi": 0.749352689479516,
          "vnl": 50.22721481323242
        },
        {
          "ghs_pop": 16318.097746707499,
          "ghs_pop_density": 0.000170098219228493,
          "water": 0,
          "very_low_density_rural": 0.83,
          "low_density_rural": 0.1,
          "rural_cluster": 0.02,
          "suburban_or_peri_urban": 0.01,
          "semi_dense_urban_cluster": 0,
          "dense_urban_cluster": 0.04,
          "urban_centre": 0,
          "shdi": 0.749352689479516,
          "vnl": 1710.898681640625
        },
        {
          "ghs_pop": 143944.96820783615,
          "ghs_pop_density": 0.0015004918277776384,
          "water": 0,
          "very_low_density_rural": 0.5591397849462365,
          "low_density_rural": 0.07526881720430108,
          "rural_cluster": 0,
          "suburban_or_peri_urban": 0.08602150537634409,
          "semi_dense_urban_cluster": 0,
          "dense_urban_cluster": 0,
          "urban_centre": 0.27956989247311825,
          "shdi": 0.749352689479516,
          "vnl": 9393.3876953125
        },
        {
          "ghs_pop": 0,
          "ghs_pop_density": 0,
          "water": 0,
          "very_low_density_rural": 1,
          "low_density_rural": 0,
          "rural_cluster": 0,
          "suburban_or_peri_urban": 0,
          "semi_dense_urban_cluster": 0,
          "dense_urban_cluster": 0,
          "urban_centre": 0,
          "shdi": 0.749352689479516,
          "vnl": 0
        },
        {
          "ghs_pop": 0,
          "ghs_pop_density": 0,
          "water": 0,
          "very_low_density_rural": 1,
          "low_density_rural": 0,
          "rural_cluster": 0,
          "suburban_or_peri_urban": 0,
          "semi_dense_urban_cluster": 0,
          "dense_urban_cluster": 0,
          "urban_centre": 0,
          "shdi": 0.749352689479516,
          "vnl": 4.482884883880615
        },
        {
          "ghs_pop": 62610.777252197266,
          "ghs_pop_density": 0.000652669167169395,
          "water": 0,
          "very_low_density_rural": 0.826530612244898,
          "low_density_rural": 0.061224489795918366,
          "rural_cluster": 0.02040816326530612,
          "suburban_or_peri_urban": 0,
          "semi_dense_urban_cluster": 0,
          "dense_urban_cluster": 0,
          "urban_centre": 0.09183673469387756,
          "shdi": 0.749352689479516,
          "vnl": 3988.3798828125
        },
        {
          "ghs_pop": 0,
          "ghs_pop_density": 0,
          "water": 0,
          "very_low_density_rural": 1,
          "low_density_rural": 0,
          "rural_cluster": 0,
          "suburban_or_peri_urban": 0,
          "semi_dense_urban_cluster": 0,
          "dense_urban_cluster": 0,
          "urban_centre": 0,
          "shdi": 0.749352689479516,
          "vnl": 0
        }
      ],
      "covariates_values": null,
      "hex_cell_geohash": [
        4171694,
        4172685,
        4171692,
        4170334,
        4170335,
        4172686,
        4171693,
        4170336,
        4172684
      ],
      "completeness_ratio": [
        0,
        0,
        0,
        0.059974273498046446,
        0.2111199554351208,
        0,
        0,
        0.03719837904608148,
        0
      ]
    }
  }
}

#265

Gigaszi · 2022-05-30T22:02:34Z

Does this resemble your implementation

Seems all right. The graph is less meaningful for small number of hexcells

workers/ohsome_quality_analyst/indicators/building_area/indicator.py

Hagellach37 · 2022-06-01T14:11:02Z

workers/ohsome_quality_analyst/indicators/building_area/indicator.py

+            group_by_boundary=True,
+        )
+        # Extract OSM data
+        timestamps = []


should be only 1 timestamp, or always the same timestamp for each feature

Changed to just take the first timestamp

Hagellach37 · 2022-06-01T14:18:44Z

workers/ohsome_quality_analyst/indicators/building_area/indicator.py

+        # output and to a nested list of the covariate values (scaled) for input to the
+        # model.
+        to_be_scaled = []
+        for hex_cell, ghs_pop, smod, shdi, vnl in zip(


maybe use for i, hex_cell in ...? just an idea

Changed to suggested syntax

Hagellach37 · 2022-06-01T14:23:25Z

workers/ohsome_quality_analyst/indicators/building_area/indicator.py

+            building_area_prediction=round(sum(self.building_area_prediction), 2),
+            completeness_ratio=round(self.result.value * 100, 2),
+        )
+        if self.threshhold_green() <= self.result.value:


maybe easier to understand if you flip it:

if value >= green_threshold

Changed to suggested order

Hagellach37 · 2022-06-01T14:28:09Z

workers/ohsome_quality_analyst/indicators/building_area/metadata.yaml

+  label_description:
+    red: |
+      The building area mapped in OSM is significantly less than predicted.
+      This indicates that many buildings have not been mapped yet.


many --> the vast majority

Hagellach37 · 2022-06-01T14:30:22Z

workers/ohsome_quality_analyst/indicators/building_area/metadata.yaml

+      good amount of buildings but not all are already mapped.
+    green: |
+      The building area mapped in OSM matches or exceeds the predicted building
+      area. This indicates good coverage of buildings are mapped in OSM.


check grammar.

Grammar has been checked and is fixed

Hagellach37 · 2022-06-01T14:32:12Z

workers/ohsome_quality_analyst/utils/definitions.py

@@ -76,6 +76,7 @@ class RasterDataset:

 # Possible indicator layer combinations
 INDICATOR_LAYER = (
+    ("BuildingArea", "building_area"),


BuildingAreaCompleteness

Random Forest Regression based Building Area Completeness indicator

OSM Building Completeness based on Random Forest Building Area Prediction

Building Completeness based on Random Forest Regression

Hagellach37 · 2022-06-01T14:38:46Z

workers/ohsome_quality_analyst/indicators/building_area/indicator.py

+    def calculate(self) -> None:
+        # # Scale covariates
+        # Predict
+        random_forest_regressor = load_sklearn_model(


maybe just use regressor here. it could be also another model, not only random forest.

Changed to model

Hagellach37 · 2022-06-01T14:41:11Z

workers/tests/integrationtests/test_indicator_building_area.py

+        # Covariates
+        self.assertIsNotNone(self.indicator.covariates)
+        self.assertGreater(len(self.indicator.covariates), 0)
+        self.assertIsNotNone(self.indicator.covariates[0].ghs_pop)


maybe you can use a loop here? e.g. use the covariates dataclass?

Now a loop is used

Hagellach37 · 2022-06-01T14:46:40Z

workers/tests/unittests/test_indicator_building_area.py

+        explained_variance = np.mean(scores["test_explained_variance"])
+
+        # Compare with the cross validation scores obtained from training the model
+        self.assertAlmostEqual(r2, 0.8889414736742092, delta=0.01)


delta can be even bigger, e.g. 0.03 (also for explained variance)

#265

joker234

please remove the two joblib files (ideally from the history as well)

.pre-commit-config.yaml

workers/pyproject.toml

joker234 · 2022-06-14T14:30:07Z

workers/pyproject.toml

@@ -14,6 +14,11 @@ keywords = [
  "quality",
  ]

+[[tool.poetry.source]]
+name = "gistools"


I would name this source not gistools, but more specific to the package as the URL is not generic to the gitlab, but the one project.

I left the name unchanged but changed to URL to resolve to GitLab group level pypi repository

shouldn't the name of the source be something like `building-completeness-model

#265

Hagellach37

looks good to me.

Hagellach37 · 2022-06-20T08:48:26Z

workers/pyproject.toml

@@ -14,6 +14,11 @@ keywords = [
  "quality",
  ]

+[[tool.poetry.source]]
+name = "gistools"


shouldn't the name of the source be something like `building-completeness-model

Add dependency building-completeness-model Python package. Use this library to preprocess data and make predictions. Predicts the building area of the AOI using a trained Random Forest Regressor. The result is the ratio between the prediction of building area and the building area mapped in OSM. The input parameters (X or Covariates) to the models are population and population density (GHSL GHS-POP), settlement typologies (GHSL SMOD), subnational Humand Development Index (GDL SHDI) and nightlights (EGO VNL). The spatial resolution of the model are hex-cells at zoom level 12. The input AOI is split into hex-cells and the prediction is done for each of those hex-cells. The model is trained on hex-cells in Africa. Therefor the Indicator is restricted to input AOI within the bounding box of Africa.

matthiasschaub mentioned this pull request Mar 7, 2022

WIP: Building area indicator #244

Closed

6 tasks

matthiasschaub force-pushed the building_area_indicator_2 branch from ae0f373 to e70b54e Compare March 9, 2022 05:25

matthiasschaub changed the title ~~Building area indicator 2~~ Building Area Indicator (Machine Learning Approach) Mar 9, 2022

matthiasschaub changed the title ~~Building Area Indicator (Machine Learning Approach)~~ Building Area Indicator Mar 9, 2022

matthiasschaub force-pushed the building_area_indicator_2 branch from a28ec2e to 4700571 Compare March 15, 2022 10:41

matthiasschaub added enhancement New feature or request indicator labels Apr 13, 2022

matthiasschaub added this to the Release 0.10.0 milestone Apr 13, 2022

matthiasschaub mentioned this pull request May 3, 2022

add hex-cells to test and dev DB setup #314

Merged

4 tasks

joker234 added the priority:high Should be addressed as soon as possible (next release) label May 10, 2022

matthiasschaub force-pushed the building_area_indicator_2 branch from 9d74d80 to 0d04fa0 Compare May 11, 2022 12:08

matthiasschaub changed the title ~~Building Area Indicator~~ Add new Indicator about building area using a ML approach May 11, 2022

matthiasschaub force-pushed the building_area_indicator_2 branch 3 times, most recently from bff0ae1 to c71b774 Compare May 18, 2022 13:44

matthiasschaub changed the title ~~Add new Indicator about building area using a ML approach~~ Add new Indicator about buildings using ML May 19, 2022

matthiasschaub added a commit that referenced this pull request May 19, 2022

add new Indicator about buildings using ML

04a4f05

#265

matthiasschaub force-pushed the building_area_indicator_2 branch from c71b774 to 04a4f05 Compare May 19, 2022 14:41

matthiasschaub added a commit that referenced this pull request May 19, 2022

add new Indicator about buildings using ML

d263b22

#265

matthiasschaub force-pushed the building_area_indicator_2 branch from 04a4f05 to d263b22 Compare May 19, 2022 14:42

matthiasschaub added the waiting An issue or PR which is waiting for an upstream bugfix, further information or is somehow blocked label May 19, 2022

matthiasschaub added a commit that referenced this pull request May 19, 2022

add new Indicator about buildings using ML

f7562e7

#265

matthiasschaub force-pushed the building_area_indicator_2 branch from d263b22 to f7562e7 Compare May 19, 2022 15:10

matthiasschaub marked this pull request as ready for review May 19, 2022 15:10

matthiasschaub added a commit that referenced this pull request May 19, 2022

add new Indicator about buildings using ML

2a024ff

#265

matthiasschaub force-pushed the building_area_indicator_2 branch from f7562e7 to 2a024ff Compare May 19, 2022 15:11

matthiasschaub added a commit that referenced this pull request May 24, 2022

add new Indicator about buildings using ML

31d4a41

#265

matthiasschaub force-pushed the building_area_indicator_2 branch from 2a024ff to 31d4a41 Compare May 24, 2022 09:45

matthiasschaub requested review from joker234 and Hagellach37 May 24, 2022 09:48

matthiasschaub mentioned this pull request May 31, 2022

update dependencies #340

Closed

3 tasks

Hagellach37 reviewed Jun 1, 2022

View reviewed changes

matthiasschaub added a commit that referenced this pull request Jun 2, 2022

add new Indicator about buildings using ML

5cbcdd0

#265

matthiasschaub force-pushed the building_area_indicator_2 branch from a1462cf to 54f6a96 Compare June 2, 2022 16:05

matthiasschaub added a commit that referenced this pull request Jun 8, 2022

add new Indicator about buildings using ML

6350148

#265

matthiasschaub force-pushed the building_area_indicator_2 branch from 54f6a96 to 1cd0802 Compare June 8, 2022 08:42

joker234 reviewed Jun 13, 2022

View reviewed changes

.pre-commit-config.yaml Outdated Show resolved Hide resolved

workers/pyproject.toml Outdated Show resolved Hide resolved

matthiasschaub requested review from joker234 and Hagellach37 June 14, 2022 13:43

joker234 reviewed Jun 14, 2022

View reviewed changes

matthiasschaub added a commit that referenced this pull request Jun 14, 2022

add new Indicator about buildings using ML

1d6a7bb

#265

matthiasschaub force-pushed the building_area_indicator_2 branch 5 times, most recently from 45170e2 to de142e3 Compare June 20, 2022 07:28

matthiasschaub removed the waiting An issue or PR which is waiting for an upstream bugfix, further information or is somehow blocked label Jun 20, 2022

Hagellach37 previously approved these changes Jun 20, 2022

View reviewed changes

matthiasschaub dismissed Hagellach37’s stale review via e2754c1 June 20, 2022 16:05

matthiasschaub force-pushed the building_area_indicator_2 branch from e2754c1 to 078c93c Compare June 20, 2022 16:27

Hagellach37 approved these changes Jun 21, 2022

View reviewed changes

matthiasschaub merged commit 3f11b69 into main Jun 21, 2022

matthiasschaub deleted the building_area_indicator_2 branch June 21, 2022 08:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new Indicator about buildings using ML #265

Add new Indicator about buildings using ML #265

matthiasschaub commented Mar 3, 2022 •

edited

matthiasschaub commented May 24, 2022 •

edited

matthiasschaub commented May 24, 2022

Gigaszi commented May 30, 2022

Hagellach37 Jun 1, 2022

matthiasschaub Jun 2, 2022

Hagellach37 Jun 1, 2022

matthiasschaub Jun 2, 2022

Hagellach37 Jun 1, 2022

matthiasschaub Jun 2, 2022

Hagellach37 Jun 1, 2022

matthiasschaub Jun 2, 2022

Hagellach37 Jun 1, 2022

matthiasschaub Jun 7, 2022

Hagellach37 Jun 1, 2022

Hagellach37 Jun 1, 2022

matthiasschaub Jun 2, 2022

Hagellach37 Jun 1, 2022

matthiasschaub Jun 14, 2022

Hagellach37 Jun 1, 2022

joker234 left a comment

joker234 Jun 14, 2022

matthiasschaub Jun 14, 2022

Hagellach37 Jun 20, 2022

Hagellach37 left a comment

Hagellach37 Jun 20, 2022

Add new Indicator about buildings using ML #265

Add new Indicator about buildings using ML #265

Conversation

matthiasschaub commented Mar 3, 2022 • edited

Description

Corresponding issue

New or changed dependencies

Checklist

matthiasschaub commented May 24, 2022 • edited

matthiasschaub commented May 24, 2022

Gigaszi commented May 30, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joker234 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Hagellach37 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matthiasschaub commented Mar 3, 2022 •

edited

matthiasschaub commented May 24, 2022 •

edited