Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new Indicator about buildings using ML #265

Merged
merged 1 commit into from
Jun 21, 2022

Conversation

matthiasschaub
Copy link
Collaborator

@matthiasschaub matthiasschaub commented Mar 3, 2022

Description

Predicts the building area of the AOI using a trained Random Forest Regressor.
The result is the ratio between the prediction of building area and the building
area mapped in OSM.

The input parameters (X or Covariates) to the models are population and population
density (GHSL GHS-POP), settlement typologies (GHSL SMOD), subnational Humand
Development Index (GDL SHDI) and nightlights (EGO VNL).

The spatial resolution of the model are hex-cells at zoom level 12. The input AOI is
split into hex-cells and the prediction is done for each of those hex-cells. The
model is trained on hex-cells in Africa. Therefor the Indicator is restricted to
input AOI within the bounding box of Africa.

Corresponding issue

Closes #243

New or changed dependencies

Checklist

@matthiasschaub matthiasschaub mentioned this pull request Mar 7, 2022
6 tasks
@matthiasschaub matthiasschaub changed the title Building area indicator 2 Building Area Indicator (Machine Learning Approach) Mar 9, 2022
@matthiasschaub matthiasschaub changed the title Building Area Indicator (Machine Learning Approach) Building Area Indicator Mar 9, 2022
@matthiasschaub matthiasschaub added enhancement New feature or request indicator labels Apr 13, 2022
@matthiasschaub matthiasschaub added this to the Release 0.10.0 milestone Apr 13, 2022
@joker234 joker234 added the priority:high Should be addressed as soon as possible (next release) label May 10, 2022
@matthiasschaub matthiasschaub changed the title Building Area Indicator Add new Indicator about building area using a ML approach May 11, 2022
@matthiasschaub matthiasschaub force-pushed the building_area_indicator_2 branch 3 times, most recently from bff0ae1 to c71b774 Compare May 18, 2022 13:44
@matthiasschaub matthiasschaub changed the title Add new Indicator about building area using a ML approach Add new Indicator about buildings using ML May 19, 2022
@matthiasschaub matthiasschaub added the waiting An issue or PR which is waiting for an upstream bugfix, further information or is somehow blocked label May 19, 2022
@matthiasschaub matthiasschaub marked this pull request as ready for review May 19, 2022 15:10
@matthiasschaub
Copy link
Collaborator Author

matthiasschaub commented May 24, 2022

@Gigaszi could you have another look at the resulting figure? Does this resemble your implementation?

sc

@matthiasschaub
Copy link
Collaborator Author

{
  "apiVersion": "0.9.0",
  "attribution": {
    "url": "https://github.com/GIScience/ohsome-quality-analyst/blob/main/data/COPYRIGHTS.md",
    "text": "© OpenStreetMap contributors"
  },
  "type": "Feature",
  "geometry": {
    "type": "MultiPolygon",
    "coordinates": [
      [
        [
          [
            5.779109,
            33.164272
          ],
          [
            5.779597,
            33.165833
          ],
          [
            5.873785,
            33.161541
          ],
          [
            5.953008,
            33.158386
          ],
          [
            5.994273,
            33.156437
          ],
          [
            6.00104,
            33.156494
          ],
          [
            6.003383,
            33.156322
          ],
          [
            6.010534,
            33.159302
          ],
          [
            6.032845,
            33.167492
          ],
          [
            6.050291,
            33.174145
          ],
          [
            6.064626,
            33.179127
          ],
          [
            6.065389,
            33.176319
          ],
          [
            6.065595,
            33.173397
          ],
          [
            6.06603,
            33.169556
          ],
          [
            6.06764,
            33.166409
          ],
          [
            6.069221,
            33.163601
          ],
          [
            6.070499,
            33.160736
          ],
          [
            6.071112,
            33.159073
          ],
          [
            6.071789,
            33.155117
          ],
          [
            6.07125,
            33.152485
          ],
          [
            6.07192,
            33.150482
          ],
          [
            6.075404,
            33.150593
          ],
          [
            6.076807,
            33.147846
          ],
          [
            6.076922,
            33.145035
          ],
          [
            6.07785,
            33.143661
          ],
          [
            6.078469,
            33.138214
          ],
          [
            6.079002,
            33.135067
          ],
          [
            6.079151,
            33.133175
          ],
          [
            6.079191,
            33.130653
          ],
          [
            6.077878,
            33.128132
          ],
          [
            6.077868,
            33.126587
          ],
          [
            6.077988,
            33.124237
          ],
          [
            6.077168,
            33.117935
          ],
          [
            6.080079,
            33.117073
          ],
          [
            6.075289,
            33.109226
          ],
          [
            6.072396,
            33.108307
          ],
          [
            6.063847,
            33.101608
          ],
          [
            6.062587,
            33.094959
          ],
          [
            6.057929,
            33.095989
          ],
          [
            6.049627,
            33.095417
          ],
          [
            6.046957,
            33.091751
          ],
          [
            6.045289,
            33.081322
          ],
          [
            6.044991,
            33.079891
          ],
          [
            6.019088,
            33.086136
          ],
          [
            6.001361,
            33.092896
          ],
          [
            5.900273,
            33.090488
          ],
          [
            5.778012,
            33.084068
          ],
          [
            5.758799,
            33.099312
          ],
          [
            5.779109,
            33.164272
          ]
        ]
      ]
    ]
  },
  "properties": {
    "metadata": {
      "name": "Building Area",
      "description": "Building Area"
    },
    "layer": {
      "name": "Building Area",
      "description": "All buildings as defined by all objects tagged with 'building=*'.\n"
    },
    "result": {
      "timestamp_oqt": "2022-05-24T08:25:01.333304+00:00",
      "timestamp_osm": "2022-05-15T20:00:00+00:00",
      "label": "red",
      "value": 0.1429655218503864,
      "description": "For the AOI the building area mapped in OSM is 957956.53 sqkm and\nthe predicted building area is 6700612.27 sqkm. The weighted\naverage of the ratio between the building area mapped in OSM and the\npredicted building area is 14.3 %. The weight is the\npredicted building area.\nThe building area mapped in OSM is significantly less than predicted.\nThis indicates that many buildings have not been mapped yet.\n"
    },
    "data": {
      "model_name": "Random Forest Regressor",
      "building_area_osm": [
        0,
        0,
        0,
        52486.14,
        841764.76,
        0,
        0,
        63705.63,
        0
      ],
      "building_area_prediction": [
        53320.37,
        5264.09625,
        57319.51,
        875144.24,
        3987139.72,
        840.03723294476,
        8152.7,
        1712591.56,
        840.03723294476
      ],
      "covariates": [
        {
          "ghs_pop": 0,
          "ghs_pop_density": 0,
          "water": 0,
          "very_low_density_rural": 1,
          "low_density_rural": 0,
          "rural_cluster": 0,
          "suburban_or_peri_urban": 0,
          "semi_dense_urban_cluster": 0,
          "dense_urban_cluster": 0,
          "urban_centre": 0,
          "shdi": 0.749352689479516,
          "vnl": 40.20217514038086
        },
        {
          "ghs_pop": 23.039753437042236,
          "ghs_pop_density": 2.4016750284244584e-7,
          "water": 0,
          "very_low_density_rural": 1,
          "low_density_rural": 0,
          "rural_cluster": 0,
          "suburban_or_peri_urban": 0,
          "semi_dense_urban_cluster": 0,
          "dense_urban_cluster": 0,
          "urban_centre": 0,
          "shdi": 0.749352689479516,
          "vnl": 3.270573616027832
        },
        {
          "ghs_pop": 0,
          "ghs_pop_density": 0,
          "water": 0,
          "very_low_density_rural": 1,
          "low_density_rural": 0,
          "rural_cluster": 0,
          "suburban_or_peri_urban": 0,
          "semi_dense_urban_cluster": 0,
          "dense_urban_cluster": 0,
          "urban_centre": 0,
          "shdi": 0.749352689479516,
          "vnl": 50.22721481323242
        },
        {
          "ghs_pop": 16318.097746707499,
          "ghs_pop_density": 0.000170098219228493,
          "water": 0,
          "very_low_density_rural": 0.83,
          "low_density_rural": 0.1,
          "rural_cluster": 0.02,
          "suburban_or_peri_urban": 0.01,
          "semi_dense_urban_cluster": 0,
          "dense_urban_cluster": 0.04,
          "urban_centre": 0,
          "shdi": 0.749352689479516,
          "vnl": 1710.898681640625
        },
        {
          "ghs_pop": 143944.96820783615,
          "ghs_pop_density": 0.0015004918277776384,
          "water": 0,
          "very_low_density_rural": 0.5591397849462365,
          "low_density_rural": 0.07526881720430108,
          "rural_cluster": 0,
          "suburban_or_peri_urban": 0.08602150537634409,
          "semi_dense_urban_cluster": 0,
          "dense_urban_cluster": 0,
          "urban_centre": 0.27956989247311825,
          "shdi": 0.749352689479516,
          "vnl": 9393.3876953125
        },
        {
          "ghs_pop": 0,
          "ghs_pop_density": 0,
          "water": 0,
          "very_low_density_rural": 1,
          "low_density_rural": 0,
          "rural_cluster": 0,
          "suburban_or_peri_urban": 0,
          "semi_dense_urban_cluster": 0,
          "dense_urban_cluster": 0,
          "urban_centre": 0,
          "shdi": 0.749352689479516,
          "vnl": 0
        },
        {
          "ghs_pop": 0,
          "ghs_pop_density": 0,
          "water": 0,
          "very_low_density_rural": 1,
          "low_density_rural": 0,
          "rural_cluster": 0,
          "suburban_or_peri_urban": 0,
          "semi_dense_urban_cluster": 0,
          "dense_urban_cluster": 0,
          "urban_centre": 0,
          "shdi": 0.749352689479516,
          "vnl": 4.482884883880615
        },
        {
          "ghs_pop": 62610.777252197266,
          "ghs_pop_density": 0.000652669167169395,
          "water": 0,
          "very_low_density_rural": 0.826530612244898,
          "low_density_rural": 0.061224489795918366,
          "rural_cluster": 0.02040816326530612,
          "suburban_or_peri_urban": 0,
          "semi_dense_urban_cluster": 0,
          "dense_urban_cluster": 0,
          "urban_centre": 0.09183673469387756,
          "shdi": 0.749352689479516,
          "vnl": 3988.3798828125
        },
        {
          "ghs_pop": 0,
          "ghs_pop_density": 0,
          "water": 0,
          "very_low_density_rural": 1,
          "low_density_rural": 0,
          "rural_cluster": 0,
          "suburban_or_peri_urban": 0,
          "semi_dense_urban_cluster": 0,
          "dense_urban_cluster": 0,
          "urban_centre": 0,
          "shdi": 0.749352689479516,
          "vnl": 0
        }
      ],
      "covariates_values": null,
      "hex_cell_geohash": [
        4171694,
        4172685,
        4171692,
        4170334,
        4170335,
        4172686,
        4171693,
        4170336,
        4172684
      ],
      "completeness_ratio": [
        0,
        0,
        0,
        0.059974273498046446,
        0.2111199554351208,
        0,
        0,
        0.03719837904608148,
        0
      ]
    }
  }
}

@Gigaszi
Copy link
Contributor

Gigaszi commented May 30, 2022

Does this resemble your implementation

Seems all right. The graph is less meaningful for small number of hexcells

@matthiasschaub matthiasschaub mentioned this pull request May 31, 2022
3 tasks
group_by_boundary=True,
)
# Extract OSM data
timestamps = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be only 1 timestamp, or always the same timestamp for each feature

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to just take the first timestamp

# output and to a nested list of the covariate values (scaled) for input to the
# model.
to_be_scaled = []
for hex_cell, ghs_pop, smod, shdi, vnl in zip(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use for i, hex_cell in ...? just an idea

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to suggested syntax

building_area_prediction=round(sum(self.building_area_prediction), 2),
completeness_ratio=round(self.result.value * 100, 2),
)
if self.threshhold_green() <= self.result.value:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe easier to understand if you flip it:

if value >= green_threshold

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to suggested order

label_description:
red: |
The building area mapped in OSM is significantly less than predicted.
This indicates that many buildings have not been mapped yet.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

many --> the vast majority

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

good amount of buildings but not all are already mapped.
green: |
The building area mapped in OSM matches or exceeds the predicted building
area. This indicates good coverage of buildings are mapped in OSM.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check grammar.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar has been checked and is fixed

@@ -76,6 +76,7 @@ class RasterDataset:

# Possible indicator layer combinations
INDICATOR_LAYER = (
("BuildingArea", "building_area"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BuildingAreaCompleteness

Random Forest Regression based Building Area Completeness indicator

OSM Building Completeness based on Random Forest Building Area Prediction

Building Completeness based on Random Forest Regression

def calculate(self) -> None:
# # Scale covariates
# Predict
random_forest_regressor = load_sklearn_model(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just use regressor here. it could be also another model, not only random forest.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to model

# Covariates
self.assertIsNotNone(self.indicator.covariates)
self.assertGreater(len(self.indicator.covariates), 0)
self.assertIsNotNone(self.indicator.covariates[0].ghs_pop)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you can use a loop here? e.g. use the covariates dataclass?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now a loop is used

explained_variance = np.mean(scores["test_explained_variance"])

# Compare with the cross validation scores obtained from training the model
self.assertAlmostEqual(r2, 0.8889414736742092, delta=0.01)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delta can be even bigger, e.g. 0.03 (also for explained variance)

Copy link
Member

@joker234 joker234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove the two joblib files (ideally from the history as well)

.pre-commit-config.yaml Outdated Show resolved Hide resolved
workers/pyproject.toml Outdated Show resolved Hide resolved
@@ -14,6 +14,11 @@ keywords = [
"quality",
]

[[tool.poetry.source]]
name = "gistools"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would name this source not gistools, but more specific to the package as the URL is not generic to the gitlab, but the one project.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left the name unchanged but changed to URL to resolve to GitLab group level pypi repository

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't the name of the source be something like `building-completeness-model

@matthiasschaub matthiasschaub force-pushed the building_area_indicator_2 branch 5 times, most recently from 45170e2 to de142e3 Compare June 20, 2022 07:28
@matthiasschaub matthiasschaub removed the waiting An issue or PR which is waiting for an upstream bugfix, further information or is somehow blocked label Jun 20, 2022
Hagellach37
Hagellach37 previously approved these changes Jun 20, 2022
Copy link
Contributor

@Hagellach37 Hagellach37 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me.

@@ -14,6 +14,11 @@ keywords = [
"quality",
]

[[tool.poetry.source]]
name = "gistools"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't the name of the source be something like `building-completeness-model

Add dependency building-completeness-model Python package.
Use this library to preprocess data and make predictions.

Predicts the building area of the AOI using a trained Random Forest
Regressor. The result is the ratio between the prediction of building
area and the building area mapped in OSM.

The input parameters (X or Covariates) to the models are population and
population density (GHSL GHS-POP), settlement typologies (GHSL SMOD),
subnational Humand Development Index (GDL SHDI) and nightlights (EGO
VNL).

The spatial resolution of the model are hex-cells at zoom level 12. The
input AOI is split into hex-cells and the prediction is done for each of
those hex-cells. The model is trained on hex-cells in Africa. Therefor
the Indicator is restricted to input AOI within the bounding box of
Africa.
@matthiasschaub matthiasschaub merged commit 3f11b69 into main Jun 21, 2022
@matthiasschaub matthiasschaub deleted the building_area_indicator_2 branch June 21, 2022 08:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request indicator priority:high Should be addressed as soon as possible (next release)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add new indicator for building completeness based on building area and ML
4 participants