-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new Indicator about buildings using ML #265
Conversation
ae0f373
to
e70b54e
Compare
a28ec2e
to
4700571
Compare
9d74d80
to
0d04fa0
Compare
bff0ae1
to
c71b774
Compare
c71b774
to
04a4f05
Compare
04a4f05
to
d263b22
Compare
d263b22
to
f7562e7
Compare
f7562e7
to
2a024ff
Compare
@Gigaszi could you have another look at the resulting figure? Does this resemble your implementation? |
|
2a024ff
to
31d4a41
Compare
Seems all right. The graph is less meaningful for small number of hexcells |
workers/ohsome_quality_analyst/indicators/building_area/indicator.py
Outdated
Show resolved
Hide resolved
group_by_boundary=True, | ||
) | ||
# Extract OSM data | ||
timestamps = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be only 1 timestamp, or always the same timestamp for each feature
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to just take the first timestamp
# output and to a nested list of the covariate values (scaled) for input to the | ||
# model. | ||
to_be_scaled = [] | ||
for hex_cell, ghs_pop, smod, shdi, vnl in zip( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe use for i, hex_cell in ...
? just an idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to suggested syntax
building_area_prediction=round(sum(self.building_area_prediction), 2), | ||
completeness_ratio=round(self.result.value * 100, 2), | ||
) | ||
if self.threshhold_green() <= self.result.value: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe easier to understand if you flip it:
if value >= green_threshold
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to suggested order
label_description: | ||
red: | | ||
The building area mapped in OSM is significantly less than predicted. | ||
This indicates that many buildings have not been mapped yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
many --> the vast majority
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
good amount of buildings but not all are already mapped. | ||
green: | | ||
The building area mapped in OSM matches or exceeds the predicted building | ||
area. This indicates good coverage of buildings are mapped in OSM. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check grammar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grammar has been checked and is fixed
@@ -76,6 +76,7 @@ class RasterDataset: | |||
|
|||
# Possible indicator layer combinations | |||
INDICATOR_LAYER = ( | |||
("BuildingArea", "building_area"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BuildingAreaCompleteness
Random Forest Regression based Building Area Completeness indicator
OSM Building Completeness based on Random Forest Building Area Prediction
Building Completeness based on Random Forest Regression
def calculate(self) -> None: | ||
# # Scale covariates | ||
# Predict | ||
random_forest_regressor = load_sklearn_model( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just use regressor
here. it could be also another model, not only random forest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to model
# Covariates | ||
self.assertIsNotNone(self.indicator.covariates) | ||
self.assertGreater(len(self.indicator.covariates), 0) | ||
self.assertIsNotNone(self.indicator.covariates[0].ghs_pop) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe you can use a loop here? e.g. use the covariates dataclass?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now a loop is used
explained_variance = np.mean(scores["test_explained_variance"]) | ||
|
||
# Compare with the cross validation scores obtained from training the model | ||
self.assertAlmostEqual(r2, 0.8889414736742092, delta=0.01) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delta can be even bigger, e.g. 0.03 (also for explained variance)
a1462cf
to
54f6a96
Compare
54f6a96
to
1cd0802
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove the two joblib files (ideally from the history as well)
workers/pyproject.toml
Outdated
@@ -14,6 +14,11 @@ keywords = [ | |||
"quality", | |||
] | |||
|
|||
[[tool.poetry.source]] | |||
name = "gistools" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would name this source not gistools, but more specific to the package as the URL is not generic to the gitlab, but the one project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left the name unchanged but changed to URL to resolve to GitLab group level pypi repository
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't the name of the source be something like `building-completeness-model
45170e2
to
de142e3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me.
workers/pyproject.toml
Outdated
@@ -14,6 +14,11 @@ keywords = [ | |||
"quality", | |||
] | |||
|
|||
[[tool.poetry.source]] | |||
name = "gistools" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't the name of the source be something like `building-completeness-model
Add dependency building-completeness-model Python package. Use this library to preprocess data and make predictions. Predicts the building area of the AOI using a trained Random Forest Regressor. The result is the ratio between the prediction of building area and the building area mapped in OSM. The input parameters (X or Covariates) to the models are population and population density (GHSL GHS-POP), settlement typologies (GHSL SMOD), subnational Humand Development Index (GDL SHDI) and nightlights (EGO VNL). The spatial resolution of the model are hex-cells at zoom level 12. The input AOI is split into hex-cells and the prediction is done for each of those hex-cells. The model is trained on hex-cells in Africa. Therefor the Indicator is restricted to input AOI within the bounding box of Africa.
e2754c1
to
078c93c
Compare
Description
Predicts the building area of the AOI using a trained Random Forest Regressor.
The result is the ratio between the prediction of building area and the building
area mapped in OSM.
The input parameters (X or Covariates) to the models are population and population
density (GHSL GHS-POP), settlement typologies (GHSL SMOD), subnational Humand
Development Index (GDL SHDI) and nightlights (EGO VNL).
The spatial resolution of the model are hex-cells at zoom level 12. The input AOI is
split into hex-cells and the prediction is done for each of those hex-cells. The
model is trained on hex-cells in Africa. Therefor the Indicator is restricted to
input AOI within the bounding box of Africa.
Corresponding issue
Closes #243
New or changed dependencies
Checklist
main
(e.g. throughgit rebase main
)rasterstats
to provide access to third-party raster datasets stored on disk #227GroupBy/boundaries
queries to the ohsome API client #272get_shdi
#333