Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HRSL vector v1 #6

Merged
merged 11 commits into from
Sep 9, 2019
Merged

HRSL vector v1 #6

merged 11 commits into from
Sep 9, 2019

Conversation

maning
Copy link
Member

@maning maning commented Aug 16, 2019

Resolves #2

This PR add the HRSL derived vector for prioritizing mapping project with AI. The data has the following files:

  • data/README.md
  • data/hrsl_ph_500m_grid_v1.geojson.zip
  • data/hrsl_ph_buffer100m_v1.geojson.zip
  • data/ph_admin_code.csv

Description of each file is in the README.md

ph_hrsl_v1

@seav @govvin Can one of you download and inspect the data locally before I merge? 🙇‍♀

@maning maning requested review from seav and govvin August 16, 2019 07:20
@maning maning self-assigned this Aug 16, 2019
@seav
Copy link
Member

seav commented Aug 16, 2019

Some comments:

  • The coordinates of the GeoJSON have 9 decimal places. This is too much precision. 5 should be enough.
  • The GeoJSON file contains 1600+ MultiPolygon features with complex shapes. This is very slow to load in tools/applications such as QGIS. Maybe it would be better to split the GeoJSON into smaller pieces, maybe grouped by ADM2 (province) level.

@seav
Copy link
Member

seav commented Aug 16, 2019

Nice to have: If the MultiPolygon inner rings have a small area (FSVO small), it might be better to just remove them to simplify the shapes further.

@govvin
Copy link
Member

govvin commented Aug 16, 2019

Thank you, @maning .

Please consider the following:

image

  • can we improve optimize the file size further by simplifying vertices

image
the polygon features look ideal for settlements, but we could be missing out on roads that connect these settlements when use this to create tasks (lines in red simulate roads not within feature outlines)

  • we probably should merge polygons within 150m of other polygons.
  • How can we better handle polygon islands within polygons?
  • could we be splitting the features with adm3 intersections prematurely?

@govvin
Copy link
Member

govvin commented Aug 16, 2019

Before we publish a version of this, how about we run a test over the same area covered by Ompong last year and compare all features added within the period (Sep-Oct) the task was running and compare that coverage against this one?

@maning
Copy link
Member Author

maning commented Sep 6, 2019

I updated the data, see OP

Changes includes the following based on your comment:

The coordinates of the GeoJSON have 9 decimal places. This is too much precision. 5 should be enough.

👍 geojson is now 5 precision.

The GeoJSON file contains 1600+ MultiPolygon features with complex shapes. This is very slow to load in tools/applications such as QGIS. Maybe it would be better to split the GeoJSON into smaller pieces, maybe grouped by ADM2 (province) level.

👍 I converted to single-parts but it still one big geojson. Its much faster to load now than before.
Also to reduce file size, I removed the Pcode attribute and added a separate csv file. User can join this attributes if they need it for analysis.

can we improve optimize the file size further by simplifying vertices

👍 I simplified the hrsl_ph_buffer100m_v1 by removing vertices within 10m.

the polygon features look ideal for settlements, but we could be missing out on roads that connect these settlements when use this to create tasks (lines in red simulate roads not within feature outlines)

👍 To resolve this, I created a new geojson with 500m grid (hrsl_ph_500m_grid_v1) intersected from the hrsl_ph_buffer100m_v1 this should cover most cases. I also think this is ideal vector we can use fro preparing the tasks.

Before we publish a version of this, how about we run a test over the same area covered by Ompong last year and compare all features added within the period (Sep-Oct) the task was running and compare that coverage against this one?

👍 Yes, I did a quick evaluation see next comment.
👎 Not in Ompong area though. :)

@maning
Copy link
Member Author

maning commented Sep 6, 2019

To quickly evaluate if the derived vectors will cover most settlement areas in a certain area. I compared the vector data and OSM coverage in Banton Island. I chose Banton Island because I used the island to test if HRSL is an appropriate data to use for prioritizing remote mapping (see my OSM diary).

Details of the comparison.

The OSM features were counted if it is partially within each polygon.

hrsl_ph_buffer100m_v1 hrsl_ph_500m_grid_v1
Map hrsl grid
Building - within polygon - 1486
- outside polygon - 376
- within polygon - 1746
- outside polygon - 116
Roads - within polygon - 85
- outside polygon - 10
- within polygon - 91
- outside polygon - 4
"Detection rate" (in %) - Building - 79.8
- Road - 89.4
- Building - 93.3
- Road - 95.8

Summary

  • hrsl_ph_buffer100m_v1 covers a much less area than hrsl_ph_500m_grid_v1 . This means there is less area to map. In addition, hrsl_ph_500m_grid_v1 can have some areas that don't have any features to map for example, if the grid is between land and sea.
  • Both polygon was able to detect most buildings and roads (79-95%), but hrsl_ph_500m_grid_v1 has the highest detection rate.
  • Using this data for mapping roads and buildings does not guarantee 100% coverage but good enough for remote mapping. I talked about this in detail in the diary.

@maning
Copy link
Member Author

maning commented Sep 6, 2019

@govvin @seav Please give this a review again. Thanks!

minor formatting correction
@govvin
Copy link
Member

govvin commented Sep 6, 2019

Let's specifiy the license to the data. Also, double-check on the license of the HRSL dataset.

It is vry challenging to achieve 100% coverage within the context of emergency mapping, and there's always the risk of not mapping isolated settlements - and even with the current approach there's no assurance of that. Helping contributors focus on smaller areas are extremely helpful. Emergency response resources are limited, and DRR managers will often focus initially on where impact is greatest.

Once released, it might be a good idea to conduct a public workshop (e.g. potential TM project managers, DRR people, etc) and record it on video.

@maning
Copy link
Member Author

maning commented Sep 7, 2019

Let's specifiy the license to the data. Also, double-check on the license of the HRSL dataset.

Good point! HSRL is CC-By 4.0 while ph admin boundaries are "Humanitarian use only"
What's the appropriate license for derived data coming from these two?

@seav

@maning
Copy link
Member Author

maning commented Sep 8, 2019

Let's specifiy the license to the data. Also, double-check on the license of the HRSL dataset.

Let's resolve this issue with a project wide license, tracking here: #8

Copy link
Member

@govvin govvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, and all the inputs from the test version has been incorporated.

@maning maning merged commit 8da5a14 into master Sep 9, 2019
@maning maning deleted the hrsl-buffer-v1 branch September 9, 2019 10:38
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants