Skip to content

Addresses#193

Merged
Victor Schappert (vcschapp) merged 11 commits intodevfrom
addresses
Jul 3, 2024
Merged

Addresses#193
Victor Schappert (vcschapp) merged 11 commits intodevfrom
addresses

Conversation

@jwass
Copy link
Copy Markdown
Collaborator

@jwass Jacob Wasserman (jwass) commented May 21, 2024

See initial PR #179

Addresses defines a point to describe a single address. All addresses have:

  • country
  • street
  • number
  • unit (optional)
  • postcode

Furthermore we need 3 "admin levels". Depending on the country or region, some may be required or forbidden. We have not yet defined these rules or added them to the schema. The 3 fields for each level are:

  • region
  • city
  • district

Note that addresses can be arbitrary and often without significant rules or structure. So we cannot restrict the "region" value to be an ISO3166 subdivision code. Different regions refer to the admin levels in their own way but we map them to one of the above fields depending on the region.

This first version does not link addresses to divisions and leaves it for future progress

@vcschapp
Copy link
Copy Markdown
Collaborator

I know TristanDiet-TomTom will provide the deep insights on property names and edge cases around the world, so I'll steer clear of that.

One thing I think is important (as discussed a bit in this morning's Schema TF meeting) relates to the 3 "admin levels" referenced in the summary, currently stated as region, city, and district.

  • If the properties have the same content, validation rules, and meaning as in divisions, we should use the same property names to make the schema as coherent as possible.
  • If they have different content, validation rules, and meaning, we should use different property names to reduce opportunities for customer confusion.
  • Ideally we also avoid "weird overlap" where the concept is called "locality" in one theme, but "city" in the other.

What if we did this? 👇 It's annoying in the short-term due to schema refactoring, but probably gives us the most coherent overall schema in the longer-term.

  1. Rename country to country_code in divisions and places.
  2. Rename region to region_code in divisions.
  3. In addresses use:
    • country_code is the ISO 3166-1 alpha-2 country code (required)
    • region, not region_code, is a less-structured region field that might not be ISO 3166-2.
    • locality is a string array field, minimum length 1, maximum length 2. This allows you to support "district" in postal geographies that require it, but avoids introducing the word "district" and having to explain that it doesn't have a strict hierarchical relationship to "locality"......

@jwass
Copy link
Copy Markdown
Collaborator Author

Jacob Wasserman (jwass) commented May 22, 2024

Victor Schappert (@vcschapp) Thanks. At the addresses meeting today, people liked the idea of just getting very generic here and calling it "admin1", "admin2", "admin3", etc. (exact name TBD but something like that is where we're leaning) and not try to map everything neatly to a known hierarchical value like the division subtypes.

The idea would not be to treat these like admin_levels but rather just populate them in order. So in US, admin1 is state, admin2 is city. But some places might have admin1 as state, admin2 is county, admin3 is city, etc. Not sure.

I think country could stay as these will always be valid ISO3166-2 country codes.

@vcschapp
Copy link
Copy Markdown
Collaborator

Victor Schappert (@vcschapp) Thanks. At the addresses meeting today, people liked the idea of just getting very generic here and calling it "admin1", "admin2", "admin3", etc. (exact name TBD but something like that is where we're leaning) and not try to map everything neatly to a known hierarchical value like the division subtypes.

The idea would not be to treat these like admin_levels but rather just populate them in order. So in US, admin1 is state, admin2 is city. But some places might have admin1 as state, admin2 is county, admin3 is city, etc. Not sure.

I think country could stay as these will always be valid ISO3166-2 country codes.

This seems reasonable. It can also be an array property so we don't need 3 columns for it.

@jwass Jacob Wasserman (jwass) force-pushed the addresses branch 2 times, most recently from bff7266 to 1d99ef9 Compare June 9, 2024 18:03
@jwass
Copy link
Copy Markdown
Collaborator Author

An alternative considered is to use a list of 5 elements and have null elements e.g. address_level: ["MA", null, "Boston", null, null]. While allowable I worry that lists with a few null values in the middle is non-standard enough to give some headaches to certain consumers.

@vcschapp
Copy link
Copy Markdown
Collaborator

An alternative considered is to use a list of 5 elements and have null elements e.g. address_level: ["MA", null, "Boston", null, null]. While allowable I worry that lists with a few null values in the middle is non-standard enough to give some headaches to certain consumers.

Sharing a couple thoughts.

Naming

What if we used the word sector/sectors for the "administrative" parts of the address, i.e. what is address_level in the current version of the PR?

My thoughts are:

  • The word sector is a bit like division in that it's a fairly generic word for a distinct part or subdivision of a larger entity, but it's a different word and thus doesn't compete with/conflict with division.
  • The word sector is mostly not used as the official word for an administrative subdivision with the exception of in parts of 5-10 smaller countries or areas. While there is OSM tagging using sector it is fairly limited and local. It does not have its own OSM key wiki.
  • The word sector is aligned with number and street in terms of being a terse, single word.

Structure

Having five top-level columns suffixed 1-5 really bugs me. It seems to be calling out for some kind of structured usage.

As discussed in today's Schema TF meeting, my map idea won't work. (Or would have to be made really awkward to make it work.)

I don't mind your array with null values idea - I prefer it to five top-level columns.

One pattern we have used successfully in other parts of the Overture schema is an array of structures. While it's a little bit less susceptible to JSON Schema validation, it does provide some other benefits. We use it in names.rules and recently Brad Richardson (@brad-richardson) restructured transportation's when: vehicle: ... scoping construct to behave similarly. So maybe one idea that strikes a balance between consistency with the schema, structured-ness, and usability, might be:

---
properties:
  theme: addresses
  type: address
  humber: 10
  street: Downing Street
  sectors:
    - level: 1
      value: London
    - level: 3
      value: UK

@jwass
Copy link
Copy Markdown
Collaborator Author

Victor Schappert (@vcschapp) Thanks for the feedback. After discussing with a bunch of people on address-tf I think I'm leaning towards a single list. We are not going to align the levels across countries, they'll still just be populated in order on a per-country basis. So each country will have a known length, but each individual record might not have all values populated and some can be null.

Also I discussed with Matt Travis (@mtravis) from AddressCloud who said the phrase "address_level" is sometimes used to describe these fields so that's how I'm leaning.

We'll have a set of rules for each country that describes how address fields map to indexes within address_level.

@mtravis
Copy link
Copy Markdown

Victor Schappert (@vcschapp) Thanks for the feedback. After discussing with a bunch of people on address-tf I think I'm leaning towards a single list. We are not going to align the levels across countries, they'll still just be populated in order on a per-country basis. So each country will have a known length, but each individual record might not have all values populated and some can be null.

Also I discussed with Matt Travis (@mtravis) from AddressCloud who said the phrase "address_level" is sometimes used to describe these fields so that's how I'm leaning.

We'll have a set of rules for each country that describes how address fields map to indexes within address_level.

Jacob Wasserman (@jwass) Victor Schappert (@vcschapp) I'm really keen to use address_levels as opposed to admin or something else. And using a list is a great idea and allows us more scope for change in the future if needed.

@vcschapp
Copy link
Copy Markdown
Collaborator

Victor Schappert (@vcschapp) Thanks for the feedback. After discussing with a bunch of people on address-tf I think I'm leaning towards a single list. We are not going to align the levels across countries, they'll still just be populated in order on a per-country basis. So each country will have a known length, but each individual record might not have all values populated and some can be null.
Also I discussed with Matt Travis (@mtravis) from AddressCloud who said the phrase "address_level" is sometimes used to describe these fields so that's how I'm leaning.
We'll have a set of rules for each country that describes how address fields map to indexes within address_level.

Jacob Wasserman (@jwass) Victor Schappert (@vcschapp) I'm really keen to use address_levels as opposed to admin or something else. And using a list is a great idea and allows us more scope for change in the future if needed.

This is a another little papercut reason why I'd like to replace level with z_order as level it has little to do with other fields like address_level.

@jwass
Copy link
Copy Markdown
Collaborator Author

Victor Schappert (@vcschapp) Are you okay with address_levels as a list? Wanted to make sure before making the update

We recently decided that the initial addresses schema would be based on the
output of the Open Addresses (OA) "conform" schema. The definition of the fields
can be found at
https://github.com/openaddresses/openaddresses/blob/master/schema/layers/address_conform.json

This initial schema is a strawman intended for feedback and discussion.

Our initial schema looks at all properties included in the US Northeast distribution
which includes the following fields:
* id
* country (the first element in the filepath e.g. us/ma/city_of_boston-addresses-city.geojson is "us")
* subcountry (the second element in the filepath e.g. us/ma/city_of_boston-addresses-city.geojson is "ma") - This field should be renamed or be removed
* region
* district
* city
* postcode
* street
* number
* unit
Update the address schema to contain the following fields:
* country (required)
* street (required)
* number (required)
* unit (optional)
* postcode (required)
* region (optional)
* city (optional)
* district (optional)

In the future we may add rules that depending on the country, it will
make certain fields required or forbidden.
Remove region, city, district and replace with a more generic/flexible
levels name
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we would have references to other entites in divisions and other themes like named road segment instead of street name… But all of that can be added later as additional properties, I think this is good starting schema for addresses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants