Skip to content

[WIP] Strawman Initial Addresses Schema#179

Closed
Jacob Wasserman (jwass) wants to merge 1 commit intodevfrom
addresses
Closed

[WIP] Strawman Initial Addresses Schema#179
Jacob Wasserman (jwass) wants to merge 1 commit intodevfrom
addresses

Conversation

@jwass
Copy link
Copy Markdown
Collaborator

@jwass Jacob Wasserman (jwass) commented Apr 28, 2024

This is an initial schema for the addresses theme and is not ready to be merged. I'm hoping this sparks discussion with the schema group and the addresses folks.

Background

We recently decided to use the Open Addresses (OA) address output schema as a starting point for Overture addresses (https://github.com/openaddresses/openaddresses/blob/master/schema/layers/address_conform.json).

This Pull Request

This initial PR copies OA's by including:

  • region
  • district
  • city
  • postcode
  • street
  • number
  • unit

We can convert OA outputs into this schema pretty straightforward. The schema also includes country and country_district which are pulled from the OA path - e.g. us/ma/... This would allow "us" as the country" and "ma" as the country_district.

Todo / Questions to Answer

  • Which fields if any should be required?
  • Should we always add the country from the OA top-level folder into each entry?
  • What to do when an address set doesn't contain the region or city but it's probably known. For example: us/ct/city_of_plainfield-addresses-city.geojson has entries like:
{"type":"Feature","properties":{"hash":"fd84e78f7c30c1ec","number":"14","street":"PINECREST DR","unit":"","city":"","district":"","region":"","postcode":"","id":""},"geometry":{"type":"Point","coordinates":[-71.9059254,41.7584033]}}
{"type":"Feature","properties":{"hash":"3f059ec63fee9030","number":"558","street":"PUTNAM RD","unit":"","city":"","district":"","region":"","postcode":"","id":""},"geometry":{"type":"Point","coordinates":[-71.900128,41.7582586]}}

Note that city and region are blank. Should we replace all of these with "Plainfield" and "CT" here and in similar instances? Another option is to replace these in OA directly with less modification here. (We should follow up with OA).

  • How to marry this with the existing address field in places
  • How should we reference buildings/places to these addresses? Likely an address_ids field or similar.
  • More I haven't thought of at the moment
  • Would we want to link the country / region / district / city to their divisions? e.g. with country_division_id, etc?
  • Would we want to link the address's street to the corresponding transportation segment(s)?

References

Other address schemas for reference:

@jwass Jacob Wasserman (jwass) changed the base branch from main to dev April 28, 2024 17:33
We recently decided that the initial addresses schema would be based on the
output of the Open Addresses (OA) "conform" schema. The definition of the fields
can be found at
https://github.com/openaddresses/openaddresses/blob/master/schema/layers/address_conform.json

This initial schema is a strawman intended for feedback and discussion.

Our initial schema looks at all properties included in the US Northeast distribution
which includes the following fields:
* id
* country (the first element in the filepath e.g. us/ma/city_of_boston-addresses-city.geojson is "us")
* subcountry (the second element in the filepath e.g. us/ma/city_of_boston-addresses-city.geojson is "ma") - This field should be renamed or be removed
* region
* district
* city
* postcode
* street
* number
* unit
@DavidKarlas
Copy link
Copy Markdown
Contributor

Just came here to comment on "Would we want to link the country / region / district / city to their divisions? e.g. with country_division_id, etc?" in my opinion answer is yes, my main question is on how to do that, one possibility is to use hierarchies with same definition as in divisions theme to form some kind of standard across themes and it looks like this:

[
  [
    {
      "division_id": "085395e73fffffff01d732732c18bb4e",
      "subtype": "country",
      "name": "New Zealand"
    },
    {
      "division_id": "0857c822bfffffff01d1d4553101e1c8",
      "subtype": "region",
      "name": "Chatham Islands"
    },
    {
      "division_id": "0857c75cffffffff017471111ebf8169",
      "subtype": "county",
      "name": "Chatham Islands Territory"
    },
    {
      "division_id": "0855485affffffff01f40c7dd929bdb3",
      "subtype": "locality",
      "name": "Kāingaroa"
    }
  ]
]

Victor Schappert (@vcschapp) do you think hierarchies would be good match?

Of course this will be additional work on addresses to find matches with correct city based on raw input data, but I also think it will drive quality and value of data set up significantly.

@jwass
Copy link
Copy Markdown
Collaborator Author

David Karlaš (@DavidKarlas)

Just came here to comment on "Would we want to link the country / region / district / city to their divisions? e.g. with country_division_id, etc?" in my opinion answer is yes, my main question is on how to do that, one possibility is to use hierarchies

If the divisions already contain the hierarchies, would it make sense to link to the most granular possible division? Like the locality then the hierarchy (region, country, etc) would just be derived from that? I suppose we could just copy the division's hierarchy which is what I think you're suggesting.

Another thing I thought about last night was we could also link the street to the transportation segment. Just added another bullet about it.

@jwass
Copy link
Copy Markdown
Collaborator Author

Jacob Wasserman (jwass) commented Apr 30, 2024

Of course this will be additional work on addresses to find matches with correct city based on raw input data, but I also think it will drive quality and value of data set up significantly.

David Karlaš (@DavidKarlas) I've been doing some more digging here. It looks like Pelias actually ignores admin hierarchy data from OA and builds it itself. See https://github.com/pelias/wof-admin-lookup?tab=readme-ov-file: "So, for Pelias we actually ignore all admin hierarchy information from individual records, and generate it ourselves from the polygon data in Who's on First." I bet we could do something similar using divisions data - as you've laid out - and have a nice coherent dataset here. There's an extra step of figuring out when some addresses might go to a different admin area than the polygon that contains them too.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to see progress on addresses.

  • This PR reflects a change toward a more "structured" address as informed by OA, versus the previous design idea Paul Heersink (@pheersink) socialized that had more of a free-form text field. Do we want to increase structure like this? I thought the idea behind free-form was that it would be more flexible in the face of non-Western/less structured addressing schemes.
  • Consider a pattern: ^(\S.*)?\S$ validation on properties to deny leading and trailing whitespace?
  • Suggested some property name tweaks to align more closely with divisions.

Comment on lines +11 to +14
Division geometry MUST be a Point as defined by GeoJSON schema.
It represents the approximate location of a position commonly
associated with the real-world entity modeled by the division
feature.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heh! Address geometry?

Comment on lines +39 to +42
city:
description: The city/locality for the address
type: string
minLength: 1
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest locality to align with subtype=locality in the divisions schema.

Comment on lines +27 to +30
country_district:
description: The region folder name in the Open Addresses distribution
type: string
minLength: 1
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one's a bit wacky

  • I don't fully understand what it means.
  • Is it a subdirectory within the OpenAddresses distribution? Do we want to marry the schema to OA to that extent?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in latest PR

- "$ref": ../defs.yaml#/$defs/propertyContainers/overtureFeaturePropertiesContainer
properties: # JSON Schema: properties within GeoJSON top-level object 'properties' property
country:
description: The country code in the Open Addresses distribution
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does "in the OpenAddresses distribution" add value? Can we just say it's the ISO-3166-1 alpha-2 country code?

Comment on lines +35 to +38
district:
description: The county for the address
type: string
minLength: 1
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest county to align with subtype=county within the divisions schema.

unit:
description: The suite/unit/apartment for the address row
type: string
minLength: 1 No newline at end of file
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gar - missing NL at EOF!

@vcschapp
Copy link
Copy Markdown
Collaborator

Just came here to comment on "Would we want to link the country / region / district / city to their divisions? e.g. with country_division_id, etc?" in my opinion answer is yes, my main question is on how to do that, one possibility is to use hierarchies with same definition as in divisions theme to form some kind of standard across themes and it looks like this:

[
  [
    {
      "division_id": "085395e73fffffff01d732732c18bb4e",
      "subtype": "country",
      "name": "New Zealand"
    },
    {
      "division_id": "0857c822bfffffff01d1d4553101e1c8",
      "subtype": "region",
      "name": "Chatham Islands"
    },
    {
      "division_id": "0857c75cffffffff017471111ebf8169",
      "subtype": "county",
      "name": "Chatham Islands Territory"
    },
    {
      "division_id": "0855485affffffff01f40c7dd929bdb3",
      "subtype": "locality",
      "name": "Kāingaroa"
    }
  ]
]

Victor Schappert (@vcschapp) do you think hierarchies would be good match?

Of course this will be additional work on addresses to find matches with correct city based on raw input data, but I also think it will drive quality and value of data set up significantly.

Going back to Relationships in the Schema, one thing we said is that:

Many relationships are implied by geometry ("contained by the polygon for the US"), or transitively through some standard value like an ISO country code ("US") or a highway network ("I-55"). Direct foreign key references should be used frugally, in cases where unambiguously deducing the relationship for a high priority use case is impossible or unnecessarily burdensome.

IMO the best thing to do is use country and region in exactly the same way in both addresses and divisions so you can join them at that level.

I'm reluctant to add any direct FK relationships between them because there are lots of subtleties around addresses that aren't easy to resolve, and that we probably shouldn't resolve.

e.g. City name of an address might be one thing from a postal delivery perspective and another thing from an actual municipal existence perspective.

I imagine Drew Breunig (@PreciselyDrew) could list 10 complexities.

I think the combination of the address feature's coordinates and the country/region aspect will already prove very useful and we might want to draw the line at adding more complexity above that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants