[WIP] Strawman Initial Addresses Schema#179
[WIP] Strawman Initial Addresses Schema#179Jacob Wasserman (jwass) wants to merge 1 commit intodevfrom
Conversation
We recently decided that the initial addresses schema would be based on the output of the Open Addresses (OA) "conform" schema. The definition of the fields can be found at https://github.com/openaddresses/openaddresses/blob/master/schema/layers/address_conform.json This initial schema is a strawman intended for feedback and discussion. Our initial schema looks at all properties included in the US Northeast distribution which includes the following fields: * id * country (the first element in the filepath e.g. us/ma/city_of_boston-addresses-city.geojson is "us") * subcountry (the second element in the filepath e.g. us/ma/city_of_boston-addresses-city.geojson is "ma") - This field should be renamed or be removed * region * district * city * postcode * street * number * unit
865b1fe to
4729ee3
Compare
|
Just came here to comment on "Would we want to link the country / region / district / city to their divisions? e.g. with country_division_id, etc?" in my opinion answer is yes, my main question is on how to do that, one possibility is to use hierarchies with same definition as in divisions theme to form some kind of standard across themes and it looks like this: Victor Schappert (@vcschapp) do you think hierarchies would be good match? Of course this will be additional work on addresses to find matches with correct city based on raw input data, but I also think it will drive quality and value of data set up significantly. |
If the divisions already contain the hierarchies, would it make sense to link to the most granular possible division? Like the locality then the hierarchy (region, country, etc) would just be derived from that? I suppose we could just copy the division's hierarchy which is what I think you're suggesting. Another thing I thought about last night was we could also link the street to the transportation segment. Just added another bullet about it. |
David Karlaš (@DavidKarlas) I've been doing some more digging here. It looks like Pelias actually ignores admin hierarchy data from OA and builds it itself. See https://github.com/pelias/wof-admin-lookup?tab=readme-ov-file: "So, for Pelias we actually ignore all admin hierarchy information from individual records, and generate it ourselves from the polygon data in Who's on First." I bet we could do something similar using divisions data - as you've laid out - and have a nice coherent dataset here. There's an extra step of figuring out when some addresses might go to a different admin area than the polygon that contains them too. |
Victor Schappert (vcschapp)
left a comment
There was a problem hiding this comment.
Nice to see progress on addresses.
- This PR reflects a change toward a more "structured" address as informed by OA, versus the previous design idea Paul Heersink (@pheersink) socialized that had more of a free-form text field. Do we want to increase structure like this? I thought the idea behind free-form was that it would be more flexible in the face of non-Western/less structured addressing schemes.
- Consider a
pattern: ^(\S.*)?\S$validation on properties to deny leading and trailing whitespace? - Suggested some property name tweaks to align more closely with divisions.
| Division geometry MUST be a Point as defined by GeoJSON schema. | ||
| It represents the approximate location of a position commonly | ||
| associated with the real-world entity modeled by the division | ||
| feature. |
There was a problem hiding this comment.
heh! Address geometry?
| city: | ||
| description: The city/locality for the address | ||
| type: string | ||
| minLength: 1 |
There was a problem hiding this comment.
Suggest locality to align with subtype=locality in the divisions schema.
| country_district: | ||
| description: The region folder name in the Open Addresses distribution | ||
| type: string | ||
| minLength: 1 |
There was a problem hiding this comment.
This one's a bit wacky
- I don't fully understand what it means.
- Is it a subdirectory within the OpenAddresses distribution? Do we want to marry the schema to OA to that extent?
There was a problem hiding this comment.
Removed in latest PR
| - "$ref": ../defs.yaml#/$defs/propertyContainers/overtureFeaturePropertiesContainer | ||
| properties: # JSON Schema: properties within GeoJSON top-level object 'properties' property | ||
| country: | ||
| description: The country code in the Open Addresses distribution |
There was a problem hiding this comment.
Does "in the OpenAddresses distribution" add value? Can we just say it's the ISO-3166-1 alpha-2 country code?
| district: | ||
| description: The county for the address | ||
| type: string | ||
| minLength: 1 |
There was a problem hiding this comment.
Suggest county to align with subtype=county within the divisions schema.
| unit: | ||
| description: The suite/unit/apartment for the address row | ||
| type: string | ||
| minLength: 1 No newline at end of file |
There was a problem hiding this comment.
Gar - missing NL at EOF!
Going back to Relationships in the Schema, one thing we said is that:
IMO the best thing to do is use I'm reluctant to add any direct FK relationships between them because there are lots of subtleties around addresses that aren't easy to resolve, and that we probably shouldn't resolve. e.g. City name of an address might be one thing from a postal delivery perspective and another thing from an actual municipal existence perspective. I imagine Drew Breunig (@PreciselyDrew) could list 10 complexities. I think the combination of the address feature's coordinates and the |
This is an initial schema for the addresses theme and is not ready to be merged. I'm hoping this sparks discussion with the schema group and the addresses folks.
Background
We recently decided to use the Open Addresses (OA) address output schema as a starting point for Overture addresses (https://github.com/openaddresses/openaddresses/blob/master/schema/layers/address_conform.json).
This Pull Request
This initial PR copies OA's by including:
We can convert OA outputs into this schema pretty straightforward. The schema also includes country and country_district which are pulled from the OA path - e.g. us/ma/... This would allow "us" as the country" and "ma" as the country_district.
Todo / Questions to Answer
Note that city and region are blank. Should we replace all of these with "Plainfield" and "CT" here and in similar instances? Another option is to replace these in OA directly with less modification here. (We should follow up with OA).
address_idsfield or similar.References
Other address schemas for reference: