Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Historic San José city boundaries #28

Closed
1ec5 opened this issue Sep 14, 2022 · 8 comments
Closed

Historic San José city boundaries #28

1ec5 opened this issue Sep 14, 2022 · 8 comments

Comments

@1ec5
Copy link
Member

1ec5 commented Sep 14, 2022

The City of San José maintains a public domain dataset of annexation areas. The evolution of San José’s boundary is especially notable for its expansion during the Dutch Hamann days and the slow process since then to fill in unincorporated “islands”. There is a well-known animation of the city’s boundary evolution on YouTube, but it only covers the period from 1900 to 2014 and doesn’t show the city in much detail.

We can turn the annexation areas from this dataset into a series of boundary relations representing the city limits from the time of one annexation to the next and upload these boundary relations to OpenHistoricalMap. OHM’s coverage of the South Bay is currently quite blank, but the addition of boundaries could provide some structure within which more data can be added.

@1ec5
Copy link
Member Author

1ec5 commented Sep 14, 2022

As with OHM’s import of historical county boundaries in OpenHistoricalMap/issues#418, we’ll need to end up with a minimal set of ways that are members of the relations. Even so, an influx of thousands of relations will be challenging to navigate. OpenHistoricalMap/issues#431 OpenHistoricalMap/issues#430 would make it easier to navigate these relations in lists of relations by appending start_date and end_date to the names.

@1ec5
Copy link
Member Author

1ec5 commented Sep 14, 2022

I’ve added Wikidata statements to suggest whether official_name:en should be “San Jose” or “San José” on each boundary. name:es should be “San José” in all cases. name:en is less clear. It should be “San José” for boundaries in the 19th century, but I’m unsure of the exact cutoff.

I’m also uncertain whether name:vi should be retroactively applied to the first boundary in 1850, which predates the modern practice of borrowing place names directly from English, or whether it should even be tagged on boundaries before around 1976 when Vietnamese began to be used locally.

@impiaaa
Copy link
Collaborator

impiaaa commented Sep 24, 2022

I've uploaded the processing scripts I used here. The general process was something like:

  1. Load the OPN/OPN_OpenDataService/Annexation Areas layer into a new QGIS project
  2. Run the "Check validity" and "Fix geometries" processes
  3. Run the "Date dissolve" script linked above. The script:
    1. Makes a list of all unique values of the "ANNEXDATE" and "DISANNEXDATE" columns, resulting in a timeline of dates to check
    2. For each date in the timeline, searches for all rows where the "ANNEXDATE" is before or on the date in question, and the "DISANNEXDATE" is after the next date or on the next date or NULL (indicating it is annexed in the present day); or, for the last date in the timeline, all rows that are currently annexed in the present day
    3. Runs the "Dissolve" process on the search results
    4. In the resulting dissolved feature, replaces the values for "ANNEXDATE" and "DISANNEXDATE" with the date in question and the date after, as temporary storage
    5. Adds the dissolved feature to the output
  4. Export the new layer of dissolved features to a GeoJSON
  5. Run the "san jose annexations" script. This script:
    1. Loads the GeoJSON
    2. Makes a list of every line segment, and all features it belongs to
    3. Joins all line segments that share nodes and belong to all the same features
    4. Writes an OSM XML file with a relation for every feature, including all of the now-joined line segments as members (without roles), and using the temporary "ANNEXDATE" and "DISANNEXDATE" fields for the actual start_date and end_date
  6. Use the "uniq" command to remove duplicate members (the script shouldn't have added these, but it did)
  7. Open the OSM XML file in JOSM
  8. Search for remaining lines with only one connection at one end and join them (the script should have caught these, but it didn't)
  9. Simplify all lines (this step had to wait because the prior processes required exactly matching coordinates between features)
  10. Assign roles to all relation members (the script didn't keep track of which lines made up inner rings and which made up outer rings) (this step was done somewhat automatically by (hackily) patching the JOSM "MultipolygonTest" validator to offer an automatic fix) (though that validator isn't able to handle self-intersecting rings (i.e., figure-eight shapes, where two corners of one or two rings touch), so those cases had to be handled manually) (the whole process also had to be done in many small steps because of memory leaks that would slow and eventually crash JOSM)
  11. Tweak tagging
  12. Fix end_dates (it should be one day earlier than the next date, not the same)

@1ec5
Copy link
Member Author

1ec5 commented Sep 24, 2022

Fix end_dates (it should be one day earlier than the next date, not the same)

Is this due to inaccuracies in the source data? I’ve been assuming OHM allows two features’ date ranges to “touch”, for example in OpenHistoricalMap/iD#137, because OHM doesn’t support beyond day precision yet.

@impiaaa
Copy link
Collaborator

impiaaa commented Sep 24, 2022

The "san jose annexations" script that I wrote makes a list of all relevant dates, then creates a boundary relation for each item in that list, with start_date set equal to that one item in the date list, and end_date set equal to the next item in the date list. This results in, for example, one relation ending on 2019-06-25 and the next relation starting on 2019-06-25. I haven't found confirmation, but I assume that end_date includes the last day the feature existed; that means that both of these two boundaries existed on 2019-06-25, which is incorrect. I expect the relations that the script produces to be temporally exclusive, so I see this as a bug, but I fixed the data after running the script, rather than fixing the bug and repeating steps 5 through 11.

@1ec5
Copy link
Member Author

1ec5 commented Sep 25, 2022

If you’re satisfied that the data is correct, then I’m satisfied. I looked at some city council annexation resolutions and saw that they don’t have language explicitly saying when they take effect, so it would take effect on the day it’s adopted. Making the previous version of the boundary end on the previous day is like saying that it took effect the midnight before the vote. I’m OK with that, since we’d only be able to get additional precision to the hour by conducting detailed research on a case-by-case basis. If it ever matters, we can deal with it manually.

By the way, if city council has ever annexed a property one day and another property the following day, step 12 would cause a boundary relation to have the same start_date and end_date, so it isn’t possible to prevent dates from overlapping on paper.

@impiaaa
Copy link
Collaborator

impiaaa commented Sep 28, 2022

I've finished uploading the data.

I ran into issues uploading relations. It turns out that relations as large as the ones in this data set take the server around a minute to process, and that process must finish before returning confirmation to the uploading client. The OHM server has a timeout set to 5 minutes, so more than 2 or 3 relations in one request can time out, leaving the uploading client (in my case JOSM) without confirmation, leading it to believe that the uploaded objects were not processed, leading to an inconsistent state and duplicate objects in subsequent uploads. My workaround has been to upload all nodes and ways in large batches first, then set JOSM to upload all relations one request at a time.

The number and size of the relations also means area downloads are much slower—downloading an area encompassing all of San José in one request is no longer possible. Hopefully this will either not harm casual editing, or get better over time.

@1ec5
Copy link
Member Author

1ec5 commented May 2, 2024

By the way, if city council has ever annexed a property one day and another property the following day, step 12 would cause a boundary relation to have the same start_date and end_date, so it isn’t possible to prevent dates from overlapping on paper.

This import has turned out to be a major source of inconsistency in OHM’s usage of end_date.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants