-
Notifications
You must be signed in to change notification settings - Fork 1
Add two connector
scripts: ArcGIS Feature Layer (for Survey123 or other), and GeoJSON to Postgres
#81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Oh, I will add that I briefly considered creating a Flow to connect the two scripts, instead of directly importing Open to other thoughts. |
|
||
This script uses the [ArcGIS REST API Query Feature Service / Layer](https://developers.arcgis.com/rest/services-reference/enterprise/query-feature-service-layer/) endpoint. | ||
|
||
Note: we have opted not to use the [ArcGIS API for Python](https://developers.arcgis.com/python/latest/) library because it requires installing `libkrb5-dev` as a system-level dependency. Workers in Windmill can [preinstall binaries](https://www.windmill.dev/docs/advanced/preinstall_binaries), but it requires modifying the Windmill `docker-compose.yml`, which is too heavy-handed an approach for this simple fetch script. No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
f/connectors/geojson/README.md
Outdated
@@ -0,0 +1,3 @@ | |||
# `geojson_to_postgres`: Upload a GeoJSON file to the data warehouse | |||
|
|||
This script imports data from a GeoJSON file into a database table. It reads a file containing spatial data, transforms the data into a structured format, and inserts it into a PostgreSQL database table. Optionally, it then delete the export file. No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would highlight front-and-center some unique characteristics about the Geojson -> pg
- It flattens properties into their own columns
- It's not using PostGIS
This is far from the only (or even most intuitive?) way to store geojson in the database. For example other ways might be:
- just create a single JSONB column and throw the entire blob in there.
- Install PostGIS extension, and then you can
- represent the "geometry" more faithfully.
- just use ogr2ogr to inject, or I wouldn't be surprised if PostGIS has other functions to help flatten json.
I'm not suggesting you switch to either of these -- just make clear what it is that we are doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also document that it looks for a uniique "id" as primary key. And where it that, inside "properties" or outside?
|
||
|
||
@pytest.fixture | ||
def arcgis_server(mocked_responses): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
import requests | ||
|
||
from f.common_logic.db_connection import postgresql | ||
from f.connectors.geojson.geojson_to_postgres import main as save_geojson_to_postgres |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You asked about calling another script from this script. I see no problem with it. Let's try it out and see how it goes.
your future extension ideas sound good too |
Goal
Closes #30.
This PR introduces:
Why both scripts?
ArcGIS Feature Layer content can be requested directly as GeoJSON, so rather than building a specialized ingestion script, I created a reusable "write GeoJSON to the data warehouse" script. The ArcGIS script leverages this to handle database writes.
What I changed
c_arcgis_account
resource type composed of username and password (which is what is required by the ArcGIS API to exchange for a token to make requests).arcgis_feature_layer.py
script that requests an access token, feature layer content, and attachments from the ArcGIS API. The structure of this script is broadly similar to those of the other connector scripts, specific leverage of the API endpoints aside.geojson_to_postgres.py
script that takes a GeoJSON file path, applies "geometry()
-->g__type
&g__coordinates
" transformations, and writes to database, with an optional "Delete GeojSON?" boolean and function to delete the file from disk if true. (Similar to the Locus Map script.)Future extensions
(1) Much of the data we ingest (e.g., from CoMapeo or Locus Map) follows a GeoJSON-esque schema anyway—where
g__type
andg__coordinates
store the spatial data that would be in a feature'sgeometry
object, and the rest of the columns are its properties. A similar approach could work for those scripts:geojson_to_postgres
to handle database writesThis would minimize redundant DB write logic and, if we ever adopt PostGIS, allows us to update the implementation in one place.
(2) Additionally, we might see fit to rewrite the
gc_uploader
app to accept either a GeoJSON file, a Locus Map export, or other formats in the future. The app would detect the file's data schema and apply any necessary transformations to ensure GeoJSON compliance before passing it to thegeojson_to_postgres
script. The broader goal here is to move toward an "Upload Anything" UI—similar to Felt—where users can upload different geospatial data formats seamlessly.If we agree on the above two possible extensions, I will create issues to track the proposed refactors.