Skip to content

Conversation

rudokemper
Copy link
Member

Goal

Closes #30.

This PR introduces:

  1. A script to download the contents of an ArcGIS Feature Layer, including attachments, using the ArcGIS REST API. (Survey123 results are stored as a Feature Layer, with the same schema as a generic Feature Layer.)
  2. A script to write a GeoJSON file to Postgres.

Why both scripts?

ArcGIS Feature Layer content can be requested directly as GeoJSON, so rather than building a specialized ingestion script, I created a reusable "write GeoJSON to the data warehouse" script. The ArcGIS script leverages this to handle database writes.

What I changed

  • Created an c_arcgis_account resource type composed of username and password (which is what is required by the ArcGIS API to exchange for a token to make requests).
  • Created an arcgis_feature_layer.py script that requests an access token, feature layer content, and attachments from the ArcGIS API. The structure of this script is broadly similar to those of the other connector scripts, specific leverage of the API endpoints aside.
  • Created a geojson_to_postgres.py script that takes a GeoJSON file path, applies "geometry() --> g__type & g__coordinates" transformations, and writes to database, with an optional "Delete GeojSON?" boolean and function to delete the file from disk if true. (Similar to the Locus Map script.)
  • e2e tests for both scripts. These are also in-line with how we do tests for other connector scripts. For the ArcGIS script, server responses are provided that match the schema of what the ArcGIS REST API returns. (I tested this with a real ArcGIS account.)

Future extensions

(1) Much of the data we ingest (e.g., from CoMapeo or Locus Map) follows a GeoJSON-esque schema anyway—where g__type and g__coordinates store the spatial data that would be in a feature's geometry object, and the rest of the columns are its properties. A similar approach could work for those scripts:

  • Download attachments
  • Apply necessary transformations for GeoJSON compliance
  • Use geojson_to_postgres to handle database writes

This would minimize redundant DB write logic and, if we ever adopt PostGIS, allows us to update the implementation in one place.

(2) Additionally, we might see fit to rewrite the gc_uploader app to accept either a GeoJSON file, a Locus Map export, or other formats in the future. The app would detect the file's data schema and apply any necessary transformations to ensure GeoJSON compliance before passing it to the geojson_to_postgres script. The broader goal here is to move toward an "Upload Anything" UI—similar to Felt—where users can upload different geospatial data formats seamlessly.

If we agree on the above two possible extensions, I will create issues to track the proposed refactors.

@rudokemper rudokemper requested a review from IamJeffG March 4, 2025 21:22
@rudokemper
Copy link
Member Author

Oh, I will add that I briefly considered creating a Flow to connect the two scripts, instead of directly importing geojson_to_postgres as a module. However I opted not to do this in order to keep the ArcGIS script as a standalone script, much like the other connector scripts. And didn't see any downsides to importing from another script.

Open to other thoughts.


This script uses the [ArcGIS REST API Query Feature Service / Layer](https://developers.arcgis.com/rest/services-reference/enterprise/query-feature-service-layer/) endpoint.

Note: we have opted not to use the [ArcGIS API for Python](https://developers.arcgis.com/python/latest/) library because it requires installing `libkrb5-dev` as a system-level dependency. Workers in Windmill can [preinstall binaries](https://www.windmill.dev/docs/advanced/preinstall_binaries), but it requires modifying the Windmill `docker-compose.yml`, which is too heavy-handed an approach for this simple fetch script. No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -0,0 +1,3 @@
# `geojson_to_postgres`: Upload a GeoJSON file to the data warehouse

This script imports data from a GeoJSON file into a database table. It reads a file containing spatial data, transforms the data into a structured format, and inserts it into a PostgreSQL database table. Optionally, it then delete the export file. No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would highlight front-and-center some unique characteristics about the Geojson -> pg

  1. It flattens properties into their own columns
  2. It's not using PostGIS

This is far from the only (or even most intuitive?) way to store geojson in the database. For example other ways might be:

  1. just create a single JSONB column and throw the entire blob in there.
  2. Install PostGIS extension, and then you can
    • represent the "geometry" more faithfully.
    • just use ogr2ogr to inject, or I wouldn't be surprised if PostGIS has other functions to help flatten json.

I'm not suggesting you switch to either of these -- just make clear what it is that we are doing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also document that it looks for a uniique "id" as primary key. And where it that, inside "properties" or outside?



@pytest.fixture
def arcgis_server(mocked_responses):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

import requests

from f.common_logic.db_connection import postgresql
from f.connectors.geojson.geojson_to_postgres import main as save_geojson_to_postgres
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You asked about calling another script from this script. I see no problem with it. Let's try it out and see how it goes.

@IamJeffG
Copy link
Contributor

your future extension ideas sound good too

@rudokemper rudokemper merged commit 4fc0f28 into main Apr 11, 2025
1 check passed
@rudokemper rudokemper deleted the arcgis-script branch April 11, 2025 17:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add script to download Survey123 responses

2 participants