Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,14 +11,14 @@ secrets managements, etc).

Some of the tools available in the Guardian Connector Scripts Hub are:

* Connector scripts to ingest data from data collection tools such as KoboToolbox, ODK, CoMapeo, and Locus Map,
* Connector scripts to ingest data from data collection tools such as KoboToolbox, ODK, CoMapeo, ArcGIS, and Locus Map,
and store this data (tabular and media attachments) in a data lake.
* A flow to download and store GeoJSON and GeoTIFF change detection alerts, post these to a CoMapeo Archive Server
API, and send a message to WhatsApp recipients via Twilio.
* Scripts to export data from a database into a specific format (e.g., GeoJSON).

![Available scripts, flows, and apps in gc-scripts-hub](gc-scripts-hub.jpg)
_A Windmill Workspace populated with the tools in this repository._
_A Windmill Workspace populated with some of the tools in this repository._

## Deploying the code to a Windmill workspace

Expand Down
25 changes: 25 additions & 0 deletions c_arcgis_account.resource-type.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
{
"type": "object",
"order": [
"username",
"password"
],
"$schema": "https://json-schema.org/draft/2020-12/schema",
"required": [
"server_url"
],
"properties": {
"username": {
"type": "string",
"default": "",
"nullable": false,
"description": "The username of your ArcGIS account"
},
"password": {
"type": "string",
"default": "",
"nullable": false,
"description": "The password of your ArcGIS account"
}
}
}
13 changes: 13 additions & 0 deletions f/connectors/arcgis/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# `arcgis_feature_layer`: Download Feature Layer from ArcGIS REST API

This script fetches the contents of an ArcGIS feature layer and stores it in a PostgreSQL database. Additionally, it downloads any attachments (e.g. from Survey123) and saves them to a specified directory.

Usage of this script requires you to have an ArcGIS account, in order to generate a token.

The feature layer URL can be found on the item details page of your layer on ArcGIS Online:

![Screenshot of a feature layer item page](arcgis.jpg)

This script uses the [ArcGIS REST API Query Feature Service / Layer](https://developers.arcgis.com/rest/services-reference/enterprise/query-feature-service-layer/) endpoint.

Note: we have opted not to use the [ArcGIS API for Python](https://developers.arcgis.com/python/latest/) library because it requires installing `libkrb5-dev` as a system-level dependency. Workers in Windmill can [preinstall binaries](https://www.windmill.dev/docs/advanced/preinstall_binaries), but it requires modifying the Windmill `docker-compose.yml`, which is too heavy-handed an approach for this simple fetch script.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Binary file added f/connectors/arcgis/arcgis.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
274 changes: 274 additions & 0 deletions f/connectors/arcgis/arcgis_feature_layer.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,274 @@
# requirements:
# psycopg2-binary
# requests~=2.32

import json
import logging
import os
from pathlib import Path

import requests

from f.common_logic.db_connection import postgresql
from f.connectors.geojson.geojson_to_postgres import main as save_geojson_to_postgres
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You asked about calling another script from this script. I see no problem with it. Let's try it out and see how it goes.


# type names that refer to Windmill Resources
c_arcgis_account = dict

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)


def main(
arcgis_account: c_arcgis_account,
feature_layer_url: str,
db: postgresql,
db_table_name: str,
attachment_root: str = "/persistent-storage/datalake",
):
storage_path = Path(attachment_root) / db_table_name

arcgis_token = get_arcgis_token(arcgis_account)

features = get_features_from_arcgis(feature_layer_url, arcgis_token)

features_with_attachments = download_feature_attachments(
features, feature_layer_url, arcgis_token, storage_path
)

features_with_global_ids = set_global_id(features_with_attachments)

save_geojson_file_to_disk(features_with_global_ids, storage_path)

# At this point, the ArcGIS data is GeoJSON-compliant, and we don't need anything
# from the REST API anymore. The data can therefore be handled further using the
# existing GeoJSON connector.
save_geojson_to_postgres(
db,
db_table_name,
str(storage_path / "data.geojson"),
storage_path,
False, # to not delete the GeoJSON file after its contents are written to the database.
# Users might like to have access to the GeoJSON file directly, in addition to the data
# in the database.
)


def get_arcgis_token(arcgis_account: c_arcgis_account):
"""
Generate an ArcGIS token using the provided account credentials.

Parameters
----------
arcgis_account : dict
A dictionary containing the ArcGIS account credentials with keys "username" and "password".

Returns
-------
str
The generated ArcGIS token.
"""
arcgis_username = arcgis_account["username"]
arcgis_password = arcgis_account["password"]

# According to the ArcGIS REST API documentation, you can set `client to `requestip`
# to generate a token based on the IP address of the request. However, this does not
# seem to work well, neither in local development nor in production. Therefore, we use
# `referer` as the client type, and use the base URL of the Windmill app as the referer.
# https://developers.arcgis.com/rest/services-reference/enterprise/generate-token/
token_response = requests.post(
"https://www.arcgis.com/sharing/rest/generateToken",
data={
"username": arcgis_username,
"password": arcgis_password,
"client": "referer",
"referer": os.environ.get("WM_BASE_URL"),
"f": "json",
},
)

arcgis_token = token_response.json().get("token")

return arcgis_token


def get_features_from_arcgis(feature_layer_url: str, arcgis_token: str):
"""
Fetch features from an ArcGIS feature layer using the provided token.

Parameters
----------
feature_layer_url : str
The URL of the ArcGIS feature layer.
arcgis_token : str
The ArcGIS token for authentication.

Returns
-------
list
A list of features retrieved from the ArcGIS feature layer.
"""
response = requests.get(
f"{feature_layer_url}/0/query",
params={
"where": "1=1", # get all features
"outFields": "*", # get all fields
"returnGeometry": "true",
"f": "geojson",
"token": arcgis_token,
},
)

if (
response.status_code != 200 or "error" in response.json()
): # ArcGIS sometimes returns 200 with an error message e.g. if a token is invalid
try:
error_message = (
response.json().get("error", {}).get("message", "Unknown error")
)
except (KeyError, ValueError):
error_message = "Unknown error"
raise ValueError(f"Error fetching features: {error_message}")

features = response.json().get("features", [])

logger.info(f"{len(features)} features fetched from the ArcGIS feature layer")
return features


def download_feature_attachments(
features: list, feature_layer_url: str, arcgis_token: str, storage_path: str
):
"""
Download attachments for each feature and save them to the specified directory.

Parameters
----------
features : list
A list of features for which attachments need to be downloaded.
feature_layer_url : str
The URL of the ArcGIS feature layer.
arcgis_token : str
The ArcGIS token for authentication.
storage_path : str
The directory where attachments should be saved.

Returns
-------
list
The list of features with updated properties including attachment information.
"""
total_downloaded_attachments = 0
skipped_attachments = 0

for feature in features:
object_id = feature["properties"]["objectid"]

attachments_response = requests.get(
f"{feature_layer_url}/0/{object_id}/attachments",
params={"f": "json", "token": arcgis_token},
)

attachments_response.raise_for_status()

attachments = attachments_response.json().get("attachmentInfos", [])

if not attachments:
logger.info(f"No attachments found for object_id {object_id}")
continue

for attachment in attachments:
attachment_id = attachment["id"]
attachment_name = attachment["name"]
attachment_content_type = attachment["contentType"]
attachment_keywords = attachment["keywords"]

feature["properties"][f"{attachment_keywords}_filename"] = attachment_name
feature["properties"][f"{attachment_keywords}_content_type"] = (
attachment_content_type
)

attachment_path = Path(storage_path) / "attachments" / attachment_name

if attachment_path.exists():
logger.debug(
f"File already exists, skipping download: {attachment_path}"
)
skipped_attachments += 1
continue

attachment_response = requests.get(
f"{feature_layer_url}/0/{object_id}/attachments/{attachment_id}",
params={"f": "json", "token": arcgis_token},
)

attachment_response.raise_for_status()

attachment_data = attachment_response.content

attachment_path.parent.mkdir(parents=True, exist_ok=True)

with open(attachment_path, "wb") as f:
f.write(attachment_data)

logger.info(
f"Downloaded attachment {attachment_name} (content type: {attachment_content_type})"
)

total_downloaded_attachments += 1

logger.info(f"Total downloaded attachments: {total_downloaded_attachments}")
logger.info(f"Total skipped attachments: {skipped_attachments}")
return features


def set_global_id(features: list):
"""
Set the feature ID of each feature to its global ID (which is a uuid).
ArcGIS uses global IDs to uniquely identify features, but the
feature ID is set to the object ID by default (which is an integer
incremented by 1 for each feature). UUIDs are more reliable for
uniquely identifying features, and using them instead is consistent
with how we store other data in the data warehouse.
https://support.esri.com/en-us/gis-dictionary/globalid

Parameters
----------
features : list
A list of features to update.

Returns
-------
list
The list of features with updated feature IDs.
"""
for feature in features:
feature["id"] = feature["properties"]["globalid"]

return features


def save_geojson_file_to_disk(
features: list,
storage_path: str,
):
"""
Save the GeoJSON file to disk.

Parameters
----------
features : list
A list of features to save.
storage_path : str
The directory where the GeoJSON file should be saved.
"""
geojson = {"type": "FeatureCollection", "features": features}

geojson_filename = Path(storage_path) / "data.geojson"

geojson_filename.parent.mkdir(parents=True, exist_ok=True)

with open(geojson_filename, "w") as f:
json.dump(geojson, f)

logger.info(f"GeoJSON file saved to: {geojson_filename}")
6 changes: 6 additions & 0 deletions f/connectors/arcgis/arcgis_feature_layer.script.lock
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
certifi==2024.12.14
charset-normalizer==3.4.1
idna==3.10
requests==2.32.3
urllib3==2.3.0
psycopg2-binary==2.9.10
50 changes: 50 additions & 0 deletions f/connectors/arcgis/arcgis_feature_layer.script.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
summary: 'ArcGIS: Download Feature Layer'
description: This script fetches the contents of an ArcGIS feature layer and stores it in a PostgreSQL database.
lock: '!inline f/connectors/arcgis/arcgis_feature_layer.script.lock'
concurrency_time_window_s: 0
kind: script
schema:
$schema: 'https://json-schema.org/draft/2020-12/schema'
type: object
order:
- arcgis_account
- feature_layer_url
- db
- db_table_name
- attachment_root
properties:
arcgis_account:
type: object
description: The name of the ArcGIS account to use for fetching the feature layer.
default: null
format: resource-c_arcgis_account
originalType: string
attachment_root:
type: string
description: >-
A path where ArcGIS attachments (e.g., from Survey123) will be stored. Attachment
files like photo and audio will be stored in the following directory schema:
`{attachment_root}/{db_table_name}/attachments/...`
default: /persistent-storage/datalake
originalType: string
db:
type: object
description: A database connection for storing tabular data.
default: null
format: resource-postgresql
db_table_name:
type: string
description: The name of the database table where the data will be stored.
default: null
originalType: string
pattern: '^.{1,54}$'
feature_layer_url:
type: string
description: The URL of the ArcGIS feature layer to fetch.
default: null
originalType: string
required:
- arcgis_account
- feature_layer_url
- db
- db_table_name
Loading