
# HDX Datastore API: Python Quickstart Notebook

This notebook shows how to query the HDX Datastore API from Python using both native (`datastore_search`) and SQL (`datastore_search_sql`).

**Please note:**  
- Use your own API token. Do not commit or share it. Treat it like a password and keep it safe.
- If using Google Colab, you can set your token at runtime in a cell. Do not save it to Drive or source control.  
- Replace placeholder values like `<YOUR_RESOURCE_ID>` with real ones.



## 1. Set your API token

Set your API token by saving it as an environment variable (`HDX_API_TOKEN`). In the notebook, load it with a config file or create a secret in notebook editors like Google Collab. Do not hardcode or commit your token in the notebook.


In [None]:
import os

# Save as secret called "HDX_API_TOKEN"
# Or if you already stored the token as an environment variable
HDX_API_TOKEN = os.getenv("HDX_API_TOKEN")


## 2. Install dependencies and import libraries
For Python packages, you will need `requests` and `pandas`.


In [None]:
import json
import time
import urllib.parse
import requests
import pandas as pd

BASE = "https://blue.demo.data-humdata-org.ahconu.org/api/3/action"
RESOURCE_ID = "221ec163-6307-42de-b77e-876e6f0d351b"  # replace with your resource id
API_TOKEN = os.getenv("HDX_API_TOKEN")  # should be set above
HEADERS = {"Authorization": API_TOKEN} if API_TOKEN else {}
print("Using resource:", RESOURCE_ID)

Using resource: 221ec163-6307-42de-b77e-876e6f0d351b



## 3. Helper functions
Basic backoff, native calls, SQL calls, and paginated fetch helpers. You may not use all of these!

In [None]:
# GET with basic exponential backoff and JSON response.
def get_with_backoff(url, headers=None, tries=5, timeout=60):
    for i in range(tries):
        r = requests.get(url, headers=headers or {}, timeout=timeout)
        if r.status_code in (429, 500, 502, 503, 504):
            # back off and try again
            time.sleep(2 ** i)
            continue
        r.raise_for_status()
        return r.json()
    raise RuntimeError(f"Request failed after {tries} tries: {url}")

# Search datastore with pagination limit of 32,000
def datastore_search(resource_id, limit=32000, offset=0, filters=None, fields=None):
    params = {
        "resource_id": resource_id,
        "limit": str(limit),
        "offset": str(offset),
    }
    if filters:
        params["filters"] = json.dumps(filters)
    if fields:
        params["fields"] = ",".join(fields)
    url = f"{BASE}/datastore_search?{urllib.parse.urlencode(params)}"
    return get_with_backoff(url, headers=HEADERS)

# Native fetch with pagination using limit and offset
def fetch_all_native(resource_id, page_size=32000, filters=None, fields=None):
    out = []
    offset = 0
    total = None
    while True:
        resp = datastore_search(resource_id, limit=page_size, offset=offset, filters=filters, fields=fields)
        if not resp.get("success"):
            raise RuntimeError(f"API indicated failure: {resp}")
        result = resp["result"]
        if total is None:
            total = result.get("total", 0)
        rows = result.get("records", [])
        if not rows:
            break
        out.extend(rows)
        offset += len(rows)
        if offset >= total:
            break
    return pd.DataFrame(out)

# SQL search
def datastore_search_sql(sql):
    q = {"sql": sql}
    url = f"{BASE}/datastore_search_sql?{urllib.parse.urlencode(q)}"
    return get_with_backoff(url, headers=HEADERS)

# Iterate using SQL with stable ORDER BY and OFFSET
def fetch_all_sql(resource_id, order_by, page_size=32000, where=None, fields="*"):
    dfs = []
    offset = 0
    while True:
        sql = f'SELECT {fields} FROM "{resource_id}"'
        if where:
            sql += f" WHERE {where}"
        sql += f" ORDER BY {order_by} LIMIT {page_size} OFFSET {offset}"
        data = datastore_search_sql(sql)
        rows = data.get("result", {}).get("records", [])
        if not rows:
            break
        dfs.append(pd.DataFrame(rows))
        offset += len(rows)
        if len(rows) < page_size:
            break
    return pd.concat(dfs, ignore_index=True) if dfs else pd.DataFrame()


## 4. Minimal probe
Confirm access and the presence of rows using a small native HDX Datastore API call. This will simply confirm that you are able to access the API with your API token and `resource_id`.

*Refer to the documentation with troubleshooting if you are unable to access with this minimal probe.*


In [None]:
# Minimal probe to confirm datastore access
try:
    probe = datastore_search(RESOURCE_ID, limit=1)
    if not probe.get("success"):
        raise RuntimeError(f"Probe failed: {probe}")

    result = probe.get("result", {})
    total = result.get("total")
    records = result.get("records", [])

    print("Probe success:", probe.get("success"))
    print("Total rows in table:", total)
    if records:
        print("Example record:")
        print(records[0])
    else:
        print("No rows returned. Try removing filters or checking the resource_id.")

except Exception as e:
    print("Probe request failed:", e)

Probe success: True
Total rows in table: 300813
Example record:
{'_id': 1, 'org': 'COOPI', 'org_acronym': 'Cooperazione Internazionale', 'org_type': 'INGO', 'sector_cluster': 'Early Recovery & Livelihoods', 'activity_name': 'Provision of Cash-Grant', 'status': 'Completed', 'adm1_name': 'Yobe', 'adm1_code': 'NGA036', 'adm2_name': 'Geidam', 'adm2_code': 'NG036006', 'HRP': 'No', 'response_type': 'Humanitarian', 'reporting_implementing': 'Yes', 'date_month': 'April', 'date_year': 2022, 'operation_type': 'Reporting'}



## 5. Native fetch with filters and pagination
Adjust `filters` and `fields` to match the schema of the `resource_id` you are calling. This is just a simple example which can be edited with your desired filters based on the data resource schema and data types that you are calling.

In [None]:
# Example: native fetch with filters and pagination
filters = {
    "org": "IOM"        # replace with a valid column:value in your dataset
}
fields = None

try:
    df_native = fetch_all_native(
        RESOURCE_ID,
        page_size=32000,
        filters=filters,
        fields=fields
    )

    n_rows = len(df_native)
    print(f"Native fetch returned {n_rows} rows")
    if n_rows == 0:
        print("No rows matched the filter. Try adjusting filters or check column names/types.")
    else:
        display(df_native.head())

except Exception as e:
    print("Native fetch failed:", e)

Native fetch returned 1769 rows


Unnamed: 0,_id,org,org_acronym,org_type,sector_cluster,activity_name,status,adm1_name,adm1_code,adm2_name,adm2_code,HRP,response_type,reporting_implementing,date_month,date_year,operation_type
0,6,IOM,International Organization For Migration,UN Agency,Early Recovery & Livelihoods,Vocational skills training,Ongoing,Borno,NGA008,Bama,NG008003,Yes,Humanitarian,Yes,April,2022,Reporting
1,101,IOM,International Organization For Migration,UN Agency,Early Recovery & Livelihoods,School/ classroom rehab/const,Completed,Adamawa,NGA002,Mubi North,NG002014,Yes,Humanitarian,Yes,June,2022,Reporting
2,102,IOM,International Organization For Migration,UN Agency,Early Recovery & Livelihoods,Cash for Work,Completed,Adamawa,NGA002,Mubi North,NG002014,Yes,Humanitarian,Yes,June,2022,Reporting
3,103,IOM,International Organization For Migration,UN Agency,Early Recovery & Livelihoods,Vocational skills training,Completed,Adamawa,NGA002,Mubi North,NG002014,Yes,Humanitarian,Yes,June,2022,Reporting
4,104,IOM,International Organization For Migration,UN Agency,Early Recovery & Livelihoods,Provide start up kit,Completed,Adamawa,NGA002,Mubi North,NG002014,Yes,Humanitarian,Yes,June,2022,Reporting



## 6. SQL fetch with stable ordering
Here is a SQL example with a deterministic column for `ORDER BY` which can be used with a primary key or a composite like `"date"` or `"id"`.


In [None]:
# Example: SQL fetch with stable ordering and pagination
order_by = "_id"
where    = "org='IOM'"
fields   = "*"

try:
    df_sql = fetch_all_sql(
        RESOURCE_ID,
        order_by=order_by,
        page_size=32000,
        where=where,
        fields=fields
    )

    n_rows = len(df_sql)
    print(f"SQL fetch returned {n_rows} rows")
    if n_rows == 0:
        print("No rows matched the WHERE clause. Check column names, values, and case sensitivity.")
    else:
        display(df_sql.head())

except Exception as e:
    print("SQL fetch failed:", e)

SQL fetch returned 1769 rows


Unnamed: 0,_id,_full_text,org,org_acronym,org_type,sector_cluster,activity_name,status,adm1_name,adm1_code,adm2_name,adm2_code,HRP,response_type,reporting_implementing,date_month,date_year,operation_type
0,6,'2022':23 'agency':7 'april':22 'bama':17 'bor...,IOM,International Organization For Migration,UN Agency,Early Recovery & Livelihoods,Vocational skills training,Ongoing,Borno,NGA008,Bama,NG008003,Yes,Humanitarian,Yes,April,2022,Reporting
1,101,'2022':24 'adamawa':15 'agency':7 'classroom':...,IOM,International Organization For Migration,UN Agency,Early Recovery & Livelihoods,School/ classroom rehab/const,Completed,Adamawa,NGA002,Mubi North,NG002014,Yes,Humanitarian,Yes,June,2022,Reporting
2,102,'2022':24 'adamawa':15 'agency':7 'cash':11 'c...,IOM,International Organization For Migration,UN Agency,Early Recovery & Livelihoods,Cash for Work,Completed,Adamawa,NGA002,Mubi North,NG002014,Yes,Humanitarian,Yes,June,2022,Reporting
3,103,'2022':24 'adamawa':15 'agency':7 'completed':...,IOM,International Organization For Migration,UN Agency,Early Recovery & Livelihoods,Vocational skills training,Completed,Adamawa,NGA002,Mubi North,NG002014,Yes,Humanitarian,Yes,June,2022,Reporting
4,104,'2022':25 'adamawa':16 'agency':7 'completed':...,IOM,International Organization For Migration,UN Agency,Early Recovery & Livelihoods,Provide start up kit,Completed,Adamawa,NGA002,Mubi North,NG002014,Yes,Humanitarian,Yes,June,2022,Reporting



## 7. Optional: Save to CSV
If you would like, you can save the ouptut to a CSV or use it in whatever tool or pipeline you are building! See the documentation to learn more.

In [None]:
df_native.to_csv("native_results.csv", index=False)
df_sql.to_csv("sql_results.csv", index=False)
print("Wrote native_results.csv and sql_results.csv")


## 8. Troubleshooting quick checks

Here are some quick sanity checks to help troubleshoot. If you see empty results or errors:
1. Verify your `RESOURCE_ID` is correct and the resource has HDX Datastore API access enabled.  
2. Remove filters and try `limit=1` again.  
3. For SQL, ensure the table name is quoted `"<resource_id>"` and the query is URL-encoded by the helper.  
4. Slow or intermittent errors can be rate limiting. Re-run cells after a short pause and add throttling.
5. Confirm `os.environ["HDX_API_TOKEN"]` is set in this session and is correct for your account.

For all other questions, please refer to our documentation.
