# API sync demonstration

This program syncs U.S. General Service Administration lease and property
holding records from a federal API to a local drive, then syncs them with
Big Local News' server using BLN's API.

This is not a complex program, but is intended to show how easy it is to build things.

We start with bringing in some required modules.

The `bln` client itself is helpful.

`requests` is a rather common module used to, among other things,
download files. It can do a lot more.

The rest are internal Python parts: `datetime` for calculating dates;
`json` for reading JSON-formatted data files;
`os` for doing some things with the operating system, like creating directories;
and `sys`, which here we use just to quit the Python program if we don't need to do anything else.

In [None]:
from bln.client import Client
import requests

import datetime
import json
import os
import sys

## API preparation

Next, we're going to retrieve the credentials for the APIs.
That's an abbreviation for the Application Programming Interface,
which ... is a lot to say. But it's basically a defined way for
one computer to do stuff with another computer -- retrieve information,
upload file, get a listing of projects, all stuff like that.

A lot of the work you want to do will typically involve several
API calls to bring it all together, in the same way few recipes
are just one step.

To request a federal API key, visit https://api.data.gov/signup/
if that link still works. Things were ... changing ... quickly in
early 2025.

Big Local News has instructions on getting an API key for their
service at https://bln-python-client.readthedocs.io/en/latest/gettingstarted.html#setup ...

In [None]:
bln_api = os.environ["BLN_API_TOKEN"]     # A consistent naming scheme is the hobgoblin of little minds
fed_api = os.environ["DATA_DOT_GOV"]

sync_log_file = "sync-log.json"

data_dir = "data/"
os.makedirs(data_dir, exist_ok=True)      # Create the data directory if it doesn't already exist

Now, let's take the BLN API key and use it to create an instance of BLN's client. That's just one line.

In [None]:
bln = Client(bln_api)

Let's build a couple functions to make the code somewhat more readable, though this code has plenty of problems remaining.

In [None]:
def fetch_log():
    global sync_log_file
    if not os.path.exists(sync_log_file):
        local_log = {}
        print(f"No log data found.")
    else:
        with open(sync_log_file, "r", encoding="utf-8") as infile:
            local_log = json.load(infile)
            print(f"{len(local_log):,} log entries found.")
    return(local_log)

In [None]:
def save_log(local_log):
    global sync_log_file
    with open(sync_log_file, "w", encoding="utf-8") as outfile:
        outfile.write(json.dumps(local_log, indent=4*" "))
    return

In [None]:
# Get the GSA project.
project = bln.get_project_by_name("GSA leases and properties")

In [None]:
# Get all the files in the project.
archived_files = {}
for f in project['files']:
    archived_files[f['name']] = f['updatedAt']
print(f"{len(archived_files):,} archived files found.")

In [None]:
r = requests.get("https://catalog.data.gov/api/3/action/package_show?id=inventory-of-owned-and-leased-properties-iolp")

In [None]:
additions = []
downloaded_files = {}

catalog_entries = r.json()['result']['resources']

for catalog_entry in catalog_entries:
    remoteurl = catalog_entry['url']
    remotefilename = remoteurl.split("/")[-1]
    if remotefilename not in archived_files:
        additions.append(catalog_entry)
        downloaded_files[remotefilename] = False
print(f"{len(additions):,} new entries found among {len(catalog_entries):,} source files.")

In [None]:
timestamp = datetime.datetime.isoformat(datetime.datetime.now(datetime.timezone.utc))

In [None]:
log_data = fetch_log()

In [None]:
log_data[timestamp] = {
    "additions": additions,
    "downloaded_files": downloaded_files,
    "archived_files": archived_files,
}

In [None]:
save_log(log_data)

In [None]:
if len(additions) == 0:
    print("No new records found. Stopping.")
    sys.exit()

In [None]:
print(f"{len(additions):,} new records found.")
project_id = project['id']
for addition in additions:
    remoteurl = addition['url']
    basefilename = remoteurl.split("/")[-1]
    targetfilename = data_dir + basefilename
    print(f"Trying to fetch {remoteurl} to {targetfilename}.")
    r = requests.get(remoteurl)
    if not r.ok:
        print(f"Error downloading {remoteurl} to {targetfilename}.")
    else:
        with open(targetfilename, "wb") as outfile:
            outfile.write(r.content)
        print(f"Trying to send {basefilename} to BigLocalNews.")
        bln.upload_file(project_id, targetfilename)
        log_data[timestamp]["downloaded_files"][basefilename] = True

In [None]:
save_log(log_data)