# API sync demonstration

This program syncs U.S. General Service Administration lease and property
holding records from a federal API to a local drive, then syncs them with
Big Local News' server using BLN's API.

This is not a complex program, but is intended to show how easy it is to build things.

We start with bringing in some required modules.

The `bln` client itself is helpful.

`requests` is a rather common module used to, among other things,
download files. It can do a lot more.

The rest are internal Python parts: `datetime` for calculating dates;
`json` for reading JSON-formatted data files;
`os` for doing some things with the operating system, like creating directories;
and `sys`, which here we use just to quit the Python program if we don't need to do anything else.

In [None]:
from bln.client import Client
import requests

import datetime
import json
import os
import sys

## API preparation

Next, we're going to retrieve the credentials for the APIs.
That's an abbreviation for the Application Programming Interface,
which ... is a lot to say. But it's basically a defined way for
one computer to do stuff with another computer -- retrieve information,
upload file, get a listing of projects, all stuff like that.

A lot of the work you want to do will typically involve several
API calls to bring it all together, in the same way few recipes
are just one step.

To request a federal API key, visit https://api.data.gov/signup/
if that link still works. Things were ... changing ... quickly in
early 2025.

Big Local News has instructions on getting an API key for their
service at https://bln-python-client.readthedocs.io/en/latest/gettingstarted.html#setup ...

In [None]:
bln_api = os.environ["BLN_API_TOKEN"]     # A consistent naming scheme is the hobgoblin of little minds
fed_api = os.environ["DATA_DOT_GOV"]

sync_log_file = "sync-log.json"

data_dir = "data/"
os.makedirs(data_dir, exist_ok=True)      # Create the data directory if it doesn't already exist

Let's build a couple functions to make the code somewhat more readable, though this code has plenty of problems remaining.

In [None]:
def fetch_log():
    global sync_log_file
    if not os.path.exists(sync_log_file):
        local_log = {}
        print(f"No log data found.")
    else:
        with open(sync_log_file, "r", encoding="utf-8") as infile:
            local_log = json.load(infile)
            print(f"{len(local_log):,} log entries found.")
    return(local_log)

In [None]:
def save_log(local_log):
    global sync_log_file
    with open(sync_log_file, "w", encoding="utf-8") as outfile:
        outfile.write(json.dumps(local_log, indent=4*" "))
    return

## Get needed data from Big Local News' API

We use the BLN API key to create an instance of BLN's client. That's just one line.

In [None]:
bln = Client(bln_api)

We need to look up the project and then pull information from it.

This builds out a dictionary of filenames and a timestamp for when the files
last changed. The update time is helpful for logging and analysis, potentially.

In [None]:
# Get the GSA project.
project = bln.get_project_by_name("GSA leases and properties")

In [None]:
# Get all the files in the project.
archived_files = {}
for f in project['files']:
    archived_files[f['name']] = f['updatedAt']
print(f"{len(archived_files):,} archived files found.")

## Pull information from GSA

This next part is just one line but it took a bit of effort.

The landing page for some of this stuff no longer had a link
to get an API key, and crucial parts of the documentation had
to be kind of analyzed.

The data.gov index isn't actually at www.data.gov or 
api.data.gov. The API is accessed through ...
catalog.data.gov.

The API follows the CKAN standard, except you have to change
the hostname, the computer part of it. For example, to see
what files are associated with a collection, you can hit the
docs at https://docs.ckan.org/en/2.10/api/index.html#get-able-api-functions
    
Right near the top it describes the `package_show` call. So rewriting
the example a little, and finding the GSA owned and leased properties
link in data.gov, we can write a single line to get a JSON-formatted
collection of details of that package, including the filenames.

In [None]:
r = requests.get("https://catalog.data.gov/api/3/action/package_show?id=inventory-of-owned-and-leased-properties-iolp")

## Bringing pieces together

Now we have identified the files that Big Local News has through its API.
We have details that include the filenames through the data.gov API.

In this case, there's something handy -- the regular files for the GSA
appear to be released each week, with prefixes like `2025-2-7` and
`2025-1-31`. They've already published a version number, you see.

In this case, we just need to check to see if the GSA site has any files
the BLN site doesn't have. If it does, we can fetch them to our own
computer, then pass them back to BLN.

The GSA stuff is viewable within the JSON, in a result:resources
section. Our call to `r/requests` here allows us to treat the download
as JSON and convert it into a Python-friendly object instantly.

So let's look at each GSA file and see if it exists on the BLN site;
if not, we need to add it to a to-do list or two.

In [None]:
additions = []
downloaded_files = {}

catalog_entries = r.json()['result']['resources']

for catalog_entry in catalog_entries:
    remoteurl = catalog_entry['url']
    remotefilename = remoteurl.split("/")[-1]
    if remotefilename not in archived_files:
        additions.append(catalog_entry)
        downloaded_files[remotefilename] = False
print(f"{len(additions):,} new entries found among {len(catalog_entries):,} source files.")

In [None]:
timestamp = datetime.datetime.isoformat(datetime.datetime.now(datetime.timezone.utc))

In [None]:
log_data = fetch_log()

In [None]:
log_data[timestamp] = {
    "additions": additions,
    "downloaded_files": downloaded_files,
    "archived_files": archived_files,
}

In [None]:
save_log(log_data)

## We might be done ... ?

If we have no additions to process, there's nothing left to download.
We've already updated the log file with our latest effort and we
can just quit, in that case.

In [None]:
if len(additions) == 0:
    print("No new records found. Stopping.")
    sys.exit()

## But if we weren't done ...

All the work above here has been just to identify if we have
new files to process. If we've reached this point,
`additions` has at least one new file for us to download.
Everything has built up to this.

Fortunately, it's pretty darn easy to download from GSA
and then upload to BLN. (It's even possible to not download
the file to the local computer, but that's ugly and looks
more confusing and also local copies are good, actually.)

So in this case we're looking at our additions, finding
the base filename, and trying to download it to our computer.

If the download process is successful, we then send it to
Big Local News. If that's succeessful, we update the
in-memory log and later save it. And then we're truly done.

In [None]:
print(f"{len(additions):,} new records found.")
project_id = project['id']
for addition in additions:
    remoteurl = addition['url']
    basefilename = remoteurl.split("/")[-1]
    targetfilename = data_dir + basefilename
    print(f"Trying to fetch {remoteurl} to {targetfilename}.")
    r = requests.get(remoteurl)
    if not r.ok:
        print(f"Error downloading {remoteurl} to {targetfilename}.")
    else:
        with open(targetfilename, "wb") as outfile:
            outfile.write(r.content)
        print(f"Trying to send {basefilename} to Big Local News.")
        bln.upload_file(project_id, targetfilename)
        log_data[timestamp]["downloaded_files"][basefilename] = True

In [None]:
save_log(log_data)