These scripts were written for use at Distilled. They take data from Stat, and upload that data to BigQuery. In general, we use this to visualize rankings in Data Studio.
To use this tool, we assume a Stat account exists with API credentials, as well as a GCP project where you can upload BigQuery data.
./init
This will write your Stat API credentials to a plain text file. The context these will be used in is assumed to be a machine that runs a cron job to update client ranking data.
./client_init name
This command creates a new data set with a client's name as provided. The client
is given an (initially empty) table for ranking data, and a view on that table
defined by the contents of clustered_legacy.sql
.
./fix_utf <infile.csv | ./nld_from_csv.py | ./nld_file_insert name
Stat CSV exports are in UTF-16, which breaks BQ's load command. This series of
commands makes the exports UTF-8, then creates a newline-delimited JSON
representation, and finally inserts that JSON into the BQ table for the client
with dataset name
.
./client_update name site_id
This (without any other side effects or temporary files) gets data from
Stat's API, structures it to match the BQ schema, and inserts it into
the client's ranking table. site_id
should be the site_id of the
corresponding Stat project.
./job
If you want to update rankings automatically, job
can be executed
with cron every day. You'll need to make sure all clients are listed
in clients.txt
.