Skip to content

GCL-UCC-NG911/shillelagh

 
 

Repository files navigation

Shillelagh

https://coveralls.io/repos/github/betodealmeida/shillelagh/badge.svg?branch=master Documentation Status

PyPI - Python Version

Shillelagh (ʃɪˈleɪlɪ) is a Python library and CLI that allows you to query many resources (APIs, files, in memory objects) using SQL. It's both user and developer friendly, making it trivial to access resources and easy to add support for new ones.

The library is an implementation of the Python DB API 2.0 based on SQLite (using the APSW library):

from shillelagh.backends.apsw.db import connect

connection = connect(":memory:")
cursor = connection.cursor()

query = "SELECT * FROM a_table"
for row in cursor.execute(query):
    print(row)

There is also a SQLAlchemy dialect:

from sqlalchemy.engine import create_engine

engine = create_engine("shillelagh://")
connection = engine.connect()

query = "SELECT * FROM a_table"
for row in connection.execute(query):
    print(row)

And a command-line utility:

$ shillelagh
sql> SELECT * FROM a_table

Why SQL?

Sharks have been around for a long time. They're older than trees and the rings of Saturn, actually! The reason they haven't changed that much in hundreds of millions of years is because they're really good at what they do.

SQL has been around for some 50 years for the same reason: it's really good at what it does.

Why "Shillelagh"?

Picture a leprechaun hitting APIs with a big stick so that they accept SQL.

How is it different?

Shillelagh allows you to easily query non-SQL resources. For example, if you have a Google Spreadsheet you can query it directly as if it were a table in a database:

SELECT country, SUM(cnt)
FROM "https://docs.google.com/spreadsheets/d/1_rN3lm0R_bU3NemO0s9pbFkY5LQPcuy1pscv8ZXPtg8/edit#gid=0"
WHERE cnt > 0
GROUP BY country

You can even run INSERT/DELETE/UPDATE queries against the spreadsheet:

UPDATE "https://docs.google.com/spreadsheets/d/1_rN3lm0R_bU3NemO0s9pbFkY5LQPcuy1pscv8ZXPtg8/edit#gid=0"
SET cnt = cnt + 1
WHERE country != 'BR'

Queries like this are supported by adapters. Currently Shillelagh has the following adapters:

Name Type URI pattern Example URI
CSV File /path/to/file.csv /home/user/sample_data.csv
Datasette API http(s)://* https://global-power-plants.datasettes.com/global-power-plants/global-power-plants
Generic JSON API http(s)://* https://api.stlouisfed.org/fred/series?series_id=GNPCA&api_key=XXX&file_type=json#$.seriess[*]
GitHub API https://api.github.com/repos/${owner}/{$repo}/pulls https://api.github.com/repos/apache/superset/pulls
GSheets API https://docs.google.com/spreadsheets/d/${id}/edit#gid=${sheet_id} https://docs.google.com/spreadsheets/d/1LcWZMsdCl92g7nA-D6qGRqg1T5TiHyuKJUY1u9XAnsk/edit#gid=0
HTML table API http(s)://* https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population
Pandas In memory Any variable name (local or global) my_df
S3 API s3://bucket/path/to/file s3://shillelagh/sample_data.csv
Socrata API https://${domain}/resource/${dataset-id}.json https://data.cdc.gov/resource/unsk-b7fc.json
System API system://${resource} system://cpu?interval=2
WeatherAPI API https://api.weatherapi.com/v1/history.json?key=${key}&q=${location} https://api.weatherapi.com/v1/history.json?key=XXX&q=London
NglsAPI API ngls://${host}/reporting?api_key=${key} ngls://192.168.64.9/reporting?api_key=XXXX

There are also 3rd-party adapters:

A query can combine data from multiple adapters:

INSERT INTO "/tmp/file.csv"
SELECT time, chance_of_rain
FROM "https://api.weatherapi.com/v1/history.json?q=London"
WHERE time IN (
  SELECT datetime
  FROM "https://docs.google.com/spreadsheets/d/1_rN3lm0R_bU3NemO0s9pbFkY5LQPcuy1pscv8ZXPtg8/edit#gid=1648320094"
)

The query above reads timestamps from a Google sheet, uses them to filter weather data from WeatherAPI, and writes the chance of rain into a (pre-existing) CSV file.

New adapters are relatively easy to implement. There's a step-by-step tutorial that explains how to create a new adapter to an API or filetype.

NGLS

Passing query parameters from superset to NGLS

To pass parameters from Superset the shillelagh middleware must recognize some fields as "SQL where" clauses that can then be used to pass some parameters to the reporting service using the reporting API. A possible SQL command could be for example: SELECT * FROM call_summary WHERE interval='month'

This means that the interval will be caught on shillelagh adapter as a "predicate" and use the value to define the Reporting API interval parameter.

To be possible to use a value as a predicate, it is mandatory to:

  1. the column must be available in the result set from Reporting API. Meaning that it is needed to add a column with the exact value on the reporting YAML definitions and resulting reporting API request.
  2. On shillelagh it is needed to add the column WITH filter to become a predicate.
  3. The other columns just need the proper type without filters, so in this case sort and filtering can be enabled directly on superset.

About

Making it easy to query APIs via SQL

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

 
 
 

Languages

  • Python 99.9%
  • Other 0.1%