Proposal: Modular data sources #412

eyeseast · 2016-04-26T16:54:53Z

Tarbell is currently built around Google Sheets as a primary data source. It's possible to pull in other types of data -- Google Docs, text files, remote APIs -- but it requires a deeper understanding of both Tarbell and Flask. We can make this better.

I propose a modular approach, with data sources attached to the core TarbellSite instance. Each source would be a class that would:

load data from some source (filesystem, URL, GDrive)
process that data into something a template can use
add processed data to site context

Tarbell should include common data sources like Google Sheets and Docs, as well as a base class for custom data sources. This would make it easier to support things like YAML frontmatter posts (#374), EditData or Airtable.

Custom backends would need a way to tell Tarbell how to find them, sort of like Flask extensions. Here are a couple possible workflows:

# tarbell_config.py
import custom_backend
custom_backend.register()

CUSTOM_BACKEND_URL = "http://example.com/data.json"

The custom_backend package would register itself with Tarbell in tarbell_config.py, and then use any settings defined there to fetch and process data.

Another approach:

# tarbell_config.py

DATA_SOURCE = 'custom_backend.CustomBackend'
CUSTOM_BACKEND_URL = "http://example.com/data.json"

This would use a string pointing to a data source class, which Tarbell would import and instantiate. I'm not sure which approach makes more sense.

Ideally, a project could have multiple data sources. I might want to write my long text article in a Google Doc while storing tabular data in a spreadsheet. The trick here is making sure variables don't collide when added to site context.

Finally, we should decide how much processing is composable. If I pull down a spreadsheet, I may want to process it into Agate tables, or turn it into JSON, or something else. Maybe I want to filter it or do additional processing. Or maybe this is something to handle per-project in tarbell_config.py.

The text was updated successfully, but these errors were encountered:

eyeseast · 2016-05-06T20:12:50Z

Here's another option for discovering custom sources: importing a class and instantiating it on a blueprint or app, much like a Flask extension. That would solve both discovery and configuration.

# tarbell_config.py
from flask import Blueprint
from custom_source import CustomSource

blueprint = Blueprint('project', __name__)
CustomSource(blueprint)

# settings as usual
CUSTOM_BACKEND_URL = "http://example.com/data.json"

This is a little less magical than using custom_source.register() (which would have to do imports in the background) but also asks a little more of users. I have to assume that if you're using a custom data source, you're advanced and writing a little Python is OK.

Separate but related: If multiple data sources are allowed, how do they interact. The current spreadsheet backend adds global variables to template context (values sheet and sheet names), so other sources need to be careful not to overwrite or be overwritten.

eyeseast · 2016-05-06T20:20:16Z

One way to make this all easier: include common data types by default. The requests I've seen, and things we've used at FRONTLINE, come down to three things:

Google Sheets
Google Docs (google docs as storage backend for long text #356)
text files with frontmatter (Jekyll-style) (Support YAML front-matter in Markdown and HTML files #374)

Whatever the discovery and configuration process, including those by default probably solves most people's issues without ever having to add a new source.

eyeseast added this to the 2.0 milestone Apr 26, 2016

eyeseast added code priority: high tag: feature labels Apr 26, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Modular data sources #412

Proposal: Modular data sources #412

eyeseast commented Apr 26, 2016

eyeseast commented May 6, 2016

eyeseast commented May 6, 2016 •

edited

Proposal: Modular data sources #412

Proposal: Modular data sources #412

Comments

eyeseast commented Apr 26, 2016

eyeseast commented May 6, 2016

eyeseast commented May 6, 2016 • edited

eyeseast commented May 6, 2016 •

edited