Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Modular data sources #412

Open
eyeseast opened this issue Apr 26, 2016 · 2 comments
Open

Proposal: Modular data sources #412

eyeseast opened this issue Apr 26, 2016 · 2 comments

Comments

@eyeseast
Copy link
Contributor

Tarbell is currently built around Google Sheets as a primary data source. It's possible to pull in other types of data -- Google Docs, text files, remote APIs -- but it requires a deeper understanding of both Tarbell and Flask. We can make this better.

I propose a modular approach, with data sources attached to the core TarbellSite instance. Each source would be a class that would:

  • load data from some source (filesystem, URL, GDrive)
  • process that data into something a template can use
  • add processed data to site context

Tarbell should include common data sources like Google Sheets and Docs, as well as a base class for custom data sources. This would make it easier to support things like YAML frontmatter posts (#374), EditData or Airtable.

Custom backends would need a way to tell Tarbell how to find them, sort of like Flask extensions. Here are a couple possible workflows:

# tarbell_config.py
import custom_backend
custom_backend.register()

CUSTOM_BACKEND_URL = "http://example.com/data.json"

The custom_backend package would register itself with Tarbell in tarbell_config.py, and then use any settings defined there to fetch and process data.

Another approach:

# tarbell_config.py

DATA_SOURCE = 'custom_backend.CustomBackend'
CUSTOM_BACKEND_URL = "http://example.com/data.json"

This would use a string pointing to a data source class, which Tarbell would import and instantiate. I'm not sure which approach makes more sense.

Ideally, a project could have multiple data sources. I might want to write my long text article in a Google Doc while storing tabular data in a spreadsheet. The trick here is making sure variables don't collide when added to site context.

Finally, we should decide how much processing is composable. If I pull down a spreadsheet, I may want to process it into Agate tables, or turn it into JSON, or something else. Maybe I want to filter it or do additional processing. Or maybe this is something to handle per-project in tarbell_config.py.

@eyeseast
Copy link
Contributor Author

eyeseast commented May 6, 2016

Here's another option for discovering custom sources: importing a class and instantiating it on a blueprint or app, much like a Flask extension. That would solve both discovery and configuration.

# tarbell_config.py
from flask import Blueprint
from custom_source import CustomSource

blueprint = Blueprint('project', __name__)
CustomSource(blueprint)

# settings as usual
CUSTOM_BACKEND_URL = "http://example.com/data.json"

This is a little less magical than using custom_source.register() (which would have to do imports in the background) but also asks a little more of users. I have to assume that if you're using a custom data source, you're advanced and writing a little Python is OK.

Separate but related: If multiple data sources are allowed, how do they interact. The current spreadsheet backend adds global variables to template context (values sheet and sheet names), so other sources need to be careful not to overwrite or be overwritten.

@eyeseast
Copy link
Contributor Author

eyeseast commented May 6, 2016

One way to make this all easier: include common data types by default. The requests I've seen, and things we've used at FRONTLINE, come down to three things:

Whatever the discovery and configuration process, including those by default probably solves most people's issues without ever having to add a new source.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant