Skip to content

Commit

Permalink
Initial documentation fleshoud out, refs #21
Browse files Browse the repository at this point in the history
Also removed cogapp since I do not need it.
  • Loading branch information
simonw committed Nov 25, 2023
1 parent cb97052 commit 944df4a
Show file tree
Hide file tree
Showing 8 changed files with 194 additions and 18 deletions.
10 changes: 2 additions & 8 deletions Justfile
Original file line number Diff line number Diff line change
Expand Up @@ -18,18 +18,12 @@
echo "Linters..."
echo " Black"
pipenv run black . --check
echo " cog"
pipenv run cog --check \
README.md docs/*.md
echo " ruff"
pipenv run ruff .

# Rebuild docs with cog
@cog:
pipenv run cog -r *.md docs/*.md

# Serve live docs on localhost:8000
@docs: cog
@docs:
rm -rf docs/_build
cd docs && pipenv run make livehtml

Expand All @@ -38,7 +32,7 @@
pipenv run black .

# Run automatic fixes
@fix: cog
@fix:
pipenv run ruff . --fix
pipenv run black .

Expand Down
140 changes: 140 additions & 0 deletions docs/developing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
# Developing a new enrichment

Enrichments are implemented as Datasette plugins.

An enrichment plugin should implement the `register_enrichments()` plugin hook, which should return a list of instances of subclasses of the `Enrichment` base class.

The function can also return an awaitable function which returns that list of instances. This is useful if your plugin needs to do some asynchronous work before it can return the list of enrichments.

## The plugin hook

Your enrichment plugin should register new enrichments using the `register_enrichments()` plugin hook:

```python
from datasette import hookimpl

@hookimpl
def register_enrichments():
return [MyEnrichment()]
```
`register_enrichment()` can optionally accept a `datasette` argument. This can then be used to read plugin configuration or run database queries.

The plugin hook can return an awaitable function if it needs to do some asynchronous work before it can return the list of enrichments, for example:

```python
@hookimpl
def register_enrichments(datasette):
async def inner():
db = datasette.get_database("mydb")
settings = [
row["setting"]
for row in await db.execute(
"select setting from special_settings"
)
]
return [
MyEnrichment(setting)
for setting in settings
]
return inner
```

## Enrichment subclasses

Most of the code you write will be in a subclass of `Enrichment`:

```python
from datasette_enrichments import Enrichment

class MyEnrichment(Enrichment):
name = "Name of My Enrichment"
slug = "my-enrichment"
description = "One line description of what it does"
```
The `name`, `slug` and `description` attributes are required. They are used to display information about the enrichment to users.

Try to ensure your `slug` is unique among all of the other enrichments your users might have installed.

You can also set a `batch_size` attribute. This defaults to 100 but you can set it to another value to control how many rows are passed to your `enrich_batch()` method at a time. You may want to set it to 1 to process rows one at a time.

### initialize()

Your class can optionally implement an `initialize()` method. This will be called once at the start of each enrichment run.

This method is often used to prepare the database - for example, adding a new table column that the enrichment will then populate.

```python
async def initialize(
self,
datasette: Datasette,
db: Database,
table: str,
config: dict
):
```
- `datasette` is the [Datasette instance](https://docs.datasette.io/en/stable/internals.html#datasette-class).
- `db` is the [Database instance](https://docs.datasette.io/en/stable/internals.html#database-class) for the database that the enrichment is being run against.
- `table` is the name of the table.
- `config` is a dictionary of configuration options that the user set for the enrichment, using the configuration form (if one was provided).

### enrich_batch()

You must implement the following method:

```python
async def enrich_batch(
self,
datasette: Datasette,
db: Database,
table: str,
rows: List[dict],
pks: List[str],
config: dict,
job_id: int,
):
# Enrichment logic goes here
```
This method will be called multiple times, each time with a different list of rows.

It should perform whatever enrichment logic is required, using the `db` object ([documented here](https://docs.datasette.io/en/stable/internals.html#database-class)) to write any results back to the database.

`enrich_batch()` is an `async def` method, so you can use `await` within the method to perform asynchronous operations such as HTTP calls ([using HTTPX](https://www.python-httpx.org/async/)) or database queries.

The arguments passed to `enrich_batch()` are as follows:

- `datasette` is the [Datasette instance](https://docs.datasette.io/en/stable/internals.html#datasette-class). You can use this to read plugin configuration, check permissions, render templates and more.
- `db` is the [Database instance](https://docs.datasette.io/en/stable/internals.html#database-class) for the database that the enrichment is being run against. You can use this to execute SQL queries against the database.
- `table` is the name of the table that the enrichment is being run against.
- `rows` is a list of dictionaries for the current batch, each representing a row from the table. These are the same shape as JSON dictionaries returned by the [Datasette JSON API](https://docs.datasette.io/en/stable/json_api.html). The batch size defaults to 100 but can be customized by your class.
- `pks` is a list of primary key column names for the table.
- `config` is a dictionary of configuration options that the user set for the enrichment, using the configuration form (if one was provided).
- `job_id` is a unique integer ID for the current job. This can be used to log additional information about the enrichment execution.

### get_config_form()

The `get_config_form()` method can optionally be implemented to return a [WTForms](https://wtforms.readthedocs.io/) form class that the user can use to configure the enrichment.

This example defines a form with two fields: a `template` text area field and an `output_column` single line input:
```python
from wtforms import Form, StringField, TextAreaField
from wtforms.validators import DataRequired

# ...
async def get_config_form(self, db, table):

class ConfigForm(Form):
template = TextAreaField(
"Template",
description='Template to use',
default=default,
)
output_column = StringField(
"Output column name",
description="The column to store the output in - will be created if it does not exist.",
validators=[DataRequired(message="Column is required.")],
default="template_output",
)

return ConfigForm
```
The valid dictionary that is produced by filling in this form will be passed as `config` to both the `initialize()` and `enrich_batch()` methods.
20 changes: 19 additions & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,27 @@

Datasette Enrichments is a plugin for [Datasette](https://datasette.io/) that adds support for enriching data in different ways.

An **enrichment** is a bulk operation that can by applied to a set of rows from a table, executing code for each of those rows.

Potential use-cases for enrichments include:

- Geocoding an address and populating a latitude and longitude column
- Executing a template to generate output based on the values in each row
- Fetching data from a URL and populating a column with the result
- Executing OCR against a linked image or PDF file

Each enrichment is implemented as its own plugin.

The Datasette ecosystem includes a growing number of enrichment plugins. They are also intended to be easy to write.

## Table of contents

```{toctree}
---
maxdepth: 3
---
setup
```
usage
permissions
developing
```
12 changes: 12 additions & 0 deletions docs/permissions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Configuring permissions

Enrichments are only available to users with the `enrichments` permission.

By default the `root` user has this permission. You can execute Datasette like this to sign in with that user:

```bash
datasette mydb.db --root
```
Then click the link provided to sign in as root.

To use enrichments in a deployed instance of Datasette, you can use an authentication plugin such as [datasette-auth-github](https://datasette.io/plugins/datasette-auth-github) or [datasette-auth-passwords](https://datasette.io/plugins/datasette-auth-passwords) to authenticate users, then grant the `enrichments` permission to those users using [permissions in datasette.yaml](https://docs.datasette.io/en/latest/authentication.html#access-permissions-in-datasette-yaml), or one of the permissions plugins.
6 changes: 0 additions & 6 deletions docs/requirements.txt

This file was deleted.

4 changes: 3 additions & 1 deletion docs/setup.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Setup
# Installation and setup

To install the Datasette Enrichments plugin, run this:
```bash
Expand All @@ -10,3 +10,5 @@ You need to install additional plugins for enrichments that you want to use befo
datasette install datasette-enrichments-jinja
```
Users with the `enrichments` permission (or the `--root` user) will then be able to select rows for enrichment using the cog actions menu on the table page.

Once you have installed an enrichment you can {ref}`run it against some data<usage>`.
17 changes: 17 additions & 0 deletions docs/usage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
(usage)=
# Running an enrichment

Enrichments are run against data in a Datasette table.

They can be executed against every row in a table, or you can filter that table first and then run the enrichment against the filtered results.

Start on the table page and filter it to just the set of rows that you want to enrich. Then click the cog icon near the table heading and select "Enrich selected data".

This will present a list of available enrichments, provided by plugins that have been installed.

Select the enrichment you want to run to see a form with options to configure for that enrichment.

Enter your settings and click "Enrich data" to start the enrichment running.

Enrichments can take between several seconds and several minutes to run, depending on the number of rows and the complexity of the enrichment.

3 changes: 1 addition & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,14 +34,13 @@ def get_long_description():
entry_points={"datasette": ["enrichments = datasette_enrichments"]},
install_requires=["datasette>=1.0a7", "WTForms"],
extras_require={
"test": ["pytest", "pytest-asyncio", "black", "cogapp", "ruff"],
"test": ["pytest", "pytest-asyncio", "black", "ruff"],
"docs": [
"sphinx==7.2.6",
"furo==2023.9.10",
"sphinx-autobuild",
"sphinx-copybutton",
"myst-parser",
"cogapp",
],
},
package_data={"datasette_enrichments": ["templates/*"]},
Expand Down

0 comments on commit 944df4a

Please sign in to comment.