Initial documentation fleshoud out, refs #21

Also removed cogapp since I do not need it.
datasette · Nov 25, 2023 · 944df4a · 944df4a
1 parent cb97052
commit 944df4a
Show file tree

Hide file tree

Showing 8 changed files with 194 additions and 18 deletions.
diff --git a/Justfile b/Justfile
@@ -18,18 +18,12 @@
   echo "Linters..."
   echo "  Black"
   pipenv run black . --check
-  echo "  cog"
-  pipenv run cog --check \
-    README.md docs/*.md
   echo "  ruff"
   pipenv run ruff .
 
-# Rebuild docs with cog
-@cog:
-  pipenv run cog -r *.md docs/*.md
 
 # Serve live docs on localhost:8000
-@docs: cog
+@docs:
   rm -rf docs/_build
   cd docs && pipenv run make livehtml
 
@@ -38,7 +32,7 @@
   pipenv run black .
 
 # Run automatic fixes
-@fix: cog
+@fix:
   pipenv run ruff . --fix
   pipenv run black .
 

diff --git a/docs/developing.md b/docs/developing.md
@@ -0,0 +1,140 @@
+# Developing a new enrichment
+
+Enrichments are implemented as Datasette plugins.
+
+An enrichment plugin should implement the `register_enrichments()` plugin hook, which should return a list of instances of subclasses of the `Enrichment` base class.
+
+The function can also return an awaitable function which returns that list of instances. This is useful if your plugin needs to do some asynchronous work before it can return the list of enrichments.
+
+## The plugin hook
+
+Your enrichment plugin should register new enrichments using the `register_enrichments()` plugin hook:
+
+```python
+from datasette import hookimpl
+
+@hookimpl
+def register_enrichments():
+    return [MyEnrichment()]
+```
+`register_enrichment()` can optionally accept a `datasette` argument. This can then be used to read plugin configuration or run database queries.
+
+The plugin hook can return an awaitable function if it needs to do some asynchronous work before it can return the list of enrichments, for example:
+
+```python
+@hookimpl
+def register_enrichments(datasette):
+    async def inner():
+        db = datasette.get_database("mydb")
+        settings = [
+            row["setting"]
+            for row in await db.execute(
+                "select setting from special_settings"
+            )
+        ]
+        return [
+            MyEnrichment(setting)
+            for setting in settings
+        ]
+    return inner
+```
+
+## Enrichment subclasses
+
+Most of the code you write will be in a subclass of `Enrichment`:
+
+```python
+from datasette_enrichments import Enrichment
+
+class MyEnrichment(Enrichment):
+    name = "Name of My Enrichment"
+    slug = "my-enrichment"
+    description = "One line description of what it does"
+```
+The `name`, `slug` and `description` attributes are required. They are used to display information about the enrichment to users.
+
+Try to ensure your `slug` is unique among all of the other enrichments your users might have installed.
+
+You can also set a `batch_size` attribute. This defaults to 100 but you can set it to another value to control how many rows are passed to your `enrich_batch()` method at a time. You may want to set it to 1 to process rows one at a time.
+
+### initialize()
+
+Your class can optionally implement an `initialize()` method. This will be called once at the start of each enrichment run.
+
+This method is often used to prepare the database - for example, adding a new table column that the enrichment will then populate.
+
+```python
+async def initialize(
+    self,
+    datasette: Datasette,
+    db: Database,
+    table: str,
+    config: dict
+):
+```
+- `datasette` is the [Datasette instance](https://docs.datasette.io/en/stable/internals.html#datasette-class).
+- `db` is the [Database instance](https://docs.datasette.io/en/stable/internals.html#database-class) for the database that the enrichment is being run against.
+- `table` is the name of the table.
+- `config` is a dictionary of configuration options that the user set for the enrichment, using the configuration form (if one was provided).
+
+### enrich_batch()
+
+You must implement the following method:
+
+```python
+async def enrich_batch(
+    self,
+    datasette: Datasette,
+    db: Database,
+    table: str,
+    rows: List[dict],
+    pks: List[str],
+    config: dict,
+    job_id: int,
+):
+    # Enrichment logic goes here
+```
+This method will be called multiple times, each time with a different list of rows.
+
+It should perform whatever enrichment logic is required, using the `db` object ([documented here](https://docs.datasette.io/en/stable/internals.html#database-class)) to write any results back to the database.
+
+`enrich_batch()` is an `async def` method, so you can use `await` within the method to perform asynchronous operations such as HTTP calls ([using HTTPX](https://www.python-httpx.org/async/)) or database queries.
+
+The arguments passed to `enrich_batch()` are as follows:
+
+- `datasette` is the [Datasette instance](https://docs.datasette.io/en/stable/internals.html#datasette-class). You can use this to read plugin configuration, check permissions, render templates and more.
+- `db` is the [Database instance](https://docs.datasette.io/en/stable/internals.html#database-class) for the database that the enrichment is being run against. You can use this to execute SQL queries against the database.
+- `table` is the name of the table that the enrichment is being run against.
+- `rows` is a list of dictionaries for the current batch, each representing a row from the table. These are the same shape as JSON dictionaries returned by the [Datasette JSON API](https://docs.datasette.io/en/stable/json_api.html). The batch size defaults to 100 but can be customized by your class.
+- `pks` is a list of primary key column names for the table.
+- `config` is a dictionary of configuration options that the user set for the enrichment, using the configuration form (if one was provided).
+- `job_id` is a unique integer ID for the current job. This can be used to log additional information about the enrichment execution.
+
+### get_config_form()
+
+The `get_config_form()` method can optionally be implemented to return a [WTForms](https://wtforms.readthedocs.io/) form class that the user can use to configure the enrichment.
+
+This example defines a form with two fields: a `template` text area field and an `output_column` single line input:
+```python
+from wtforms import Form, StringField, TextAreaField
+from wtforms.validators import DataRequired
+
+# ...
+        async def get_config_form(self, db, table):
+
+        class ConfigForm(Form):
+            template = TextAreaField(
+                "Template",
+                description='Template to use',
+                default=default,
+            )
+            output_column = StringField(
+                "Output column name",
+                description="The column to store the output in - will be created if it does not exist.",
+                validators=[DataRequired(message="Column is required.")],
+                default="template_output",
+            )
+
+        return ConfigForm
+```
+The valid dictionary that is produced by filling in this form will be passed as `config` to both the `initialize()` and `enrich_batch()` methods.
diff --git a/docs/index.md b/docs/index.md
@@ -2,9 +2,27 @@
 
 Datasette Enrichments is a plugin for [Datasette](https://datasette.io/) that adds support for enriching data in different ways.
 
+An **enrichment** is a bulk operation that can by applied to a set of rows from a table, executing code for each of those rows.
+
+Potential use-cases for enrichments include:
+
+- Geocoding an address and populating a latitude and longitude column
+- Executing a template to generate output based on the values in each row
+- Fetching data from a URL and populating a column with the result
+- Executing OCR against a linked image or PDF file
+
+Each enrichment is implemented as its own plugin.
+
+The Datasette ecosystem includes a growing number of enrichment plugins. They are also intended to be easy to write.
+
+## Table of contents
+
 ```{toctree}
 ---
 maxdepth: 3
 ---
 setup
-```
+usage
+permissions
+developing
+```
diff --git a/docs/permissions.md b/docs/permissions.md
@@ -0,0 +1,12 @@
+# Configuring permissions
+
+Enrichments are only available to users with the `enrichments` permission.
+
+By default the `root` user has this permission. You can execute Datasette like this to sign in with that user:
+
+```bash
+datasette mydb.db --root
+```
+Then click the link provided to sign in as root.
+
+To use enrichments in a deployed instance of Datasette, you can use an authentication plugin such as [datasette-auth-github](https://datasette.io/plugins/datasette-auth-github) or [datasette-auth-passwords](https://datasette.io/plugins/datasette-auth-passwords) to authenticate users, then grant the `enrichments` permission to those users using [permissions in datasette.yaml](https://docs.datasette.io/en/latest/authentication.html#access-permissions-in-datasette-yaml), or one of the permissions plugins.
diff --git a/docs/requirements.txt b/docs/requirements.txt
diff --git a/docs/setup.md b/docs/setup.md
@@ -1,4 +1,4 @@
-# Setup
+# Installation and setup
 
 To install the Datasette Enrichments plugin, run this:
 ```bash
@@ -10,3 +10,5 @@ You need to install additional plugins for enrichments that you want to use befo
 datasette install datasette-enrichments-jinja
 ```
 Users  with the `enrichments` permission (or the `--root` user) will then be able to select rows for enrichment using the cog actions menu on the table page.
+
+Once you have installed an enrichment you can {ref}`run it against some data<usage>`.
diff --git a/docs/usage.md b/docs/usage.md
@@ -0,0 +1,17 @@
+(usage)=
+# Running an enrichment
+
+Enrichments are run against data in a Datasette table.
+
+They can be executed against every row in a table, or you can filter that table first and then run the enrichment against the filtered results.
+
+Start on the table page and filter it to just the set of rows that you want to enrich. Then click the cog icon near the table heading and select "Enrich selected data".
+
+This will present a list of available enrichments, provided by plugins that have been installed.
+
+Select the enrichment you want to run to see a form with options to configure for that enrichment.
+
+Enter your settings and click "Enrich data" to start the enrichment running.
+
+Enrichments can take between several seconds and several minutes to run, depending on the number of rows and the complexity of the enrichment.
+
diff --git a/setup.py b/setup.py
@@ -34,14 +34,13 @@ def get_long_description():
     entry_points={"datasette": ["enrichments = datasette_enrichments"]},
     install_requires=["datasette>=1.0a7", "WTForms"],
     extras_require={
-        "test": ["pytest", "pytest-asyncio", "black", "cogapp", "ruff"],
+        "test": ["pytest", "pytest-asyncio", "black", "ruff"],
         "docs": [
             "sphinx==7.2.6",
             "furo==2023.9.10",
             "sphinx-autobuild",
             "sphinx-copybutton",
             "myst-parser",
-            "cogapp",
         ],
     },
     package_data={"datasette_enrichments": ["templates/*"]},