Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mechanism for specifying that an Enrichment subclass uses a secret #46

Closed
simonw opened this issue Apr 25, 2024 · 18 comments
Closed

Mechanism for specifying that an Enrichment subclass uses a secret #46

simonw opened this issue Apr 25, 2024 · 18 comments
Labels
enhancement New feature or request

Comments

@simonw
Copy link
Collaborator

simonw commented Apr 25, 2024

Split from:

@simonw
Copy link
Collaborator Author

simonw commented Apr 25, 2024

Here's how datasette-enrichments-gpt does that at the moment: https://github.com/datasette/datasette-enrichments-gpt/blob/1353a46f08f73e9b18a58115d74730fdab7427ad/datasette_enrichments_gpt/__init__.py#L32-L116

    async def get_config_form(self, datasette, db, table):
        class ConfigForm(Form):
            # Select box to pick model from gpt-3.5-turbo or gpt-4-turbo
            model = SelectField(
                "Model",
                choices=[
                    ("gpt-3.5-turbo", "gpt-3.5-turbo"),
                    ("gpt-4-turbo", "gpt-4-turbo"),
                    ("gpt-4-vision", "gpt-4-turbo vision"),
                ],
                default="gpt-3.5-turbo",
            )
            # ... more stuff like that

        def stash_api_key(form, field):
            if not (field.data or "").startswith("sk-"):
                raise ValidationError("API key must start with sk-")
            if not hasattr(datasette, "_enrichments_gpt_stashed_keys"):
                datasette._enrichments_gpt_stashed_keys = {}
            key = secrets.token_urlsafe(16)
            datasette._enrichments_gpt_stashed_keys[key] = field.data
            field.data = key

        class ConfigFormWithKey(ConfigForm):
            api_key = PasswordField(
                "API key",
                description="Your OpenAI API key",
                validators=[
                    DataRequired(message="API key is required."),
                    stash_api_key,
                ],
                render_kw={"autocomplete": "off"},
            )

        plugin_config = datasette.plugin_config("datasette-enrichments-gpt") or {}
        api_key = plugin_config.get("api_key")

        return ConfigForm if api_key else ConfigFormWithKey

Then it accesses that stashed key here:

https://github.com/datasette/datasette-enrichments-gpt/blob/1353a46f08f73e9b18a58115d74730fdab7427ad/datasette_enrichments_gpt/__init__.py#L229-L244

def resolve_api_key(datasette, config):
    plugin_config = datasette.plugin_config("datasette-enrichments-gpt") or {}
    api_key = plugin_config.get("api_key")
    if api_key:
        return api_key
    # Look for it in config
    api_key_name = config.get("api_key")
    if not api_key_name:
        raise ApiKeyError("No API key reference found in config")
    # Look it up in the stash
    if not hasattr(datasette, "_enrichments_gpt_stashed_keys"):
        raise ApiKeyError("No API key stash found")
    stashed_keys = datasette._enrichments_gpt_stashed_keys
    if api_key_name not in stashed_keys:
        raise ApiKeyError("No API key found in stash for {}".format(api_key_name))
    return stashed_keys[api_key_name]

@simonw simonw added the enhancement New feature or request label Apr 25, 2024
@simonw
Copy link
Collaborator Author

simonw commented Apr 25, 2024

Since I'm encouraging integration with datasette-secrets anyway, I'm going to allow Enrichment subclasses to indicate that they need an API key. If they do that, the API key form trick will be applied for them.

One potential design:

class GptEnrichment(Enrichment):
    name = "AI analysis with OpenAI GPT"
    slug = "gpt"
    description = "Analyze data using OpenAI's GPT models"
    runs_in_process = True
    batch_size = 1
    needs_api_key = "OPENAI_API_KEY"
    needs_api_key_label = "OpenAI API Key"
    needs_api_key_obtain_url = "https://..."

Or... since we'll be using register_secrets() from https://github.com/datasette/datasette-secrets for this anyway, maybe we reuse that datasette_secrets.Secret() object here? That would be pretty neat.

@simonw simonw changed the title Utility function to help implement the stashed API key pattern Utility function to help implement API key patterns Apr 25, 2024
@simonw
Copy link
Collaborator Author

simonw commented Apr 26, 2024

I'm going to experiment with secret = Secret(...) - and then a pattern where register_secrets() returns those secrets.

@simonw
Copy link
Collaborator Author

simonw commented Apr 26, 2024

Maybe don't even tell plugins they have to call register_secrets() themselves - the datasette-enrichments plugin could automatically return ANY secrets property of a subclass as part of its own register_secrets() hook implementation.

@simonw
Copy link
Collaborator Author

simonw commented Apr 26, 2024

This depends on datasette-secrets and that depends on datasette>=1.0a13 so this can only be in an alpha release for the moment.

@simonw
Copy link
Collaborator Author

simonw commented Apr 26, 2024

... which means the entire enrichments ecosystem is about to go alpha-only, which isn't ideal.

@simonw
Copy link
Collaborator Author

simonw commented Apr 26, 2024

OK, datasette-secrets is now compatible with <1.0 and >=1.0a Datasette.

@simonw
Copy link
Collaborator Author

simonw commented Apr 27, 2024

Writing the documentation first - documentation-driven development.

@simonw
Copy link
Collaborator Author

simonw commented Apr 27, 2024

Here's that initial documentation: https://github.com/datasette/datasette-enrichments/blob/5e06d06db0572880668e3dd8e6fbde8b04a9db73/docs/developing.md#enrichments-that-use-secrets-such-as-api-keys

And the code example (which won't work yet):

from datasette_enrichments import Enrichment
from datasette_secrets import Secret

# Then later in your enrichments class
class TrainEnthusiastsEnrichment(Enrichment):
    name = "Train Enthusiasts"
    slug = "train-enthusiasts"
    description = "Enrich with extra data from the Train Enthusiasts API"
    secret = Secret(
        name="TRAIN_ENTHUSIASTS_API_KEY",
        description="An API key from train-enthusiasts.doesnt.exist",
        obtain_url="https://train-enthusiasts.doesnt.exist/api-keys",
        obtain_label="Get an API key"
    )

@simonw
Copy link
Collaborator Author

simonw commented Apr 27, 2024

The register_secrets() bit will be handled by the datasette-enrichments plugin itself, no need for actual enrichments plugin to register their secret, it will be picked up from secret = on all subclasses of Enrichment.

@simonw
Copy link
Collaborator Author

simonw commented Apr 27, 2024

I'll use this pattern for that:

class BaseClass:
    subclasses = []

    def __init_subclass__(cls, **kwargs):
        super().__init_subclass__(**kwargs)
        cls.subclasses.append(cls)

@simonw
Copy link
Collaborator Author

simonw commented Apr 27, 2024

I tested this and it works! The secrets show up in the /-/secrets interface as they should.

Running test server like this at the moment:

export KEY=$(datasette secrets generate-encryption-key)
datasette *.db -p 8026 --root \
  -s plugins.datasette-secrets.encryption-key $KEY \
  -s permissions.manage-secrets.id root \
  --secret 1

CleanShot 2024-04-26 at 18 20 30@2x

I applied this change to my local copy of datasette-enrichments-gpt:

diff --git a/datasette_enrichments_gpt/__init__.py b/datasette_enrichments_gpt/__init__.py
index 84e7ac4..15501df 100644
--- a/datasette_enrichments_gpt/__init__.py
+++ b/datasette_enrichments_gpt/__init__.py
@@ -1,5 +1,6 @@
 from __future__ import annotations
 from datasette_enrichments import Enrichment
+from datasette_secrets import Secret
 from datasette import hookimpl
 from datasette.database import Database
 import httpx
@@ -28,6 +29,7 @@ class GptEnrichment(Enrichment):
     description = "Analyze data using OpenAI's GPT models"
     runs_in_process = True
     batch_size = 1
+    secret = Secret("OPENAI_API_KEY_FOR_ENRICHMENTS")
 
     async def get_config_form(self, datasette, db, table):
         columns = await db.table_columns(table)

@simonw
Copy link
Collaborator Author

simonw commented Apr 27, 2024

Next step: make it so this works - it should try get_secret() and, if it fails, it should apply that pattern where it adds an extra _secret key to the form.

@simonw
Copy link
Collaborator Author

simonw commented Apr 27, 2024

CleanShot 2024-04-26 at 19 19 49@2x

Need to think a bit more about how that field is displayed. Also the field doesn't do anything yet.

@simonw
Copy link
Collaborator Author

simonw commented Apr 27, 2024

Better:

CleanShot 2024-04-26 at 22 11 33@2x

That's for:

class GptEnrichment(Enrichment):
    name = "AI analysis with OpenAI GPT"
    slug = "gpt"
    description = "Analyze data using OpenAI's GPT models"
    runs_in_process = True
    batch_size = 1
    secret = Secret(
        name="OPENAI_API_KEY",
        description="OpenAI API key",
        obtain_label="Get an OpenAI API key",
        obtain_url="https://platform.openai.com/api-keys",
    )

simonw added a commit that referenced this issue Apr 27, 2024
@simonw simonw changed the title Utility function to help implement API key patterns Mechanism for specifying that an Enrichment subclass uses a secret Apr 27, 2024
@simonw
Copy link
Collaborator Author

simonw commented Apr 27, 2024

Next step is to figure out how to have an Enrichment subclass use the resolved secret. I think an await self.get_secret() method might be the way to do that.

@simonw
Copy link
Collaborator Author

simonw commented Apr 27, 2024

Everything is working now, needs tests before I land it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant