Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating providers - backend #744

Closed
koenedaele opened this issue Jan 4, 2023 · 7 comments
Closed

Creating providers - backend #744

koenedaele opened this issue Jan 4, 2023 · 7 comments

Comments

@koenedaele
Copy link
Member

koenedaele commented Jan 4, 2023

This is a major update that will allow creating new providers without editing code or configuration as part of #724

We will split provider creation from conceptscheme editing so we can have a technical admin that can configure providers while regular users can edit conceptscheme metadata. This will also allow separating internal providers (with an editable conceptscheme) from external providers (such as Getty where the conceptscheme is uneditable)

Reading of providers will be handled in #790:

  • GET /providers: Read all configured providers
  • GET /providers/{id}: Read a single configured provider

Tasks:

  • Create a providers table in the DB
  • Create CRUD endpoints for providers:
    • POST /providers: Register a new SQLAlchemyProvider, and create a conceptscheme for it as well unless a conceptscheme_id is provided.
    • PUT /providers/{id}: Update a configured provider
    • DELETE /providers/{id}: Delete a configured provider. Main question is, does this delete the conceptscheme and all concepts as well? Makes it a very dangerous operation.
  • Update https://github.com/OnroerendErfgoed/atramhasis/blob/develop/atramhasis/skos/__init__.py#L22 to read the configured providers from the DB

Creating a provider, immediately creates a conceptscheme as well. Changing the conceptscheme_id once it's been set is impossible.

Payload for a creating a SQLAlchemyprovider:

{
   'id': 'ERFGOEDTYPES', // string, required on edit, if not set on create will be equal to the id of the created conceptscheme, passes to provider.metadata.id
   'conceptscheme_uri': 'https://id.erfgoed.net/thesauri/conceptschemes' // string, required on create, will be used to create the conceptscheme, will be ignored on edit and is immutable once created (unless edited in database for people who really know what they're doing)
   'uri_pattern': 'https://id.erfgoed.net/thesauri/erfgoedtypes/%s', // string, required, needs to contain one %s, will be passed to a UriPatternGenerator at provider.uri_generator, should be not be changed once concepts have been created unless you really know what you're doing, perhaps add a warning in the UI
   'default_language': 'nl-BE', // string, not required, if not present or None don't pass to provider, if present must be exactly one IANA language string, passes to provider.metadata.default_language
   'subject': [], // list of strings, not required, if not present or None don't pass to provider, if present must be a list of strings, only valid value at the moment is 'hidden'
   'force_display_language: None, // string, not required, if not present or None don't pass to provider, if present must be exactly one IANA language string, , passes to provider.metadata.atramhasis.force_display_language,
   'id_generation_strategy: 'numeric, // string, not required, if not present or None don't pass to provider, if present must be one of numeric, guid or manual, passes to provider.metadata.atramhasis.id_generation_strategy,
}

In the DB this will require a table provider with the fields:

  • id, string, PK
  • conceptscheme_id, integer, required, FK to conceptscheme.id
  • uri_pattern, string, required
  • default_language: string, not required
  • subject: either a list serialised as string or something like an array, not required
  • force_display_language: string, not required
  • id_generation_strategy: string, not required or required and defaults to numeric

An alternative would be to have a table with an id field and put some or all of the config in a JSON field to make it easy to extend the configuration. Might work well since we don't really need to query for providers having language X or strategy Y. Maybe we'd query for providers with subject hidden, but I don't think so. And the number of providers in one instance is generally small, so reading them one by one is not a big deal. So, open to suggestions. If we do add the dataset attribute back in (see further down) I do expect that to be a JSON field.

So, to create a new provider, the smallest POST possible is:

{
  'conceptscheme_uri': https://id.erfgoed.net/thesauri/conceptschemes',
  'uri_pattern': 'https://id.erfgoed.net/thesauri/erfgoedtypes/%s'
}

In this case a new provider will be created, the id of the provider will be set to the DB id of the created conceptscheme (cast to a string), a urigenerator will be initialised and a conceptscheme will be created with the provided uri.

A more common way to create a provider would be:

{
  'id': 'ERFGOEDTYPES',
  'conceptscheme_uri': https://id.erfgoed.net/thesauri/conceptschemes',
  'uri_pattern': 'https://id.erfgoed.net/thesauri/erfgoedtypes/%s'
}

In this case a provider will be created with id 'ERFGOEDTYPES', a urigenerator will be initialised and a conceptscheme will be created with the provided uri.

Initialising the providers on a request will have to call a function that reads providers from the DB. As opposed to earlier deisgn, manual config of SQLAlchemyProviders will no longer be possible. Manual config is still necessary for external providers. So, we still expect the user to have a create_registry function, but for most users the default version will never need to be edited and will look something like:

def create_registry(request):
    try:
        registry = Registry(instance_scope='threaded_thread')
        
        registry = register_providers_from_db(registry, request)

        return registry
    except Exception as e:
        // handle errors

Someone who wants to add Getty would have something like:

def create_registry(request):
    try:
        registry = Registry(instance_scope='threaded_thread')
        
        registry = register_providers_from_db(registry, request)

        # use 'subject': ['external'] for read only external providers
        # (only available in REST service)

        getty_session = CacheControl(requests.Session(), heuristic=ExpiresAfter(weeks=1))

        aat = AATProvider(
            {'id': 'AAT', 'subject': ['external']},
            session=getty_session
        )
        registry.register_provider(aat)

        return registry
    except Exception as e:
        // handle errors

One drawback of our new implementation is that working with void datasets is no longer supported. However, this is an experimental feature that was never advertised to end users, so we're not really breaking anything. If someone absolutely wants to retain BC, they can still choose not to use the register_providers_from_db method to fall back to the old way. Or somehow add those old datasets somewhere after initialisation. Version 2.0.0 will deprecate the void.ttl support. A later version will try to integrate something like DCAT. So, for this feature it's easier to ignore it.

We will also need an upgrade path for version 1.0.0 users, something like a script that creates a provider in the database for every existing conceptscheme.

@Wim-De-Clercq
Copy link
Contributor

Wim-De-Clercq commented Mar 22, 2023

My current proposal for the model would look like this:

class Provider(Base):
    id = Column(String, primary_key=True)
    conceptscheme_id = Column(
        Integer,
        ForeignKey(ConceptScheme.id),
        nullable=False,
    )  # metadata.conceptscheme_id or metadata.id
    metadata = Column(JSON, nullable=False)
    # + other things not in metadata like uri pattern.

    # For every known field which is taken out of metadata, define a hybrid_property
    @hybrid_property
    def default_language(self):
        return self.metadata.get("default_language")

    @default_language.setter
    def default_language(self, value):
        self.metadata["default_language"] = value

    # Same for expand_strategy, subject, atramhasis.force_display_language, atramhasis.id_generation_strategy

I would add metadata to the JSON, all fields which are already part of the JSON object will be removed from this metadata.
For example, if the metadata contains {"default_language": "nl", "extra": "test"} then the provider JSON would contain

{
  "default_language": "nl",
  ...
  "metadata": {
    "extra": "test"
  }
}

If people were to send default_language in metadata, this would raise a bad request.


1 more thing which is not backwards compatible is the urigenerator, which could be anything and will now be exclusively a UriPatternGenerator

@koenedaele
Copy link
Member Author

I know about the uripatterngenerator. Don't think we've ever see a use for another one? Other option would be to pass both the generator and it's arguments, but that seems like it can get complicated quickly?

Only other sensible form of uri patterns I can think of are uri's that are basically the document's url with and attached #id. Something like https://thesaurus.onroerenderfgoed.be/conceptschemes/TREES/c/1#id. Can be done with a pattern or with a generator that uses the pyramid routing to generate a url. That last one would be the cleanest solution I think, but more work than I want to handle right now.

@koenedaele
Copy link
Member Author

@JonasGe and @Wim-De-Clercq What do we do about the DELETE provider?

  • Option 1: Delete the provider, but retain the conceptscheme and concepts. But since there's no way to create a new provider for an existing conceptscheme, this seems rather pointless.
  • Option 2: Don't delete the provider, but make it non-functional? Something like giving at a subject or some other kind of flag that it shouldn't be loaded by Atramhasis?
  • Option 3: Have it delete the provider and cascade to the conceptscheme and concepts. Makes it a dangerous operation to perform.

All in all, I think option 3 is the best one. I would like to see different permissions for CUD providers than creating concepts. Or is this something that's completely up to the implementor? I would assume we have editors who can edit a conceptscheme and create, edit and delete concepts and collections and admins who can create, edit and delete providers. Reading providers is allowed to all I would say.

@JonasGe
Copy link
Member

JonasGe commented Mar 24, 2023

I also think option 3 is the best. The documentation (and probably a comment block in the code) should clearly reflect that this is an irreversible action which will cascade delete every associated conceptscheme and concepts.

@koenedaele
Copy link
Member Author

koenedaele commented Mar 24, 2023

Once this feature and #732 land, we'll need to review all the docs anyway. A lot has changed.

@Wim-De-Clercq
Copy link
Contributor

Wim-De-Clercq commented Mar 24, 2023

yes 3 is the best, the other 2 seem rather pointless. Option 2 could be achieved by disabling the view with permissions or something. It's better to have the option.

I would like to see different permissions for CUD providers than creating concepts

it's partially up to us. We have to define unique permission names on the views for the provider permissions vs the concept permissions.

We do, however, not implement a security factory. So by default these permissions are ignored.
But an implementor can add one, and over there define which role gets which permission.

@cedrikv
Copy link
Contributor

cedrikv commented May 9, 2023

@Wim-De-Clercq I get an error when trying to delete a provider (created with the initializedb script):
2023-05-09 15:53:41,484 WARNI [atramhasis.views.exception_views][waitress-2] (sqlite3.IntegrityError) NOT NULL constraint failed: concept.conceptscheme_id
[SQL: UPDATE concept SET conceptscheme_id=? WHERE concept.id = ?]
[parameters: ((None, 14), (None, 15), (None, 16), (None, 17), (None, 18), (None, 19), (None, 20), (None, 21)  ... displaying 10 of 73 total bound parameter sets ...  (None, 85), (None, 86))]
(Background on this error at: https://sqlalche.me/e/14/gkpj)
2023-05-09 15:53:41,486 INFO  [pyramid_debugtoolbar][waitress-2] Squashed sqlalchemy.exc.IntegrityError at http://0.0.0.0:6543/providers/3
traceback url: http://0.0.0.0:6543/_debug_toolbar/313339373538393639333739353230/exception

image

@cedrikv cedrikv added the test label May 9, 2023
vancamti added a commit that referenced this issue May 10, 2023
vancamti added a commit that referenced this issue May 16, 2023
@cedrikv cedrikv modified the milestones: Sprint 207, Sprint 208 May 22, 2023
vancamti added a commit that referenced this issue May 22, 2023
* #744 testfix delete provider with conceptscheme with concepts

* #744 fix unittests

* #744 review fix
koenedaele added a commit that referenced this issue May 25, 2023
* develop:
  Bump requests from 2.28.2 to 2.31.0 (#838)
  #833 providers json text to numeric (#835)
  #831 create via put (#836)
  #744 testfix delete provider with conceptscheme with concepts (#828)
@cedrikv cedrikv modified the milestones: Sprint 208, Sprint 2089, Sprint 209 Jun 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants