Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mgr/dashboard: controller models proposal #29383

Draft
wants to merge 1 commit into
base: master
from

Conversation

@rjfd
Copy link
Contributor

commented Jul 29, 2019

This draft PR contains a new proposal for the improvement of the current dashboard backend infrastructure.

No implementation is done yet. The idea is to discuss this proposal here in this PR, and when the reviewers have approved it we can start with the implementation.

Signed-off-by: Ricardo Dias rdias@suse.com

mgr/dashboard: controller models proposal
Signed-off-by: Ricardo Dias <rdias@suse.com>
complex rules. Currently, such validation must be done manually in the endpoint method code, which
leads to a lot of boilerplate code, and most of the times, to be completely inexistent.

The purpose of this proposal is enhance the controller infrastructure with notion of data models

This comment has been minimized.

Copy link
@alfonsomthd

alfonsomthd Jul 30, 2019

Contributor

The request-response conversation is the fundamental process that drives all communication on the web.
Because of this, these models shouldn't be be called data models but "RequestModel / ResponseModel".
Examples:
Django: https://docs.djangoproject.com/en/2.2/ref/request-response/
Symfony: https://symfony.com/doc/current/introduction/http_fundamentals.html#symfony-request-object
Spring: https://docs.spring.io/spring-framework/docs/current/javadoc-api/org/springframework/http/RequestEntity.html

@epuertat
Copy link
Contributor

left a comment

First of all, there's been a lot effort put here, so thanks a lot @rjfd for taking the time to do it!

My comments are mostly about:

  • Identifying and potential use of established patterns and practices: Model, DataMapper, DTO, primary keys, etc.
  • Naming/conventions. For example "Controller models" sounds a bit confusing to me, as controllers and models are quite opposite terms in web development. So IMHO this proposal mixes Controller concerns (handling input data) with Model concerns (type definition and validation).
  • HTTP 1.1 syntax & semantics.
  • Looking for a type-definition DSL (OpenAPI schema, JSON-schema) and aligning these to this proposal, and using any existing tools.
  • Designing first complex behaviours meeting requirements (top-down) rather than infrastructure (bottom-up). E.g.: design first Pagination/Versioning proposals and then coming up with a PaginatedController vs. providing a HeaderModel for a yet undecided Pagination/Mime-type versioning approach. I find it more focused on solving the technical issues than to provide groundwork for missing features that directly relate to controllers & models (routing, versioning, paginating, etc).
  • Related to the above: complexity hiding and tight coupling (HTTP responses, etc).

Nonetheless, I think this proposal is a great start for improving our codebase.

* `BodyModel`
* `QueryModel`
* `PathModel`
* `HeaderModel`

This comment has been minimized.

Copy link
@epuertat

epuertat Jul 30, 2019

Contributor

I find misleading putting at the same level and with the Model suffix the: URL path, query string, body and HTTP headers, as they mean completely different things in HTTP 1.1 (OpenAPI DSL allows specified data sources other than body by means of in keyword, which allows also cookie as data source, but IMHO that's just to comply with and support as many existing APIs as possible):

  • URL Path acts as a the locator of a Resource/ResourceCollection endpoint.
  • Query string & HTTP Headers are representation directives/modifiers. Contrary to query string params, HTTP Headers are pretty standardized.
  • HTTP payload ("body") conveys Resource representations. HTTP payload must comply with the Content-Type specified in the HTTP headers.

This comment has been minimized.

Copy link
@rjfd

rjfd Jul 31, 2019

Author Contributor

From a controller's implementation perspective it's all data. What I did here was come up with a single abstraction for all these different sources of data that come in an HTTP request.
And as you mentioned, this also helps the automatic generation of the OpenAPI specification, which our swagger-ui page uses.

If you think this is not the best abstraction for this case, can you please suggest a different one so we can discuss other alternatives.

This comment has been minimized.

Copy link
@epuertat

epuertat Jul 31, 2019

Contributor

I think BodyModel would be essentially the Model part (aka DTO aka DataMapper aka...) in MVC/MVP/MVVM/... architectures (we mainly called them services so far).

The remaining data sources (Path, Query string and HTTP Headers) are a concern of the Presenter/Controller/ViewModel.

The name Model implies an abstract/base class (never a final class) that needs to be derived prior to be instantiated. However, I don't think that we should have/don't need such level of abstraction for URL Paths, HTTP Headers and Query strings. They would work just as concrete classes.

So my suggestion would be:

  • BodyModel becomes just Model, without any reference to how it's encoded (the Body part). We may later discuss whether it's a plain DTO or an smart one (ActiveRecord).
  • PathModel does not have much entity IMHO. Most REST frameworks provide templated expressions for specifying the data types (e.g.: /osd/<int:id>, /user/<str:id>), but I'd even go further and defer that validation to the Model, as technically the ID (and its type) is a concern of the Model, not the Presenter/Controller (even if we want to perform some kind of endpoint overloading based on data types: /user/1 vs. /user/self).
  • QueryModel and HeaderModel (this latter brings a special case) depend on the purpose of the specific params. If we want to use either to implement pagination, filtering, sorting, etc., I'd rather design and provide the higher-level constructs (e.g.: ControllerMixins: PaginableMixin, SortableMixin, etc.). If we need it we may end up implementing some HeaderParam or QuerystringParam classes (but only if those are really needed).
    • HeaderModel special case: HTTP 1.1 leverages Headers for providing with content negotiation and other tools (e.g.: a legacy client sends a request to /api/osd under Content-Type: application/vnd.ceph.dashboard.osd.v1+json, while the current dashboard API has deprecated such version and only provides Content-Type: application/vnd.ceph.dashboard.osd.v2+json, so a 406 - Not Acceptable response is returned). Most REST frameworks provide helpers for this functionality:
@cherrypy.tools.accept(media='application/vnd.ceph.dashboard.osd.v2+json')
def list(...):

The purpose of this proposal is enhance the controller infrastructure with notion of data models
(and please don't associate this notion with database models used to represent relational database
data in an object-oriented way) that allows to specify the structure of the data that is received

This comment has been minimized.

Copy link
@epuertat

epuertat Jul 30, 2019

Contributor

I was then going to suggest you to change the term "Model" as it's a pretty standard term in MVC/MVVM/etc. architectures (same way as we're using Controller in a very specific case, while we could also call Controllers to our Cherrypy tools), but the truth is that the BodyModel is defining a Model with exactly the same syntax and semantics as most ORM/Web frameworks, but without the methods for retrieving/storing data from/to the back-end. I'd however suggest to change the Model suffix from everything else (Path, Headers, Query), as the term Model only applies to classes representing back-end entities.

This comment has been minimized.

Copy link
@rjfd

rjfd Jul 31, 2019

Author Contributor

In this case the term Model should not be associated with any backend entities. Here Model means a schema of the data that comes from the HTTP requests independent of the type of data.

Maybe change from Model to Schema?

This comment has been minimized.

Copy link
@epuertat

epuertat Jul 31, 2019

Contributor

Models are exactly for decoupling back-end entities from Controllers & Views. If we create something else called "Model" that has nothing to do with the back-end entity, we will then need another "ModelModel" to do that. So we are basically over-engineering.

Model and HTTP should not be related topics at all. The Model is a data conveyor, while the final representation of those data (HTTP 1.1 JSON, Thrift RPC, gRPC, etc.) belongs to the scope of the Presenter/View.

user_id = Attr.String(
description="CephX user ID",
validator=(Val.NotEmpty(), Val.Length(64), ),

This comment has been minimized.

Copy link
@epuertat

epuertat Jul 30, 2019

Contributor

As you are using inheritance, I'd leverage it and instead of validating a string by means of specific string validators, I'd overload it: Attr.String(..., max_length=None, allow_empty=True). Same would apply to other types. You may check JSON-schema or OpenAPI schema to get an idea on the kind of validations applied.

This comment has been minimized.

Copy link
@rjfd

rjfd Jul 31, 2019

Author Contributor

I don't have a strong opinion here. The benefits I see with the approach I propose is that we can add new kinds of validators without changing the attributes implementation.

This comment has been minimized.

Copy link
@epuertat

epuertat Jul 31, 2019

Contributor

My point is that we are mixing here 2 OOP strategies: inheritance (Attr.String) and composition (__init__(validator=[ ])). Lately, composition is preferred over inheritance, but the case of Attribute/Field/etc is the classic example where inheritance beats composition (most ORMs/Web frameworks do that).


```json
{
"export_id": 1,

This comment has been minimized.

Copy link
@epuertat

epuertat Jul 30, 2019

Contributor

HTTP 1.1 states that the POST request returns a Location header pointing to the created resource. In fact, under strict HTTP 1.1, no entity should be returned; only a description of the response status (if status is 201).

Specifically for PUT partial updates, HTTP Header Prefer: return=[mimimal|representation] allows to specify whether the response should include or not a full representation of the updated entity.

This comment has been minimized.

Copy link
@rjfd

rjfd Jul 31, 2019

Author Contributor

The example I show in the document is to show how to create the model classes for the JSON data, it's not demonstrating how to "correctly" implement the POST or PUT methods in a REST API.

Indeed, the current dashboard REST API implementation needs a revision to make sure all REST endpoints have a consistent response according to the HTTP method invoked. But that's not an objective of this document.

This comment has been minimized.

Copy link
@epuertat

epuertat Jul 31, 2019

Contributor

Understood. I thought that given this is a refactoring of Controllers, and Controllers are in most part responsible for the HTTP conversation, we should ensure this is aligned to the current standards we are using.

Since the attributes of an export are a superset of the attributes to create an export we create
a `CreateExportModel` model class with all attributes of an export except for the `export_id`
attribute and create an `ExportModel` that extends `CreateExportModel` and adds the missing
attribute `export_id`. Extensability of models via class inheritance prevents code duplication.

This comment has been minimized.

Copy link
@epuertat

epuertat Jul 30, 2019

Contributor

No need of this IMHO. We may discuss, as with relational databases, whether the primary/unique key is part of the data model, but most OOP database patterns (ORM/Active Record/Data Mapper) expose the primary key in the Model. We may leave that defined with an Attribute whose property auto=true or using an Attr.Auto (like Django's AutoField).

Identifiers could also result from multiple attributes (composite keys, as in most NoSQL DBs). This example is taken from my PR on fixing the RBD latency issue (#28387), where RbdImages can be uniquely addressed by (pool_name, name):

class RbdImage(Resource):
    name = StaticAttribute(key=True)
    pool_name = StaticAttribute(key=True)
    ...

This comment has been minimized.

Copy link
@rjfd

rjfd Jul 31, 2019

Author Contributor

This example is to show that we can create a model class based on another and add more attributes. It's not particular related to indexes of data objects or keys.

As I said in the document, this models should not be confused with ORM models. There is no need for an attribute to generate a key sequence automatically.

The models in this document are just a way to specify the structure of the data (plus validation) that is transported by the HTTP requests and responses. Nothing more.

This comment has been minimized.

Copy link
@epuertat

epuertat Jul 31, 2019

Contributor

Sorry I don't understand why we would want to specify the data structure in the HTTP payload without taking into consideration the data structure we will use for generating/parsing that HTTP payload data. I think we are creating here an unnecessary layer. Given we are not owners/designers of the Ceph data structures, we should start modelling/validating data from the back-end to the front-end, and not enforcing structure at the opposite side.


In the `CreateExportModel` we can see how to include other models as attributes. The `fsal` class
attribute is assigned to the `Model` class attribute instance that receives as the first parameter
of the constructor the class of the model. In this case the `FsalModel`.

This comment has been minimized.

Copy link
@epuertat

epuertat Jul 30, 2019

Contributor

If this is for complex type definition and reuse, the proper way to do it would be by Attr.UDT (User-Defined Types). Models should be left for "queryable" entities (CRUD), while UDT's are not meant to be stand-alone entities.

However, if this attribute would represent another model with stand-alone back-end representation, I'd say this attribute should be a ForeignKey or a ModelReference, referenced by key/id, to make resources relationships explicit (IANA registry for rel=relations). Nested resources should be exposed via URL sub-resources:

# new user (returns user id: 13)
POST /users

# new user role
POST /users/13/roles

This comment has been minimized.

Copy link
@rjfd

rjfd Jul 31, 2019

Author Contributor

The models described in this document are very simple and are not supposed to take care of data normalization. So there is no need to over-engineer and use foreign keys to reference other models.

This comment has been minimized.

Copy link
@epuertat

epuertat Jul 31, 2019

Contributor

Entity relations/hierarchies are a cardinal part of HTTP 1.1/REST (URLs is an example of that idea), and also Ceph and Dashboard itself are full of them (Hosts - OSDs - Pools - RBD Images - RBD Snapshots - ...). A good proposal for the data modelling of a REST API should cover that part (I don't think it's over-engineer).

class FsalModel(BodyModel):
name = Attr.String(
description="Export path",
validator=Val.Enum("CEPH", "RGW"))

This comment has been minimized.

Copy link
@epuertat

epuertat Jul 30, 2019

Contributor

As mentioned, we have considered adopting some DSL for type definition (JSON-schema or OpenAPI Schema Object, which is an extension to JSON-schema), in order to improve stability, quality and end-to-end testability/maintainability of the Dashboard API (both back-end + front-end).

If that's the case, we should try to evaluate existing tools and see whether this proposal is aligned to those. For example, you may check the kind of types and validations that exist in JSON-schema.

This comment has been minimized.

Copy link
@rjfd

rjfd Jul 31, 2019

Author Contributor

We are already using OpenAPI. Our swagger-ui page consumes an OpenAPI specification to generate the forms to try the REST API. And we are already automatically generating that OpenAPI specification. Therefore, yes, the objective is that this proposal should be aligned with the OpenAPI specification so that we can easily generate the specification.

I haven't yet gone through the OpenAPI documentation to check if there are incompatibilities with what I wrote in this proposal, but I intend to that after we reach an agreement that this is the way to go.

This comment has been minimized.

Copy link
@epuertat

epuertat Jul 31, 2019

Contributor

Great thanks. Probably you won't need a thorough reading of that, just the notions/syntax/semantics. IMHO it doesn't make sense to have a specification if it's not somehow enforced: by generating AND consuming that specification, we are ensuring we fully comply with our spec, and we'll avoid lots of code duplication (models in front-end and back-end which sometimes which are sometimes mismatching).

```python
class ExportKeyModel(PathModel):
cluster_id = Attr.String(
description="Cluster identifier")

This comment has been minimized.

Copy link
@epuertat

epuertat Jul 30, 2019

Contributor

Most ORMs solve this by allowing you to define which specific attributes/fields/columns make up the primary key/unique identifier, so that the Controller can automatically query & retrieve instances by primary key.

By doing that you can also immediately get the __eq__ magic method implementation so you can have Model1 == Model2 checks. That also allows you to implement caching (as __eq__ and __hash__ are a must if you plan to store instances in a dictionary cache or WeakRef*Dict).

The thing here is whether we want to adopt a plain DataMapper/DTO pattern (with minimal/no logic: validation and that's all) or an ActiveRecord pattern, which allows for storing/retrieving objects and other duties. I think this proposal currently walks the DataMapper approach, but I personally would go with a smarter ActiveRecord-like way. Basically because the plain DataMapper approach provides IMHO a small gain with respect to what we have right now (data typing and input validation). If we don't cover the retrieval/storage logic, we will still need another abstraction/scattered code to deal with that. I don't think tight coupling would be a concern here, as long as the whole DataMapper/ActiveRecord approach is in itself a decoupling strategy.

This comment has been minimized.

Copy link
@rjfd

rjfd Jul 31, 2019

Author Contributor

Once again, this is just an example and not a methodology on how represent primary keys of data.

Regarding the DataMapper vs ActiveRecord topic, the ActiveRecord was what we had in openATTIC and it was really painful in several aspects, including development of new models and performace. The main problem is that the backend is not consuming data from a single kind of source (like a relational database, or a key-value store), it's getting data from completely different kinds of sources. Trying to create a one-size-fits-all solution does not work for this case because what works good for one kind of source does not work quite well with other kinds of sources.

This is why I state that the "models" in this proposal are only to be used to specify the structure of data of the endpoints and not in the services. I believe the best solution is what we currently have that is custom code for each service.

This comment has been minimized.

Copy link
@epuertat

epuertat Jul 31, 2019

Contributor

ORMs have cultivated a bad reputation of giving birth to bloated and underperforming solutions, but that claim doesn't differ too much from the ones that are blamed on: OOP/C++, relational databases, NoSQL databases, interpreted languages, distributed systems, etc. And here we're using a bunch of them.

That said, I've partaken and suffered in the past from the pains of badly designed/implemented ORM code. However, IMHO it had more to do with the lack of proper documentation, learning/knowledge sharing, and poor debuggability/logging/profiling groundworks, than to the ORM concept per se. I don't know your pains with OpenATTIC and Django, but I'd really like to know, as I find that kind of pain-sharing very enriching (and therapeutic).

I'm really sorry to disagree on the last part, but I don't think our current services implementation is the best we can have and do. With our current approach, as we lack a mapping between inbound and outbound data (something Models essentially mean to provide), we cannot (or hardly can) do essential things to ensure a performant API:

  • Lazy loading/fetching: only the data requested on the controller side is fetched from the back-end. A Model basically provides the best interface for that.
    • Without this it's really hard to implement pagination and sorting.
  • Fine-grained caching: per-piece-of-information caching with different policies and expiration times.
infrastructure abstraction.
To fix this problem we can get the headers data the same way as we get other types of data of an
HTTP request, using a model class.

This comment has been minimized.

Copy link
@epuertat

epuertat Jul 30, 2019

Contributor

I don't like too much the cherrypy thread-local singleton approach but headers are easily available from every controller by cherrypy.request.headers and cherrypy.serving.response.headers. Is that not working?

On having headers fed as parameters to the controller, while some MVCs like Spring follow this practice, it makes limited sense to me. What do we want headers for:

  • Pagination: ok, in that case let's come with PaginatedController class which provides the whole functionality, rather than just the validated input parameters.
  • Mime-type versioning: same thing, probably the controller should act as a router to the versioned model (NFSGaneshaExportModel["v1"] or NFSGaneshaExportModel["v2"]).
  • ??

I don't mean it's not useful to have per-header validation, but I'd rather design the abstractions after the core use-case we want to have (pagination, mime-type versioning, etc).

This comment has been minimized.

Copy link
@rjfd

rjfd Jul 31, 2019

Author Contributor

Currently there is no controller in the dashboard code that uses the headers.
One reason I add the HeaderModel to the proposal was because it fitted well in the abstraction of representing any kind of data as a "model". Another reason is that by completely abstracting the web server technology we use below our "controller infrastrcuture" we could in theory change from using cherrypy to another solution without changing most of the code.

Regarding the way we will implement pagination is out of the scope of this proposal.

This comment has been minimized.

Copy link
@epuertat

epuertat Jul 31, 2019

Contributor

So that means we are open to other web frameworks? I though we didn't want to move from Cherrypy.

IMHO pagination should be part of this proposal as it's a hot topic.

def list(self) -> Response:
# build body model
# build header model
return Response(header=header, body=body)

This comment has been minimized.

Copy link
@epuertat

epuertat Jul 30, 2019

Contributor

One thing I really like about most web frameworks (cherrypy included) is that they provide efficient abstractions for the HTTP Headers/Request/Response workflow. Instead of worrying about a per-controller customization of HTTP Headers, Responses, etc., I think we should focus on cross-controller behaviours: if we are able to map certain Exceptions/return codes to: Headers and HTTP response codes, we don't need this kind of low-level/highly-coupled tuning at controller basis.

This comment has been minimized.

Copy link
@rjfd

rjfd Jul 31, 2019

Author Contributor

if we are able to map certain Exceptions/return codes to: Headers and HTTP response codes, we don't need this kind of low-level/highly-coupled tuning at controller basis.

We are already doing that. The example with the Response class is just to be complete and have a solution for the rare situation where you need for some reason to customize the response headers, body, and return code.

In 99% of the cases you will just return a BodyModel instance in the endpoint method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.