New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MultiReference and ReferenceFilter concept #336

Closed
mingard opened this Issue Oct 12, 2017 · 17 comments

Comments

Projects
None yet
3 participants
@mingard
Copy link
Member

mingard commented Oct 12, 2017

Overview

I have been looking into the most effective way of introducing some flexibility into the way content creators build articles. Previously we've used a layout field concept which required a rather verbose collection schema and a hook in API. It was also very difficult to edit outside of Publish as the format was complex.

Example setup

In this setup, we are using collections to define modular parts of a page.

Other Collections

  • Articles (our primary)
  • Galleries
  • Competition Forms
  • Blog posts
    ... + a lot more

MultiReference concept

If we want to allow the editor to link galleries and blog posts to an article we need to add a separate Reference field for each. This is fine if we're only using a few, but things get complicated if the list gets long. On top of that, the interface in Publish gets rather messy, with a lot of rarely used Reference fields being displayed.

"modules": {
      "type": "MultiReference",
      "label": "Modules",
      "settings": {
        "collections": ["galleries", "blog_posts", "competition_forms"]
      }
    }
}

ReferenceSource

Disclaimer: This one steps into the datasource territory and I know it might seem a bit odd. I'll try my best to justify!

The current Reference field requires that the user defines the ObjectId's of the referenced documents. What it doesn't do is allow them the flexibility to Reference documents by other source filters.

In this example, we're going to create a collection called blogmodules and we're going to use it in pages. Our first page is called News and we want to include a blog module.

The blog module collection has two fields.

  • Title e.g. 'Recent news'
  • Posts (ReferenceFilter)
"posts": {
      "type": "ReferenceFilter",
      "label": "Posts",
      "settings": {
        "collection": "blog_posts",
        "filter": {
          "tags.handle": "news"
        }
      }
    }
}

Why do this at API level and not in Web?

In most situations there's really no need. Datasources do a great job of formatting queries.

This feature simply allows an document editor the flexibility to dynamically Reference content that relates to a post, without requiring a datasource to be created.

@jimlambie

This comment has been minimized.

Copy link
Member

jimlambie commented Oct 12, 2017

Can we see a more detailed explanation of how multi reference works? So far you've only made the collection property accept an array

@mingard

This comment has been minimized.

Copy link
Member Author

mingard commented Oct 12, 2017

Sure:

A single collection would just use the normal Reference field, so no examples there.

Specified collections example

"modules": {
      "type": "MultiReference",
      "label": "Modules",
      "settings": {
        "collections": ["galleries", "blog_posts", "competition_forms"]
      }
    }
}

All collections example

"modules": {
      "type": "MultiReference",
      "label": "Modules"
    }
}

Excluded collections example

"modules": {
      "type": "MultiReference",
      "label": "Modules",
      "settings": {
        "excludeCollections": ["pages", "authors"]
      }
    }
}
@mingard

This comment has been minimized.

Copy link
Member Author

mingard commented Oct 12, 2017

Rather than just inserting ObjectIds, it could accept an object.

Payload example option 1

{
  "title": "Foo",
  "modules": [
    {
      "collection": "galleries",
      "_id": "59df358f47884d2e1fda774e"
    },
    {
      "collection": "blog_posts",
      "_id": "58df358f47824d4e1fca774a"
    }
  ]
}

Payload example option 2

{
  "title": "Foo",
  "modules": [
    "galleries.59df358f47884d2e1fda774e",
    "blog_posts.58df358f47824d4e1fca774a"
  ]
}
@jimlambie

This comment has been minimized.

Copy link
Member

jimlambie commented Oct 14, 2017

Option 1 above is where I was heading with this, too. It's more readable and requires less parsing of data.

@jimlambie

This comment has been minimized.

Copy link
Member

jimlambie commented Oct 14, 2017

"elements": [
  {
    "_id": "59e21f67ae114ddab6b4d7ee",
    "uid": "1-page",
    "title": "About Us",
    "template": "text",
    "url": "/about-us",
    "apiVersion": "1.0",
    "createdAt": 1507991399299,
    "createdBy": "testClient",
    "history": [],
    "v": 1
  },
  {
    "_id": "59e227464d908ce4cdf0c8b1",
    "uid": "23333",
    "template": "video",
    "title": "Video 1",
    "apiVersion": "1.0",
    "createdAt": 1507993414076,
    "createdBy": "testClient",
    "history": [],
    "v": 1
  }
],
"composed": {
  "elements": [
    {
      "collection": "pages",
      "_id": "59e21f67ae114ddab6b4d7ee"
    },
    {
      "collection": "videos",
      "_id": "59e227464d908ce4cdf0c8b1"
    }
  ]
}
@mingard

This comment has been minimized.

Copy link
Member Author

mingard commented Oct 14, 2017

That’s my preference too. Better to avoid straying from objectids

@mingard

This comment has been minimized.

Copy link
Member Author

mingard commented Oct 14, 2017

Which field determines the collection to populate?

@mingard

This comment has been minimized.

Copy link
Member Author

mingard commented Oct 14, 2017

Also, will whichever field is required for determining the collection be required with reference updates?

@jimlambie

This comment has been minimized.

Copy link
Member

jimlambie commented Oct 14, 2017

Looks like I forgot to copy in the collection identifier when I returned the data, I can fix that

@mingard

This comment has been minimized.

Copy link
Member Author

mingard commented Oct 17, 2017

Also, will whichever field is required for determining the collection be required with reference updates?

@jimlambie what are you thinking for this?

@eduardoboucas

This comment has been minimized.

Copy link
Member

eduardoboucas commented Mar 13, 2018

The discussion in #395 led me here. Here are my thoughts on how Reference fields could handle all the requirements we now have, including the question raised by @mingard on how single vs. multiple values are represented.

I propose a single field type (Reference) to hold all reference values. Single reference values are returned as objects, multiples are returned as arrays. API will sanitise each object accordingly.

To insert data into a reference field, there are two different methods available.

Method 1

The collection referenced by the field must be declared in the schema, in the settings.collection property. This is backward-compatible with the current implementation of API 2.0.

collection.books.json

{
  "author": {
    "type": "Reference",
    "settings": {
      "collection": "authors"
    }
  }
}
  • Inserting a book with a single existing author

    POST /1.0/test/books
    
    {
      "title": "Building cool APIs",
      "author": "59e227464d908ce4cdf0c8b1"
    }
  • Inserting a book with two existing authors

    POST /1.0/test/books
    
    {
      "title": "Building cool APIs",
      "author": [
        "59e227464d908ce4cdf0c8b1",
        "59e227464d908ce4cdf0c8b2"
      ]
    }
  • Inserting a book with a single author that doesn't yet exist

    POST /1.0/test/books
    
    {
      "title": "Building cool APIs",
      "author": {
        "name": "James Lambie"
      }
    }

    API will create a document in the authors collection with {"name": "James Lambie"}.

  • Inserting a book with two authors that doesn't yet exist

    POST /1.0/test/books
    
    {
      "title": "Building cool APIs",
      "author": [
        {
          "name": "James Lambie"
        },
        {
          "name": "Arthur Mingard"
        }
      ]
    }

    API will create two documents in the authors collection with {"name": "James Lambie"} and {"name": "Arthur Mingard"}.

Method 2

This method does not rely on the field schema declaring the name of the referenced collection. Instead, it allows a single field to reference documents from multiple collections.

collection.books.json

{
  "author": {
    "type": "Reference"
  }
}
  • Inserting a book with a single existing author

    POST /1.0/test/books
    
    {
      "title": "Building cool APIs",
      "author": {
        "collection": "authors",
        "data": "59e227464d908ce4cdf0c8b1"
      }
    }
  • Inserting a book with two existing authors

    POST /1.0/test/books
    
    {
      "title": "Building cool APIs",
      "author": [
        {
          "collection": "authors",
          "data": "59e227464d908ce4cdf0c8b1"
        },
        {
          "collection": "authors",
          "data": "59e227464d908ce4cdf0c8b2"
        }
      ]
    }
  • Inserting a book with two existing authors from different collections

    POST /1.0/test/books
    
    {
      "title": "Building cool APIs",
      "author": [
        {
          "collection": "authors",
          "data": "59e227464d908ce4cdf0c8b1"
        },
        {
          "collection": "nonEnglishAuthors",
          "data": "59e227464d908ce4cdf0c8b3"
        },
        {
          "collection": "authors",
          "data": "59e227464d908ce4cdf0c8b2"
        }
      ]
    }
  • Inserting a book with a single author that doesn't yet exist

    POST /1.0/test/books
    
    {
      "title": "Building cool APIs",
      "author": {
        "collection": "authors",
        "data": {
          "name": "James Lambie"
        }
      }
    }

    API will create a document in the authors collection with {"name": "James Lambie"}.

  • Inserting a book with two authors that doesn't yet exist

    POST /1.0/test/books
    
    {
      "title": "Building cool APIs",
      "author": [
        {
          "collection": "authors",
          "data": {
            "name": "James Lambie"
          }
        }
        {
          "collection": "authors",
          "data": {
            "name": "Arthur Mingard"
          }
        }
      ]
    }

    API will create two documents in the authors collection with {"name": "James Lambie"} and {"name": "Arthur Mingard"}.

  • Inserting a book with two authors, from different collections, that don't yet exist

    POST /1.0/test/books
    
    {
      "title": "Building cool APIs",
      "author": [
        {
          "collection": "authors",
          "data": {
            "name": "James Lambie"
          }
        }
        {
          "collection": "nonEnglishAuthors",
          "data": {
            "name": "Eduardo Bouças"
          }
        }
      ]
    }

    API will create a document in the authors collection with {"name": "James Lambie"} and one in nonEnglishAuthors with {"name": "Eduardo Bouças"}.

Notes

The decision to separate the name of the collection from the pre-composed document in Method 2, into the collection and data properties respectively, is based on:

  1. Removing the need for meta/prefixed fields

    If we were to inject the name of the collection into the body of the pre-composed document (e.g. {"_collection": "authors", "name": "John Doe"}), we'd need to make sure the property where the collection is defined doesn't clash with data from the document. We could introduce a prefix, but API 3.0 introduces the concept of configurable prefix characters, where it's possible to even remove prefixes completely. This makes this option a lot more complex and prone to issues.

  2. Easier for consumer applications

    For consumer applications that are inserting data into Publish, injecting a meta property means cloning an object (or mutating it by assigning a new property, which is probably a bad idea). It's easier to just wrap the pre-composed document in a parent object with a data property.

@mingard

This comment has been minimized.

Copy link
Member Author

mingard commented Mar 14, 2018

@eduardoboucas one scenario that drove the original request was a need for multiple collections to be defined. Method 1 is a single collection and method 2 is unrestricted. Perhaps the ability to define an array of collections would be a third method. It’s more about restrictions in editing. Perhaps this could be a Publish setting, but it feels like a form of field validation to me, with an error thrown on insert fail: Field ‘authors’ must be one of xxxxxxx.

Note that it also could be important to be able to define fields on a per-collection basis, and whilst this can be something we handle in a datasource when using web, it might need to exist in other usecases.

@eduardoboucas

This comment has been minimized.

Copy link
Member

eduardoboucas commented Mar 14, 2018

I see the restriction on the referenced collections as a validation rule. Not limited to Publish, but part of the new field-specific validation rules that we’ve been discussing for a while (which I’m hoping to progress in the next few days).

As for limiting the fields returned from the referenced documents, I’d rather do that in the existing fields parameter for consistency, where you’d define the fields of the various levels if you don’t want to get them all. We might need to introduce a special notation here, but I think it’s still worth doing it here rather than introducing a third method.

Would that be any good?

@mingard

This comment has been minimized.

Copy link
Member Author

mingard commented Mar 14, 2018

@eduardoboucas how do you propose the validation rule be formatted. For example, how would I achieve this with validation?

{
  "author": {
    "type": "Reference",
    "settings": {
      "collections": ["authors", "people", "users"]
    }
  }
}

Regarding the fields, current this is supported

{
  "author": {
    "type": "Reference",
    "settings": {
      "collection": "authors",
      "fields": ["name", "title"]
    }
  }
}

How would this look with multiple collections?

@eduardoboucas

This comment has been minimized.

Copy link
Member

eduardoboucas commented Mar 14, 2018

Regarding validation, I see it being declared in a way that is very similar to what you posted, but on a validation block, where field-specific validation parameters could be added. Here's an example:

{
  "title": {
    "type": "String",
    "validation": {
      "regex": {
        "pattern": "^[0-9a-fA-F]{24}$"
      }
    },
  },
  "email": {
    "type": "Email",
    "validation": {
      "domains": ["dadi.co", "dadi.tech"]
    }
  },
  "author": {
    "type": "Reference",
    "validation": {
      "collections": ["authors", "people", "users"]
    }
  }
}

As for fields, I had no idea we had settings.fields in the current implementation. This is not what I meant (and, to be honest, I'm not sure I'm a big fan of it existing in the settings block, because you're not configuring how the field works, you're just formatting its output).

What I meant was relying on the fieldLimiters property, which is used to limit the fields sent in a response. So your example would look something like:

{
  "settings": {
    "fieldLimiters": {
      "authors.name": 1,
      "authors.title": 1
    }
  }
}

... or, ideally, with the array notation (not sure if we support this):

{
  "settings": {
    "fieldLimiters": [
      "author.name",
      "author.title"
    ]
  }
}

If you're talking about getting different fields based on the referenced collection, this is where that special notation I mentioned could come in. One option would be to do something like this:

{
  "settings": {
    "fieldLimiters": [
      "author@authors.name",
      "author@authors.title",
      "author@people.age"
    ]
  }
}

... or some variation of it. Basically what I'm saying is that, in my opinion, limiting the fields should happen using the mechanism we already have in place for limiting the fields. This would have the important side effect of allowing people to customise the returned fields on a per-request basis, using the fields URL parameter, bringing us closer to how GraphQL allows requests to define the exact schema of the data for output.

e.g. http://api.somedomain.tech/1.0/test/users?fields=["title","author.name","author.title","author@people.age"]

@mingard

This comment has been minimized.

Copy link
Member Author

mingard commented Mar 14, 2018

@eduardoboucas I like the move from the settings block to validation. For backwards compatibility, I assume we'd be keeping settings.collection for single collection Referencing.

I actually don't see the point in the field limiters at all, at least not at collection schema level. It makes sense to support them in queries, but i don't think we need to consider collection-specific field limitations in, for example, a datasource. I don't see why a multiple source Reference field would need to allow author@authors.name specifically, as the result of the payload could be filtered at template level in Web. pseudo code:

if author.results[0].data.name && author.results[0].data.collection === 'people': do x

TL;DR

like the validation. Fields are probably not important, we just need to make sure that settings.fields is ignored when no settings.collection is defined.

@eduardoboucas

This comment has been minimized.

Copy link
Member

eduardoboucas commented Mar 14, 2018

For backwards compatibility, I assume we'd be keeping settings.collection for single collection Referencing.

Absolutely!

I actually don't see the point in the field limiters at all, at least not at collection schema level.

It's indeed very rarely used. I think the rationale is to offer a sensible fallback for when the URL parameter is not present in the request, much like what happens with other parameters (e.g. count, cache or includeHistory). I don't think it's about allowing a field specifically, it's about having the ability to specify a sensible default response format (e.g. to keep the payload size manageable by eliminating fields that will never be required).

But the important part is that we're able to specify the fields (and override the defaults) at query level, and, like you say, any filtering can be handled downstream using data sources or similar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment