-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define a federated Open Terms Archive collections APIs #1016
Comments
I have a note about duplicates: I think I agree that returning all results is the best way to go, but that still leaves the question of how we'd handle duplicates on the ToS;DR side. The RFC mentions "defining an arbitrary priority based on data quality" -- what is the criteria for "data quality" in this case? Does this mean that the result with the "highest" data quality would be returned? Is there a real-life example of duplicates that I could inspect, just to see what the returned data might look like? Thank you! |
Hi @madoleary,
The idea is to let each client of the federated API the responsibility to handle duplicates by returning all the results and letting it choose the collection from which it wants to obtain the document.
As it is mentioned, the idea of "defining an arbitrary priority based on data quality" was not retained, so a priori the question of data quality criterion will not be addressed on the OTA side.
For example, a result for a query like {
"results": [
{
"collection": "pga",
"service": {
"id": "facebook",
"name": "Facebook",
"url": "http://173.173.173.173/api/v1/service/facebook",
"termsTypes": [ "Terms of Service", "Privacy Policy", "Developer Terms", "Trackers Policy", "Data Processor Agreement"]
}
},
{
"collection": "contrib",
"service": {
"id": "facebook",
"name": "Facebook",
"url": "http://162.162.162.162/api/v1/service/facebook",
"termsTypes": [ "Terms of Service", "Privacy Policy"]
}
}
],
"failures": []
} And on your side, you could define that you prefer to use data from the |
Very helpful, thank you, @Ndpnt ! |
Thanks @Ndpnt for this clear RFC! Proposition 1.BThis is a suggested improvement of proposition 1 (initially posted) on
|
I think that the |
I have another question: what would the response object look like for an index of services? For example, if I were to retrieve all the services for each collection. I ask this because eventually Phoenix is supposed to retrieve an index of services from OTA, per the MOU. Let me know if this question is outside the scope of this RFC. |
Also: is there a specific message returned when a service is not found? |
Sorry, I see the HTTP 404 note! |
Thanks @MattiSG for your propositions. I fully agree with the Proposition 1.B. For proposition 2:
|
As I suggest to have the search action being only a filtering on the |
Proposition 3This is a suggested improvement on proposition one
|
Parameter | Type | Description |
---|---|---|
name | URL-encoded string | The string to search for in service names |
termsType | URL-encoded string | The string to search for in service terms |
Returns
A JSON array of all matching services across all collections that also include the terms type, as indicated by the termsType
query param, in their termsTypes
fields.
Returns all matching services if no termsType
param is passed.
Returns an empty array if no matching service with the terms type is found.
Example
GET /services?name=facebook&termsType=cookies%20policy
{
"results": [
{
"collection": "contrib",
"service": {
"id": "facebook",
"name": "Facebook",
"url": "http://162.162.162.162/api/v1/service/facebook",
"termsTypes": ["Terms of Service", "Cookies Policy"]
}
}
],
"failures": []
}
Hi @madoleary, Proposition 3.B
|
Parameter | Type | Description |
---|---|---|
name | URL-encoded string | The string to search for in service names |
termsTypes | URL-encoded string | The comma-separated string that represent the array of termsType to search for |
Returns
A JSON array of all matching services across all collections that also include the terms types, as indicated by the termsTypes
query param, in their termsTypes
fields.
Returns all matching services if no termsTypes
param is passed.
Returns an empty array if no matching service with the terms types is found.
Example
GET /services?name=facebook&termsTypes=Cookies%20Policy,Terms%20of%Service
{
"results": [
{
"collection": "contrib",
"service": {
"id": "facebook",
"name": "Facebook",
"url": "http://162.162.162.162/api/v1/service/facebook",
"termsTypes": ["Terms of Service", "Cookies Policy"]
}
}
],
"failures": []
}
That looks great, @Ndpnt ! I'm in favor of proposition 3.B |
Love it!
💯 Thank you both for your contributions, I fully support 3.B! |
Hi everyone, This RFC received no further feedback since one month, so I think we can conclude that proposal 3.B seems acceptable to everyone and will therefore be implemented. Thanks again for your contributions 🙏 . Please note that we will probably not be able to work on its implementation before a few weeks as we have a lot of things to handle this month. |
Thanks @Ndpnt! It's not entirely clear to me what will be implemented: 3.B is concerned with |
Proposed final API layout:
|
Parameter | Type | Description |
---|---|---|
name | URL-encoded string | The string to search for in service names |
termsTypes | URL-encoded string | The comma-separated string that represent the array of termsType to search for |
Returns
A JSON array of all matching services across all collections that also include the terms types, as indicated by the termsTypes
query param, in their termsTypes
fields.
Returns all matching services if no termsTypes
param is passed.
Returns an empty array if no matching service with the terms types is found.
Example
GET /services?name=facebook&termsTypes=Cookies%20Policy,Terms%20of%Service
{
"results": [
{
"collection": "contrib",
"service": {
"id": "facebook",
"name": "Facebook",
"url": "http://162.162.162.162/api/v1/service/facebook",
"termsTypes": ["Terms of Service", "Cookies Policy"]
}
}
],
"failures": []
}
GET /service/:serviceId
Parameters
Parameter | Type | Description |
---|---|---|
serviceId | URL-encoded string | The ID of the service. |
Returns
A JSON array of services with the given ID accross all collections with the URL where they can be found.
Returns a HTTP 404
if no matching service is found.
Example
GET /service/service1
{
"results": [
{
"collection": "demo",
"service": {
"id": "service1",
"name": "Service 1",
"url": "http://173.173.173.173/api/v1/service/service1",
"termsTypes": [ "Terms of Service"]
}
},
{
"collection": "contrib",
"service": {
"id": "service1",
"name": "Service 1",
"url": "http://162.162.162.162/api/v1/service/service1",
"termsTypes": [ "Terms of Service", "Privacy Policy"]
}
}
],
"failures": []
}
Much clearer, thank you very much! 😃 |
In 3.B (#1016 (comment)), we did not specify if specifying multiple terms types means we want to get only the service declarations that track all those terms types, or if we want to get all service declarations that track at least one of those terms types 🙃 @Ndpnt you were the one expanding on @madoleary’s initial request, to include multiple terms types. Do you remember what was your intention with this addition? |
We also did not specify what happens if |
My intention was to make it possible to search for a service containing at least the specified terms types, in order to help me find the most appropriate collection for the terms types I was interested in. So for me, it was an AND logical operator for terms types. |
I don't agree with that, I'm in favor of returning all the services. At the moment, we don't have too many services, and when we do, we'll be able to set up pagination. It's important to bear in mind that this means just one request to each collection API and not a request per service. |
After some discussion, it seems that we don't currently have a use case for searching with multiple term types on |
If we have no |
Complement note: we also found that all hypothetical use cases (AND, OR) could be implemented with the basic function provided here and a tiny bit of client-side logic. It will always be time to add more power to the API later on when we gather more understanding of most usual use cases 🙂 |
After discussion I agree, this was premature optimisation on my side. This “no parameter” route is very easy to cache. If it becomes very popular and the contents grow big, we can just decrease the poll rate and warn that this route only updates every hour / every day… |
Hi all, I appreciate the discussion about multiple terms types. In my specs, I only have us searching for one terms type at a time, e.g., cookies policy. I, too, don't think searching for multiple terms types is necessary. I also think all services should be returned on |
Context and Problem Statement
Open Terms Archive is a decentralised system that tracks collections of services and documents across multiple servers. Each collection operates its own API which exposes services and terms tracked, but the decentralisation of these APIs implies to search across all these APIs to identify which services and documents are currently tracked.
We propose the creation of a federated API to enable easy querying of the distributed database and thus facilitate collaboration with external applications.
Proposed solution
Base URL
http://api.opentermsarchive.org/:version
Endpoints
Note: The
failures
object is detailed below in theError Handling
sectionGET /collections
Enumerate all collections
Returns
A JSON array of all collections
Example
GET /services?searchName=:searchName
Parameters
Returns
A JSON array of all matching services accross all collections with the URL where they can be found.
Returns all services if no
searchName
param is passed.Returns an empty array if no matching service is found.
Example
GET /service/:serviceId
A JSON array of all specific service identified by their ID in all collections
Parameters
Returns
A JSON array of services with the given ID accross all collections with the URL where they can be found.
Returns a HTTP
404
if no matching service is found.Example
Notes
Duplicates
We have considered multiple duplicate resolution solutions (specifying priority order as query params, defining an arbitrary priority based on data quality, returning an arbitrary result with a key
alternatives
to other results, using HTTP code300 Multiple Choices
, …) but we have come to the conclusion that they do not align with our fundamental philosophy of decentralization and resilience. The idea is therefore to embrace the fact that it is possible to have the same service declared in multiple collections and thus to always return an array of results.Error Handling
To handle errors in the underlying APIs, the idea is to return a
failures
array containing objects describing the collection that failed and why. For example:Compatibility with different underlying API versions
By definition, a federated API may interact with multiple versions of underlying APIs. To effectively manage this, the proposed approach is to only gather the necessary fields and directly provide the resource URL in the underlying API. Moreover, to allow the client to determine the shape of the result, it is proposed to include the API version in the response headers of each underlying API.
Naming convention for collection ID
As the collection ID will then become a differentiating element that should be easy to handle with scripts and other tools, we suggest the following naming convention:
france-élections
→france-elections
.-
):France Elections
→france-elections
.The text was updated successfully, but these errors were encountered: