Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for allowing multiple schemas in a single instance #1106

Closed
steve-chavez opened this issue May 10, 2018 · 27 comments
Closed

Proposal for allowing multiple schemas in a single instance #1106

steve-chavez opened this issue May 10, 2018 · 27 comments
Labels
enhancement a feature, ready for implementation

Comments

@steve-chavez
Copy link
Member

This is needed both for api versioning and schema based multitenancy, currently it can be done with many PostgREST instances but it's unwieldy as there's considerable ops overhead, I'll make my proposal from the api versioning perspective.

Specifying the API version

Traditionally this has been done by having a prefix url path like /v1/projects, other options include a header that specifies the version or a date, like Stripe does, or a mime type that appends the version(application/vnd.com.v2+json).

All of this approaches are global and if we want to make them work with just SQL it would imply having to copy all the unmodified views from v1 schema to v2, ideally only the views that contain breaking changes would be in v2.

We can reuse PostgreSQL dot notation in the url to version the resources, this also has the advantage of being able to define embeds and specify the target resource version:

GET /v1.projects?select=id,name,v2.clients(id,name,active)&clients.active=is.true

(Notice that in the filter v2.clients is just referred as clients, the schema can only be specified at the url path and in the select part, this will ensure that we don't break our current filter syntax)

Maintaning backwards compatibility

I think that REST API evolution can be handled in a similar way to regular software packages evolution,
we can use schema names that follow SemVer, or maybe just a MAJOR MINOR convention(like v1_2, underscore used to avoid conflicts with .) to represent additions/breaking changes, comments on columns/views/schemas can notify deprecations and maybe release date.

This combined with the schemas in the search_path, specified through db-schema as previously discussed, would allow us to maintain backwards compatibility:

-- For maintaing backward compatibility on breaking change
SET search_path TO v1_0, v2_0;
GET /projects -- This would query FROM v1_0.projects
GET /v2_0.projects -- Client would have to specify the schema to use the latest version

-- For maintaing backward compatibility on new addition
SET search_path TO v1_1, v1_0, v2_0;
GET /projects -- This would query FROM v1_1.projects

Dropping support can be achieved by removing the schema from the search_path and maybe doing a redirect with a 3xx to the new url, this part would have to be done in the proxy layer(maybe later when adding #1088 this could be done in the pre-request function).

Security considerations

That we can specify the schema on the url will not mean that a client can access any schema like pg_catalog, we will only allow querying the schemas specified in the db-schema, internally we'll explicitly qualify all the SQL queries as we have done until now with a single schema.

@rsubr
Copy link

rsubr commented May 19, 2018

I'm building a schema based multi-tenancy app and for the most part PostgREST ver 0.4.4.0 works for me.

The ability to set search_path via a JWT claim would be highly useful. I'm presently doing this via pre-request DB function, but it has some limitations described below.

My schema structure:

 tenant1: tables + data
 tenant2: tables + data
 base: empty tables and views (read-only), DB functions
 public: pgcrypto, pgjwt, etc 

Tenant schemas are exact copies from base. DB functions are created only once in base and use search_path to access appropriate tenant schemas.

Default DB search_path is set to base, public so anon users can login and create a JWT token. JWT contains a claim {"schema": "tenant1"} then I SET search_path TO tenant1, base, public.

I'm presently doing this by setting schema="" in postgrest.conf and using pre-request function to set search_path as above. This works for simple table queries and /rpc. But I can't use Resource Embedding with select, I get this error:

{
    "message": "Could not find foreign keys between these entities, No relation found between mytable_a and mytable_b"
}

If I set schema="tenant1" in postgrest.conf then the select works correctly. But then I can't access other tenant schemas :-(

I'm happy to contribute in anyway.

@kaiwu
Copy link

kaiwu commented Jun 15, 2018

the hack mentioned by @rsubr somehow affects the upsert feature and it no longer does the on conflict clause

@ppKrauss
Copy link
Contributor

ppKrauss commented Jul 24, 2018

Hi, I am using docker with external PostgreSQL and need multiple endpoints at the same machine (so same PostgreSQL and databases).... So, no matter if it is implemented as "one PostgREST instance" or "multiple PostgREST instances". The focus of a user like me is "multiple endpoints" and the best solution (faster and good reuse of resources).

So, I am perhaps backing to #1090 (that was redirected to this issue) ... The scenario is:

  1. commom docker pull postgrest/postgrest

  2. endpoint1 installation docker run --rm --net=host -p 3000:3000 -e PGRST_DB_SCHEMA="sh1" -e PGRST_DB_URI="postgres://postgres:_pwd_@localhost:5432/db1" ...

  3. (is an error but imagine something as) endpoint2 installation docker run --rm --net=host -p 3002:3002 -e PGRST_DB_SCHEMA="sh2" ... & (all "..." same as above)

  4. (is an error but imagine something as) endpoint2 installation docker run --rm --net=host -p 3003:3003 -e PGRST_DB_URI="postgres://postgres:_pwd_@localhost:5432/otherdb" ....&

So, to obtain SQL's table sh1.t at endpoint1, etc. of each schema and database:

  • endpoint1 URL for table sh1.t of database db1 = http://myHost:3000/t
  • endpoint2 URL for table sh2.t of database db1 = http://myHost:3002/t
  • endpoint3 URL for table public.t of database otherdb = http://myHost:3003/t

What is the best solution to obtain this scenario?

@steve-chavez
Copy link
Member Author

Multiple instances with docker would be better handled with docker-compose, I posted an example configuration in PostgREST/postgrest-docs#160 (comment).

For mapping the instances to different endpoints in the same domain we recommend Nginx, we're missing an example in the docs for this, but should be something like:

location /endpoint1/ {
    proxy_pass http://localhost:3001/; # Reverse proxy to a PostgREST instance
}
location /endpoint2/ {
    proxy_pass http://localhost:3002/; # Reverse proxy to a PostgREST instance
}
...

@owais
Copy link

owais commented Feb 26, 2019

Support for dynamically selecting the schema at runtime would be fantastic. Would open up a whole new suite of applications that can be built with PostgREST.

@ghost
Copy link

ghost commented Nov 8, 2019

Hello everyone, my team requires this functionality as well. Initially we tried using the db-extra-search-path configuration parameter. The issue with that is that if 2 different schemata contain a table with the same name it would query the table in the schema that is mentioned first in the config. So I went ahead and looked into how to implement the possibility without breaking any existing functions(or so I hope). What I ended up doing is giving the possibility to specify a schema in the request URI given that the configured user has the proper permissions. Today I created a fork https://github.com/mahmoudkassem/postgrest/commit/6dd5bf1d80eb33adf7c71d161395da64a8463179, I'm aware that this just the first step, for instance the Swagger Doc generation needs to reflect that extension as well(so instead of using just one DbStructure Instance it would require a Map Schema DbStructure). Is there some other part of the application that this extension would need to affect?

Initially I had about a week to enable the functionality(which was mostly used to get familiar with the code), but now things cooled down and I have the capacity to further work on this, but I require support on the conceptual level and I can't grasp all the ramifications this change would have for the existing functionalities. With your support we can hopefully have this feature properly implemented.

@steve-chavez
Copy link
Member Author

@MahmoudKassem Great idea!

I think your /schema/table approach can work great for schema based multitenancy, where each schema database objects don't touch each other and have no relationships.

The main thing we need to ensure is that our sandbox(safe) approach is maintained toward the exposed schemata . For this we shouldn't think about db-extra-search-path but on db-schema, the latter would have to be extended to support a schema list.

The schema shouldn't be dynamic, because you risk exposing unsafe database objects, like the ones in pg_catalog or public(which usually contains extensions). So if we have db-schema=a,b it should be possible to query /a/table and /b/table but not /pg_catalog/pg_class.

If /pg_catalog/pg_class gets requested, the idea would be to whitelist based on the db-schema. For every request with an unspecified schema, quote the full identifier. So "select * from "pg_catalog.pg_class" results in relation "pg_catalog.pg_class" does not exist.

We also need to make sure is that there's no chance that something like /schema/table?select=*,table_from_another_schema_not_included_in_db_schema(*) can succeed, despite table and table_from_another_schema_not_included_in_db_schema having an o2m relationship between them.

For ensuring this, we have been very careful until now, we took a schema parameter in many functions(unnecessarily in many cases) and recently I have been removing these parameters, you can see #1393 for some refactor commits.

Fortunately, now the OpenAPI output takes a schema as a parameter. So, if we restrict your solution to schema based multitenancy it shouldn't be a problem if we don't detect relationships to other schema database objects.

The findRelation(finds relationships between two tables) function also takes a schema parameter.

For the above concerns, I think It'd be a matter of passing the schema from the url to those functions plus the whitelist, and of course adding a new test suite for making sure all of this works.

DbStructure would be a different beast. I haven't yet fully grasped its whole complexity for doing mutli schema.

But I think we can work on this together gradually one PR at a time, I'm also interested on adding this feature for this release.

Let me know what you think and if you have any question!

@ruslantalpa
Copy link
Contributor

ruslantalpa commented Nov 8, 2019

@steve-chavez all the above is good except the "one PR at a time", i don't see how you can do that, it's a single (big) feature that either works correctly in all cases (and does not compromise previous security) or it does not. If in the middle of the work (after a few PRs) some corner case is discovered that makes this unfeasible or too complicated, it would be hard to revers things.
The best way would be to have a new branch and work there until it's final

For specifying the "dynamic" schema, a header should be used. This will not break the current url format and not introduce ambiguity. Ppl that want it in a URL can use nginx to do the "translation"

[edit] removed the part about fully qualifying which you already mentioned in the original post

@ghost
Copy link

ghost commented Dec 2, 2019

Thanks for your input/insight. https://github.com/mahmoudkassem/postgrest/commit/6415217b42f6af5d517d6aa880bff02c1716cc6f I've extended the db-schema parameter to be a list of schemas separated by comma. Whereas the first value of the list is the default. The passing of the schema is now via header instead of the url as suggested. The passed value has to be a member of the list otherwise the default schema is used. So now only explicitly added schemas to db-schema can be used. What is still to do is the check on the query level. Any input/suggestions whether this is going into the right direction are most welcome.

@ppKrauss
Copy link
Contributor

ppKrauss commented Dec 2, 2019

@ruslantalpa

For specifying the "dynamic" schema, a header should be used. This will not break the current url format and not introduce ambiguity.

Perhaps something as using a URI syntax that is not a PosgreSQL syntax, as the character ~: a standard public table, t; a schema sh1 table, sh1~t.

@steve-chavez
Copy link
Member Author

steve-chavez commented Dec 2, 2019

@MahmoudKassem I think erring in case of a wrong schema instead of picking a default would be better. It seems easy to mistype a number in /v4/table(when you wanted v3 and you don't have v4 defined) and not erring could mean we let a client make a mistake.

Regarding the header for the schema, I don't think there's a standard for this(unless we go with versioning on the media type application/vnd.com.v2+json).

My vote would be for versioning on the url. I think it wouldn't imply a breaking change. Unless in some case the user has named his schema rpc(too rare). We can take the empty prefix / as default as usual and then the other schemas can use another prefix like /v1/table and /v1/rpc/proc for their table/views/functions. And of course this will only happen if more than one schema was specified on db-schema.

Having the schema in the url makes the API easier to explore.

@ppKrauss
Copy link
Contributor

ppKrauss commented Dec 2, 2019

I think your /schema/table approach can work great for schema based multitenancy, where each schema database objects don't touch each other and have no relationships.

As I comment, there is no mistype when using a schema separator mark like ~ (URI consistent)... So /schema~table is most reliable tham /schema/table.

@ghost
Copy link

ghost commented Dec 3, 2019

https://github.com/mahmoudkassem/postgrest/commit/750698bf05f07349b787c74ce32412a07502020f The easier discovery + that it doesn't break backwards compatibility(for the most part) convinced me, so I've added the dynamically specified schema back to the url and now instead of falling back to the default schema it will result in an 404 not found if the schema is not specified in db-schema.

I find the /schema/table approach more intuitive, maybe that's the influence from the time I spend working with node.js.

@ppKrauss could you please elaborate on how /schema/table causes Postgresql specific issues? I'm not too familiar with Postgresqls' syntax.

I'm currently in the process of looking into how the request is transformed into a query to implement the schema check on the query level.

@ppKrauss
Copy link
Contributor

ppKrauss commented Dec 3, 2019

Hi @MahmoudKassem , it is a syntax suggestion, an alternative to your schema/table syntax suggestion: I am suggesting to replace your / syntax to a ~ syntax, schema~table.

It is valid as URI (no conflict) and valid to express with "no ambiguity risk", to confuse the PostgreSQL schema names and table names, or to confuse with other endpoint's REST syntax, like API version. Rules:

  1. tableName bypass, not need translation. Internally PostgreSQL translates to public.tableName.

  2. schemaName~tableName is translated to PostgreSQL as schemaName.tableName.


NOTE. You can suppose "rule 3. schemaName~". It can be used (if necessary in a new application) as reference to schemaName. It is not the REST-syntax tradition, but is functionally equivalent to /, to express hierarchy. Perhaps schemaName~tableName~columnName can be used also in the future (for example for documentation).

@steve-chavez
Copy link
Member Author

steve-chavez commented Dec 4, 2019

To solve ambiguity, how about if when db-schema is a list we force that the request must contain the prefix. So, if we had db-schema=v1,v2:

  • / no longer defaults to the first schema in the search path. It gives 404.
  • /v1/ gives the openapi schema of v1.
  • /v1/table and /v1/rpc/proc work as usual.
  • /v2/ follows the same rules.

That would only be the case when db-schema is a list, so it wouldn't imply a breaking change.
Looking for any feedback on that idea.

@ppKrauss #1421 could be solved in another way(check my latest comment).

Since these schemas are independent, I think the slash / represents namespaces better than ~. Like what we do with /rpc/(since a pg table and function can share the same name). Also elucidates that our openapi output is only for one schema.

@tsingson
Copy link

tsingson commented Dec 5, 2019

@steve-chavez I like this /schema-name/rpc/.........

@ppKrauss
Copy link
Contributor

ppKrauss commented Dec 5, 2019

Hi, this issue is confuse for me, I not see utility on versioning for my applications, but I need access to multiple schemas at PostgREST endpoint... Anyone who has understood, please explain to me, or copy/paste the final consensual definitions:

  1. The multitenancy concept used here;

  2. The syntax proposal for only one endpoint for multiple schemas (not multiple versioning endpoints);

  3. The syntax of queries, it is the SQL standard schemaName.tableName, or something changed?
    (of course only those schemaNames that belong to the set of exposed ones)

  4. There are a pending syntax problem/consensus in this discussion?

@steve-chavez
Copy link
Member Author

I've edited my previous comment. I'll refrain from discussing migration strategies here, this could be done later in the docs.

@MahmoudKassem What do you think of that url syntax? I think with that we can also reject an unknown schema at the http level and we wouldn't need extra work at the sql level.

@ppKrauss
Copy link
Contributor

ppKrauss commented Dec 18, 2019

Thanks @steve-chavez for edits on explanations. Commenting it:

To solve ambiguity, how about if when db-schema is a list we force that the request must contain the prefix. So, if we had db-schema=v1,v2:

* `/` no longer defaults to the first schema in the search path. It gives 404.

* `/v1/`  gives the openapi schema of v1.

* `/v1/table` and `/v1/rpc/proc` work as usual.

* `/v2/` follows the same rules.

This endpoint-syntax proposal breaks compatibility of classic PostgREST, where the endpoint path /string1 is a table reference... And it is good to remember that versioning endpoint is not the best practice for "all REST community", there are some criticisms, and some alternatives.

Since these schemas are independent (...) our openapi output is only for one schema.

Hum... Maybe you want to say "only for one default schema"... Or are you also affirming that it is not a proposal for multiple schemas?
(about acess to "schemaName.tableName" in a valid schemaName of the db-schema list)


Note about versioning demands

There are two independent main motivations for PostgREST endpoint versioning, not only one:

  • Database changes
  • PostgREST changes

Ideal is 3 independent proposals for issue discusstion: database versioning, PostgREST versioning and multiple schema.

Your proposal merges multiple schema demand with database versioning demand.

@steve-chavez
Copy link
Member Author

steve-chavez commented Dec 21, 2019

@ppKrauss You're right about the shortcomings of the url versioning. I think we can agree that api versioning has no right way. So the best we can do is to provide flexibility and let the user do the url versioning if he wants. The only way to do this is to switch schemas with a header.

I've never been a fan of adding the version in the media type since it makes the api harder to explore, but one of the articles you shared proposes the Accept-Version header, which is not so bad to type in an app like postwoman/postman/insomnia/etc. I think we can use that. It's not standard but it seems to be gaining traction(seen it recommended here). It would be better to use that than coming up with our own X-custom-header.

The only problem is that that header is not that "semantic" for schema based mutlitenancy. I've seen two other headers that seemed more fit for this: Accept-Profile and Accept-Schema. But I'm not sure if we would be abusing them by just providing a string value like Accept-Profile: my_schema.

All in all, I think the safest way would be to go with a header(whichever we choose). We could then provide an nginx snippet for users that'd like the url versioning.

@skariel
Copy link

skariel commented Dec 25, 2019

I think pREST can do it. But I haven't tried it yet...

@steve-chavez
Copy link
Member Author

I think we can go with the Accept-Version header for now, it's commonly used and if there's a need we can later include more semantic headers.

@MahmoudKassem What do you think?

@ghost
Copy link

ghost commented Jan 13, 2020

@steve-chavez I agree on community conses being more important than semantic coherence, so I've modified the schema passing to use the Accept-Version header instead of the URL https://github.com/mahmoudkassem/postgrest/commit/be330be9e74fa695a8ba278db39f15e3061c7183.

@steve-chavez
Copy link
Member Author

@MahmoudKassem Design-wise I think this is done! All what's left would be adding tests.

You could add a v1 and v2 schemas with a couple tables in fixtures.sql:

CREATE SCHEMA public;
CREATE SCHEMA postgrest;
CREATE SCHEMA private;
CREATE SCHEMA test;
CREATE SCHEMA تست;
CREATE SCHEMA extensions;

Then add a new spec in test/Feature/. Could be named MultipleSchemasSpec.hs.

This spec would take a config-schema="v1,v2". Here's a config you can clone and modify for that purpose:

testUnicodeCfg :: Text -> AppConfig
testUnicodeCfg testDbConn = (testCfg testDbConn) { configSchema = "تست" }

In MultipleSchemasSpec.hs you should test:

  • A request to v1.table and a request to v2.table
  • A request with an unknown schema gives 404.
  • A request without a specified schema should give the default schema.
  • A few OpenAPI tests for the tables, similar to:
    it "includes definitions to tables" $ do
    r <- simpleBody <$> get "/"
    let def = r ^? key "definitions" . key "child_entities"
    liftIO $
    def `shouldBe` Just
    [aesonQQ|
    {
    "type": "object",
    "description": "child_entities comment",
    "properties": {
    "id": {
    "description": "child_entities id comment\n\nNote:\nThis is a Primary Key.<pk/>",
    "format": "integer",
    "type": "integer"
    },
    "name": {
    "description": "child_entities name comment. Can be longer than sixty-three characters long",
    "format": "text",
    "type": "string"
    },
    "parent_id": {
    "description": "Note:\nThis is a Foreign Key to `entities.id`.<fk table='entities' column='id'/>",
    "format": "integer",
    "type": "integer"
    }
    },
    "required": [
    "id"
    ]
    }
    |]

You can open a PR while you do this if you'd like, we can keep working on the tests there.

@ghost
Copy link

ghost commented Jan 19, 2020

@steve-chavez I opened the PR #1436 and will add the tests now. Thanks to your detailed outline writing the tests should be straightforward I will probably require support with maintaining the sandbox on the query level though. This is not implemented yet ofc this would also need to have tests as well.

@steve-chavez
Copy link
Member Author

steve-chavez commented Mar 31, 2020

Solved in #1450!

Basically, the schema to be used can be selected through the Accept-Profile: schema2 header for GET/HEAD requests. For POST/PATCH/PUT/DELETE requests Content-Profile: schema2 can be used.

The headers are based on this nascent spec: Content Negotiation by Profile.

Edit: This is only possible if db-schema has a list of schemas. Like db-schema = schema1, schema2.

@akshaysasidrn
Copy link

akshaysasidrn commented Oct 13, 2022

Hello team, first of all thank you for the feature. I'm trying to make use of multiple schemas in a single instance, but I have a limitation of not being able to know the db-schemas beforehand. How do I dynamically update the list on the posgrest server such that I can then trigger NOTIFY pgrst, 'reload config' to have the schemas available at the API?

Edit: I found the answer within the documentation itself. I had missed it earlier. Please ignore this comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement a feature, ready for implementation
Development

No branches or pull requests

9 participants