Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PostGIS support #1601

Merged
merged 6 commits into from
May 20, 2023
Merged

PostGIS support #1601

merged 6 commits into from
May 20, 2023

Conversation

crossworth
Copy link
Contributor

@crossworth crossworth commented Apr 23, 2023

Hello,
while Atlas supports using PostGIS in some capacity, we could improve it. This PR is an attempt to adds support to PostGIS extension to Atlas test suite.

PostGIS is the most used extension for Postgres.

image

The goals of this PR are small:

  • Run the all the Postgres tests with the PostGIS image successful
  • Create a pattern for testing Postgres extensions and be an example for other extensions (I'm looking at you TimescaleDB)

I think that been able to pass all the tests with PosGIS would solve the problems related to the schemas created by the extension (#1568 and #1577) and would pave the way to solve the problem related to geometry detection (#1577).

The problem

PostGIS (can) create a few schema/tables, that Atlas recognize as user defined, this schema/tables are used to provide extra functionality and most of the time should not be changed by the user (or Atlas), there is ways to exclude them, but I think that Atlas should be able to identify schema/tables created by extensions and make the flow as easy as possible.

schemas:

tables:

Declarative:

When running inspect on an empty database, Atlas tries to define the table spatial_ref_sys

atlas schema inspect -u "postgres://localhost:15432/postgis_test?sslmode=disable" > schema.hcl
schema.hcl
table "spatial_ref_sys" {
  schema = schema.public
  column "srid" {
    null = false
    type = integer
  }
  column "auth_name" {
    null = true
    type = character_varying(256)
  }
  column "auth_srid" {
    null = true
    type = integer
  }
  column "srtext" {
    null = true
    type = character_varying(2048)
  }
  column "proj4text" {
    null = true
    type = character_varying(2048)
  }
  primary_key {
    columns = [column.srid]
  }
  check "spatial_ref_sys_srid_check" {
    expr = "((srid > 0) AND (srid <= 998999))"
  }
}
schema "public" {
}

It's possible to mitigate the problem using exclude:

atlas schema inspect -u "postgres://localhost:15432/postgis_test?sslmode=disable" --exclude "*.spatial_ref_sys" > schema.hcl
# or
atlas schema inspect -u "postgres://localhost:15432/postgis_test?search_path=public&sslmode=disable" --exclude "spatial_ref_sys" > schema.hcl

When applying the schema, we have to exclude it as well, otherwise we face an error:

Error: drop "spatial_ref_sys" table: pq: cannot drop table spatial_ref_sys because extension postgis requires it

Versioned:

The atlas migrate diff don't support the exclude flag, so when generating from a database with the PostGIS extension already enabled, we get the spatial_ref_sys table definition, one way to solve it would be to edit the file and hash the migration directory again.

The atlas migrate apply works as expected as well, but the atlas migrate lint causes problems.

When running with the docker image, we get a pq: extension "postgis" is not available error as expected.

When using an externa database with PostGIS installed, we get different errors:

Error: drop "spatial_ref_sys" table: pq: cannot drop table spatial_ref_sys because extension postgis requires it

and

Error: taking database snapshot: sql/migrate: connected database is not clean: found table "spatial_ref_sys" in connected schema

Since the atlas migrate commands dont support the exclude flag, we don't really have a work around in this case.

The solution (‽)

This PR tries to solve the problem by changing the query used to inspect tables to ignore tables created/used by extensions. This fixes the problem of the spatial_ref_sys table and mostly (if not all) the problems of the declarative way and the versioned way as well.

But...

The tests are still failing due to the schemas created by PostGIS docker image and Driver.CheckClean checks.
See, on my machine I don't have that schemas (because I only use the base postgis extension), so the tests against my local Postgres instance works.

Tests error:

sql/migrate: connected database is not clean: found schema "tiger". baseline version or allow-dirty is required

I tried to change the query used to inspect the schemas as well, got one almost working, but the tiger_data still cause problems.

test=# SELECT 
    schema_name 
FROM information_schema.schemata 
WHERE 
    schema_name NOT IN ('information_schema', 'pg_catalog', 'pg_toast', 'crdb_internal', 'pg_extension') 
        AND schema_name NOT LIKE 'pg_%temp_%'
        AND NOT EXISTS (
                SELECT 
                    1
                FROM pg_extension
                WHERE 
                    extnamespace = (
                        SELECT 
                            oid 
                        FROM pg_namespace 
                        WHERE 
                            nspname = schema_name
                    )
        )
ORDER BY 
    schema_name;
 schema_name 
-------------
 tiger_data
(1 row)

Tests error:

sql/migrate: connected database is not clean: found schema "tiger_data". baseline version or allow-dirty is required

The way I see, there is no perfect way to ensure a schema/table is used (only) by the extension in some cases (like this one), so while this PR make some progress, it's not the complete solution and tests are failing.

Maybe the best solution would be embrace that sometimes we can not exclude the schema/tables created by extensions automatically and let the user exclude it as required.

So that means adding a exclude to the atlas migrate lint and atlas migrate diff commands and to make the tests pass, we would need to allow providing InspectOptions to Driver.CheckClean/Driver.Snapshot and skip the PostGIS schemas.

Alternatively we could create a custom PostGIS image without the postgis_topology and postgis_tiger_geocoder extensions and handle them at a later stage.


Comments or ideas are appreciated.

Copy link
Member

@a8m a8m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution, @crossworth. Can you check the CI? Let me know if I can help with anything

@plutino
Copy link

plutino commented May 9, 2023

@crossworth @a8m I took a look of the CI failures, and they are all caused by the extra schemas added by postgis, namely tiger, tiger_data, and topology.

I'm not familiar with the code base enough to have the best way to fix this, but this change fixes the tests:

--- a/sql/postgres/inspect.go
+++ b/sql/postgres/inspect.go
@@ -1073,7 +1073,7 @@ const (
        // Query to list database schemas.
-       schemasQuery = "SELECT schema_name FROM information_schema.schemata WHERE schema_name NOT IN ('information_schema', 'pg_catalog', 'pg_toast', 'crdb_internal', 'pg_extension') AND schema_name NOT LIKE 'pg_%temp_%' ORDER BY schema_name"
+       schemasQuery = "SELECT schema_name FROM information_schema.schemata WHERE schema_name NOT IN ('information_schema', 'pg_catalog', 'pg_toast', 'crdb_internal', 'pg_extension', 'tiger', 'tiger_data', 'topology') AND schema_name NOT LIKE 'pg_%temp_%' ORDER BY schema_name"

These schemas are created because the postgis image creates the postgis_topology and postgis_tiger_geocoder extensions to all databases. A better solution might be not to create them and rely on user migrations. Unfortunately, the official postgis does not provide an option to disable the creation of these extensions. Probably use a custom image instead?

@crossworth
Copy link
Contributor Author

crossworth commented May 9, 2023

Hello @plutino, thanks for giving a look on the issue.

While this change fix this problem, it's not the best solution, this would limit the names tiger, tiger_data and topology, that are not that "unique" and can create hard to understood bugs. This wont help with future extensions either.

To make the CI pass, we only need to provide a way to exclude this schemas as needed (vs hardcoded on the query), Atlas already has support for it in a few places (the exclude flag) or use a lean PostGIS Docker image like you mentioned.

In my opinion the best solution is to add support for the exclude flag on atlas migrate lint and atlas migrate diff commands.

I plan to take a look at it this week and see if I can make progress, but if you would like to give a try, please let me know.

@plutino
Copy link

plutino commented May 9, 2023

Hello @plutino, thanks for giving a look on the issue.

While this change fix this problem, it's not the best solution, this would limit the names tiger, tiger_data and topology, that are not that "unique" and can create hard to understood bugs. This wont help with future extensions either.

To make the CI pass, we only need to provide a way to exclude this schemas as needed (vs hardcoded on the query), Atlas already has support for it in a few places (the exclude flag) or use a lean PostGIS Docker image like you mentioned.

In my opinion the best solution is to add support for the exclude flag on atlas migrate lint and atlas migrate diff commands.

I plan to take a look at it this week and see if I can make progress, but if you would like to give a try, please let me know.

Sounds good. Thanks @crossworth . This will help us a lot!

@crossworth
Copy link
Contributor Author

Alright, I tried my ideia of adding support for the exclude flag, but seems like I had only two ways of doing (in a backward compatible way):

  • Adding a optional argument to Driver.CheckClean and Driver.Snapshot (to allow controlling the InspectSchema and InspectRealm.
  • Adding the options to the driver itself.

The first option seems to me the best way, since it creates a better API, but it requires changes to a lot of packages until we get to the packages used by the cmd/atlas, the second option seemed easier to me, but feels a lot more like hacked together.

After thinking a bit, I remembered that my goal with this PR was quite simple, making the CI pass with PostGIS, with as little code change as possible, so I took a step back and implemented a different solution, a little bit hacky as well, but at least only in the tests part.

We drop the unused schemas, would be better to have a lean docker image, but maintaining a custom docker image could not be worth for this case.

This means that the problem related to the postgis_tiger_geocoder extension still exists, we could always go back to it when/if the needs arise.

Now, with this change both the declarative and versioned migration strategies works, without the need of the exclude flag:

NOTE: Each test was done with a empty database running on docker:

docker run --rm -e POSTGRES_PASSWORD=pass -e POSTGRES_DB=test -p 25432:5432 postgis/postgis:latest

Declarative

atlas schema inspect

# note the search_path
atlas schema inspect --url "postgres://postgres:pass@localhost:25432/test?search_path=public&sslmode=disable"
atlas version v0.10.2-30a2c72-canary
table "spatial_ref_sys" {
  schema = schema.public
  column "srid" {
    null = false
    type = integer
  }
  column "auth_name" {
    null = true
    type = character_varying(256)
  }
  column "auth_srid" {
    null = true
    type = integer
  }
  column "srtext" {
    null = true
    type = character_varying(2048)
  }
  column "proj4text" {
    null = true
    type = character_varying(2048)
  }
  primary_key {
    columns = [column.srid]
  }
  check "spatial_ref_sys_srid_check" {
    expr = "((srid > 0) AND (srid <= 998999))"
  }
}
schema "public" {
}
This PR
schema "public" {
}

Versioned

atlas migrate diff

# note the search_path
atlas migrate diff --dir "file://migrations" --to "postgres://postgres:pass@localhost:25432/test?search_path=public&sslmode=disable" --dev-url "docker://postgres/15"
atlas version v0.10.2-30a2c72-canary

File migrations/20230510042843.sql created.

-- Create "spatial_ref_sys" table
CREATE TABLE "public"."spatial_ref_sys" ("srid" integer NOT NULL, "auth_name" character varying(256) NULL, "auth_srid" integer NULL, "srtext" character varying(2048) NULL, "proj4text" character varying(2048) NULL, PRIMARY KEY ("srid"), CONSTRAINT "spatial_ref_sys_srid_check" CHECK ((srid > 0) AND (srid <= 998999)));
This PR

No file is created.

The migration directory is synced with the desired state, no changes to be made

atlas migrate apply

# note the search_path
atlas migrate apply --dir "file://migrations" --url "postgres://postgres:pass@localhost:25432/test?search_path=public&sslmode=disable"
atlas version v0.10.2-30a2c72-canary
Error: sql/migrate: connected database is not clean: found table "atlas_schema_revisions" in schema "public". baseline version or allow-dirty is required
This PR
Migrating to version 20230510043225 (1 migrations in total):

  -- migrating version 20230510043225
    -> create table users (
id integer not null
);
  -- ok (6.722041ms)

  -------------------------
  -- 154.249ms
  -- 1 migrations
  -- 1 sql statements

atlas migrate lint

We are required to use a dev database with PostGIS extension, so no docker support 😔.

atlas migrate lint --dir "file://migrations" --dev-url "postgres://postgres:pass@localhost:25432/test?search_path=public&sslmode=disable" --latest 1
atlas version v0.10.2-30a2c72-canary
Error: taking database snapshot: sql/migrate: connected database is not clean: found table "spatial_ref_sys" in connected schema
This PR

No output

Copy link
Member

@a8m a8m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, @crossworth!

@a8m a8m merged commit 214f9fa into ariga:master May 20, 2023
25 checks passed
@plutino
Copy link

plutino commented May 22, 2023

Thanks @crossworth. This would really resolve our problem.

@a8m do you have an estimate on when this can be released? We currently uses a custom build and I'd really love to switch to an official release.

@a8m
Copy link
Member

a8m commented May 22, 2023

The binary is released two times a day. You can install the latest version with:

curl -sSf https://atlasgo.sh | sh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants