Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new connection type - Greenplum #10641

Closed
mebelousov opened this issue Aug 29, 2020 · 5 comments
Closed

Add new connection type - Greenplum #10641

mebelousov opened this issue Aug 29, 2020 · 5 comments
Labels
kind:feature Feature Requests

Comments

@mebelousov
Copy link
Contributor

Greenplum DB is a popular open source relational DBMS[1].

I want to implement some database specific behaviour for a common class. It's easy to check connection type for postgres, mysql or oracle. Now I use postgres connection type for Greenplum and cannot explicitly check type.

Airflow has PostgresHook based on psycopg2. As Greenplum supports PostgreSQL libpq API [2], PostgresHook could be reused for Greenplum.

As I see it's enough to change 2 code line in the connectoin.py.

I'm not sure that it's nice to reuse Hook.
Please confirm or reject my idea.

Index: airflow/models/connection.py
===================================================================
--- airflow/models/connection.py  (revision 6416d898060706787861ff8ecbc4363152a35f45)
+++ airflow/models/connection.py  (date 1598717245079)
@@ -109,6 +109,7 @@
         ('grpc', 'GRPC Connection'),
         ('yandexcloud', 'Yandex Cloud'),
         ('spark', 'Spark'),
+        ('greenplum', 'Greenplum'),
     ]
 
     def __init__(
@@ -243,7 +244,7 @@
         elif self.conn_type == 'google_cloud_platform':
             from airflow.contrib.hooks.bigquery_hook import BigQueryHook
             return BigQueryHook(bigquery_conn_id=self.conn_id)
-        elif self.conn_type == 'postgres':
+        elif self.conn_type in ('postgres', 'greenplum'):
             from airflow.hooks.postgres_hook import PostgresHook
             return PostgresHook(postgres_conn_id=self.conn_id)
         elif self.conn_type == 'pig_cli':

  1. https://github.com/greenplum-db/gpdb
  2. https://gpdb.docs.pivotal.io/6-10/security-guide/topics/ports_and_protocols.html
@boring-cyborg
Copy link

boring-cyborg bot commented Aug 29, 2020

Thanks for opening your first issue here! Be sure to follow the issue template!

@turbaszek
Copy link
Member

@mebelousov would you like to open a PR?

@mik-laj
Copy link
Member

mik-laj commented Sep 14, 2020

Why don't you configure the connection as PostgreSQL and then add the missing information in the description? See: https://github.com/apache/airflow/pull/10873/files
Wasn't the ability to add a description for the connection enough for you?

@mebelousov
Copy link
Contributor Author

@mik-laj good idea! Thank you! Seems it's a good work-around.

I found out than method _generate_insert_sql in PostgresHook [1] is incompatible with Greenplum syntax [2]. Now Greenplum doesn't allow to ON CONFLICT ({0}) DO UPDATE SET {1}.
I believe to add GreenplumHook for full compatibility. GreenplumHook will be copy of PostgresHook with exeption of _generate_insert_sql. It's enough to use _generate_insert_sql from DbApiHook.
Is it good idea to add a new hook?

In general we have two choices:

  1. Add GreenplumHook.
  2. Use connection description.

  1. def _generate_insert_sql(table, values, target_fields, replace, **kwargs):
  2. http://docs.greenplum.org/6-0/ref_guide/sql_commands/INSERT.html

@RosterIn
Copy link
Contributor

@mebelousov I think in that case you don't need to duplicate code. you can inherit from PostgresHook and overwrite only the methods that are incomparable.

You hook can be:

class GreenplumHook(PostgresHook):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

    def _generate_insert_sql():
        pass

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:feature Feature Requests
Projects
None yet
Development

No branches or pull requests

4 participants