Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vertica timestamp(0) causes an unable to get column information error in DataHub CLI #5295

Closed
martinontcode opened this issue Jun 30, 2022 · 10 comments · Fixed by #6295
Closed
Assignees
Labels
bug Bug report ingestion PR or Issue related to the ingestion of metadata

Comments

@martinontcode
Copy link

Describe the bug
A warning is received in Datahub CLI while executing ingestion from Vertica DB via Vertica plugin.
Warning output: 'warnings': {'XXX_TABLE': ['unable to get column information due to an error -> __init__() got an unexpected keyword ' "argument 'precision'"]}

It seems that the error is caused when an attribute data type is set to timestamp(0). The error can be bypassed when commenting out the following lines in venv\Lib\site-packages\datahub\ingestion\source\sql\vertica.py, _get_column_info function.

elif attype in ("timestamp", "time"):
        kwargs["timezone"] = False
        # if charlen:
        #     kwargs["precision"] = int(charlen)
        args = ()

The created kwargs object precision attribute isn't compatible with the timestamp parameters: {'timezone': False, 'precision': 0}, it's possible that the same error would appear when working with timestampz data type.

Commenting most likely works as timestamp precision is optional - TIMESTAMP/TIMESTAMPTZ

To Reproduce
Create a table in Vertica including an attribute with timestamp(0):

CREATE TABLE YYY.XXX_TABLE
(
    load_timestamp timestamp(0)
);

Execute the pipeline using datahub CLI.

Expected behavior
Load meta from Vertica to file using datahub CLI w/o warnings.

Desktop (please complete the following information):

  • Python 3.8.2
  • acryl-datahub 0.8.40 / DataHub CLI 0.8.40
  • wheel 0.37.1
  • setuptools 62.6.0
  • pip 22.1.2
@martinontcode martinontcode added the bug Bug report label Jun 30, 2022
@maggiehays maggiehays added the ingestion PR or Issue related to the ingestion of metadata label Jun 30, 2022
@martinontcode
Copy link
Author

Hi, is there a possibility of receiving an ETA on when this might be resolved?

@martinontcode
Copy link
Author

Some additional information regarding this issue.

In case of timestamp(x) or timestamptz(x) as mentioned, the following warning is raised:
'warnings': {'XSCHEMA.YTABLE': ['unable to get column information due to an error -> __init__() got an unexpected keyword ' "argument 'precision'"]}

The precision argument is initiated in \Lib\site-packages\datahub\ingestion\source\sql\vertica.py:

    elif attype in ("timestamptz", "timetz"):
        kwargs["timezone"] = True
        if charlen:
            kwargs["precision"] = int(charlen)  # type: ignore
        args = ()  # type: ignore
    elif attype in ("timestamp", "time"):
        kwargs["timezone"] = False
        if charlen:
            kwargs["precision"] = int(charlen)  # type: ignore

This calls for the TIMESTAMP class in \Lib\site-packages\sqlalchemy\sql\sqltypes.py:

class TIMESTAMP(DateTime):

    """The SQL TIMESTAMP type.

    :class:`_types.TIMESTAMP` datatypes have support for timezone
    storage on some backends, such as PostgreSQL and Oracle.  Use the
    :paramref:`~types.TIMESTAMP.timezone` argument in order to enable
    "TIMESTAMP WITH TIMEZONE" for these backends.

    """

    __visit_name__ = "TIMESTAMP"

    def __init__(self, timezone=False):
        """Construct a new :class:`_types.TIMESTAMP`.

        :param timezone: boolean.  Indicates that the TIMESTAMP type should
         enable timezone support, if available on the target database.
         On a per-dialect basis is similar to "TIMESTAMP WITH TIMEZONE".
         If the target database does not support timezones, this flag is
         ignored.


        """
        super(TIMESTAMP, self).__init__(timezone=timezone)

    def get_dbapi_type(self, dbapi):
        return dbapi.TIMESTAMP

Which returns the precision error as there's no precision argument defined.

@martinontcode
Copy link
Author

So an easy fix would be to add the following argument and code line into the class:

class TIMESTAMP(DateTime):

    """The SQL TIMESTAMP type.

    :class:`_types.TIMESTAMP` datatypes have support for timezone
    storage on some backends, such as PostgreSQL and Oracle.  Use the
    :paramref:`~types.TIMESTAMP.timezone` argument in order to enable
    "TIMESTAMP WITH TIMEZONE" for these backends.

    """

    __visit_name__ = "TIMESTAMP"

    def __init__(self, timezone=False, precision=None):
        """Construct a new :class:`_types.TIMESTAMP`.

        :param timezone: boolean.  Indicates that the TIMESTAMP type should
         enable timezone support, if available on the target database.
         On a per-dialect basis is similar to "TIMESTAMP WITH TIMEZONE".
         If the target database does not support timezones, this flag is
         ignored.


        """
        super(TIMESTAMP, self).__init__(timezone=timezone)
        self.precision = precision

    def get_dbapi_type(self, dbapi):
        return dbapi.TIMESTAMP

Which results in the following schema output:

{
"fieldPath": "DATE_CREATED",
"jsonPath": null,
"nullable": true,
"description": null,
"created": null,
"lastModified": null,
"type": {
    "type": {
        "com.linkedin.pegasus2avro.schema.TimeType": {}
    }
},
"nativeDataType": "TIMESTAMP(precision=0)",
"recursive": false,
"globalTags": null,
"glossaryTerms": null,
"isPartOfKey": false,
"isPartitioningKey": null,
"jsonProps": null
}

@maggiehays
Copy link
Collaborator

Hi martinontcode! Would you mind opening a PR with the fix so we can move it through review?

@github-actions
Copy link

github-actions bot commented Aug 4, 2022

This issue is stale because it has been open for 15 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io

@github-actions github-actions bot added the stale label Aug 4, 2022
@martinontcode
Copy link
Author

Hi, the issue is still present.
The complexity of this issue comes from the fact that the used import from sqlalchemy.sql import sqltypes has the missing "precision" parameter.

Is there a way to fix this from Datahub perspective?

@github-actions github-actions bot removed the stale label Sep 4, 2022
@github-actions
Copy link

github-actions bot commented Oct 4, 2022

This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io

@inancdokurel
Copy link
Contributor

I submitted a PR attempting to fix this problem. It basically is a hacky implementation of the suggestion @martinontcode made.

@github-actions github-actions bot removed the stale label Oct 27, 2022
@github-actions
Copy link

This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io

@github-actions github-actions bot added the stale label Nov 27, 2022
@inancdokurel
Copy link
Contributor

This still is a problem in the current version. Could anybody review and comment on the opened PR?

@github-actions github-actions bot removed the stale label Nov 30, 2022
hsheth2 pushed a commit that referenced this issue Dec 7, 2022
Co-authored-by: İnanç Dokurel <inancdokurel@users.noreply.github.com>
Fixes #5295
cccs-Dustin pushed a commit to CybercentreCanada/datahub that referenced this issue Feb 1, 2023
…ahub-project#6295)

Co-authored-by: İnanç Dokurel <inancdokurel@users.noreply.github.com>
Fixes datahub-project#5295
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug report ingestion PR or Issue related to the ingestion of metadata
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants