don't convert date to iso string format if export format is parquet in PostgresToGCSOperator#25691
Conversation
|
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
|
pingzh
left a comment
There was a problem hiding this comment.
can you write a corresponding test for it? thanks
|
If I remember correctly we had to modify things to be able to export them to
|
Converting time, detetime and date to a string format had broken the Operator for parquet as export format |
The schema generated by this operator use type that are safe for BigQuery ( Parquet has on top of that an additional mapping, for mapping BigQuery types to pyarrow types. (See I would expect parquet export to be successful when columns are dates, but also be able to import this into bigquery with a correct schema definition. (This is how it works for Note: I took a quick look, and couldn’t find what changed. On my bucket I found working extract from 19 April 2022, with |
Yes i confirm this field will have the right type in BigQuery. If you see my terminal screenshot I parsed the schema of parquet file I exported. |
But unfortunately we don't have unit tests on parquet export format which can confirm that parquet format was working :( |
We should not compare how json and csv format work with parquet format. Because it's two different different files types, json and csv file are string serialized format and parquet is a binary format which make parquet a type aware format. For example:
|
|
needs rebase and test fixes. |
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
Issue description:
In
PostgresToGCSOperatoroperator we can't write DATE column whenexport_format="parquet"because atconvert_typemethod the date data result from postgres are converted to string so pyarrow raise this exception :pyarrow.lib.ArrowTypeError: object of type <class 'str'> cannot be converted to intSolution:
Don't convert date to string when export format is parquet.
Here my solution and screen of DagRun: