Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Destination S3 to Parquet/Avro: Handle empty objects with additionalProperties=true #4608

Closed
Tracked by #6996
blotouta2 opened this issue Jul 7, 2021 · 4 comments · Fixed by #7288
Closed
Tracked by #6996

Comments

@blotouta2
Copy link
Contributor

Enviroment

  • Airbyte version: example is 0.25.0-alpha
  • OS Version / Instance: AWS EC2
  • Deployment: Docker
  • Source Connector and version: (if applicable example Salesforce 0.2.3) Slack 0.2.3
  • Destination Connector and version: (if applicable example Postgres 0.3.3) Destination S3 0.1.7
  • Severity: High
  • Step where error happened: Sync job

Current Behavior

i was trying to send data from facebook marketing to destination s3 in parquet format but it give me error. it works well with S3 CSV format previously.

Expected Behavior

Data will be sync in parquet format.

Logs

If applicable, please upload the logs from the failing operation.
For sync jobs, you can download the full logs from the UI by going to the sync attempt page and
clicking the download logs button at the top right of the logs display window.

slack-s3-parquet.txt

LOG

Logs are attached as txt file.

@blotouta2 blotouta2 added the type/bug Something isn't working label Jul 7, 2021
@sherifnada sherifnada added the area/connectors Connector related issues label Jul 7, 2021
@ChristopheDuong ChristopheDuong added this to the Core - 2021-07-14 milestone Jul 7, 2021
@sherifnada
Copy link
Contributor

sherifnada commented Jul 7, 2021

Schema of the faulty object (reference):

"pinned_info": {
  "type": [
    "null",
    "object"
  ],
  "additionalProperties": true,
  "properties": {}
}

@sherifnada sherifnada changed the title Destination S3 Parquet conversion failed Destination S3 to Parquet/Avro: Handle empty objects with additionalProperties=true Jul 7, 2021
@tuliren
Copy link
Contributor

tuliren commented Jul 7, 2021

The root cause is that pinned_info does not have predefined properties. Parquet does not like (is not compatible with) it, and currently we don't support additionalProperties. A related issue and potential solution is here: #4124.

@cgardens cgardens added this to the Core 2021-10-06 milestone Sep 22, 2021
@tuliren
Copy link
Contributor

tuliren commented Sep 30, 2021

Ping Rytis Zolubas when this is done.

@sherifnada
Copy link
Contributor

Desired behvavior:

if there is any data that is not present in the schema, we should put them in a hashmap field additionalProperties and whenever we see extra fields, we serialize them as string and put them in that field

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment