Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Destination S3 SchemaParseException: Illegal initial character #5149

Closed
marcosmarxm opened this issue Aug 3, 2021 · 1 comment · Fixed by #5729
Closed

Destination S3 SchemaParseException: Illegal initial character #5149

marcosmarxm opened this issue Aug 3, 2021 · 1 comment · Fixed by #5729

Comments

@marcosmarxm
Copy link
Member

Enviroment

  • Airbyte version: 0.29.2-alpha
  • OS Version / Instance: Ubuntu 18.04
  • Deployment: Docker
  • Source Connector and version: Facebook Marketing
  • Destination Connector and version: S3 Parquet (default option)
  • Severity: Very Low / Low / Medium / High / Critical
  • Step where error happened: Sync job

Current Behavior

Sync Facebook Marketing to S3 Parquet doesnt works.
The connection hangs a
API Source can have complex JSON schemas where some formats maybe doesnt have native support.

Expected Behavior

Should work fine maybe removing the illegal char.

Logs

If applicable, please upload the logs from the failing operation.
For sync jobs, you can download the full logs from the UI by going to the sync attempt page and
clicking the download logs button at the top right of the logs display window.

LOG

2021-08-03 01:22:47 INFO () WorkerRun(call):62 - Executing worker wrapper. Airbyte version: 0.29.2-alpha
2021-08-03 01:22:47 INFO () TemporalAttemptExecution(get):110 - Executing worker wrapper. Airbyte version: 0.29.2-alpha
2021-08-03 01:22:48 INFO () DefaultReplicationWorker(run):102 - start sync worker. job id: 6 attempt id: 0
2021-08-03 01:22:48 INFO () DefaultReplicationWorker(run):111 - configured sync modes: {null.ads_insights_platform_and_device=incremental - append, null.ads=incremental - append, null.ad_creatives=full_refresh - overwrite, null.campaigns=incremental - append, null.ad_sets=incremental - append, null.ads_insights=incremental - append, null.ads_insights_region=incremental - append, null.ads_insights_age_and_gender=incremental - append, null.ads_insights_country=incremental - append, null.ads_insights_dma=incremental - append}
2021-08-03 01:22:48 INFO () DefaultAirbyteDestination(start):78 - Running destination...
2021-08-03 01:22:48 INFO () LineGobbler(voidCall):85 - Checking if airbyte/destination-s3:0.1.9 exists...
2021-08-03 01:22:48 INFO () LineGobbler(voidCall):85 - airbyte/destination-s3:0.1.9 was found locally.
2021-08-03 01:22:48 INFO () DockerProcessFactory(create):146 - Preparing command: docker run --rm --init -i -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -w /data/6/0 --network host --log-driver none airbyte/destination-s3:0.1.9 write --config destination_config.json --catalog destination_catalog.json
2021-08-03 01:22:48 INFO () LineGobbler(voidCall):85 - Checking if airbyte/source-facebook-marketing:0.2.14 exists...
2021-08-03 01:22:48 INFO () LineGobbler(voidCall):85 - airbyte/source-facebook-marketing:0.2.14 was found locally.
2021-08-03 01:22:48 INFO () DockerProcessFactory(create):146 - Preparing command: docker run --rm --init -i -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -w /data/6/0 --network host --log-driver none airbyte/source-facebook-marketing:0.2.14 read --config source_config.json --catalog source_catalog.json
2021-08-03 01:22:48 INFO () DefaultReplicationWorker(run):139 - Waiting for source thread to join.
2021-08-03 01:22:48 INFO () DefaultReplicationWorker(lambda$getDestinationOutputRunnable$3):246 - Destination output thread started.
2021-08-03 01:22:48 INFO () DefaultReplicationWorker(lambda$getReplicationRunnable$2):210 - Replication thread started.
2021-08-03 01:22:51 INFO () DefaultAirbyteStreamFactory(lambda$create$0):73 - 2021-08-03 01:22:51 INFO i.a.i.b.IntegrationRunner(run):81 - {} - Running integration: io.airbyte.integrations.destination.s3.S3Destination
2021-08-03 01:22:51 INFO () DefaultAirbyteStreamFactory(lambda$create$0):73 - 2021-08-03 01:22:51 INFO i.a.i.b.IntegrationCliParser(parseOptions):135 - {} - integration args: {catalog=destination_catalog.json, write=null, config=destination_config.json}
2021-08-03 01:22:51 INFO () DefaultAirbyteStreamFactory(lambda$create$0):73 - 2021-08-03 01:22:51 INFO i.a.i.b.IntegrationRunner(run):85 - {} - Command: WRITE
2021-08-03 01:22:51 INFO () DefaultAirbyteStreamFactory(lambda$create$0):73 - 2021-08-03 01:22:51 INFO i.a.i.b.IntegrationRunner(run):86 - {} - Integration config: IntegrationConfig{command=WRITE, configPath='destination_config.json', catalogPath='destination_catalog.json', statePath='null'}
2021-08-03 01:22:51 INFO () DefaultAirbyteStreamFactory(lambda$create$0):73 - 2021-08-03 01:22:51 INFO i.a.i.d.s.S3FormatConfigs(getS3FormatConfig):42 - {} - S3 format config: {"format_type":"Parquet","page_size_kb":1024,"block_size_mb":128,"compression_codec":"UNCOMPRESSED","dictionary_encoding":true,"max_padding_size_mb":8,"dictionary_page_size_kb":1024}
2021-08-03 01:22:51 INFO () DefaultAirbyteStreamFactory(lambda$create$0):73 - 2021-08-03 01:22:51 WARN i.a.i.d.s.a.JsonToAvroSchemaConverter(getAvroSchema):126 - {} - Schema name contains illegal character(s) and is standardized: adlabels.items -> adlabels_items
2021-08-03 01:22:51 INFO () DefaultAirbyteStreamFactory(lambda$create$0):73 - 2021-08-03 01:22:51 WARN i.a.i.b.FailureTrackingAirbyteMessageConsumer(close):78 - {} - Airbyte message consumer: failed.

Steps to Reproduce

  1. Create a Facebook Marketing (using integration account)
  2. Create S3 Destination with Parquet format (default values)
  3. Sync using Incremental Append

Are you willing to submit a PR?

Not at the moment.

@marcosmarxm marcosmarxm added the type/bug Something isn't working label Aug 3, 2021
@sherifnada
Copy link
Contributor

implementation hint: the issue seems to be that some fields in the facebook source start with a numeric character. This is an illegal first character when writing to avro/parquet. The right behavior is probably to prefix with _.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment