Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalization fails when catalog has stream with no properties #3999

Closed
xavidop opened this issue Jun 9, 2021 · 1 comment · Fixed by #4020
Closed

Normalization fails when catalog has stream with no properties #3999

xavidop opened this issue Jun 9, 2021 · 1 comment · Fixed by #4020
Labels
type/bug Something isn't working

Comments

@xavidop
Copy link

xavidop commented Jun 9, 2021

Expected Behavior

When a normalization is enabled, all the bulk data imported from mongo should be properly set in the multiple destinations tables.

Current Behavior

When I enable the normalization data toggle with a mongo source, it fails with the following logs:

Logs

´´´log
2021-06-09 14:24:14 INFO (/workspace/288/1) DefaultNormalizationWorker(run):61 - Running normalization.
2021-06-09 14:24:17 INFO (/workspace/288/1) LineGobbler(voidCall):85 - Running: transform-config --config destination_config.json --integration-type bigquery --out /workspace/288/1/normalize
2021-06-09 14:24:17 INFO (/workspace/288/1) LineGobbler(voidCall):85 - Namespace(config='destination_config.json', integration_type=<DestinationType.bigquery: 'bigquery'>, out='/workspace/288/1/normalize')
2021-06-09 14:24:17 INFO (/workspace/288/1) LineGobbler(voidCall):85 - transform_bigquery
2021-06-09 14:24:17 INFO (/workspace/288/1) LineGobbler(voidCall):85 - Running: transform-catalog --integration-type bigquery --profile-config-dir /workspace/288/1/normalize --catalog destination_catalog.json --out /workspace/288/1/normalize/models/generated/ --json-column _airbyte_data
2021-06-09 14:24:17 INFO (/workspace/288/1) LineGobbler(voidCall):85 - Processing destination_catalog.json...
2021-06-09 14:24:17 INFO (/workspace/288/1) LineGobbler(voidCall):85 - Traceback (most recent call last):
2021-06-09 14:24:17 INFO (/workspace/288/1) LineGobbler(voidCall):85 - File "/usr/local/bin/transform-catalog", line 8, in
2021-06-09 14:24:17 INFO (/workspace/288/1) LineGobbler(voidCall):85 - sys.exit(main())
2021-06-09 14:24:17 INFO (/workspace/288/1) LineGobbler(voidCall):85 - File "/usr/local/lib/python3.8/site-packages/normalization/transform_catalog/transform.py", line 102, in main
2021-06-09 14:24:17 INFO (/workspace/288/1) LineGobbler(voidCall):85 - TransformCatalog().run(args)
2021-06-09 14:24:17 INFO (/workspace/288/1) LineGobbler(voidCall):85 - File "/usr/local/lib/python3.8/site-packages/normalization/transform_catalog/transform.py", line 55, in run
2021-06-09 14:24:17 INFO (/workspace/288/1) LineGobbler(voidCall):85 - self.process_catalog()
2021-06-09 14:24:17 INFO (/workspace/288/1) LineGobbler(voidCall):85 - File "/usr/local/lib/python3.8/site-packages/normalization/transform_catalog/transform.py", line 82, in process_catalog
2021-06-09 14:24:17 INFO (/workspace/288/1) LineGobbler(voidCall):85 - processor.process(catalog_file=catalog_file, json_column_name=json_col, default_schema=schema)
2021-06-09 14:24:17 INFO (/workspace/288/1) LineGobbler(voidCall):85 - File "/usr/local/lib/python3.8/site-packages/normalization/transform_catalog/catalog_processor.py", line 72, in process
2021-06-09 14:24:17 INFO (/workspace/288/1) LineGobbler(voidCall):85 - stream_processors = self.build_stream_processor(
2021-06-09 14:24:17 INFO (/workspace/288/1) LineGobbler(voidCall):85 - File "/usr/local/lib/python3.8/site-packages/normalization/transform_catalog/catalog_processor.py", line 148, in build_stream_processor
2021-06-09 14:24:17 INFO (/workspace/288/1) LineGobbler(voidCall):85 - raise EOFError("Invalid Catalog: Unexpected empty properties in catalog")
2021-06-09 14:24:17 INFO (/workspace/288/1) LineGobbler(voidCall):85 - EOFError: Invalid Catalog: Unexpected empty properties in catalog
´´´

the catalog.json is this one:

´´´json
{
"streams": [{
"stream": {
"name": "api-keys",
"json_schema": {
"properties": {}
},
"supported_sync_modes": ["full_refresh", "incremental"],
"source_defined_cursor": false,
"default_cursor_field": [],
"source_defined_primary_key": []
},
"sync_mode": "full_refresh",
"cursor_field": [],
"destination_sync_mode": "append",
"primary_key": []
}, {
"stream": {
"name": "diagrams",
"json_schema": {
"properties": {
"_id": {
"type": "string"
},
"name": {
"type": "string"
},
"zoom": {
"type": "string"
},
"nodes": {
"type": "string"
},
"offsetX": {
"type": "string"
},
"offsetY": {
"type": "string"
},
"children": {
"type": "array"
},
"modified": {
"type": "integer"
},
"creatorID": {
"type": "integer"
},
"variables": {
"type": "array"
},
"versionID": {
"type": "string"
}
}
},
"supported_sync_modes": ["full_refresh", "incremental"],
"source_defined_cursor": false,
"default_cursor_field": [],
"source_defined_primary_key": []
},
"sync_mode": "full_refresh",
"cursor_field": [],
"destination_sync_mode": "append",
"primary_key": []
}, {
"stream": {
"name": "integration-users",
"json_schema": {
"properties": {}
},
"supported_sync_modes": ["full_refresh", "incremental"],
"source_defined_cursor": false,
"default_cursor_field": [],
"source_defined_primary_key": []
},
"sync_mode": "full_refresh",
"cursor_field": [],
"destination_sync_mode": "append",
"primary_key": []
}, {
"stream": {
"name": "oktas",
"json_schema": {
"properties": {}
},
"supported_sync_modes": ["full_refresh", "incremental"],
"source_defined_cursor": false,
"default_cursor_field": [],
"source_defined_primary_key": []
},
"sync_mode": "full_refresh",
"cursor_field": [],
"destination_sync_mode": "append",
"primary_key": []
}, {
"stream": {
"name": "programs",
"json_schema": {
"properties": {
"_id": {
"type": "string"
},
"lines": {
"type": "string"
},
"startId": {
"type": "string"
},
"commands": {
"type": "array"
},
"skill_id": {
"type": "string"
},
"variables": {
"type": "array"
}
}
},
"supported_sync_modes": ["full_refresh", "incremental"],
"source_defined_cursor": false,
"default_cursor_field": [],
"source_defined_primary_key": []
},
"sync_mode": "full_refresh",
"cursor_field": [],
"destination_sync_mode": "append",
"primary_key": []
}, {
"stream": {
"name": "projects",
"json_schema": {
"properties": {
"_id": {
"type": "string"
},
"name": {
"type": "string"
},
"image": {
"type": "string"
},
"teamID": {
"type": "integer"
},
"members": {
"type": "array"
},
"privacy": {
"type": "string"
},
"linkType": {
"type": "string"
},
"platform": {
"type": "string"
},
"creatorID": {
"type": "integer"
},
"prototype": {
"type": "string"
},
"devVersion": {
"type": "string"
},
"platformData": {
"type": "string"
}
}
},
"supported_sync_modes": ["full_refresh", "incremental"],
"source_defined_cursor": false,
"default_cursor_field": [],
"source_defined_primary_key": []
},
"sync_mode": "full_refresh",
"cursor_field": [],
"destination_sync_mode": "append",
"primary_key": []
}, {
"stream": {
"name": "prototype-programs",
"json_schema": {
"properties": {
"_id": {
"type": "string"
},
"lines": {
"type": "string"
},
"startId": {
"type": "string"
},
"commands": {
"type": "array"
},
"skill_id": {
"type": "string"
},
"variables": {
"type": "array"
}
}
},
"supported_sync_modes": ["full_refresh", "incremental"],
"source_defined_cursor": false,
"default_cursor_field": [],
"source_defined_primary_key": []
},
"sync_mode": "full_refresh",
"cursor_field": [],
"destination_sync_mode": "append",
"primary_key": []
}, {
"stream": {
"name": "runtime-sessions",
"json_schema": {
"properties": {}
},
"supported_sync_modes": ["full_refresh", "incremental"],
"source_defined_cursor": false,
"default_cursor_field": [],
"source_defined_primary_key": []
},
"sync_mode": "full_refresh",
"cursor_field": [],
"destination_sync_mode": "append",
"primary_key": []
}, {
"stream": {
"name": "versions",
"json_schema": {
"properties": {
"_id": {
"type": "string"
},
"name": {
"type": "string"
},
"creatorID": {
"type": "integer"
},
"projectID": {
"type": "string"
},
"prototype": {
"type": "string"
},
"variables": {
"type": "array"
},
"platformData": {
"type": "string"
},
"rootDiagramID": {
"type": "string"
}
}
},
"supported_sync_modes": ["full_refresh", "incremental"],
"source_defined_cursor": false,
"default_cursor_field": [],
"source_defined_primary_key": []
},
"sync_mode": "full_refresh",
"cursor_field": [],
"destination_sync_mode": "append",
"primary_key": []
}]
}
´´´

Steps to Reproduce

Just adding a mongo as a source, I used BigQuery as a destination and then enabled the normalization data toggle

Severity of the bug for you

Critical

Airbyte Version

0.24.7-alpha

Connector Version (if applicable)

Mongo connector version 0.3.1

Additional context

I have Airbyte deployed on Kubernetes EKS

@xavidop xavidop added the type/bug Something isn't working label Jun 9, 2021
@ChristopheDuong ChristopheDuong changed the title Normalization data from mongo fails Normalization fails when catalog has stream with no properties Jun 10, 2021
@ChristopheDuong
Copy link
Contributor

for the meantime, if you avoid selecting collections with no data (empty properties in a stream or empty columns in a table) the exception won't be thrown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants