Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source Mailchimp: AssertionError: Mismatched number of tables #8825

Closed
Tracked by #11031
marcosmarxm opened this issue Dec 16, 2021 · 17 comments · Fixed by #10975
Closed
Tracked by #11031

Source Mailchimp: AssertionError: Mismatched number of tables #8825

marcosmarxm opened this issue Dec 16, 2021 · 17 comments · Fixed by #10975
Assignees

Comments

@marcosmarxm
Copy link
Member

marcosmarxm commented Dec 16, 2021

Environment

  • Airbyte version: 0.33.12-alpha
  • OS Version / Instance: macOS
  • Deployment: Docker
  • Source Connector and version: Mailchimp 0.2.9
  • Destination Connector and version: Postgres 0.3.13
  • Severity: Very Low / Low / Medium / High / Critical
  • Step where error happened: Deploy / Sync job / Setup new connection / Update connector / Upgrade Airbyte

Current Behavior

Sync connector is not working with Normalization.

Expected Behavior

Tell us what should happen.

Logs

normalization - 2021-12-16 00:39:06 ERROR () LineGobbler(voidCall):82 -   File "/usr/local/bin/transform-catalog", line 8, in <module>
normalization - 2021-12-16 00:39:06 ERROR () LineGobbler(voidCall):82 -     sys.exit(main())
normalization - 2021-12-16 00:39:06 ERROR () LineGobbler(voidCall):82 -   File "/usr/local/lib/python3.8/site-packages/normalization/transform_catalog/transform.py", line 82, in main
normalization - 2021-12-16 00:39:06 ERROR () LineGobbler(voidCall):82 -     TransformCatalog().run(args)
normalization - 2021-12-16 00:39:06 ERROR () LineGobbler(voidCall):82 -   File "/usr/local/lib/python3.8/site-packages/normalization/transform_catalog/transform.py", line 35, in run
normalization - 2021-12-16 00:39:06 ERROR () LineGobbler(voidCall):82 -     self.process_catalog()
normalization - 2021-12-16 00:39:06 ERROR () LineGobbler(voidCall):82 -   File "/usr/local/lib/python3.8/site-packages/normalization/transform_catalog/transform.py", line 62, in process_catalog
normalization - 2021-12-16 00:39:06 ERROR () LineGobbler(voidCall):82 -     processor.process(catalog_file=catalog_file, json_column_name=json_col, default_schema=schema)
normalization - 2021-12-16 00:39:06 ERROR () LineGobbler(voidCall):82 -   File "/usr/local/lib/python3.8/site-packages/normalization/transform_catalog/catalog_processor.py", line 63, in process
normalization - 2021-12-16 00:39:06 ERROR () LineGobbler(voidCall):82 -     for conflict in tables_registry.resolve_names():
�[32mnormalization�[0m - 2021-12-16 00:39:06 ERROR () LineGobbler(voidCall):82 -   File "/usr/local/lib/python3.8/site-packages/normalization/transform_catalog/table_name_registry.py", line 196, in resolve_table_names
�[32mnormalization�[0m - 2021-12-16 00:39:06 ERROR () LineGobbler(voidCall):82 -     assert (table_count * 2) == registry_size, f"Mismatched number of tables {table_count * 2} vs {registry_size} being resolved"
�[32mnormalization�[0m - 2021-12-16 00:39:06 ERROR () LineGobbler(voidCall):82 - AssertionError: Mismatched number of tables 62 vs 60 being resolved

Complete logs: logs-5-0 (2).txt

Steps to Reproduce

  1. create mailchimp with integration account
  2. use postgres destination
  3. run sync #:bomb:

Are you willing to submit a PR?

Remove this with your answer.

@marcosmarxm marcosmarxm added type/bug Something isn't working needs-triage labels Dec 16, 2021
@zkid18
Copy link
Contributor

zkid18 commented Dec 16, 2021

+1 faced the same issue recently in the similar environment.

@sherifnada sherifnada added area/connectors Connector related issues and removed needs-triage labels Dec 17, 2021
@sherifnada sherifnada added this to the Connectors Jan 14 2022 milestone Dec 24, 2021
@htrueman htrueman self-assigned this Dec 27, 2021
@htrueman
Copy link
Contributor

Scoping report

  • I've run the normalization process separately with docker run --rm --init -i -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -w /data/29/2/normalize --network host --log-driver none airbyte/normalization:0.1.61 run --integration-type postgres --config destination_config.json --catalog destination_catalog.json and it fails with same error as described in the issue.

  • So it seems that the issue is either with the destination or with normalization.

  • While debugging found that there are duplicate table name, which is campaigns_recipients__gment_opts_conditions

  • As the normalization script should handle collisions, I may assume that the issue is specifically within normalization script.

  • To fix that, we need to resolve this collision and make it not possibe in future.

@htrueman
Copy link
Contributor

htrueman commented Jan 4, 2022

Issue is no longer appearing. Tested within the same conditions. Both the full connector sync and normalization separately.
Seems that recent postgres destination updated resolved the issue.
Lib versions:
airbyte/normalization: 0.1.63,
airbyte-cdk: 0.1.47,
source-mailchimp: 0.2.11,
destination-postgres: 0.3.13

@htrueman
Copy link
Contributor

Reopened as user reported the issue again. Going to update the normalizator to explicitly remove duplicates.

@sherifnada
Copy link
Contributor

this is potentially related to the fact that normalization shortens the name. Could there be anything leading to two instances of the shortened name being created?

@htrueman
Copy link
Contributor

this is potentially related to the fact that normalization shortens the name. Could there be anything leading to two instances of the shortened name being created?

You're right. I made a research on this issue. Figured that the issue is indeed fixed.
Previously we had campaigns_recipients__gment_opts_conditions table duplicated. But for now it's fixed, all duplications are resolved. Going to contact the end user to tell him to try again. Perhaps to reinstall the connector.

@htrueman
Copy link
Contributor

htrueman commented Jan 19, 2022

@sherifnada so the new issue appeared.
See logs logs-854-0.txt.
Seems to be the same as #5870 (still not resolved).
So it should be airbyte-workers issue.

What should we do next?

@arimbr
Copy link
Contributor

arimbr commented Feb 9, 2022

@htrueman I think I am also experiencing the same error during normalization in a Mailchimp to BigQuery sync:
Mismatched number of tables 62 vs 60 being resolved

See logs: logs-48031.txt

@ChristopheDuong
Copy link
Contributor

Yes, here are some notes for whoever will be working on looking deeper into fixing this:

  • we would need to setup a similar source-mailchimp like what Ari did: https://cloud.airbyte.io/workspaces/bed3b473-1518-4461-a37f-730ea3d3a848/connections/88ef12b6-808a-4bfd-8cf6-b65fd682d2b6
  • extract the catalog.json generated by the source
  • run the transform_catalog function, especially the airbyte-integrations/bases/base-normalization/normalization/transform_catalog/table_name_registry.py class and debug why it is finding conflict names
  • since this is happening in BigQuery, it may not be linked to truncated names (as we would do in Postgres destinations) because the limit there is much higher but the exception is still being thrown

@arimbr
Copy link
Contributor

arimbr commented Feb 14, 2022

I was able to debug this locally following @ChristopheDuong advice. Seems like the function find_children_streams in stream_processor.py is returning two StreamProcessor objects for stream_name = conditions.

'airbyte_mailchimp.campaigns_recipients_segment_opts_conditions': [<normalization.transform_catalog.table_name_registry.NormalizedNameMetadata object at 0x7fea0083d490>, <normalization.transform_catalog.table_name_registry.NormalizedNameMetadata object at 0x7fea0083d450>

I believe it's the second one that's messing up things... I was able to run transform-catalog --integration-type bigquery --profile-config-dir . --catalog destination_catalog.json --out . --json-column _airbyte_data successfully when I removed from destination_catalog.json the following lines:

"x-discriminator": {
  "type": "string",
  "propertyName": "condition_type"
}

These lines are coming from here: https://us1.api.mailchimp.com/schema/3.0/Definitions/SegmentCondition.json

Can we safely remove these lines from the Mailchimp schema in the following file?

"x-discriminator": {
"type": "string",
"propertyName": "condition_type"
},

@arimbr
Copy link
Contributor

arimbr commented Feb 14, 2022

@VitaliiMaltsev I saw you did some related changes here: #7975. Wonder whether my previous comment rings a bell?

@iporollo
Copy link

Seeing the same issue with Mailchimp to Snowflake sync:
Mismatched number of tables 62 vs 60 being resolved

See logs: logs-269.txt

@arimbr
Copy link
Contributor

arimbr commented Mar 17, 2022

@zkid18 @htrueman @iporollo I released a new version of the connector in #1936 that solved the issue for me. Could you upgrade the Mailchimp connector version to 0.2.12 and see if the error is solved?

@zkid18
Copy link
Contributor

zkid18 commented Mar 19, 2022

@arimbr hey, thanks! I'll have a look on Monday and will message you if we have any issues. thanks!

@bmikaili
Copy link

I am getting this same error with Clickhouse.

@eloymc98
Copy link

eloymc98 commented Jan 26, 2023

I am getting this same error with Clickhouse.

Me too. Any news on this issue?

@evantahler
Copy link
Contributor

Can you please open a new issue for problems with soruce-clickhouse so we can track it separately?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.