Destination MSSQL duplicate array records when using incremental append (nested tables) #9465

marcosmarxm · 2022-01-13T02:48:10Z

Is this your first time deploying Airbyte: No
OS Version / Instance: Ubuntu 20.04 (2 vCPU ARM)
Memory / Disk: 8GB / 40GB
Deployment: Docker compose
Airbyte Version: 0.35.4-alpha
Source name/version: Custom connector
Destination name/version: MSSQL 0.1.13
Step: Sending nesting array data for normalization from custom connector to MSSQL using incremental append duplicate data on nesting on each sync run, even though no records are found it's adding new rows, the logs says "Read 0 records from forms stream"
Description: When no records found anyway it's executing normalization and duplicating the rows on nested tables from arrays.

from slack convo:
Hello @Marcos Marx (Airbyte), yes the data it's very simple something like this: [{id: "1", name: "test", sub_objects: [{name: "abcd"}]}] it works perfect with main table, but with sub_objects table on every sync with no any extra data is re-running normalization and on that step duplicating data on sub_object table.

agrass · 2022-01-13T12:43:20Z

Hi, thanks por creating the issue @marcosmarxm. I would like to help with this, any idea where I can start checking? It's normal that re-run normalization when no new data it's fetch? Thanks

ChristopheDuong · 2022-01-13T13:20:23Z

Nested streams are not de-duplicated, see

airbyte/airbyte-integrations/bases/base-normalization/normalization/transform_catalog/stream_processor.py

Line 142 in ecfc9e1

    
           # nested streams can't be deduped like their parents (as they may not share the same cursor/primary keys)

You could use a custom transformation where you can specify how to de-duplicate sub-streams

agrass · 2022-01-13T13:33:49Z

Thanks for the response @ChristopheDuong, I'm working with incremental append and It make sense what you comment because there’s not a cursor/primary key that I'm agree, but the problem it's more related that it's duplicating data on every run with no data. This generate thousands of rows on each sync every 5 minutes and re-run the normalization every sync that is not that efficient. This it's an expected behavior or what do you think? It should run the normalization only on new data?

For example this 3 syncs with no data (0bytes) is generating new rows on nested table on each run:

agrass · 2022-01-14T15:08:43Z

It's there a way where I can check here on this line if the process it's sync and without any new records to avoid re-run the normalization?? that make sense? @ChristopheDuong

airbyte/airbyte-workers/src/main/java/io/airbyte/workers/DefaultNormalizationWorker.java

Line 47 in 2115f7a

normalizationRunner.start();

ChristopheDuong · 2022-01-14T16:14:51Z

No, that's not right.

If there is one record for a stream not related to your substream in your connection, normalization would be triggered and rows will be appended in your un-nested table too.

You should really look at custom transformation where you can specify how to de-duplicate sub-streams
https://docs.airbyte.com/operator-guides/transformation-and-normalization/transformations-with-airbyte

agrass · 2022-01-14T17:01:54Z

thanks for the response @ChristopheDuong, sorry I didn't understand well, probably I'm missing some context. it's there an example case where it's needed to re-run normalization with no new records found? Not considering when you reset your data.

marcosmarxm · 2022-08-01T12:31:14Z

Zendesk ticket #1758 has been linked to this issue.

marcosmarxm · 2022-08-01T12:32:42Z

Comment made from Zendesk by Marcos Marx on 2022-08-01 at 12:32:

Hello Jaafar, look your issue is similar to #9465
Currently you need to dedup nested records by yourself probably exporting the normalization project and executing the dedup.
The main reason of this is:
# nested streams can't be deduped like their parents (as they may not share the same cursor/primary keys)

marcosmarxm added type/bug Something isn't working area/connectors Connector related issues normalization labels Jan 13, 2022

igrankova added connectors/destinations-database connectors/destination/mssql labels Jan 17, 2022

marcosmarxm changed the title ~~Destination MSSQL duplicate array records when using incremental append~~ Destination MSSQL duplicate array records when using incremental append (nested tables) Mar 5, 2022

wjwatkinson mentioned this issue Apr 6, 2022

Normalized Tabular Data Option does not Normalize nested JSON objects, so Flatten them Instead #11693

Closed

bleonard added autoteam team/databases labels Apr 27, 2022

marcosmarxm mentioned this issue Apr 27, 2022

Duplicate record accumulated in _airbyte_raw_<table_name> for sync failed retry #12336

Closed

marcosmarxm added the zendesk label Aug 1, 2022

grishick added the team/destinations Destinations team's backlog label Sep 27, 2022

grishick removed the team/databases label Oct 7, 2022

grishick mentioned this issue Dec 30, 2022

Normalization creates duplicate rows in the properties table (Source Hubspot) #19207

Closed

bleonard added the frozen Not being actively worked on label Mar 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Destination MSSQL duplicate array records when using incremental append (nested tables) #9465

Destination MSSQL duplicate array records when using incremental append (nested tables) #9465

marcosmarxm commented Jan 13, 2022

agrass commented Jan 13, 2022

ChristopheDuong commented Jan 13, 2022 •

edited

Loading

agrass commented Jan 13, 2022

agrass commented Jan 14, 2022

ChristopheDuong commented Jan 14, 2022

agrass commented Jan 14, 2022

marcosmarxm commented Aug 1, 2022

marcosmarxm commented Aug 1, 2022

Destination MSSQL duplicate array records when using incremental append (nested tables) #9465

Destination MSSQL duplicate array records when using incremental append (nested tables) #9465

Comments

marcosmarxm commented Jan 13, 2022

agrass commented Jan 13, 2022

ChristopheDuong commented Jan 13, 2022 • edited Loading

agrass commented Jan 13, 2022

agrass commented Jan 14, 2022

ChristopheDuong commented Jan 14, 2022

agrass commented Jan 14, 2022

marcosmarxm commented Aug 1, 2022

marcosmarxm commented Aug 1, 2022

ChristopheDuong commented Jan 13, 2022 •

edited

Loading