Skip to content

Conversation

@yevgenypats
Copy link
Contributor

@yevgenypats yevgenypats commented Jan 11, 2023

I think I missed it in the review but the api should include an option struct rather then the ...option as otherwise it's just introduce un-necessary boilerplate for each client.

Also, I've added BackendNone as otherwise there is no way to specify that you don't want incremental syncing.

@github-actions
Copy link

github-actions bot commented Jan 11, 2023

⏱️ Benchmark results

  • DefaultConcurrencyDFS-2 resources/s: 10,978
  • DefaultConcurrencyRoundRobin-2 resources/s: 11,285
  • Glob-2 ns/op: 185.8
  • TablesWithChildrenDFS-2 resources/s: 28,111
  • TablesWithChildrenRoundRobin-2 resources/s: 27,491
  • TablesWithRateLimitingDFS-2 resources/s: 28.22
  • TablesWithRateLimitingRoundRobin-2 resources/s: 817.9
  • BufferedScanner-2 ns/op: 11.26
  • LogReader-2 ns/op: 34.21

@github-actions github-actions bot added fix and removed fix labels Jan 11, 2023
@yevgenypats
Copy link
Contributor Author

yevgenypats commented Jan 11, 2023

Now that Im thinking more about it I think we might have a problem in our incremental syncing approach.

If we specify incremental, only incremental tables should be synced as otherwise it can cause issues with destinations such as gcs.

also, on the destination side we always don't delete incremental tables which basically now we disabled an option of overwrite-delete-stale for a lot of tables (i.e let's say we just want overwrite-delete-stale without any incremental syncing).

So I think if backend is specified it will mean that only incremental fields will be synced and also this information should be passed to the destination as otherwise it can't know if we are syncing incremental source or not.

@hermanschaaf
Copy link
Member

If we specify incremental, only incremental tables should be synced as otherwise it can cause issues with destinations such as gcs.

@yevgenypats what kind of issues are you thinking about here? Maybe let's sync up on this if you have time later :)

@disq
Copy link
Member

disq commented Jan 12, 2023

@yevgenypats backend could've been nil before (not specifying the option would make it nil) so it worked for non-incrementals. I think???

@disq
Copy link
Member

disq commented Jan 12, 2023

I don't think incremental-only should be automatically decided based on backend. If the user wants to sync incrementals on short periods but don't spam the full resources, they can have separate configs and use skip_tables. (Could also add an only_incremental bool option for convenience)

Copy link
Member

@hermanschaaf hermanschaaf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe let's catch up about the new changes? I think we should consider some alternatives

// Enable incremental syncing for tables that supports that
Incremental Incremental `json:"incremental,omitempty"`
// Sync on
OnlyIncrementalTables bool `json:"only_incremental_tables,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the use case for the OnlyIncrementalTables boolean? I don't understand why the same couldn't be achieved by splitting your config into incremental / non-incremental tables.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be hard to determine which tables are incremental and generate a config and keep it up to date. This way you can have one config with all tables, fetched daily, and another one, fetched hourly, updating only incrementals (taking shorter bursts, not wasting API limits and other resources)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but I think keeping your list of tables up-to-date (which to include, which to skip) and how regularly you want to sync each is a problem separate from incremental-ness, adding this one boolean is not going to solve it except maybe for a narrow use case

Comment on lines 27 to 36
func (r *Incremental) UnmarshalJSON(data []byte) (err error) {
var registry string
if err := json.Unmarshal(data, &registry); err != nil {
return err
}
if *r, err = IncrementalFromString(registry); err != nil {
return err
}
return nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func (r *Incremental) UnmarshalJSON(data []byte) (err error) {
var registry string
if err := json.Unmarshal(data, &registry); err != nil {
return err
}
if *r, err = IncrementalFromString(registry); err != nil {
return err
}
return nil
}
func (r *Incremental) UnmarshalJSON(data []byte) (err error) {
var incremental string
if err := json.Unmarshal(data, &incremental); err != nil {
return err
}
if *r, err = IncrementalFromString(incremental); err != nil {
return err
}
return nil
}

case "both":
return IncrementalBoth, nil
default:
return IncrementalNone, fmt.Errorf("unknown registry %s", s)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return IncrementalNone, fmt.Errorf("unknown registry %s", s)
return IncrementalNone, fmt.Errorf("unknown incremental value %s", s)

Comment on lines 12 to 14
IncrementalNone Incremental = iota
IncrementalTablesOnly
IncrementalBoth
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the behavior expected to be for all these settings? What is the difference between setting IncrementalTablesOnly here and using the OnlyIncrementalTables boolean at the source spec level?

@yevgenypats
Copy link
Contributor Author

Removed Incremental per discussion and using only backend config.

Comment on lines 92 to 94
sourceSpec = specs.Source{
Name: sourceName,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we overwriting sourceSpec, shortly after unmarshaling from r.SourceSpec?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch. a copy paste from the previous if

Copy link
Member

@hermanschaaf hermanschaaf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just had the one question about overwriting the variable shortly after unmarshalling it

@kodiakhq kodiakhq bot merged commit f548a54 into main Jan 12, 2023
@kodiakhq kodiakhq bot deleted the fix/new_api branch January 12, 2023 16:13
kodiakhq bot pushed a commit that referenced this pull request Jan 15, 2023
🤖 I have created a release *beep* *boop*
---


## [1.25.1](v1.25.0...v1.25.1) (2023-01-14)


### Bug Fixes

* Change options for new client ([#603](#603)) ([f548a54](f548a54))
* PK Addition Order ([#607](#607)) ([eff40e7](eff40e7))

---
This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants