-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding the incremental entity provider backend #14356
Adding the incremental entity provider backend #14356
Conversation
Changed Packages
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work. I added a few comments for the readme.
Whoa! Nice :) Review to be had later on. Regarding the DI error, note that all our in-repo dependencies (to other |
Thanks for the heads-up! I also need to trim some other dependencies that are unneeded in this version. |
plugins/incremental-ingestion-backend/src/database/migrations.ts
Outdated
Show resolved
Hide resolved
plugins/incremental-ingestion-backend/src/engine/IncrementalIngestionEngine.ts
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright! Sorry it took some time, we were away and I also spent some real focus time on this :) I hope the amount of comments doesn't feel overwhelming. Let me know if you prefer some form of arrangement with an eager merge, with the most important things addressed and deferring the rest, or so
plugins/incremental-ingestion-backend/src/service/IncrementalCatalogBuilder.ts
Outdated
Show resolved
Hide resolved
} from '../types'; | ||
|
||
/** @public */ | ||
export class IncrementalIngestionDatabaseManager { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are no tests for this class (might be good for at least specifically this one since it might be the riskier one in future refactors)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll look into writing one. Is this a showstopper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah. Would be good to get them in eventually of course.
plugins/incremental-ingestion-backend/src/database/IncrementalIngestionDatabaseManager.ts
Outdated
Show resolved
Hide resolved
|
||
const ingestionsDeleted = await tx('ingestion.ingestions') | ||
.delete() | ||
.where('provider_name', provider); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah a lot of the cleanup would go away if you had on cascade delete foreign relations as suggested in the migration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find a way to do this without losing the information about how many of each type of record was deleted. I'm still trying to decide if I actually care about that data, though.
plugins/incremental-ingestion-backend/src/database/IncrementalIngestionDatabaseManager.ts
Outdated
Show resolved
Hide resolved
} | ||
|
||
/** | ||
* Performs a lookup of all providers that have duplicate active ingestion records. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be a UNIQUE constraint on the table instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is left over from before we added the unique constraint that is present in the current migration. I'll have to think about whether this is still a concern.
This PR has been automatically marked as stale because it has not had recent activity from the author. It will be closed if no further activity occurs. If the PR was closed and you want it re-opened, let us know and we'll re-open the PR so that you can continue the contribution! |
d74979c
to
227e882
Compare
Just closing and opening to try to nudge the builds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, some followup. Do note that I pushed a PR to strip out all of the database related code from the API report; you want to pull this branch before you continue work on it. I hope that's not a problem. 🙏
// before incremental builder migrations are executed | ||
const { incrementalAdminRouter } = await incrementalBuilder.build(); | ||
|
||
router.use('/incremental', incrementalAdminRouter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, let's leave it as is then for now. We will want to write a module for the new backend system for this eventually, and at that point I think we'll discover that some similar deferral mechanism as suggested above might come into play under the hood too. So we can worry about that then.
} from '../types'; | ||
|
||
/** @public */ | ||
export class IncrementalIngestionDatabaseManager { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah. Would be good to get them in eventually of course.
plugins/incremental-ingestion-backend/src/database/IncrementalIngestionDatabaseManager.ts
Outdated
Show resolved
Hide resolved
plugins/incremental-ingestion-backend/src/engine/IncrementalIngestionEngine.ts
Outdated
Show resolved
Hide resolved
plugins/incremental-ingestion-backend/src/engine/IncrementalIngestionEngine.ts
Outdated
Show resolved
Hide resolved
Signed-off-by: Damon Kaswell <damon.kaswell1@hp.com>
Signed-off-by: Damon Kaswell <damon.kaswell1@hp.com>
Signed-off-by: Damon Kaswell <damon.kaswell1@hp.com>
Signed-off-by: Damon Kaswell <damon.kaswell1@hp.com>
Signed-off-by: Damon Kaswell <damon.kaswell1@hp.com>
Signed-off-by: Damon Kaswell <damon.kaswell1@hp.com>
Signed-off-by: Damon Kaswell <damon.kaswell1@hp.com>
Signed-off-by: Damon Kaswell <damon.kaswell1@hp.com>
Signed-off-by: Damon Kaswell <damon.kaswell1@hp.com>
Signed-off-by: Damon Kaswell <damon.kaswell1@hp.com>
Signed-off-by: Damon Kaswell <damon.kaswell1@hp.com>
Signed-off-by: Damon Kaswell <damon.kaswell1@hp.com>
Signed-off-by: Fredrik Adelöw <freben@gmail.com>
Signed-off-by: Damon Kaswell <damon.kaswell1@hp.com>
Signed-off-by: Damon Kaswell <damon.kaswell1@hp.com>
Signed-off-by: Damon Kaswell <damon.kaswell1@hp.com>
Signed-off-by: Damon Kaswell <damon.kaswell1@hp.com>
Signed-off-by: Fredrik Adelöw <freben@gmail.com>
de21a79
to
3f5d620
Compare
Signed-off-by: Fredrik Adelöw <freben@gmail.com>
b198472
to
72b1847
Compare
Signed-off-by: Fredrik Adelöw <freben@gmail.com>
72b1847
to
d7209f1
Compare
Signed-off-by: Fredrik Adelöw <freben@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I think this is in a good shape now to merge and iterate on!
One large remaining note that I'd like to have addressed in a follow-up, is that it directly accesses catalog tables. That should ideally not be necessary. Since we already know from our own tables what the previous ingestion round returned, we should consult with that to compute the deleted items. The catalog, especially since it goes via the search
and final_entities
tables, is not a good source of truth since there may be problems and delays related to processing and stitching meaning that the final state of the db isn't what you think it is, next time that the burst starts over again.
Signed-off-by: Damon Kaswell damon.kaswell1@hp.com
Hey, I just made a Pull Request!
This PR introduces the concept of incremental entity providers to Backstage. An incremental entity provider is a form of entity provider that can ingest from very large data sources where a full mutation to achieve initial ingestion may be impossible.
Incremental providers split a large ingestion task into smaller chunks which can be streamed into the catalog (using deltas under the hood). The size and length of these chunks are configurable, as are the lengths of the pauses between them and the amount of time to rest after ingestion is complete before starting a fresh run.
The incremental entity provider tracks a cursor to indicate its current location in the data stream. This allows the provider's work to pick up where it leaves off after a burst of activity, on any replica running the provider's task.
In order to accomplish this, the incremental provider introduces a new schema called
ingestion
to the Postgres database, with three tables:ingestions
- Contains the status record for an incremental entity provider. Over time, there will always be two records per provider: One for the previous run, and one for the current run. Older ingestion records are purged automatically.ingestion_marks
- Contains the cursor records used by the incremental entity provider. They have a many-to-one relationship with theingestions
table records. Ingestion marks for old ingestion records are purged automatically as well.ingestion_mark_entities
- Contains an entry for each discovered entity, with a many-to-one relationship with records from theingestion_marks
table. Again, older records are purged automatically. This table is used for comparisons to thefinal_entities
table, to allow entities for assets that have been removed to be deleted.It's important to note that due to the fact that any replica of Backstage could pick up the task of ingestion, stateful protocols such as LDAP will not work, and should be ingested either via a proxy or with a standard entity provider.
✔️ Checklist
Signed-off-by
line in the message. (more info)