-
Notifications
You must be signed in to change notification settings - Fork 459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove orphaned orchestrator nodes on environmentd startup #16200
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -8,3 +8,4 @@ | |
# by the Apache License, Version 2.0. | ||
|
||
class DatabaseError(Exception): ... | ||
class InterfaceError(Exception): ... |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -207,7 +207,7 @@ impl k8s_openapi::Metadata for PodMetrics { | |
impl NamespacedKubernetesOrchestrator { | ||
/// Return a `ListParams` instance that limits results to the namespace | ||
/// assigned to this orchestrator. | ||
fn list_params(&self) -> ListParams { | ||
fn list_pod_params(&self) -> ListParams { | ||
let ns_selector = format!( | ||
"environmentd.materialize.cloud/namespace={}", | ||
self.namespace | ||
|
@@ -740,6 +740,7 @@ impl NamespacedOrchestrator for NamespacedKubernetesOrchestrator { | |
|
||
/// Drops the identified service, if it exists. | ||
async fn drop_service(&self, id: &str) -> Result<(), anyhow::Error> { | ||
fail::fail_point!("kubernetes_drop_service", |_| Err(anyhow!("failpoint"))); | ||
self.service_scales | ||
.lock() | ||
.expect("poisoned lock") | ||
|
@@ -768,7 +769,7 @@ impl NamespacedOrchestrator for NamespacedKubernetesOrchestrator { | |
|
||
/// Lists the identifiers of all known services. | ||
async fn list_services(&self) -> Result<Vec<String>, anyhow::Error> { | ||
let stateful_sets = self.stateful_set_api.list(&self.list_params()).await?; | ||
let stateful_sets = self.stateful_set_api.list(&Default::default()).await?; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe I'm missing some context, but do we not want to only list the ones with the correct There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. After discussion in chat, this label is not set on the statefulsets, so was previously returning an empty list. |
||
let name_prefix = format!("{}-", self.namespace); | ||
Ok(stateful_sets | ||
.into_iter() | ||
|
@@ -818,7 +819,7 @@ impl NamespacedOrchestrator for NamespacedKubernetesOrchestrator { | |
}) | ||
} | ||
|
||
let stream = watcher(self.pod_api.clone(), self.list_params()) | ||
let stream = watcher(self.pod_api.clone(), self.list_pod_params()) | ||
.touched_objects() | ||
.filter_map(|object| async move { | ||
match object { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible that bootstrapping creates new objects with a user ID higher than
next_ids.0
due to the builtin schema migration.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can probably fix this by getting the next IDs after bootrapping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to get those before the migration, I'm interested in objects that would be left over from a previous environmentd run, so no need to look at the ID's being allocated during bootstrap and normal operation.
The only thing I'm worried about is that someone else does a
create replica
and we see the increased replica ID in thenext_replica_id
, but not the replica object itself. Thenremove_orphans
would gladly remove this from the orchestrator. Afaik this can not happen, because both the initial catalog load and replica create happen inside a transaction.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we see a new object created during migration, when removing orphans then will this code panic in the storage controller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're OK here. A user can't create a replica until we're done with bootstrapping and finished removing orphaned nodes. If a new Coordinator creates a new replica and increases the
next_replica_id
then our read ofnext_replica_id
will fail because we're no longer the leader.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, it won't be deleted, but we will panic: We need to fetch the
next_id
after migration. When I just fetch them after the bootstrap (usingpeek_key_one
on the ID allocation collection), that call does an epoch check right?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. All stash reads and writes will do an epoch check.