Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Control startup order / readiness check for services #921

Closed
aL3891 opened this issue Nov 17, 2023 · 17 comments
Closed

Control startup order / readiness check for services #921

aL3891 opened this issue Nov 17, 2023 · 17 comments
Labels
area-app-model Issues pertaining to the APIs in Aspire.Hosting, e.g. DistributedApplication
Milestone

Comments

@aL3891
Copy link

aL3891 commented Nov 17, 2023

Firstly, so awesome to see this project, i was real sad to see project tye get fewer and fewer updates because i found it really useful and perhaps took a larger dependency on it that you should for a preview project :)

For tye and our current, rather hacky local development orchestration we have an issue where some services need to be running for other services to start correctly.
It would be great if aspire started the projects in the order they where defined and also had the option to wait for health checks to pass before starting the next service

Perhaps this is already the case , if so, my appologies, but i didnt find anything in the docs about that. i guess it should already have the data to work like this since the services are defined in sequence in code and also health checks and dependencies can be defined. i guess some services wont have health checks in the same sense, like a database container for example, so maybe they need to have custom method supplied to see if they are ready

@danmoseley danmoseley added area-app-model Issues pertaining to the APIs in Aspire.Hosting, e.g. DistributedApplication and removed area-dashboard labels Nov 17, 2023
@davidfowl
Copy link
Member

Controlling startup order of services is something we generally do not want to offer be because notion does not exist in reality when you deploy. There are a few tasks and jobs that you do want to run in some order (migrations for example), but we’re not convinced that adding startup order generally is a positive .

@aL3891
Copy link
Author

aL3891 commented Nov 17, 2023

I agree that that is not the typical case in production but when running locally i dont think it uncommon for the database to not be running when you start, where as in production that would be very unusual. Likewise starting your entire service solution from scratch is less common in production where you'd typically have the previous version already running, so services would still start.

I admit i'm a bit selfish here though, trying to solve an issue that i personally have, specifically with orleans where clients crash if there is no silo available. Ideally apps should be resilient to things like that but there are alot of non ideal apps out there.
Since the information about dependencies is already it seems friction could be reduced by using it :)

I guess the issue could be mitigated by having restart policies for services, that typically does exist in production but is not something that aspire does at the moment as far as i understand

Having the services restart could be a bit disruptive when debugging locally though, where as in production it probably doesn't matter as much.

@davidfowl davidfowl added this to the backlog milestone Nov 24, 2023
@3GDXC
Copy link

3GDXC commented Jan 2, 2024

Controlling startup order of services is something we generally do not want to offer be because notion does not exist in reality when you deploy. There are a few tasks and jobs that you do want to run in some order (migrations for example), but we’re not convinced that adding startup order generally is a positive .

@davidfowl how would you suggest developers control the start-up/time-out/delay where a project has a dependency on a container and this is specified in the AppHost project? i.e. two projects that use a RabbitMQContainer, one publisher, one consumer, both require the RabbitMQContainer to be running prior to starting; at present a hack of await Task.Delay(xx) works; but as I'm sure you'll agree that is a nasty hack ;)

@davidfowl
Copy link
Member

In the next release, these startup errors will go away for the most part because we have a proxy between services that we can make work without your code having to retry or delay.

That said, aspire components are resilient to connection failures by default and retry connections for transient failures (up to some limit). In a way it makes sure you are building with an understanding that the network can fail at any time and should recovery from transient errors.

@jsheetzmt
Copy link

It would be nice if we can toggle containers created in AppHost as a required dependency before projects are started. For us, a SQL container is only applicable for local dev. When deployed, our apps are targeting Azure SQL and don't have the same issue. Migrations run in our CI/CD pipeline, and we will retry the pipeline if it fails.

@diegomgarcia
Copy link

I haven't look deeply into the internals yet, but I believe it might be feasible to implement a wait mechanism for services added with the WithReference method. Essentially, this would ensure these services are fully ready before initiating the dependent service they're connected to. Or it could be viable to create an extension like DependsOn(service) that we could handle on this specific way to avoid causing a slow down on all services referenced to finish starting.

@davidfowl
Copy link
Member

Here's a spike for this feature based on the latest preview (6 at the time of writing) https://github.com/davidfowl/WaitForDependenciesAspire

@cisionmarkwalls
Copy link

Here's a spike for this feature based on the latest preview (6 at the time of writing) https://github.com/davidfowl/WaitForDependenciesAspire

I've tested it in our applications (used it to wait for database setup/seeding across Postgres, Redis and Kafka topics before services start up so they are all in a known state) for local development and it worked really well. Have you considered spinning that out into a nuget as an optional Aspire feature for local development?

@SteveSandersonMS
Copy link
Member

Adding to this just to point out how helpful it would be for #4177. Local LLM hosting may involve downloading many-gigabyte models before startup if not already cached locally, which can easily take 10+ minutes depending on connection speed. If developers don't realise this is going on, the only indication they might get is errors from dependent projects that are trying to call the local LLM service. They will likely start debugging and rechecking configuration, and may shut down the AppHost many times in the process which stops the download, when all they need to do is just wait.

Similarly, waiting for readiness could be considered a prerequiste for a data seeding mechanism. For example, seeding a Qdrant vector DB can take a minute or two even with just 10k records or so. If you don't realise what's going on and just see it as errors, you'll waste a lot of time debugging.

@jhancock-taxa
Copy link

This happens all of the time with integration tests. We need something that causes blocking on database spin up/readiness etc. for development which can be ignored in production: .WaitForReadyDev() or something so it's really obvious what it's doing.

@feO2x
Copy link

feO2x commented Jul 5, 2024

I'd like to second that. WaitFor, as shown in the WaitForDependenciesAspire example, would be extremely helpful. Docker Compose has a similar mechanism with depends_on and, in my opinion, not having this in Aspire could be a deal breaker for people coming from Docker Compose or similar technologies.

@gingters
Copy link

Yes, this is required here too.

When I use docker compose and depends_on, the container startup will wait until the dependency is up and running. When a container has a configured healthcheck, then the dependent containers will only start up when the healthcheck of the dependency returns fine.

That way, i.e. a migrations container can wait until the database is up and running, the API can wait for the migrations and the container that has the data seed script (which uses the API) can wait until the API reports back healthy, and all other services that need the data can wait for the seed container to be started up.

So, this "wait for a certain service to report back healthy until this is started" just like in docker compose would be really great. There are lot of scenarios that would be enabled with that functionality.

@JeroMiya
Copy link

We also possibly have a need for this. For local development, we have to incorporate a custom identity server container whose implementation is not entirely under our control. It does not participate directly with aspire service discovery other than injecting connection strings to the database. This identity server implementation reads configuration data from the database on startup (outside of our control), so it must be able to connect to the database on startup. The database container does have a health check defined, and we use a docker compose dependency to ensure the database container is up, running, and healthy before starting the identity server container.

We could probably get away with using a longer retry/timeout setting for the connection string (although this would be more fragile in practice than waiting for a health check), but the default Aspire behavior when adding a SqlServer resource reference is to generate a connection string without any retry/timeout settings. So, we'd have to customize the connection string, and that's a bit more complicated than it needs to be just to support (unavoidable) resource dependencies for local development.

@rosieks
Copy link

rosieks commented Aug 1, 2024

Controlling startup order of services is something we generally do not want to offer be because notion does not exist in reality when you deploy. There are a few tasks and jobs that you do want to run in some order (migrations for example), but we’re not convinced that adding startup order generally is a positive .

@davidfowl So what's the solution for running migrations right now? I tried to sample from playground but I stuck:

Npgsql.PostgresException (0x80004005): 57P03: the database system is starting up

@onionhammer
Copy link
Contributor

onionhammer commented Aug 1, 2024

Controlling startup order of services is something we generally do not want to offer be because notion does not exist in reality when you deploy.

Unless you're using initcontainers, which EF migrations is a perfect example of when you might use init containers. Init containers are also a thing in azure container apps

@jhancock-taxa
Copy link

Controlling startup order of services is something we generally do not want to offer be because notion does not exist in reality when you deploy.

Unless you're using initcontainers, which EF migrations is a perfect example of when you might use init containers. Init containers are also a thing in azure container apps

They're also a thing in Kubernetes and every other cloud platform that does orchestration. Aspire needs to be able to model these.

@mitchdenny
Copy link
Member

Closing this in favor of: #5275

@github-actions github-actions bot locked and limited conversation to collaborators Sep 12, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-app-model Issues pertaining to the APIs in Aspire.Hosting, e.g. DistributedApplication
Projects
None yet
Development

No branches or pull requests