Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions docs/docs/guides/server-deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -400,6 +400,28 @@ export DSTACK_DB_MAX_OVERFLOW=80
You have to ensure your Postgres installation supports that many connections by
configuring [`max_connections`](https://www.postgresql.org/docs/current/runtime-config-connection.html#GUC-MAX-CONNECTIONS) and/or using connection pooler.

## Server upgrades

When upgrading the `dstack` server, follow these guidelines to ensure a smooth transition and minimize downtime.

### Before upgrading

1. **Check the changelog**: Review the [release notes :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/releases){:target="_blank"} for breaking changes, new features, and migration notes.
2. **Review backward compatibility**: Understand the [backward compatibility](#backward-compatibility) policy.
3. **Back up your data**: Ensure you always create a backup before upgrading.

### Best practices

- **Test in staging**: Always test upgrades in a non-production environment first.
- **Monitor logs**: Watch server logs during and after the upgrade for any errors or warnings.
- **Keep backups**: Retain backups for at least a few days after a successful upgrade.

### Troubleshooting

**Deadlock when upgrading a multi-replica PostgreSQL deployment**

If a deployment is stuck due to a deadlock when applying DB migrations, try scaling server replicas to 1 and retry the deployment multiple times. Some releases may not support rolling deployments, which is always noted in the release notes. If you think there is a bug, please [file an issue](https://github.com/dstackai/dstack/issues).

## FAQs

??? info "Can I run multiple replicas of dstack server?"
Expand Down
10 changes: 9 additions & 1 deletion src/dstack/_internal/server/migrations/env.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,6 @@ def run_migrations_offline():
literal_binds=True,
dialect_opts={"paramstyle": "named"},
)

with context.begin_transaction():
context.run_migrations()

Expand All @@ -61,12 +60,21 @@ def run_migrations(connection: Connection):
# https://alembic.sqlalchemy.org/en/latest/batch.html#dealing-with-referencing-foreign-keys
if connection.dialect.name == "sqlite":
connection.execute(text("PRAGMA foreign_keys=OFF;"))
elif connection.dialect.name == "postgresql":
# lock_timeout is needed so that migrations that acquire locks
# do not wait for locks forever, blocking live queries.
# Better to fail and retry a deployment.
connection.execute(text("SET lock_timeout='10s';"))
connection.commit()
context.configure(
connection=connection,
target_metadata=target_metadata,
compare_type=True,
render_as_batch=True,
# Running each migration in a separate transaction.
# Running all migrations in one transaction may lead to deadlocks in HA deployments
# because lock ordering is not respected across all migrations.
transaction_per_migration=True,
)
with context.begin_transaction():
context.run_migrations()
Expand Down