Include original error in ConnectionShutdown messages#1266
Closed
dkropachev wants to merge 777 commits intoapache:trunkfrom
Closed
Include original error in ConnectionShutdown messages#1266dkropachev wants to merge 777 commits intoapache:trunkfrom
dkropachev wants to merge 777 commits intoapache:trunkfrom
Conversation
It does not match orthodox python style.
And trigger warnings:
cqlengine/query/test_queryoperators.py:157: SyntaxWarning: "is" with 'int' literal. Did you mean "=="?
self.assertTrue(len(first_page) is 1)
Invalidate tablets when table or keyspace is deleted
Remove is <int>
Following warnings shows up all over the code: ``` SyntaxWarning: invalid escape sequence ... ``` Let's fix it.
Fix wrong escape sequences
python 3.13 is now released, and we should supply wheel for it.
seems like scylla 6.2 cqlsh isn't working as expect on old versions of glibc ``` 2025-01-19 13:51:07.901 DEBUG [cluster:757]: node1: (EE) /home/runner/.ccm/scylla-repository/release/6.2.2/python3/bin/../libexec/python3.12.bin: symbol lookup error: /lib/x86_64-linux-gnu/libpthread.so.0: undefined symbol: __libc_pthread_init, version GLIBC_PRIVATE ```
`setup.py` is pinning to use `Cython>=0.20,!=0.25,<0.30` code generated form those old versions is not compiling anymore with python3.13 in turn it cause libev part to be missing in the driver, which surface other issue with asyncio eventloop we currently use as default in this chnage we pin to newer version of cython when we build the wheel Ref: #409
github actions: user newer cython versions
PyEval_InitThreads is depricated, since 3.7 it is not necessary to call it. On python 3.12 it is not present anymore, which cases build to fail. Ref: https://docs.python.org/3/c-api/init.html#c.PyEval_InitThreads
Otherwise it fails to convert unicode to 'str':
```
Error compiling Cython file:
------------------------------------------------------------
...
if metadata_request_timeout is None:
return stmt
ms = int(metadata_request_timeout / datetime.timedelta(milliseconds=1))
if ms == 0:
return stmt
return f"{stmt} USING TIMEOUT {ms}ms"
^
------------------------------------------------------------
cassandra/util.py:1813:11: Cannot convert Unicode string to 'str' implicitly. This is not portable and requires explicit encoding.
```
We need to see when extension is failed to build on CICD. Let's introduce CASS_DRIVER_BUILD_EXTENSIONS_ARE_MUST that is False by default. When it is set, setup.py fails when building of any extension is failed.
There is a bug in distutils that does not allow it to pick up cython for python 3.12 and later.
move to pyproject.toml that supports PEP517/518 this would help us use isolated build env, and avoid all the fragile situation we have with Cython installation dependcy missing in some CI variation
By default docker/setup-qemu-action uses `binmt:master`, which leads to unstable cicd that hard to debug, randm crashes, bogus errors. It fixes one of the issue with aarch64 builds.
We use old 20.04, it is time to update it.
Runner os should match cibuildwheels `MACOSX_DEPLOYMENT_TARGET`, otherwise brew pulls wrong packages that not supported by runner OS.
VSCode cl.exe fails to recognize some of the syntax:
```
cassandra\c_shard_info.c(3083): error C2065: '__uint128_t': undeclared identifier
cassandra\c_shard_info.c(3083): error C2146: syntax error: missing ')' before identifier '__pyx_v_biased_token'
cassandra\c_shard_info.c(3083): error C2059: syntax error: ')'
cassandra\c_shard_info.c(3083): error C2059: syntax error: ')'
cassandra\c_shard_info.c(3083): error C2059: syntax error: ')'
```
Let's switch to clang.
fix extensions building
Full scan queries are slower even if there is only one record.
Old package 3.3.2 was removed from repo and not available anymore
* cicd: trigger workflows only when it is needed * cicd: make build-push workflow to be reusable * cicd: make pre-release workflow manually triggerable 1. Make it reuse `lib-build-and-push` 2. Make it manually triggerable only This workflow is checking if driver works with next pre-release python version. We don't want to have them in PRs, if it fails it does not mean that PR is bad. It would be better to manually trigger it for now, and later we will make it check on existing pre-release versions and run on them once a month.
On python 3.12.x cqlsh fails with: ``` libexec/python3.12.bin: symbol lookup error: /lib/x86_64-linux-gnu/libpthread.so.0: undefined symbol: __libc_pthread_init, version GLIBC_PRIVATE ``` So we need to work it around by not using cqlsh
* PYTHON-1364 Fix ssl.wrap_socket errors (from eventlet) for Python 3.12 (apache#1181) * PYTHON-1366 Handle removal of asyncore in Python 3.12 (apache#1187) * Remove outdated Python pre-3.7 references (apache#1186) * PYTHON-1371 Add explicit exception type for serialization failures (apache#1193) * PYTHON-1331 ssl.match_hostname() is deprecated in 3.7 (apache#1191) * Documentation (and other) updates for 3.29.0 (apache#1194) * fix(test-requirements): remove python 3.13 restrictions Original commit c9b24b7 made it not install certain modules on python 3.13 because at the time python was broken and did not support these modules, now they work ok, roll it back. --------- Co-authored-by: Bret McGuire <absurdfarce@users.noreply.github.com> Co-authored-by: Brad Schoening <5796692+bschoening@users.noreply.github.com>
* cicd/lib-build-and-push: fix python-version description and default * cicd/lib-build-and-push: make choco dont print progress * cicd/lib-build-and-push: use fixed windows version * cicd/lib-build-and-push: allow prerelease python for aarch64 * cicd/lib-build-and-push: use correct `macos-14` instead of `macos-latest` * cicd/lib-build-and-push: load calculated matrix * cicd: move cibuildwheel configuration to pyproject.toml * setup.py: don't panic on wrong command line * cicd/lib-build-and-push: merge aarch64 job to build-wheels * cicd/lib-build-and-push: build pypy and native in single blow
following code:
```
.update(
bin_map__remove={123, 456}
)
```
Would fail if map is defined with non-nullable value type:
```
columns.Map(columns.BigInt, columns.Bytes, required=False, default={})
```
…utofix Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
…utofix Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
…ation Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Co-authored-by: mykaul <4655593+mykaul@users.noreply.github.com>
…n permissions Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
…ng alert no. 7: Workflow does not contain permissions Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Node pools should be stable, if cc fails to connect it is not good enough reason to neither to kill it nor to mark node down.
Continued effort to remove protocol versions < 3 as was done in the native Python code. Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
Co-authored-by: mykaul <4655593+mykaul@users.noreply.github.com>
… starting point The `test_profile_lb_swap` test logic assumed that `populate` was called before control connection (cc) was created, meaning only the contact points from the cluster configuration were known (a single host). Due to that the starting point was not random. This commit updates the test to reflect the new behavior, where `populate` is called on the load-balancing policy after the control connection is created. This allows the policy to be updated with all known hosts and ensures the starting point is properly randomized.
Previously, the driver relied on the load-balancing policy (LBP) to determine the order of hosts to connect to. Since the default LBP is Round Robin, each reconnection would start from a different host. After removing fake hosts with random IDs at startup, this behavior changed. When the LBP is not yet initialized, the driver now uses the endpoints provided by the control connection (CC), so there is no guarantee that different hosts will be selected on reconnection. This change updates the test logic to first establish a connection and initialize the LBP, and only then verify that two subsequent reconnections land on different hosts in a healthy cluster.
Only compare hosts endpoints not whole Host instances as we don't know hosts ids.
In DC aware lbp when local_dc is not provided we set it in on_add and it needs to be initialized for distance to give proper results.
Previously, we used endpoints provided to the cluster to create Host instances with random host_ids in order to populate the LBP before the ControlConnection was established. This logic led to creating many connections that were opened and then quickly closed, because once we learned the correct host_ids from system.peers, we removed the old Hosts with random IDs and created new ones with the proper host_ids. This commit introduces a new approach. To establish the ControlConnection, we now use only the resolved contact points from the cluster configuration. Only after a successful connection do we populate Host information in the LBP. If the LBP is already initialized during ControlConnection reconnection, we reuse the existing values.
…eck if a table is using tablets table_has_tablets() performs the same self._tablets.get((keyspace, table) that get_tablet_for_key() does which is a called a line later does, so it's redundant. Removed the former. Note - perhaps table_has_tablets() is not needed and can be removed? Unsure, it's unclear if it's part of the API or not. It's now completely unused across the code. Adjusted unit tests as well. Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
Replace sleep-based timing with proper Event synchronization: - Remove executor_init with 0.5s sleep - Replace time.sleep(0.5) with connection_created.wait() - Replace time.sleep(3) with executor.shutdown(wait=True) - Reduce iterations from 20 to 3 (deterministic with proper sync) - Add explicit assertions for shutdown state Performance improvement: - Before: 70.13s call tests/unit/test_host_connection_pool.py::HostConnectionTests::test_fast_shutdown - After: 0.01s call tests/unit/test_host_connection_pool.py::HostConnectionTests::test_fast_shutdown Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
When a host changes its IP address, the driver previously updated the host endpoint to the new IP before calling on_down. This caused on_down to mistakenly target the new IP for connection cleanup. This change reorders the operations to ensure on_down cleans up the old IP's resources before the host object is updated and on_up is called for the new IP. Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
When a connection is closed or becomes defunct, include the last_error in the ConnectionShutdown exception message. This helps investigate issues where the original cause (e.g., "Bad file descriptor") was previously lost when subsequent operations tried to use the connection. Refs #614
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When a connection is closed or becomes defunct, include the
last_errorin theConnectionShutdownexception message.Before:
After:
This helps investigate #614 by preserving the original error reason when subsequent operations try to use a closed connection.
Test plan
send_msg()andwait_for_responses()verifyinglast_erroris included