Fuseki 6.0.0: canceled federated SERVICE queries can wedge the target dataset until restart

### Version

 6.0.0    Also reproduced on 5.5.0.

### What happened?

### Version

  6.0.0

  Also reproduced on 5.5.0.

  ### What happened?

  We can reproduce a failure mode where repeated canceled federated `SERVICE` queries leave
  the target dataset effectively wedged until Fuseki is restarted.

  The pattern is:

  1. A direct query to dataset `target` succeeds.
  2. A federated query from dataset `source` to dataset `target` using `SERVICE
  <http://127.0.0.1:3030/target/sparql>` succeeds.
  3. We then issue a burst of heavy federated queries from `source` to `target`, with the
  client canceling/timing out the outer HTTP request almost immediately.
  4. After that, even a simple direct query to dataset `target` times out.
  5. Restarting Fuseki clears the problem.

  This is reproducible for us on Jena/Fuseki 6.0.0 and also on 5.5.0.

  ### Why this looks distinct from ordinary timeout behavior

  This does not look like only the outer client timing out.

  After the cancellation storm:
  - a direct query to the target dataset also times out
  - the store recovers only after restart

  So the target dataset/server appears to be left in a bad runtime state.

  ### Reproducing it

  We reproduced this against an isolated standalone Fuseki 6.0.0 container built from the
  official Apache release tarball.

  Our production datasets are private, but the failure can be described with this structure:

  - source dataset: `source`
  - target dataset: `target`

  Baseline direct probe against `target`:

  ```sparql
  SELECT * WHERE {
    <urn:probe-subject> ?p ?o
  }
  LIMIT 5
```
  Baseline federated probe from source to target:
```sparql
  SELECT * WHERE {
    SERVICE <http://127.0.0.1:3030/target/sparql> {
      <urn:probe-subject> ?p ?o
    }
  }
  LIMIT 5
```
 Cancellation-storm query:
```sparql
  SELECT * WHERE {
    SERVICE <http://127.0.0.1:3030/target/sparql> {
      ?s ?p ?o
    }
  }
```
  We then repeatedly send that last query to the source dataset and cancel the outer HTTP
  request almost immediately, for example:

```sh
  for i in $(seq 1 40); do
    curl -sS --max-time 0.05 -G \
      --data-urlencode 'query=SELECT * WHERE { SERVICE <http://127.0.0.1:3030/target/sparql> { ?s ?p ?o } }' \
      http://127.0.0.1:3030/source/sparql >/dev/null || true
  done 
```

  ### Actual result

  Before stress:

  - direct query succeeds
  - federated query succeeds

  After the canceled federated-query burst:

  - direct query to target times out
  - federated query also fails/times out
  - Fuseki restart is required to recover

  ### Expected result

  Canceled outer federated queries should not leave the target dataset/server wedged.
  After the canceled requests, normal direct queries to the target dataset should still work.

  ### Relevant logs

  From the Jena 6.0.0 Fuseki log, after the stress starts we see many inner requests like:
```
  GET http://127.0.0.1:3030/target/sparql?query=SELECT++%2A%0AWHERE%0A++%7B+?s++?p++?o+%7D%0A
```
  The outer requests are being canceled by the client, but the inner SERVICE subqueries
  continue to run. After enough of these, the target dataset stops responding to even direct
  queries.

  ### Notes

  - This was reproduced using loopback 127.0.0.1, so it does not appear to require Docker DNS/
    container-name routing.
  - We specifically tested 6.0.0 because the changelog mentions query-cancellation
    improvements, but we still reproduce this failure.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fuseki 6.0.0: canceled federated SERVICE queries can wedge the target dataset until restart #3837

Version

What happened?

Version

What happened?

Why this looks distinct from ordinary timeout behavior

Reproducing it

Actual result

Expected result

Relevant logs

Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Fuseki 6.0.0: canceled federated SERVICE queries can wedge the target dataset until restart #3837

Description

Version

What happened?

Version

What happened?

Why this looks distinct from ordinary timeout behavior

Reproducing it

Actual result

Expected result

Relevant logs

Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions