Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOLR-16536 Replace OpenTracing instrumentation with OpenTelemetry #1841

Merged
merged 16 commits into from
Aug 17, 2023

Conversation

stillalex
Copy link
Member

@stillalex stillalex commented Aug 14, 2023

https://issues.apache.org/jira/browse/SOLR-16536

Description

Move all tracing code to OTEL.

Solution

  • Updated otel dependencies to 1.29.0.
  • OTEL tracing is 'always on' but the impls will be noops if OTEL is not enabled. this might need to be profiled more attentively to see if it adds much overhead.
  • Tried as much as possible to move all otel code to the TraceUtils utils class.
  • TestDistributedTracing - has some race problems, otel will not init properly if the test runs on 'beast' mode. will leave as is for not and revisit if there is too much noise.

Open items:

  • TracerConfigurator - I don't think SpanThreadLocalProvider is still needed. happy to be shown otherwise, I would actually want to understand more where this was used.
  • documentation needs an update probably

Tests

Please describe the tests you've developed or run to confirm this patch implements the feature or solves the problem.

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.
  • I have added documentation for the Reference Guide

@stillalex
Copy link
Member Author

posting a screenshot from jaeger ui to demo what this data looks like for a query.

otel-dashboard

@janhoy
Copy link
Contributor

janhoy commented Aug 15, 2023

Hi, please connect this PR to SOLR-16536 instead (and not the umbrella issue).

Copy link
Contributor

@janhoy janhoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huge PR, but I think I followed along.

Main comment is whether we should kill TracerConfigurator and initialize tracer statically.
Rest are a bunch of questions. Have run the unit tests, but not spun up a cluster to compare new spans with old ones.

@janhoy
Copy link
Contributor

janhoy commented Aug 15, 2023

Dependency analysis found issues.
usedUndeclaredArtifacts

  • io.opentelemetry:opentelemetry-api:1.28.0@jar
  • org.apache.solr:solrj:10.0.0-SNAPSHOT@
    unusedDeclaredArtifacts
  • io.opentelemetry:opentelemetry-api-events:1.28.0-alpha@jar
  • io.opentelemetry:opentelemetry-sdk-trace:1.28.0@jar
  • io.opentelemetry:opentelemetry-sdk:1.28.0@jar
  • io.opentelemetry:opentelemetry-semconv:1.28.0-alpha@jar

This hits at changes needed in gradle build. There are permitUnusedDeclared keywords that can be used in build.gradle to tell the build that they are on purpose.

@janhoy
Copy link
Contributor

janhoy commented Aug 15, 2023

WARNING: there were unreferenced files under license folder:

  • /home/runner/work/solr/solr/solr/licenses/opentracing-api-LICENSE-ASL.txt
  • /home/runner/work/solr/solr/solr/licenses/opentracing-api-NOTICE.txt
  • /home/runner/work/solr/solr/solr/licenses/opentracing-mock-LICENSE-ASL.txt
  • /home/runner/work/solr/solr/solr/licenses/opentracing-mock-NOTICE.txt
  • /home/runner/work/solr/solr/solr/licenses/opentracing-noop-LICENSE-ASL.txt
  • /home/runner/work/solr/solr/solr/licenses/opentracing-noop-NOTICE.txt
  • /home/runner/work/solr/solr/solr/licenses/opentracing-util-LICENSE-ASL.txt
  • /home/runner/work/solr/solr/solr/licenses/opentracing-util-NOTICE.txt

These files from licenses/ folder can now be removed as we don't have opentracing as a dependency anymore.

@janhoy janhoy requested a review from gus-asf August 15, 2023 12:34
@stillalex stillalex changed the title SOLR-16354 Migrate from Jaeger/OpenTracing to OTEL SOLR-16536 Replace OpenTracing instrumentation with OpenTelemetry Aug 15, 2023
@stillalex
Copy link
Member Author

Hi, please connect this PR to SOLR-16536 instead (and not the umbrella issue).

you are right. renamed the PR to reflect the correct issue

@stillalex
Copy link
Member Author

Thank you @janhoy for the review!

I think the biggest question was the removal of span.log. it felt strange to have logs mixed with spans. maybe I misunderstood the use. happy to look into otel alternatives

I am seeing some backup related tests failing on my local and on the crave build. will dig into those a bit more, I want to reach a stable state before making more changes.

@stillalex
Copy link
Member Author

sorry, had to do a rebase on top of main branch to fix some conflicts.

@stillalex
Copy link
Member Author

stillalex commented Aug 15, 2023

quick comparison of Query span data (exporting from jaeger on 9.3 vs this PR):

Query spans on this PR
{
    "data": [
        {
            "traceID": "b3e378158f072905b2920df7714ddd37",
            "spans": [
                {
                    "traceID": "b3e378158f072905b2920df7714ddd37",
                    "spanID": "86487341790e7776",
                    "operationName": "post:/{core}/select",
                    "references": [
                        {
                            "refType": "CHILD_OF",
                            "traceID": "b3e378158f072905b2920df7714ddd37",
                            "spanID": "bcf1836208e5a7a4"
                        }
                    ],
                    "startTime": 1692133668595665,
                    "duration": 30937,
                    "tags": [
                        {
                            "key": "otel.library.name",
                            "type": "string",
                            "value": "solr"
                        },
                        {
                            "key": "http.method",
                            "type": "string",
                            "value": "POST"
                        },
                        {
                            "key": "http.url",
                            "type": "string",
                            "value": "http://localhost:8982/solr/test_shard2_replica_n1/select"
                        },
                        {
                            "key": "db.instance",
                            "type": "string",
                            "value": "test_shard2_replica_n1"
                        },
                        {
                            "key": "db.type",
                            "type": "string",
                            "value": "solr"
                        },
                        {
                            "key": "http.status",
                            "type": "int64",
                            "value": 200
                        },
                        {
                            "key": "span.kind",
                            "type": "string",
                            "value": "server"
                        },
                        {
                            "key": "internal.span.format",
                            "type": "string",
                            "value": "proto"
                        }
                    ],
                    "logs": [],
                    "processID": "p1",
                    "warnings": null
                },
                {
                    "traceID": "b3e378158f072905b2920df7714ddd37",
                    "spanID": "bcf1836208e5a7a4",
                    "operationName": "get:/{collection}/select",
                    "references": [],
                    "startTime": 1692133668566962,
                    "duration": 63467,
                    "tags": [
                        {
                            "key": "otel.library.name",
                            "type": "string",
                            "value": "solr"
                        },
                        {
                            "key": "http.method",
                            "type": "string",
                            "value": "GET"
                        },
                        {
                            "key": "http.url",
                            "type": "string",
                            "value": "http://localhost:8983/solr/test/select"
                        },
                        {
                            "key": "db.instance",
                            "type": "string",
                            "value": "test"
                        },
                        {
                            "key": "db.type",
                            "type": "string",
                            "value": "solr"
                        },
                        {
                            "key": "http.status",
                            "type": "int64",
                            "value": 200
                        },
                        {
                            "key": "http.params",
                            "type": "string",
                            "value": "_=1692133666339\u0026indent=true\u0026q=*:*\u0026q.op=OR\u0026useParams="
                        },
                        {
                            "key": "span.kind",
                            "type": "string",
                            "value": "server"
                        },
                        {
                            "key": "internal.span.format",
                            "type": "string",
                            "value": "proto"
                        }
                    ],
                    "logs": [],
                    "processID": "p1",
                    "warnings": null
                },
                {
                    "traceID": "b3e378158f072905b2920df7714ddd37",
                    "spanID": "3a9142862158b626",
                    "operationName": "post:/{core}/select",
                    "references": [
                        {
                            "refType": "CHILD_OF",
                            "traceID": "b3e378158f072905b2920df7714ddd37",
                            "spanID": "bcf1836208e5a7a4"
                        }
                    ],
                    "startTime": 1692133668595437,
                    "duration": 20848,
                    "tags": [
                        {
                            "key": "otel.library.name",
                            "type": "string",
                            "value": "solr"
                        },
                        {
                            "key": "http.method",
                            "type": "string",
                            "value": "POST"
                        },
                        {
                            "key": "http.url",
                            "type": "string",
                            "value": "http://localhost:8983/solr/test_shard1_replica_n2/select"
                        },
                        {
                            "key": "db.instance",
                            "type": "string",
                            "value": "test_shard1_replica_n2"
                        },
                        {
                            "key": "db.type",
                            "type": "string",
                            "value": "solr"
                        },
                        {
                            "key": "http.status",
                            "type": "int64",
                            "value": 200
                        },
                        {
                            "key": "span.kind",
                            "type": "string",
                            "value": "server"
                        },
                        {
                            "key": "internal.span.format",
                            "type": "string",
                            "value": "proto"
                        }
                    ],
                    "logs": [],
                    "processID": "p1",
                    "warnings": null
                }
            ],
            "processes": {
                "p1": {
                    "serviceName": "solr",
                    "tags": [
                        {
                            "key": "host.name",
                            "type": "string",
                            "value": "localhost"
                        },
                        {
                            "key": "telemetry.sdk.language",
                            "type": "string",
                            "value": "java"
                        },
                        {
                            "key": "telemetry.sdk.name",
                            "type": "string",
                            "value": "opentelemetry"
                        },
                        {
                            "key": "telemetry.sdk.version",
                            "type": "string",
                            "value": "1.28.0"
                        }
                    ]
                }
            },
            "warnings": null
        }
    ],
    "total": 0,
    "limit": 0,
    "offset": 0,
    "errors": null
}

vs

Query spans on Solr 9.3
{
    "data": [
        {
            "traceID": "36d81b003d238f7278c467132917ee3c",
            "spans": [
                {
                    "traceID": "36d81b003d238f7278c467132917ee3c",
                    "spanID": "0ea50951fdf8c5ed",
                    "operationName": "post:/{core}/select",
                    "references": [
                        {
                            "refType": "CHILD_OF",
                            "traceID": "36d81b003d238f7278c467132917ee3c",
                            "spanID": "9cf9b219437d234c"
                        },
                        {
                            "refType": "CHILD_OF",
                            "traceID": "36d81b003d238f7278c467132917ee3c",
                            "spanID": "9cf9b219437d234c"
                        }
                    ],
                    "startTime": 1692134969573648,
                    "duration": 5985,
                    "tags": [
                        {
                            "key": "otel.library.name",
                            "type": "string",
                            "value": "opentracing-shim"
                        },
                        {
                            "key": "http.url",
                            "type": "string",
                            "value": "http://localhost:8982/solr/test_shard2_replica_n1/select"
                        },
                        {
                            "key": "http.status_code",
                            "type": "int64",
                            "value": 200
                        },
                        {
                            "key": "http.method",
                            "type": "string",
                            "value": "POST"
                        },
                        {
                            "key": "db.type",
                            "type": "string",
                            "value": "solr"
                        },
                        {
                            "key": "db.instance",
                            "type": "string",
                            "value": "test_shard2_replica_n1"
                        },
                        {
                            "key": "span.kind",
                            "type": "string",
                            "value": "server"
                        },
                        {
                            "key": "span.kind",
                            "type": "string",
                            "value": "internal"
                        },
                        {
                            "key": "internal.span.format",
                            "type": "string",
                            "value": "proto"
                        }
                    ],
                    "logs": [],
                    "processID": "p1",
                    "warnings": null
                },
                {
                    "traceID": "36d81b003d238f7278c467132917ee3c",
                    "spanID": "7d539974b1cf00bf",
                    "operationName": "post:/{core}/select",
                    "references": [
                        {
                            "refType": "CHILD_OF",
                            "traceID": "36d81b003d238f7278c467132917ee3c",
                            "spanID": "9cf9b219437d234c"
                        },
                        {
                            "refType": "CHILD_OF",
                            "traceID": "36d81b003d238f7278c467132917ee3c",
                            "spanID": "9cf9b219437d234c"
                        }
                    ],
                    "startTime": 1692134969542767,
                    "duration": 26760,
                    "tags": [
                        {
                            "key": "otel.library.name",
                            "type": "string",
                            "value": "opentracing-shim"
                        },
                        {
                            "key": "http.url",
                            "type": "string",
                            "value": "http://localhost:8982/solr/test_shard2_replica_n1/select"
                        },
                        {
                            "key": "http.status_code",
                            "type": "int64",
                            "value": 200
                        },
                        {
                            "key": "http.method",
                            "type": "string",
                            "value": "POST"
                        },
                        {
                            "key": "db.type",
                            "type": "string",
                            "value": "solr"
                        },
                        {
                            "key": "db.instance",
                            "type": "string",
                            "value": "test_shard2_replica_n1"
                        },
                        {
                            "key": "span.kind",
                            "type": "string",
                            "value": "server"
                        },
                        {
                            "key": "span.kind",
                            "type": "string",
                            "value": "internal"
                        },
                        {
                            "key": "internal.span.format",
                            "type": "string",
                            "value": "proto"
                        }
                    ],
                    "logs": [],
                    "processID": "p1",
                    "warnings": null
                },
                {
                    "traceID": "36d81b003d238f7278c467132917ee3c",
                    "spanID": "9cf9b219437d234c",
                    "operationName": "get:/{collection}/select",
                    "references": [],
                    "startTime": 1692134969508568,
                    "duration": 72193,
                    "tags": [
                        {
                            "key": "otel.library.name",
                            "type": "string",
                            "value": "opentracing-shim"
                        },
                        {
                            "key": "http.url",
                            "type": "string",
                            "value": "http://localhost:8983/solr/test/select"
                        },
                        {
                            "key": "http.status_code",
                            "type": "int64",
                            "value": 200
                        },
                        {
                            "key": "http.method",
                            "type": "string",
                            "value": "GET"
                        },
                        {
                            "key": "http.params",
                            "type": "string",
                            "value": "_=1692134966416\u0026indent=true\u0026q=*:*\u0026q.op=OR\u0026useParams="
                        },
                        {
                            "key": "db.type",
                            "type": "string",
                            "value": "solr"
                        },
                        {
                            "key": "db.instance",
                            "type": "string",
                            "value": "test"
                        },
                        {
                            "key": "span.kind",
                            "type": "string",
                            "value": "server"
                        },
                        {
                            "key": "span.kind",
                            "type": "string",
                            "value": "internal"
                        },
                        {
                            "key": "internal.span.format",
                            "type": "string",
                            "value": "proto"
                        }
                    ],
                    "logs": [],
                    "processID": "p1",
                    "warnings": null
                },
                {
                    "traceID": "36d81b003d238f7278c467132917ee3c",
                    "spanID": "4899cdb8a9e4200d",
                    "operationName": "post:/{core}/select",
                    "references": [
                        {
                            "refType": "CHILD_OF",
                            "traceID": "36d81b003d238f7278c467132917ee3c",
                            "spanID": "9cf9b219437d234c"
                        },
                        {
                            "refType": "CHILD_OF",
                            "traceID": "36d81b003d238f7278c467132917ee3c",
                            "spanID": "9cf9b219437d234c"
                        }
                    ],
                    "startTime": 1692134969543674,
                    "duration": 14919,
                    "tags": [
                        {
                            "key": "otel.library.name",
                            "type": "string",
                            "value": "opentracing-shim"
                        },
                        {
                            "key": "http.url",
                            "type": "string",
                            "value": "http://localhost:8983/solr/test_shard1_replica_n2/select"
                        },
                        {
                            "key": "http.status_code",
                            "type": "int64",
                            "value": 200
                        },
                        {
                            "key": "http.method",
                            "type": "string",
                            "value": "POST"
                        },
                        {
                            "key": "db.type",
                            "type": "string",
                            "value": "solr"
                        },
                        {
                            "key": "db.instance",
                            "type": "string",
                            "value": "test_shard1_replica_n2"
                        },
                        {
                            "key": "span.kind",
                            "type": "string",
                            "value": "server"
                        },
                        {
                            "key": "span.kind",
                            "type": "string",
                            "value": "internal"
                        },
                        {
                            "key": "internal.span.format",
                            "type": "string",
                            "value": "proto"
                        }
                    ],
                    "logs": [],
                    "processID": "p1",
                    "warnings": null
                }
            ],
            "processes": {
                "p1": {
                    "serviceName": "solr",
                    "tags": [
                        {
                            "key": "host.name",
                            "type": "string",
                            "value": "localhost"
                        },
                        {
                            "key": "telemetry.sdk.language",
                            "type": "string",
                            "value": "java"
                        },
                        {
                            "key": "telemetry.sdk.name",
                            "type": "string",
                            "value": "opentelemetry"
                        },
                        {
                            "key": "telemetry.sdk.version",
                            "type": "string",
                            "value": "1.21.0"
                        }
                    ]
                }
            },
            "warnings": null
        }
    ],
    "total": 0,
    "limit": 0,
    "offset": 0,
    "errors": null
}

Diff:

  • 9.3 every child has 2 entries under references but they are the same, this looks like a duplication problem with 9.3 data
  • 9.3 every span entry has 2 span.kind entries one server and one internal vs. PR with a single entry server
  • 9.3 spans have http.status_code attrib, where PR has http.status (the name was fixed in the PR to match)

@stillalex
Copy link
Member Author

stillalex commented Aug 15, 2023

quick comparison of Indexing span data (exporting from jaeger on 9.3 vs this PR):

Update spans on this PR
{
    "data": [
        {
            "traceID": "d0cd6efa699d993637bfc1f313742f1f",
            "spans": [
                {
                    "traceID": "d0cd6efa699d993637bfc1f313742f1f",
                    "spanID": "ac76ee5c9f41ceb6",
                    "operationName": "post:/{core}/update",
                    "references": [
                        {
                            "refType": "CHILD_OF",
                            "traceID": "d0cd6efa699d993637bfc1f313742f1f",
                            "spanID": "f9d87bf73cd5a669"
                        }
                    ],
                    "startTime": 1692134062932096,
                    "duration": 222281,
                    "tags": [
                        {
                            "key": "otel.library.name",
                            "type": "string",
                            "value": "solr"
                        },
                        {
                            "key": "http.method",
                            "type": "string",
                            "value": "POST"
                        },
                        {
                            "key": "http.url",
                            "type": "string",
                            "value": "http://localhost:8982/solr/test_shard2_replica_n1/update"
                        },
                        {
                            "key": "db.instance",
                            "type": "string",
                            "value": "test_shard2_replica_n1"
                        },
                        {
                            "key": "db.type",
                            "type": "string",
                            "value": "solr"
                        },
                        {
                            "key": "http.status",
                            "type": "int64",
                            "value": 200
                        },
                        {
                            "key": "http.params",
                            "type": "string",
                            "value": "update.distrib=TOLEADER\u0026distrib.from=http%3A%2F%2Flocalhost%3A8983%2Fsolr%2Ftest_shard1_replica_n2%2F\u0026wt=javabin\u0026version=2"
                        },
                        {
                            "key": "span.kind",
                            "type": "string",
                            "value": "server"
                        },
                        {
                            "key": "internal.span.format",
                            "type": "string",
                            "value": "proto"
                        }
                    ],
                    "logs": [],
                    "processID": "p1",
                    "warnings": null
                },
                {
                    "traceID": "d0cd6efa699d993637bfc1f313742f1f",
                    "spanID": "f9d87bf73cd5a669",
                    "operationName": "post:/{collection}/update",
                    "references": [],
                    "startTime": 1692134062686908,
                    "duration": 468961,
                    "tags": [
                        {
                            "key": "otel.library.name",
                            "type": "string",
                            "value": "solr"
                        },
                        {
                            "key": "http.method",
                            "type": "string",
                            "value": "POST"
                        },
                        {
                            "key": "http.url",
                            "type": "string",
                            "value": "http://localhost:8983/solr/test/update"
                        },
                        {
                            "key": "db.instance",
                            "type": "string",
                            "value": "test"
                        },
                        {
                            "key": "db.type",
                            "type": "string",
                            "value": "solr"
                        },
                        {
                            "key": "http.status",
                            "type": "int64",
                            "value": 200
                        },
                        {
                            "key": "http.params",
                            "type": "string",
                            "value": "_=1692133972530\u0026commitWithin=10\u0026overwrite=true\u0026wt=json"
                        },
                        {
                            "key": "span.kind",
                            "type": "string",
                            "value": "server"
                        },
                        {
                            "key": "internal.span.format",
                            "type": "string",
                            "value": "proto"
                        }
                    ],
                    "logs": [],
                    "processID": "p1",
                    "warnings": null
                }
            ],
            "processes": {
                "p1": {
                    "serviceName": "solr",
                    "tags": [
                        {
                            "key": "host.name",
                            "type": "string",
                            "value": "localhost"
                        },
                        {
                            "key": "telemetry.sdk.language",
                            "type": "string",
                            "value": "java"
                        },
                        {
                            "key": "telemetry.sdk.name",
                            "type": "string",
                            "value": "opentelemetry"
                        },
                        {
                            "key": "telemetry.sdk.version",
                            "type": "string",
                            "value": "1.28.0"
                        }
                    ]
                }
            },
            "warnings": null
        }
    ],
    "total": 0,
    "limit": 0,
    "offset": 0,
    "errors": null
}
Update spans on Solr 9.3 PR
{
    "data": [
        {
            "traceID": "49503807dd79f2bf5e20fddfabbca6b0",
            "spans": [
                {
                    "traceID": "49503807dd79f2bf5e20fddfabbca6b0",
                    "spanID": "e79354cfcc9a1c8d",
                    "operationName": "post:/{core}/update",
                    "references": [
                        {
                            "refType": "CHILD_OF",
                            "traceID": "49503807dd79f2bf5e20fddfabbca6b0",
                            "spanID": "3b5a75c06c661620"
                        },
                        {
                            "refType": "CHILD_OF",
                            "traceID": "49503807dd79f2bf5e20fddfabbca6b0",
                            "spanID": "3b5a75c06c661620"
                        }
                    ],
                    "startTime": 1692134739803056,
                    "duration": 216899,
                    "tags": [
                        {
                            "key": "otel.library.name",
                            "type": "string",
                            "value": "opentracing-shim"
                        },
                        {
                            "key": "http.url",
                            "type": "string",
                            "value": "http://localhost:8982/solr/test_shard2_replica_n1/update"
                        },
                        {
                            "key": "http.status_code",
                            "type": "int64",
                            "value": 200
                        },
                        {
                            "key": "http.method",
                            "type": "string",
                            "value": "POST"
                        },
                        {
                            "key": "http.params",
                            "type": "string",
                            "value": "update.distrib=TOLEADER\u0026distrib.from=http%3A%2F%2Flocalhost%3A8983%2Fsolr%2Ftest_shard1_replica_n2%2F\u0026wt=javabin\u0026version=2"
                        },
                        {
                            "key": "db.type",
                            "type": "string",
                            "value": "solr"
                        },
                        {
                            "key": "db.instance",
                            "type": "string",
                            "value": "test_shard2_replica_n1"
                        },
                        {
                            "key": "span.kind",
                            "type": "string",
                            "value": "server"
                        },
                        {
                            "key": "span.kind",
                            "type": "string",
                            "value": "internal"
                        },
                        {
                            "key": "internal.span.format",
                            "type": "string",
                            "value": "proto"
                        }
                    ],
                    "logs": [],
                    "processID": "p1",
                    "warnings": null
                },
                {
                    "traceID": "49503807dd79f2bf5e20fddfabbca6b0",
                    "spanID": "3b5a75c06c661620",
                    "operationName": "post:/{collection}/update",
                    "references": [],
                    "startTime": 1692134739529315,
                    "duration": 491930,
                    "tags": [
                        {
                            "key": "otel.library.name",
                            "type": "string",
                            "value": "opentracing-shim"
                        },
                        {
                            "key": "http.url",
                            "type": "string",
                            "value": "http://localhost:8983/solr/test/update"
                        },
                        {
                            "key": "http.status_code",
                            "type": "int64",
                            "value": 200
                        },
                        {
                            "key": "http.method",
                            "type": "string",
                            "value": "POST"
                        },
                        {
                            "key": "http.params",
                            "type": "string",
                            "value": "_=1692133972530\u0026commitWithin=10\u0026overwrite=true\u0026wt=json"
                        },
                        {
                            "key": "db.type",
                            "type": "string",
                            "value": "solr"
                        },
                        {
                            "key": "db.instance",
                            "type": "string",
                            "value": "test"
                        },
                        {
                            "key": "span.kind",
                            "type": "string",
                            "value": "server"
                        },
                        {
                            "key": "span.kind",
                            "type": "string",
                            "value": "internal"
                        },
                        {
                            "key": "internal.span.format",
                            "type": "string",
                            "value": "proto"
                        }
                    ],
                    "logs": [],
                    "processID": "p1",
                    "warnings": null
                }
            ],
            "processes": {
                "p1": {
                    "serviceName": "solr",
                    "tags": [
                        {
                            "key": "host.name",
                            "type": "string",
                            "value": "localhost"
                        },
                        {
                            "key": "telemetry.sdk.language",
                            "type": "string",
                            "value": "java"
                        },
                        {
                            "key": "telemetry.sdk.name",
                            "type": "string",
                            "value": "opentelemetry"
                        },
                        {
                            "key": "telemetry.sdk.version",
                            "type": "string",
                            "value": "1.21.0"
                        }
                    ]
                }
            },
            "warnings": null
        }
    ],
    "total": 0,
    "limit": 0,
    "offset": 0,
    "errors": null
}

Diff:

  • 9.3 every child has 2 entries under references but they are the same, this looks like a duplication problem with 9.3 data (same as Query data)
  • 9.3 every span entry has 2 span.kind entries one server and one internal vs. PR with a single entry server (same as Query data)
  • 9.3 spans have http.status_code attrib, where PR has http.status (same as Query data) (the name was fixed in the PR to match)

@janhoy
Copy link
Contributor

janhoy commented Aug 15, 2023

  • 9.3 spans have http.status_code attrib, where PR has http.status

Please consult the otel semantic conventions for what attribute names to use. The former correct tag was http.status_code, but last month the OTEL project adopted ECS (Elastic Common Schema) as the new standard, so those conventions will change and stabilize, thus the correct tag will be http.response.status_code.

Perhaps we should emit both for some versions, or implement the suggested env var OTEL_SEMCONV_STABILITY_OPT_IN?

@stillalex
Copy link
Member Author

Perhaps we should emit both for some versions, or implement the suggested env var OTEL_SEMCONV_STABILITY_OPT_IN?

if this is for 2 fields only (http.status_code => http.response.status_code and http.method => http.request.method) it feels like overkill to implement this mechanism. I would emit both attributes old and new for a while and then drop them. let me know if you feel differently.

@janhoy
Copy link
Contributor

janhoy commented Aug 16, 2023

if this is for 2 fields only (http.status_code => http.response.status_code and http.method => http.request.method) it feels like overkill to implement this mechanism. I would emit both attributes old and new for a while and then drop them. let me know if you feel differently.

I'm ok with duplicating tags. But we should probably (now or as a follow-up) review every tag name that we explicitly define in our code, to see how it aligns with the recommendations, to make sure we covered all of them.

@janhoy
Copy link
Contributor

janhoy commented Aug 16, 2023

Tests seem to fail with SEVERE: 9 threads leaked from SUITE scope at org.apache.solr.opentelemetry.TestDistributedTracing:. Cannot reproduce locally. Re-running crave tests.

@janhoy
Copy link
Contributor

janhoy commented Aug 16, 2023

I reproduced the thread leak locally, but it does not happen every time. Also, the new Crave test run was successful. Looks like there is a timing issue. @stillalex how is the tracer life cycle handled now that we don't explicitly shut down the tracer in CoreContainer?

@janhoy
Copy link
Contributor

janhoy commented Aug 16, 2023

Wrt documentation. As long as the user-facing config and env vars are the same, and the spans created are almost identical, I think the only Ref-guide documentation to add is in "Major changes in Solr 10" section of upgrade notes, we can mention the deprecation of http.status_code and friends. We should also mention as a breaking change that OpenTelemetry is gone, so old Java agents providing an OT tracer won't work.

If we plan to keep sending both for a long time (e.g. 9.x and 10.x) then we're fine and would not need to implement OTEL_SEMCONV_STABILITY_OPT_IN.

Thinking about it - if we this PR is purely getting rid of OpenTracing libs and we do not change configuration in any way, it could be canidate for backporting to 9.4 release? edit In 9.x we still have jaegertracer-configurator module which relies on opentracing libs, so this has to be 10.0 only.

Would you like to give the docs a shot @stillalex ? Also, please add a line to CHANGES.txt.

@stillalex
Copy link
Member Author

Would you like to give the docs a shot @stillalex ? Also, please add a line to CHANGES.txt.

@janhoy updated, please take a look.

Just to run one idea by you: the otel library upgrade is causing a lot of conflicts, and in itself it would be good to have it on 9.x branch too. what do you think about splitting only the lib upgrade to a different PR which can be simple and merged to 9.x branch?

@stillalex
Copy link
Member Author

quick smoke test on benchmarking side, I tried running the existing CloudIndexing benchmark and numbers seem pretty close (I am sure there are more opportunities for improvement but I did not spend much time on this):

  • 5 threads indexing over 60 seconds
./jmh.sh -wi 1 -i 1 -r 60 CloudIndexing -t 5 -p nodeCount=1 -p numShards=1 -p numReplicas=1 -p useStringUtf8Over=0 -p directBuffer=true
  • PR
Benchmark                    (directBuffer)  (nodeCount)  (numReplicas)  (numShards)  (scale)  (useStringUtf8Over)   Mode  Cnt      Score   Error  Units
CloudIndexing.indexLargeDoc            true            1              1            1        1                    0  thrpt        1116.343          ops/s
CloudIndexing.indexSmallDoc            true            1              1            1        1                    0  thrpt       29966.343          ops/s
  • MAIN
Benchmark                    (directBuffer)  (nodeCount)  (numReplicas)  (numShards)  (scale)  (useStringUtf8Over)   Mode  Cnt      Score   Error  Units
CloudIndexing.indexLargeDoc            true            1              1            1        1                    0  thrpt        1419.507          ops/s
CloudIndexing.indexSmallDoc            true            1              1            1        1                    0  thrpt       30111.511          ops/s

@janhoy
Copy link
Contributor

janhoy commented Aug 16, 2023

Yes, splitting out the otel version upgrade is a good idea!

@stillalex
Copy link
Member Author

moved the simple otel update to #1846 will rebase once that is in

@stillalex stillalex mentioned this pull request Aug 17, 2023
7 tasks
@stillalex
Copy link
Member Author

@janhoy otel upgrade split, merged and rebase completed. running checks now to verify if I missed anything.

Copy link
Contributor

@janhoy janhoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me now. Tests are hopefully stable. Great work!

PS: Did you do a broader review of tag name changes apart from the two we covered, or should we put it up as a new JIRA?

@stillalex
Copy link
Member Author

PS: Did you do a broader review of tag name changes apart from the two we covered, or should we put it up as a new JIRA?

No review yet. I was thinking I could do one as part of SOLR-16935 because I wanted to add 2 more spans.

@stillalex stillalex merged commit 92dede1 into apache:main Aug 17, 2023
2 checks passed
@stillalex stillalex deleted the SOLR-16536-otel branch August 17, 2023 12:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants