Skip to content

Support custom OTLP CA bundles#119

Merged
ptone merged 3 commits intoGoogleCloudPlatform:mainfrom
mfreeman451:fix/telemetry-cloud-ca-file
Apr 11, 2026
Merged

Support custom OTLP CA bundles#119
ptone merged 3 commits intoGoogleCloudPlatform:mainfrom
mfreeman451:fix/telemetry-cloud-ca-file

Conversation

@mfreeman451
Copy link
Copy Markdown
Contributor

@mfreeman451 mfreeman451 commented Apr 10, 2026

Summary

  • add config support for custom OTLP CA bundle files
  • thread the CA file through telemetry config conversion and exporter setup
  • add focused coverage for schema/config conversion and CA-backed TLS loading
  • proper context support

Testing

  • go test ./pkg/config ./pkg/sciontool/telemetry -run 'Test.*(Telemetry|OTLP|CA|Bundle).*'\n

Copy link
Copy Markdown
Contributor Author

@mfreeman451 mfreeman451 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mfreeman451 mfreeman451 marked this pull request as ready for review April 10, 2026 04:33
@mfreeman451 mfreeman451 force-pushed the fix/telemetry-cloud-ca-file branch from 0452363 to 38b00d1 Compare April 11, 2026 05:15
Copy link
Copy Markdown
Contributor Author

@mfreeman451 mfreeman451 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ptone
Copy link
Copy Markdown
Member

ptone commented Apr 11, 2026

PR #119 Review: Support custom OTLP CA bundles

Executive Summary

This PR adds support for custom PEM-encoded CA bundle files in the OTLP telemetry export path, threading the configuration through schema, settings, API types, config conversion, and exporter initialization. The change is low-to-medium risk — the new code is well-structured, follows existing patterns, and includes solid test coverage, with a couple of minor observations around redundant TLS config loading and a missing ctx parameter in an unused code path.


Critical Issues

None. No blocking bugs or security flaws identified.


Observations

1. Redundant loadOTLPTLSConfig calls in initGRPC (Efficiency)

File: pkg/sciontool/telemetry/exporter.go, lines 141 and 158

In initGRPC, the CA file is read and parsed twice — once via appendOTLPTraceGRPCSecurityOption (line 141) and again via otlpGRPCTransportCredentials (line 158). Both ultimately call loadSecureOTLPTLSConfig -> loadOTLPTLSConfig -> os.ReadFile. Similarly, in newOTLPProviders (providers.go), the CA file is read three times (trace, log, metric security options).

This isn't a correctness issue, but it's unnecessary I/O and parsing. Consider loading the TLS config once and passing it down, or caching it on the Config struct.

Suggested Fix (for initGRPC):

func (e *CloudExporter) initGRPC(ctx context.Context, config *Config) error {
    gcpDialOpts, err := loadSecureGCPDialOptions(ctx, config)
    if err != nil {
        return err
    }

    // Load TLS config once for reuse
    tlsConfig, err := loadSecureOTLPTLSConfig(config)
    if err != nil {
        return err
    }

    opts := []otlptracegrpc.Option{
        otlptracegrpc.WithEndpoint(config.Endpoint),
    }
    if config.Insecure {
        opts = append(opts, otlptracegrpc.WithInsecure())
    } else {
        opts = append(opts, otlptracegrpc.WithTLSCredentials(credentials.NewTLS(tlsConfig)))
    }
    // ... use same tlsConfig for transport credentials below
}

2. CAFile as string vs *string — cannot unset via merge (Design)

File: pkg/config/templates.go, line 820; pkg/api/types.go, line 260

CAFile is a plain string, which means an override with ca_file: "" will not clear a base value set to a non-empty path. This is consistent with how the existing fields like Endpoint and Protocol work elsewhere in the codebase, so this is acceptable — just noting the semantic gap for awareness.

3. No path validation on CAFile (Security — Low Risk)

Files: pkg/sciontool/telemetry/exporter.go (line 37), pkg/config/settings_v1.go (line 467)

The CAFile path is used directly in os.ReadFile without any sanitization (e.g., filepath.Clean, symlink resolution). In this context, the config is written by the user/operator themselves, so this is low risk — an attacker who can modify the config already has broader access. However, if the config ever becomes writable by less-trusted sources (e.g., remote template injection), this could become a file-read primitive. Worth documenting as an assumption.

4. Unused ctx parameter in initHTTP (Minor)

File: pkg/sciontool/telemetry/exporter.go, line 180

The ctx parameter was added to initHTTP's signature and correctly threaded to otlptracehttp.New. Good — this is an improvement over the prior context.Background().

5. Inconsistent error checking style in clearTelemetryEnv (Minor)

File: pkg/sciontool/telemetry/config_test.go, lines 567-569

The Unsetenv(EnvCAFile) call has explicit error checking (if err := os.Unsetenv(EnvCAFile); err != nil { panic(err) }) while all surrounding Unsetenv calls do not. This inconsistency is cosmetic but could confuse future contributors.

Similarly, os.Setenv(EnvCAFile, ...) in TestLoadConfig_EnvOverrides (line 57) uses if err := os.Setenv(...) while all surrounding Setenv calls ignore the error. Either all should check or none should.


Positive Feedback

  1. Clean refactoring of otlp_options.go: Extracting the security option helpers into a dedicated file with a uniform pattern (appendOTLP*SecurityOption) is well-designed and eliminates duplicated insecure/TLS branching logic across multiple call sites.

  2. Proper context threading: Replacing context.Background() with caller-provided ctx in initHTTP, initGRPC, and NewCloudExporter is a correctness improvement that enables proper cancellation and timeout propagation.

  3. Comprehensive test coverage: Every layer — schema, settings loading, config conversion, env-var mapping, TLS loading, and merge — has corresponding test updates. The generateTestCertificatePEM helper is a clean utility for validating real certificate parsing.

  4. Good error handling in newOTLPProviders: The cascading shutdown of already-created exporters on failure (e.g., shutting down traceExporter when log TLS fails) prevents resource leaks.

  5. buildProviders refactor: Flipping to early-return for the !batch case and using struct literals directly is cleaner and eliminates unnecessary intermediate variables.

  6. Schema and env-var alignment: The SCION_OTEL_CA_FILE env var, x-env-var annotation in the JSON schema, and mapOtelEnvKey mapping are all consistent and correctly wired.


Final Verdict

Approve. The PR is well-implemented with thorough test coverage and clean refactoring. The observations above are non-blocking improvements — the redundant file reads (#1) are the most impactful and could be addressed in a follow-up if the CA file is expected to be large or loaded frequently.

@ptone ptone merged commit 68cda6e into GoogleCloudPlatform:main Apr 11, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants