Skip to content

fix: align Vector component schemas with v0.53.0 spec#80

Merged
TerrifiedBug merged 3 commits intomainfrom
fix/schema-audit-alignment
Mar 9, 2026
Merged

fix: align Vector component schemas with v0.53.0 spec#80
TerrifiedBug merged 3 commits intomainfrom
fix/schema-audit-alignment

Conversation

@TerrifiedBug
Copy link
Owner

@TerrifiedBug TerrifiedBug commented Mar 9, 2026

Summary

  • Bumps Vector from 0.44.0 → 0.53.0 in agent Dockerfile, install script, and server Dockerfile
  • Audits all 114 VectorFlow component schemas against vector generate-schema112/114 clean (remaining 2 are informational missing optional fields for dnstap and kubernetes_logs)
  • Validates field names, required status, encoding codecs, compression options, enum values, default values, and field types against Vector's official JSON schema
  • Adds dependsOn conditional visibility to 6 modal components so fields show/hide based on selected mode

Field presence & required status

  • Add encoding to required arrays for 25 sinks
  • Fix required arrays for 8 components (kafka group_id, nats connection_name, vector address, opentelemetry grpc/http, throttle window_secs, gcp stackdriver resource)
  • Remove deprecated fields: http headers, elasticsearch suppress_type_name, splunk_hec token, databend tls
  • Remove invalid TLS from 12 components, invalid fields from 10 components
  • Fix aws_sns queue_urltopic_arn (Vector docs have a copy-paste bug)

Encoding codecs

  • Remove invalid ndjson codec from all sinks (NDJSON = codec:json + framing:newline_delimited)
  • Add missing otlp and syslog codecs to 20+ sinks
  • Standardize all sinks to Vector's canonical 13-codec list

Compression options

  • Add missing algorithms (snappy, zlib, zstd) to AWS, GCP, network, and search-db sinks
  • Keep Kafka's unique list (lz4 instead of zlib)

Enum values

  • source:http_server/http_client method — add missing HTTP methods
  • sink:http method — add trace
  • sink:clickhouse format — add arrow_stream
  • source:host_metrics collectors — add tcp
  • source:vector version — remove deprecated v1
  • source:docker_logs multiline modehalt_afterhalt_with
  • sink:gcp_chronicle region — remove invalid australia-southeast1

Default values

  • source:aws_sqs visibility_timeout_secs — 30 → 300 (was 10x too low, would cause premature message redelivery)

UX — conditional field visibility

  • source:syslog, source:socket, source:fluent, source:dnstap, source:statsd, sink:socket
  • Fields like address, path, socket_file_mode show/hide based on mode selection

Test plan

  • vector generate-schema 2>/dev/null > /tmp/vector-schema.json && python3 scripts/audit-vector-schemas.py → 112/114 clean
  • npx tsc --noEmit passes
  • Pipeline editor: add syslog source → address shows for tcp/udp, path shows for unix
  • Pipeline editor: add socket sink → address for tcp/udp, path for unix modes
  • Pipeline editor: add HTTP sink → verify 13 encoding codecs, no ndjson
  • Pipeline editor: add aws_sqs source → verify visibility_timeout default shows 300

Bump Vector to 0.53.0 and audit all VectorFlow component schemas against
`vector generate-schema` to fix mismatches:

- Bump Vector from 0.44.0 to 0.53.0 in agent and server Dockerfiles
- Add encoding to required arrays for 25 sinks
- Remove invalid TLS fields from 12 components (implicit via HTTPS/rediss)
- Remove deprecated fields (http headers, elasticsearch suppress_type_name,
  splunk_hec token, databend tls)
- Remove invalid fields (mqtt qos, pulsar dead_letter_queue_topic, etc.)
- Fix required arrays for 8 components (kafka group_id, nats connection_name,
  vector address, opentelemetry grpc/http, throttle window_secs, etc.)
- Fix aws_sns queue_url → topic_arn (Vector docs have copy-paste bug from SQS)
- Add dependsOn for 6 modal components (syslog, socket, fluent, dnstap,
  statsd sources + socket sink) so fields show/hide based on selected mode
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 9, 2026

Greptile Summary

This PR bumps Vector from 0.44.0 → 0.53.0 across all three Dockerfiles/install scripts and performs a comprehensive audit of all 114 VectorFlow component schemas against vector generate-schema, with 112/114 fully aligned after the changes.

Key changes include:

  • Codec standardization: removes invalid ndjson from all sinks (replaced by codec:json + framing:newline_delimited); adds otlp and syslog to 20+ sinks; establishes the canonical 13-codec list as the default in encodingSchema()
  • Required field corrections: adds encoding to the required arrays of 25 sinks, fixes required fields for kafka (group_id), nats (connection_name), vector source (address), opentelemetry source (grpc/http), throttle (window_secs), and GCP Stackdriver (resource)
  • Correctness fix: renames aws_sns.queue_urltopic_arn — a copy-paste bug in Vector's own docs that would have caused invalid YAML generation and deploy failures
  • Default value fix: aws_sqs.visibility_timeout_secs corrected from 30 → 300 seconds, preventing premature message redelivery at production SQS throughput
  • UX — conditional field visibility: adds dependsOn to 6 sources/sinks so address, path, and socket_file_mode show/hide based on the selected mode; the FieldRenderer correctly normalizes value to an array regardless of whether a string or array is passed
  • Deprecation cleanup: removes http.headers, elasticsearch.suppress_type_name, splunk_hec.token, databend.tls, and 12 other invalid TLS/field references

Confidence Score: 4/5

  • Safe to merge; two pre-existing concerns flagged in prior review threads remain open but are the only blocking items.
  • The bulk of the PR is a well-audited mechanical schema alignment — codec lists, required arrays, enum fixes — verified against Vector's own schema output. The aws_sns.queue_url → topic_arn fix and the aws_sqs.visibility_timeout_secs default correction are especially valuable correctness fixes. The dependsOn conditional visibility is implemented correctly (FieldRenderer normalizes both string and array values). Score is 4 rather than 5 because the two concerns already raised in prior review comments — store_access_key in aws_kinesis_firehose's required array and requiring both grpc and http on the opentelemetry source — have not yet been addressed.
  • src/lib/vector/schemas/sources/messaging.ts (store_access_key in required) and src/lib/vector/schemas/sources/network.ts (opentelemetry requiring both grpc and http)

Important Files Changed

Filename Overview
src/lib/vector/schemas/sinks/aws.ts Updated codecs (otlp/syslog added), compression (snappy/zlib/zstd added for most), fixed aws_sns queue_url→topic_arn, added encoding to required arrays for 5 sinks.
src/lib/vector/schemas/sinks/network.ts Removed deprecated headers fields; updated HTTP method enums; added dependsOn for socket sink address/path/send_buffer_bytes; removed ndjson from all network sinks; added encoding to required arrays.
src/lib/vector/schemas/sources/local.ts Removed internal_metrics object and the malformed decodingSchema spread from the file source schema per Vector 0.53.0 spec.
src/lib/vector/schemas/sources/messaging.ts Added group_id/connection_name to required arrays; removed deprecated fields; fixed aws_sqs visibility_timeout_secs default (30→300); added store_access_key to aws_kinesis_firehose required (previously flagged concern).
src/lib/vector/schemas/sources/network.ts Added dependsOn conditional visibility to syslog/socket/fluent/dnstap/statsd; updated HTTP method enums; removed deprecated fields; vector source v1 removed; opentelemetry now requires both grpc and http (previously flagged concern).

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[User selects component in Pipeline Editor] --> B[SchemaForm renders configSchema]
    B --> C[FieldRenderer iterates over schema properties]
    C --> D{Field has dependsOn?}
    D -- No --> E[Render field unconditionally]
    D -- Yes --> F[Read parentValues for dependsOn.field]
    F --> G{dependsOn.value\nis array?}
    G -- Yes --> H[Use value as-is]
    G -- No --> I[Wrap string in array]
    H --> J{parentValues includes\ncurrent mode?}
    I --> J
    J -- No --> K[return null — field hidden]
    J -- Yes --> E
    E --> L[Rendered field in config form]

    subgraph "New in this PR: mode-gated fields"
        M["socket sink: address / path / send_buffer_bytes"]
        N["syslog source: address / path / socket_file_mode"]
        O["socket source: address / path / socket_file_mode"]
        P["fluent source: address / path / socket_file_mode"]
        Q["dnstap source: address / socket_path / socket_file_mode"]
        R["statsd source: address / path"]
    end
Loading

Last reviewed commit: d17d194

…tor spec

Deep value-level validation against `vector generate-schema`:

Encoding codecs:
- Remove invalid `ndjson` codec from all sinks and shared.ts default
  (NDJSON in Vector = codec:json + framing:newline_delimited)
- Add missing `otlp` and `syslog` codecs to 20+ sinks
- Standardize all sinks to Vector's canonical 13-codec list

Compression options:
- Add missing algorithms to AWS, GCP, network, and search-db sinks
  (snappy, zlib, zstd where missing)
- Keep Kafka's unique list (lz4 instead of zlib)

Enum value fixes:
- source:http_server method — add DELETE, HEAD, OPTIONS, PATCH
- source:http_client method — add DELETE, OPTIONS, PATCH
- sink:http method — add trace
- sink:clickhouse format — add arrow_stream
- source:host_metrics collectors — add tcp
- source:vector version — remove deprecated v1
- source:docker_logs multiline mode — halt_after → halt_with
- sink:gcp_chronicle region — remove invalid australia-southeast1

Default value fixes:
- source:aws_sqs visibility_timeout_secs — 30 → 300 (10x too low)
@github-actions github-actions bot added fix and removed fix labels Mar 9, 2026
Small ExternalLink icon next to the component name in the detail panel
that opens the Vector documentation for that component type. URL is
computed from kind/type: vector.dev/docs/reference/configuration/{kind}s/{type}/
@TerrifiedBug
Copy link
Owner Author

@greptile review

@TerrifiedBug TerrifiedBug merged commit 23d3f4c into main Mar 9, 2026
10 checks passed
@TerrifiedBug TerrifiedBug deleted the fix/schema-audit-alignment branch March 9, 2026 19:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant