Skip to content

New compiler: Words() produces empty result for 2-char [A-Z][a-z] enum values, causing false JSON name collisions #4475

@ido-lazer-cogntiv

Description

@ido-lazer-cogntiv

GitHub repository with your minimal reproducible example (do not leave this field blank or fill out this field with "github.com/bufbuild/buf" or we will automatically close your issue, see the instructions above!)

https://github.com/ido-lazer-cogntiv/buf-json-name-collision-repro

Commands

buf build .

Output

example.proto:8:3:enum values have the same JSON name

Expected Output

Build succeeds (as it does on buf <= 1.67.1 and on protoc).

Anything else?

Bug

Starting with buf v1.68.0 (the new compiler), buf build and buf breaking fail on proto3 enums containing two-character values matching [A-Z][a-z] (e.g. Ab, Xq, Yz).

The error is:

enum value Ab has the same JSON name "" as enum value Xq

Reproduction

syntax = "proto3";
package example.v1;

enum Foo {
  FOO_UNSPECIFIED = 0;
  Ab = 1;
  Xq = 2;
}
# Fails on buf >= 1.68.0
buf build .

# Error:
# example.proto:6:3: enum value Ab has the same JSON name "" as enum value Xq

This works fine on buf 1.67.1 and on protoc.

Root cause

The Words() function in bufbuild/protocompile/internal/cases/words.go returns an empty list for any 2-character string matching [A-Z][a-z].

Tracing the algorithm for "Ab":

  1. i=0, next='A': No switch case matches. Sets prev='A', first=false.
  2. i=1, next='b': Two conditions are simultaneously true: unicode.IsUpper(prev) && unicode.IsLower(next) and str == "" (last rune). But the upper+lower case is checked first in the switch.
  3. The upper+lower case inserts a word boundary before prev (index 0), producing word = input[:0] = "" — empty, so it's discarded. It then sets input = input[0:].
  4. The loop increments i to 2, exits the for loop, never reaching the str == "" (last rune) handler.
  5. Result: Words("Ab")[] (empty list).

Since the enum JSON name is computed from Words(), all such values get an empty JSON name "", and they collide with each other and with every other enum value.

For comparison, Words("Abc") (3 chars) works correctly because after the upper+lower case fires at i=1, the loop continues to i=2 where the last-rune handler yields the full word.

Affected values

Any enum value that is exactly 2 characters matching [A-Z][a-z]. For example: Ab, Xq, Yz, Hu, Fn, etc.

Environment

  • buf v1.68.1 (also v1.68.0) — affected
  • buf v1.67.1 — works fine
  • protoc v24.4 — works fine

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions