Skip to content

Conversation

@WZhuo
Copy link
Contributor

@WZhuo WZhuo commented Dec 30, 2025

The url encode and decode function will be used for generating partitions' path, so move it to the common util for later use.

@WZhuo WZhuo changed the title refactor: move url encoder from rest catalog to util for common use feat: a simple implement of url encoder Dec 31, 2025
Copy link
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting together all my comments, the code may look like this (disclaimer: gemini wrote it):

namespace {

bool IsUnreserved(unsigned char c) {
  return (c >= '0' && c <= '9') || (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') ||
         c == '-' || c == '.' || c == '_' || c == '~';
}

char ToHex(unsigned char v) {
  static constexpr std::array<char, 16> kHexChars = {
      '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f'};
  return kHexChars[v & 0x0F];
}

int FromHex(char c) {
  if (c >= '0' && c <= '9') return c - '0';
  if (c >= 'a' && c <= 'f') return c - 'a' + 10;
  if (c >= 'A' && c <= 'F') return c - 'A' + 10;
  return -1;
}

}  // namespace

std::string UrlEncoder::Encode(std::string_view str_to_encode) {
  std::string escaped;
  escaped.reserve(str_to_encode.size() * 3 / 2);  // Heuristic reservation

  for (unsigned char c : str_to_encode) {
    if (IsUnreserved(c)) {
      escaped += static_cast<char>(c);
    } else {
      escaped += '%';
      escaped += ToHex(c >> 4);
      escaped += ToHex(c);
    }
  }
  return escaped;
}

std::string UrlEncoder::Decode(std::string_view str_to_decode) {
  std::string result;
  result.reserve(str_to_decode.size());

  for (size_t i = 0; i < str_to_decode.size(); ++i) {
    char c = str_to_decode[i];
    if (c == '%' && i + 2 < str_to_decode.size()) {
      int h1 = FromHex(str_to_decode[i + 1]);
      int h2 = FromHex(str_to_decode[i + 2]);

      if (h1 >= 0 && h2 >= 0) {
        result += static_cast<char>((h1 << 4) | h2);
        i += 2;
      } else {
        result += c;
      }
    } else if (c == '+') {
      result += ' ';
    } else {
      result += c;
    }
  }

  return result;
}

@WZhuo
Copy link
Contributor Author

WZhuo commented Jan 5, 2026

Rewrite the code translate from curl_easy_escape, curl_easy_unescape

@wgtmac wgtmac changed the title feat: a simple implement of url encoder feat: add simple url encoder & decoder Jan 5, 2026
@wgtmac
Copy link
Member

wgtmac commented Jan 5, 2026

Thanks @WZhuo for working on this and @HuaHuaY for the review!

@wgtmac wgtmac merged commit 68fe381 into apache:main Jan 5, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants