Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

decoded view #229

Merged
merged 1 commit into from Jul 16, 2022
Merged

decoded view #229

merged 1 commit into from Jul 16, 2022

Conversation

alandefreitas
Copy link
Member

@alandefreitas alandefreitas commented Jul 12, 2022

Context (const_string)

We have been seeing many foundational issues with const_string, including the recent issues #159 and #223. When compared to std::string or std::string_view, its design has been becoming more complex over time (#163, #164) and we are constantly exploring new solutions around it (#112, #127).

Some of its problems can be improved by implementing it as a concrete type with the appropriate operators (#127). However, while recently exploring examples of use cases of URIs (#226, #224), we have identified problems that cannot be solved with these extra operators.

For instance, routers almost never need to allocate memory and convert decoded values to std::string. const_string was forcing the application to perform ~2 or 3 allocations in a context where no allocations were really required, not to mention the lack of ergonomics in handling static_pools to avoid these allocations. Although const_string contains an inline buffer, the user still needs to handle these potential allocations.

Another smaller problem is mentioned in the mailing list. A url_view has functions such as url_view::XXXXXX() and url_view::encoded_XXXXXX() for all URI components, where the default function url_view::XXXXXX() is the expensive one. Some people have found this counterintuitive. Other people have considered only using the functions for the encoded components, which could defy the purpose of the library to a large extent.

This PR (decoded_view)

This PR introduces a "transform-view" type for percent decoded strings. Thus, all pairs of functions for URI components (url_view::XXXXXX() and url_view::encoded_XXXXXX()) would return cheap reference types. The user can use both functions without worrying about unnecessary memory allocations.

In cases where memory allocation is required, the allocation is deferred until the user really needs it. The user has total control over how the memory will be allocated. It also avoids an extra allocation when const_string would be converted to std::string and avoids all allocations when the decoded string is only required for comparisons and checking conditions. When the user needs common conversions, such as std::string, helper functions such as to_string() are provided for new and recycled objects.

The view type has the usual string operators, and should be almost as convenient as a string_view. The main difference is iterators are bidirectional, so operator[] is not available. One would need to advance the iterator or allocate memory. In practice, we don't need operator[] for many common use cases, like comparing segments or query values.

A summary of the allowed operations:

  • Iterators: begin, cbegin, end, cend, rbegin, crbegin, rend, crend
  • Access: front, back, encoded_data
  • Capacity: size, length, encoded_size, encoded_length, max_size, empty
  • Operations: copy, compare, to_string, append_to
  • Conversions: explicit std::basic_string
  • Non-member: All operators, oeprator<<(ostream, decoded_view)

About the operations:

  • Assignment can construct new objects or reuse recycled objects.
  • Appending can avoid temporary values.
  • Comparisons don't require any allocations.
  • Adding other operations that exist in string_view are possible with the same time complexity: starts_with, ends_with, contains, find, find_*
  • Adding other operations that exist in string_view are possible with the linear time complexity, which would still be faster than allocating memory: substr, remove_prefix, remove_suffix

Tests

Besides testing all member functions, as usual, the tests include use cases people have brought up (testConversion() and testAppend()), snippets that came up in the examples (mostly in testCompare()), and use cases analogous to all examples in the Tony Table of PR #127 (testPR127Cases()).

Naming

The initial name for this class is decoded_view.

  • The suffix view indicates it doesn't own the data. The suffix range could denote both views and containers, which makes the suffix range useless.

  • The prefix decoded is short and, in the context, is enough to convey that the string is percent decoded. So this avoids very long unergonomic names such as percent_decoded_range or even pct_decoded_range.

@alandefreitas
Copy link
Member Author

@akrzemi1, would you like to chip in?

namespace boost {
namespace urls {

/** A view on a string as percent decoded characters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs work

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've improved this (locally for now) but I'll let you let me know when we can resolve this.

return n_ == 0;
}

/** Copies characters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs work

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've improved this (locally for now) but I'll let you let me know when we can resolve this.

@alandefreitas alandefreitas force-pushed the decoded_view branch 2 times, most recently from 667063a to e18e113 Compare July 15, 2022 19:47
@cppalliance-bot
Copy link

An automated preview of the documentation is available at https://229.url.prtest.cppalliance.org/libs/url/doc/html/index.html

@cppalliance-bot
Copy link

An automated preview of the documentation is available at https://229.url.prtest.cppalliance.org/libs/url/doc/html/index.html

namespace detail
{
template<class T, class = void>
struct has_value_type : std::false_type {};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could have done this all in is_mutable_string

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to do that but I couldn't because of some other implementation details.

detail::has_value_type<T>::value &&
detail::has_assign_and_append<T, I>::value
>::type>
: std::is_same<typename T::value_type, char>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh you can do better than this :) do it all in is_mutable_string

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything was in is_mutable_string, but I didn't manage to include the value_type requirement without splitting it.

template<class T, class I>
using is_mutable_string = __see_below__;
#else
namespace detail
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't put detail stuff in headers that docca sees

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the detail namespace was excluded in Jamfile. In any case, this is ifdefed out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the detail/ directory is excluded by the jamfile but there's no way to exclude the namespace from being parsed. docca throws them out, but it still has to lexically analyze it. Better that doxygen never sees it, so that the docs generate faster.

struct MutableString
{
template< class InputIt >
MutableString& assign( InputIt first, InputIt last );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need the requirement that the function return T&

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced MutableString& with void. But that would imply the requirement is void. Both are wrong, but MutableString& is more common although void might be safer (? not sure).

@cppalliance-bot
Copy link

An automated preview of the documentation is available at https://229.url.prtest.cppalliance.org/libs/url/doc/html/index.html

@sehe
Copy link

sehe commented Jul 15, 2022

In fact there is no such thing as a "percent-decoded string" - this is just a plain string (here)

You could also say there is no plain string (there's only a view over pct-encoded storage). Again, I think "[pct_]decoding_view" would be clear? UPDATE decode_view works as well (HT @alandefreitas)

W.r.t. naming, I agree that it's never going to read like poetry, and that's probably fine. I suppose _view suffixed will go over better with reviewers.

@cppalliance-bot
Copy link

An automated preview of the documentation is available at https://229.url.prtest.cppalliance.org/libs/url/doc/html/index.html

@alandefreitas alandefreitas merged commit 1d1a09f into boostorg:develop Jul 16, 2022
@alandefreitas alandefreitas deleted the decoded_view branch August 3, 2022 21:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants