New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow building an array_string from a const char* #5

Open
wants to merge 3 commits into
base: master
from

Conversation

Projects
None yet
2 participants
@matthijskooijman
Copy link

matthijskooijman commented Jan 14, 2018

This adds a public converting constructor that accepts a const char*,
and modifies the existing private converting constructor to simply
accept any parameter type (as long as it supports constexpr indexing, it
should work).

By adding this, you can do more complicated string processing in constexpr functions, and then store the result in a new char array using this new constructor. This allows, for example, to implement a compile-time basename function (see my blog for an example: https://www.stderr.nl/Blog/Software/Cpp/CompiletimeBasename.html).

Allow building an array_string from a const char*
This adds a public converting constructor that accepts a const char*,
and modifies the existing private converting constructor to simply
accept any parameter type (as long as it supports constexpr indexing, it
should work).
@akrzemi1

This comment has been minimized.

Copy link
Owner

akrzemi1 commented Jan 15, 2018

I am not comfortable providing a constructor from a pointer. Instead, I have added function offset_literal, which should allow you to acheive the same goal. Here is an illustration: https://wandbox.org/permlink/ztFuvdg4LbADSgdU

Would that be satisfactory for your use case?

@matthijskooijman

This comment has been minimized.

Copy link

matthijskooijman commented Jan 17, 2018

I am not comfortable providing a constructor from a pointer.

Why not? In a constexpr context, AFAICS the compiler will perform static bounds checks so this should be safe?

What I don't like about your suggested solution is that:

  • I need to repeat the string literal twice (though I guess that could be rewritten to store and pass a string_literal instead)
  • The return value of compute_basename_start is no longer self-contained, but is only valid in a specific context tied to the string it operates on, and the operation that still needs to be performed (offseting).

What I liked about the const char* approach is that the basename and strrchr implementations are quite clean and return a self-contained result, and only at the end do some slightly less elegant template strlen stuff.

Ideally, you could do more kinds of processing (also replacing characters in a string, or more generic substring operations, etc.) in the same way, but I now see that the const char* doesn't quite allow this either (since you can't change characters inside a const char*, you would need to convert to an array_string, but you can only do that conversion when you actually know how long the string will be, which requires knowledge of the full starting string, which you can only do at the final end of the operation, not halfway relying on a string passed in as a parameter).

Carrying this thought further, it seems you would get more flexibility if you interpret the N not as the actual string size, but as the maximum size (in other words, if you allow NUL bytes inside the array_string buffer). Then you can write your string processing functions to pass around and return array_string objects, possibly stuffed with zeroes at the end. You can implement a substring and a "replace character" that simply return a new array_string and all kinds of other operations on top of these. At the top level, you can shrink the resulting string by calling strlen() on the result and passing that as the new N parameter, similar to the example in my code.

One thing you cannot really implement in this way is string operations that increase the string length in a way that is not bounded by type parameters (e.g. when the resulting length depends on string contents and a upper bound cannot be selected statically), but in these cases you can probably explicitly specify an upper bound, and raise a compile-time error when the bound is exceeded, which should probably be sufficient.

How does that sound?

@akrzemi1

This comment has been minimized.

Copy link
Owner

akrzemi1 commented Jan 17, 2018

I am not comfortable providing a constructor from a pointer.

Why not? In a constexpr context, AFAICS the compiler will perform static bounds checks so this should be safe?

When you pass me a pointer, it is just a pointer. I have no guarantee if it points to a null-terminated char sequence. I cannot apply some validations. Also, I cannot statically figure out the size.

What I don't like about your suggested solution is that:

  • I need to repeat the string literal twice (though I guess that could be rewritten to store and pass a string_literal instead)
  • The return value of compute_basename_start is no longer self-contained, but is only valid in a specific context tied to the string it operates on, and the operation that still needs to be performed (offseting).

I appreciate why this is a problem. Let me think about a satisfactory solution.

How does that sound?
If you mean that I can add more ops, then yes, but I would rather add them selectively, as needs arise and are reported to me.

If you are referring to the limitations of this implementation, then yes, it is not a full static string processing: only the sizes of strings.

@akrzemi1

This comment has been minimized.

Copy link
Owner

akrzemi1 commented Jan 17, 2018

Maybe what you need is a C++17 string_view:

#include <experimental/string_view>
#include <iostream>

using std::experimental::string_view;
constexpr string_view s {"/user/me/project/file.cpp"};
constexpr auto i = s.find_last_of("/");
constexpr auto j = i == s.npos ? 0 : i + 1;
constexpr string_view fname = s.substr(j);

int main()
{
  std::cout << fname << std::endl;
}

Here's a live example: https://wandbox.org/permlink/0kNenlD54cvtZVOf

@matthijskooijman

This comment has been minimized.

Copy link

matthijskooijman commented Jan 23, 2018

string_view is certainly interesting. For my particular usecase (basename) it suffices, except that I would still need your static_string class to allocate a new array containing just file.cpp. The code you wrote above will allocate the full /user/me/project/file.cpp in the .elf file, and then refer to just a part of it (so you would save the runtime cost, but not the memory cost).

Furthermore, string_view only handles contiguous subsets of strings. For complete string processing, you would also need concatenation (I guess you can pretty much build anything else from those two), but concatenation does require allocating a new buffer as your array_string does now (or keep two pointers to the concatenated strings, but then each concatenating changes the type of the resulting string, which might be problematic when the concatenation is data-dependent).

@matthijskooijman

This comment has been minimized.

Copy link

matthijskooijman commented Feb 6, 2018

I translated my ideas into some code, since that is often easier to talk about. See the last commit I pushed to this PR.

I essentially changed the meaning of the N template parameter, so it now means "The size of the storage", instead of "The length of the string". The length of the string can be smaller. In the case of a literal_ref, the offset and actual string length is stored explictly, in the case of the char_array, it is deduced using a static version of strlen. The actual string length is returned by len() in both cases.

I also added a substr() implementation for both string versions, which allows generating these "shorter-than-N" strings. Using this substr will generate strings that are shortened, but the storage they point to will still take up the full N bytes. To shrink this storage, the actual string length must be passed as a template parameter, so that can only happen at the outer level of processing, where the result is available in a constexpr variable (and specifically can not happen with strings that are available as arguments to constexpr functions, since those are not guaranteed to be constexpr). e.g. to shrink an array_string in the file variable:

    constexpr auto small_file = sstr::array_string<file.len()>(file);

I also added the static_basename code to the test file, which can now be written in a fairly natural way (it would be even better in C++14).

Some remarks regarding the code:

  • When concatenating or converting strings, a int sequence was used. Since the actual string length is now no longer a template parameter, we cannot generate an int sequence based on the actual string length. Instead, I'm still generating int sequences based on the full storage size (N), but introduced a concat_elem helper that figures out how many elements to use from the first string, when to switch to the second string, and fills up the rest of the storage with zeroes.
  • For the converting constructor, I would need something similar (to use up to len() bytes from the converted string and fill the rest with zeroes). I solved this quick and dirty by reusing the concatenating private constructor and concatenating a zero-length string. This probably needs to be solved more elegantly.
  • I also added a converting constructor for array_string that can expand or shrink the storage size N. Expanding is needed when processing a string where you conditionally concatenate two strings: the resulting types will be different depending on wether the concatenation happened or not. By allowing the storage to expand, you can always return a value of the bigger type. Shrinking is needed to shrink the storage by passing an explicit length template parameter as shown above, and has an assert in place to make sure you won't shrink to smaller than the actual string length.
  • I did not add a converting constructor that changes N for literal_ref, since that would further change the meaning of N from "the size of the storage pointed to", to "An upper bound on the actual string length". Changing this might be ok, and is needed for some string processing (e.g. when choosing one of two strings with different lengths to return, you need to unify both into a single type that will hold both references), but I haven't done this yet.
  • When using substr() on a literal_ref, the length of the string is separately stored, but the NUL-termination is not changed. This means that when calling c_str() on such a string you will see the original NUL-terminator (i.e. any suffix removed using substr() will be present again). I don't think this can be fixed, other than forcing the substr() return value into an array_string (either directly, or letting the caller do that before calling c_str()).
  • I added a new len() method, to return the actual string length, while keeping the size() method to return the value of the storage (e.g. of N). I'm not sure if having both is really meaningful, though.
  • I used concatenation with an empty string to quickly convert a literal_ref into an array_string, just for convenience.
  • Having to support both literal_ref and char_array is a bit cumbersome. I now implemented two versions of substr(), and my static_basename also uses templates to support both kinds of string, but it also needs a decltype for the return type since the result of substr() depends on the type it is called upon. For this kind of string processing, it is probably a lot more convenient to only use the array_string version, which should work in all cases where the literal_ref also works (and with a proper constructor, you can convert a string literal into an array_string directly). The downside is probably performance: Allocating a new array all the time can be more work than just passing around references, but since this is all at compiletime, that's probably not really a big issue. A related issue is that string literals can share storage, while the array_string storage is probably not shared (unless the linker is really smart). However, once you do any string concatenation, you would lose sharing anyway, and when you're only doing substring-processing, you likely want to shrink the storage from the original anyway.

I guess the last point would be the biggest point to discuss. I think that to conveniently support generic string processing, the literal_ref string should be removed, but I can imagine you want to keep it in, especially in reference to your blogpost about it. Or do you have other compelling reasons to keep it?

@akrzemi1

This comment has been minimized.

Copy link
Owner

akrzemi1 commented Feb 6, 2018

Thank you for the submission. When I have some more spare time, I will analyze this in detail. I understand that the main idea is to track the "capacity" statically, and "length" dynamically.

@matthijskooijman

This comment has been minimized.

Copy link

matthijskooijman commented Mar 28, 2018

@akrzemi1, did you perhaps find some time to look at this yet? No pressure, just a gentle reminder :-)

@akrzemi1

This comment has been minimized.

Copy link
Owner

akrzemi1 commented Mar 28, 2018

@matthijskooijman: I am sorry but I had to put it in queue due to other commitments and personal life. I did look at it. I think I know how to handle it (provide three specializations of the template: for references to static array, for copied array, and for copied array with fixed capacity but dynamic length). But in honesty, it will take me some time to do this. Sorry for the delay.

@akrzemi1

This comment has been minimized.

Copy link
Owner

akrzemi1 commented Apr 25, 2018

I have implemented part of your proposed change: you can take suffixes but not arbitrary substrings.

constexpr auto fpath = sstr::literal("path/fname");
constexpr auto fname = sstr::suffix(fpath, 5);
// decltype(fname) is sstr::string_literal_suffix<10>
// fname.size() == 5

constexpr auto full_fname = fname + ".txt";
// decltype(fname) is sstr::array_string_suffix<10 + 4>
// fname.size() == 9

std::cout << full_fname << std::endl;
// outputs: "fname.txt"

I think it should handle your use case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment