Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New feature: validate function for String class. #3096

Closed
wants to merge 2 commits into from

Conversation

Chris--A
Copy link
Contributor

@Chris--A Chris--A commented May 7, 2015

As people can access the internal buffer using various provided methods: str.c_str(), &str[0] It gives reason to believe that someone will modify the string externally, or provide these pointers to functions that do. It makes sense as a user can reserve memory using str.reserve(), or simply overwrite existing data already in the pointer.

Once this happens the String can reflect incorrect values despite having valid data inside it.

Take for example, someone buffering a string to receive data provided by some source external to the library.

  String result;
  result.reserve( 32 ); //Buffer enough space for the String data.

  I2C_EEPROM.get( address, result.c_str(), 32 );  //'get' the data directly into the String buffer.

  Serial.print( result ); //Prints an empty string.

This problem is solved with this PR:

  String result;
  result.reserve( 32 ); //Buffer enough space for the String data.

  I2C_EEPROM.get( address, result.c_str(), 32 );  //'get' the data directly into the String buffer.

  result.validate(); // Allow the String to detect the changed data.

  Serial.print( result ); //The internal 'len' is now correct, and data is printed.

The function simply searches for the null character. There is one parameter called remainder which if set to:

  • true: the function only searches the unused buffer (capacity - len).
  • false: the entire buffer is re-validated.

The default is to validate the entire buffer.

This seems important as the Print library uses the string's internal data to determine length, not the actual null character. If someone was to simply cut a string in half (`str[5] = '\0';) then functions using the null will perform differently to what is seen in debugging (may hinder debugging as Serial.print() is affected).

@ffissore ffissore assigned bugst-bot and cmaglie and unassigned bugst-bot May 7, 2015
@ffissore ffissore added Component: Core Related to the code for the standard Arduino API Architecture: AVR Applies only to the AVR microcontrollers (Uno, etc.) Architecture: SAM Applies only to the SAM microcontrollers (Due) labels May 7, 2015
@matthijskooijman
Copy link
Collaborator

This seems like a useful addition. But is validate() really a useful name? That suggests it returns a validity status, not reinitializes the length. Also, this doesn't really concern valid vs invalid strings (especially since, as far as I'm concerned, strings with embedded nul characters could be considered valid as well).

Since the function essentially updates the length to match the content, assuming a nul-terminated string, resetLength() or detectLength() or reinitializedLength() or findLength() might be more appropriate (though I don't really like any of them much I think). Any other ideas?

@cmaglie
Copy link
Member

cmaglie commented Jan 8, 2016

As far as I can see c_str returns a const char * that cannot (or at least should not) be modified, the following sketch:

void setup() {
  String result;
  result.reserve(32);
  result.c_str()[0] = 'A';
  Serial.print(result);
}

void loop() {
}

gives an:

ketch_jan08a.ino: In function 'void setup()':
sketch_jan08a:5: error: assignment of read-only location '* result.String::c_str()'
   result.c_str()[0] = 'A';
                     ^
exit status 1
assignment of read-only location '* result.String::c_str()'

@matthijskooijman
Copy link
Collaborator

@cmaglie, that's correct, but simply doing (char*)s.c_str() will circumvent that limitation by removing the constness. Also, I suspect that there are other ways to get embedded nul characters, like s.setCharAt(idx, 0) (didn't check if this is allowed, though).

So, I can see that this new method can be useful in some cases (though probably only in unusual cases, that touch the edges of the String object interface), but I still do not believe validate() is the right name for it.

@Chris--A
Copy link
Contributor Author

Chris--A commented Jan 9, 2016

Can we discuss some names, I'll gladly change it to have this feature in.

This finds its usefulness when some task needs to add data to a string. It can use up the available buffer in an efficient single step. For instance, but not limited to passing the String to functions that only accept c-string whether it be strconcat, or a function to read an interface.

That suggests it returns a validity status, not reinitializes the length. Also, this doesn't really concern valid vs invalid strings (especially since, as far as I'm concerned, strings with embedded nul characters could be considered valid as well).

As for the name, validate() seems fine to me. As in validate my String data: myStr.validate().
It is nice and short but still to the point. However, we could try validateContents() or validateData().

Validate is a verb describing an action. Whereas 'reinitialized' or 'validated' is past tense. As in, "Please validate my String" as opposed to "I need the reinitialized Length".

As a validity test, I'd expect to see something like isValid() or isValidated(). IMO plain validate() is asking the String to validate.

resetLength() seems ok, but on closer inspection it seems to imply something is cleared, or undone. detectLength() or reinitializedLength() or findLength() all tell me that they return something, and more so, do not imply that the actual String object is modified.


If you use toCharArray() or pass c_str() to any standard c-string based tool, it is only going to consider it a valid string up to the first null. However unrelated custom functionality could use multiple nulls in the one 'custom-string', or the non-string memory functions like memcpy (but doesn't seem relevant to the idea of a String class which basically wraps a null terminated string).

And there are three ways identified in this issue where a null can be entered in the middle of a string.

  String aStr = "Some text data";

  aStr[1] = '\0';
  aStr.setCharAt(2, '\0');

  char *cstr = (char*) aStr.c_str();
  cstr[3] = '\0';

Cheers

@Chris--A
Copy link
Contributor Author

Just had another question where this would be very beneficial.
http://stackoverflow.com/q/34701335/4057102

Without a feature like this, the String library still needs to be used in conjunction with char arrays, and kind of defeats its 'easy to use' functionality.

@matthijskooijman
Copy link
Collaborator

For instance, but not limited to passing the String to functions that only accept c-string whether it be strconcat, or a function to read an interface.

IMHO, using c_str() to modify the string contents is not actually part of the String interface (as indicated by it returning a const pointer). It will work, if you remove the const, but you'll run into problems such as this one. I guess the "canonical" way to solve this would be to use an external buffer and copy bytes, but that isn't very efficient. I'm not sure if there can be an API that is both elegant and efficient, within the current way String works. Having said that, adding the method you propose would indeed allow people to move outside of the bounds of the API and make their code work, even if it is not entirely elegant.

Regarding the name, I agree that validate has the right verb tense, and reinitalizedLength does not - I think I actually meant reinitializeLength in my previous comment.

However, I do not think the concept of "validating" matches the operation here. Validating, to me, is to check whether something is valid. Validating by itself doesn't change anything about the subject. Even if you'd assume validating means making the string valid, I'm not convinced that the string is invalid when there is an embedded NUL and only valid when the length matches the position of the first NUL. And even if you'd accept that definition of validity and that validate makes the string valid, why would it update length? It might as well remove any embedded NULs, or replace them with spaces, etc.

@cmaglie
Copy link
Member

cmaglie commented Apr 20, 2016

Looking at this again, I still think that changing the string buffer through the c_str() pointer should not be allowed and the API speak for itself by returning a const char *. Of course you can always force the constness limit with a couple of tricks, but the fact remains: this is not the intended usage of the String class.

Moreover, how will you check for boundaries? c_str() returns the pointer to the String buffer but not the current buffer size that may arbitrarily change (depending on the String class implementation).

@Chris--A
Copy link
Contributor Author

@cmaglie

I still think that changing the string buffer through the c_str() pointer should not be allowed.

It is a relatively common method for doing efficient bulk inserts. And allows a String to be compatible with functions and classes that only accept a pointer (and maybe a length).

Moreover, how will you check for boundaries? c_str() returns the pointer to the String buffer but not the current buffer size that may arbitrarily change (depending on the String class implementation).

When using c_str() to insert data into a String, one would naturally use the reserve() function before hand. This reflects functionality that is akin to creating a char array buffer: the size is known.

Besides, if c_str() is not to be used for writing, that is fine. There are still methods to change the string, which are also quite common in answers provided on the forum. Take this for example:

  String s = "A String!";
  char *c = &s[0];

In the code above, there is no trickery as the pointer is not const, and the API intends s[0] to be writable. Also this is the staple method synonymous with creating pointers to arbitrary places in an array:

  char arr[] = "A String\0Another String";

  char *first  = &arr[0];
  char *second = &arr[8];

Essentially the validate() function I proposed is a valuable addition because it allows people to use the String class in a vast range of functionality. Not to mention the advantages that could be added to the API or other libraries.

Take a very common example, i.e. A beginner wants to use something that works with c-strings, not Strings:

If a user is required to modify code they have found, simply so they can use it with their existing code (String use) then it significantly lessens the value of the Arduino API, and essentially, what it was designed for. This is especially important when considering the target demographic:

From the introduction

  • The Arduino software is easy-to-use for beginners.
  • Arduino is an open-source prototyping platform based on easy-to-use hardware and software.

Now I understand that adding an additional function is an increase in complexity. However, it does not have to be used, and will come with documentation and examples where users expect to find them (Arduino reference).

This is opposed to forcing the requirement to learn about the very topics the users avoided by utilizing the String class in the first place. And is not covered in the Arduino reference.

@facchinm facchinm added Print and Stream class The Arduino core library's Print and Stream classes and removed Architecture: AVR Applies only to the AVR microcontrollers (Uno, etc.) Architecture: SAM Applies only to the SAM microcontrollers (Due) labels Jan 24, 2017
@facchinm
Copy link
Member

As pointed out in the comments, the constness of the pointer returned by c_str() disallows any modification of that buffer, and for the principle of least astonishment I'd leave it as is.
Modifying the backing storage can still be done via cast but I can't see much benefits.

@Chris--A I'd close the issue here, if you still want to see this in the core please reopen in https://github.com/arduino/ArduinoCore-api , thanks!

@facchinm facchinm closed this Sep 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Core Related to the code for the standard Arduino API Print and Stream class The Arduino core library's Print and Stream classes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants