New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle BOM in the beginning of the script #439
Handle BOM in the beginning of the script #439
Conversation
unittests/compiled_tests.cpp
Outdated
|
||
chai.eval("def func() { return \"Hello World\"; };"); | ||
|
||
CHECK(chai.eval<std::string>("\xef\xbb\xbf(func())") == "Hello World"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this whole script should be inline (i.e. don't use func()
), since you want to test the script running, not the function call working. Also you should probably remove line 359 as well.
Codecov Report
@@ Coverage Diff @@
## develop #439 +/- ##
===========================================
+ Coverage 72.05% 72.09% +0.04%
===========================================
Files 59 59
Lines 10884 10912 +28
===========================================
+ Hits 7842 7867 +25
- Misses 3042 3045 +3
Continue to review full report at Codecov.
|
@@ -52,6 +52,7 @@ | |||
|
|||
|
|||
#include "../dispatchkit/exception_specification.hpp" | |||
#include "chaiscript_parser.hpp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still not sure it's a good idea to pull in the parser for this very simple test one could easily inline, especially considering it's only used in one place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true, I originally intended it to be done differently... I've refactored the code and removed the include.
unittests/compiled_tests.cpp
Outdated
{ | ||
chaiscript::ChaiScript_Basic chai(create_chaiscript_stdlib(),create_chaiscript_parser()); | ||
CHECK_THROWS_AS(chai.eval<std::string>("\xef\xbb\xbfprint \"Hello World\""), chaiscript::exception::eval_error); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still needs a test case for other binary/non-ANSI garbage in the input at random positions (beginning, middle, end).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left just one non-ASCII character and moved it to the middle of the string.
We're getting an inexplicable error on Visual C++ - can you make sure you're up to date with the latest |
I've merged the latest changes from the upstream and pushed to my branch. If it doesn't help, I'll try to compile my project on VS and see what happens ( I'm on G++ right now). |
infile.read(&v[0], static_cast<std::streamsize>(bytes_needed)); | ||
std::string buffer_string(v.begin(), v.end()); | ||
|
||
if ((buffer_string.size() > 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This statement will always be true, since the vector always contains bytes_needed
elements. In addition, this should use bytes_needed
rather than a constant.
Overall I'd just check whether the stream has reached the end of file or not.
infile.seekg(0, std::ios::beg); | ||
|
||
assert(size >= 0); | ||
|
||
if (skip_bom(infile)) { | ||
size-=3; // decrement the BOM size from file size, otherwise we'll get parsing errors | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably do another asset(size >= 0);
here to verify there's actually something beyond the BOM.
bool SkipWS(bool skip_cr=false) { | ||
bool retval = false; | ||
|
||
while (m_position.has_more()) { | ||
if(static_cast<size_t>(*m_position) > 0x7e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This cast looks wrong to me. Why no unsigned char
instead?
unittests/compiled_tests.cpp
Outdated
TEST_CASE("Non-ASCII characters in string") | ||
{ | ||
chaiscript::ChaiScript_Basic chai(create_chaiscript_stdlib(),create_chaiscript_parser()); | ||
CHECK_THROWS_AS(chai.eval<std::string>("prin\xeft \"Hello World\""), chaiscript::exception::eval_error); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You misunderstood me. I think there should be at least three test cases, I could even imagine five:
- Read and run a UTF-8 BOM file – no exception, normal execution
- Run a string with UTF-8 BOM – exception? could need others' opinion on this one I guess
- Run a string with an invalid character at the start – exception
- Run a string with an invalid character in the middle – exception
- Run a string with an invalid character at the end – exception
std::vector<char> v(bytes_needed); | ||
|
||
infile.read(&v[0], static_cast<std::streamsize>(bytes_needed)); | ||
std::string buffer_string(v.begin(), v.end()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This step feels completely unnecessary, considering the comparison happens on a character by character level.
infile.read(&v[0], static_cast<std::streamsize>(bytes_needed)); | ||
std::string buffer_string(v.begin(), v.end()); | ||
|
||
if (!infile.eof() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The more I think about this check, the less sense it makes. Right now the BOM wouldn't be skipped if the file includes only the BOM (I think; not 100% sure if the eof bit is set already).
Either way, I think the better approach would be to allocate something like char header[3] = {0};
, read the first 3 bytes from the stream and just compare them 1:1 with the BOM sequence, not even testing the length/position (since failed reads would result in \0
).
Thank you both for the diligence on this. Once you have something sorted out that you like, if you're still having problems with MSVC, I'll debug that bit. |
detect BOM in processed file.
anything inside it.
std::vector<char> v(bytes_needed); | ||
std::streamsize bytes_needed = 3; | ||
std::streamsize bytes_read = 0; | ||
char buffer[3] = { '\0' }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't have to initialize the array elements to \0
if you check length anyway. This was more an idea in case you just do the three comparisons only and skip the length check.
But besides that, nothing more to comment on. :)
This reverts commit 0e964da.
FYI @AlekMosingiewicz and @MarioLiebisch I've just narrowed down why the tests are failing on Appveyor - it is because of an upgrade to cmake. I'll get it fixed shortly. |
OK, Appveyor builds are officially fixed now! If you are both happy with this I'll merge it in. I'd like to see this before the next release, and I'd like to make the next release very soon. |
I've nothing to add, only waiting for @MarioLiebisch opinion |
Ditto, fine with me. |
Ok, I'm going to merge and run through some fuzz testing, since this is hitting the top of the parser. |
Issue this pull request references: #436
Changes proposed in this pull request