Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle BOM in the beginning of the script #439

Merged
merged 27 commits into from May 26, 2018
Merged

Handle BOM in the beginning of the script #439

merged 27 commits into from May 26, 2018

Conversation

AlekMosingiewicz
Copy link
Contributor

Issue this pull request references: #436

Changes proposed in this pull request

  • detect BOM in the same way a symbolic link is detected before the parsing begins and skip it


chai.eval("def func() { return \"Hello World\"; };");

CHECK(chai.eval<std::string>("\xef\xbb\xbf(func())") == "Hello World");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this whole script should be inline (i.e. don't use func()), since you want to test the script running, not the function call working. Also you should probably remove line 359 as well.

@codecov-io
Copy link

codecov-io commented May 13, 2018

Codecov Report

Merging #439 into develop will increase coverage by 0.04%.
The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #439      +/-   ##
===========================================
+ Coverage    72.05%   72.09%   +0.04%     
===========================================
  Files           59       59              
  Lines        10884    10912      +28     
===========================================
+ Hits          7842     7867      +25     
- Misses        3042     3045       +3
Impacted Files Coverage Δ
include/chaiscript/language/chaiscript_parser.hpp 98.27% <100%> (-0.24%) ⬇️
include/chaiscript/language/chaiscript_engine.hpp 93.75% <100%> (+0.38%) ⬆️
unittests/compiled_tests.cpp 92.2% <100%> (+0.16%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 062f821...b3f77f0. Read the comment docs.

@@ -52,6 +52,7 @@


#include "../dispatchkit/exception_specification.hpp"
#include "chaiscript_parser.hpp"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still not sure it's a good idea to pull in the parser for this very simple test one could easily inline, especially considering it's only used in one place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, I originally intended it to be done differently... I've refactored the code and removed the include.

{
chaiscript::ChaiScript_Basic chai(create_chaiscript_stdlib(),create_chaiscript_parser());
CHECK_THROWS_AS(chai.eval<std::string>("\xef\xbb\xbfprint \"Hello World\""), chaiscript::exception::eval_error);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still needs a test case for other binary/non-ANSI garbage in the input at random positions (beginning, middle, end).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left just one non-ASCII character and moved it to the middle of the string.

@lefticus
Copy link
Member

We're getting an inexplicable error on Visual C++ - can you make sure you're up to date with the latest develop and see if the error persists? Thanks

@AlekMosingiewicz
Copy link
Contributor Author

I've merged the latest changes from the upstream and pushed to my branch. If it doesn't help, I'll try to compile my project on VS and see what happens ( I'm on G++ right now).

infile.read(&v[0], static_cast<std::streamsize>(bytes_needed));
std::string buffer_string(v.begin(), v.end());

if ((buffer_string.size() > 2)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This statement will always be true, since the vector always contains bytes_needed elements. In addition, this should use bytes_needed rather than a constant.

Overall I'd just check whether the stream has reached the end of file or not.

infile.seekg(0, std::ios::beg);

assert(size >= 0);

if (skip_bom(infile)) {
size-=3; // decrement the BOM size from file size, otherwise we'll get parsing errors
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably do another asset(size >= 0); here to verify there's actually something beyond the BOM.

bool SkipWS(bool skip_cr=false) {
bool retval = false;

while (m_position.has_more()) {
if(static_cast<size_t>(*m_position) > 0x7e) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cast looks wrong to me. Why no unsigned char instead?

TEST_CASE("Non-ASCII characters in string")
{
chaiscript::ChaiScript_Basic chai(create_chaiscript_stdlib(),create_chaiscript_parser());
CHECK_THROWS_AS(chai.eval<std::string>("prin\xeft \"Hello World\""), chaiscript::exception::eval_error);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You misunderstood me. I think there should be at least three test cases, I could even imagine five:

  • Read and run a UTF-8 BOM file – no exception, normal execution
  • Run a string with UTF-8 BOM – exception? could need others' opinion on this one I guess
  • Run a string with an invalid character at the start – exception
  • Run a string with an invalid character in the middle – exception
  • Run a string with an invalid character at the end – exception

std::vector<char> v(bytes_needed);

infile.read(&v[0], static_cast<std::streamsize>(bytes_needed));
std::string buffer_string(v.begin(), v.end());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This step feels completely unnecessary, considering the comparison happens on a character by character level.

infile.read(&v[0], static_cast<std::streamsize>(bytes_needed));
std::string buffer_string(v.begin(), v.end());

if (!infile.eof()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more I think about this check, the less sense it makes. Right now the BOM wouldn't be skipped if the file includes only the BOM (I think; not 100% sure if the eof bit is set already).

Either way, I think the better approach would be to allocate something like char header[3] = {0};, read the first 3 bytes from the stream and just compare them 1:1 with the BOM sequence, not even testing the length/position (since failed reads would result in \0).

@lefticus
Copy link
Member

Thank you both for the diligence on this. Once you have something sorted out that you like, if you're still having problems with MSVC, I'll debug that bit.

std::vector<char> v(bytes_needed);
std::streamsize bytes_needed = 3;
std::streamsize bytes_read = 0;
char buffer[3] = { '\0' };

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't have to initialize the array elements to \0 if you check length anyway. This was more an idea in case you just do the three comparisons only and skip the length check.

But besides that, nothing more to comment on. :)

@lefticus
Copy link
Member

FYI @AlekMosingiewicz and @MarioLiebisch I've just narrowed down why the tests are failing on Appveyor - it is because of an upgrade to cmake. I'll get it fixed shortly.

@lefticus
Copy link
Member

OK, Appveyor builds are officially fixed now! If you are both happy with this I'll merge it in. I'd like to see this before the next release, and I'd like to make the next release very soon.

@AlekMosingiewicz
Copy link
Contributor Author

I've nothing to add, only waiting for @MarioLiebisch opinion

@MarioLiebisch
Copy link

Ditto, fine with me.

@lefticus
Copy link
Member

Ok, I'm going to merge and run through some fuzz testing, since this is hitting the top of the parser.

@lefticus lefticus merged commit 61dfb22 into ChaiScript:develop May 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants