Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AVRO-1849 Fix C++ schema to JSON converter #97

Closed
wants to merge 9 commits into from

Conversation

Simon24601
Copy link
Contributor

A ValidSchema with no records produced invalid JSON when converting to JSON in C++. This is now fixed.

Copy link
Contributor

@zivanfi zivanfi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Simon,

I noticed your pull request in the issue, it is a pity that it went unnoticed for so long. Thanks for the fix, if you are still willing to go forward with it, please see the suggestion in my comment.

Thanks,

Zoltan

std::ostringstream os;
compiledSchema.toJson(os);
std::string result = os.str();
result.erase( std::remove_if( result.begin(), result.end(), ::isspace ), result.end() ); // Remove whitespace
Copy link
Contributor

@zivanfi zivanfi Sep 21, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of removing all whitespaces, consecutive whitespaces should be collapsed into a single space. It is true that it doesn't matter how many whitespaces there are, but it does matter whether there is or isn't whitespace between specific characters. A slight modification will give the desired result as described in http://stackoverflow.com/a/8362145/5613485

Additionally this line uses a different style than the other lines, there should be no spaces after '('-s or before ')'-s.

@Simon24601
Copy link
Contributor Author

Thanks Zoltan.
I certainly agree about the style - not sure why I have a different spacing
on that line.

With regards whitespace in general, the reason for eradicating whitespace
in this test is that the round-trip of JSON->Schema->JSON can insert exotic
whitespace like '\n' and '\r'. So we could have
{"type" : "double"}
being replaced with
{"type"\r\n:"double"\r\n}
In this case, compression to a single whitespace character isn't going to
allow these to compare equal - we need to convert all whitespace to ' '
first, then compress. Furthermore, whitespace in a schema is mostly
irrelevant, except inside tokens, so that
{"type" : "newType"}
and
{"type" : "new Type"}
are not considered identical. However, from what I understand of the Avro
specification, whitespace is not permitted inside tokens.

But if you think it's a change worth making, I'm happy to make it.

With regards the length of time it took to pick up this Jira, obviously I
could have been more aggressive in chasing it, but is there a separate
mailing list for e.g. C++ Avro or C# Avro developers? I have encountered
some issues in C# that I will be submitting pull requests for as soon as
I've written the appropriate tests, but I gather most devs are working on
the java implementation, so it would be better for me to target emails more
appropriately.

Thanks and regards
Simon

On Wed, Sep 21, 2016 at 12:49 PM, Zoltan Ivanfi notifications@github.com
wrote:

@Zicl requested changes on this pull request.

Hey Simon,

I noticed your pull request in the issue, it is a pity that it went
unnoticed for so long. Thanks for the fix, if you are still willing to go
forward with it, please see the suggestion in my comment.

Thanks,

Zoltan

In lang/c++/test/SchemaTests.cc
#97 (review):

@@ -154,6 +205,19 @@ static void testCompile(const char* schema)
compileJsonSchemaFromString(std::string(schema));
}

+// Test that the JSON output from a valid schema matches the JSON that was
+// used to construct it, apart from whitespace changes.
+static void testRoundTrip(const char* schema)
+{

  • BOOST_CHECKPOINT(schema);
  • avro::ValidSchema compiledSchema = compileJsonSchemaFromString(std::string(schema));
  • std::ostringstream os;
  • compiledSchema.toJson(os);
  • std::string result = os.str();
  • result.erase( std::remove_if( result.begin(), result.end(), ::isspace ), result.end() ); // Remove whitespace

Instead of removing all whitespaces, consecutive whitespaces should be
collapsed into a single space. It is true that it doesn't matter how many
whitespaces there are, but it does matter whether there is or isn't
whitespace between specific characters. A slight modification will give the
desired result as described in http://stackoverflow.com/a/8362145/5613485

Additionally this line uses a different style than the other lines, there
should be no spaces after '('-s or before ')'-s.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#97 (review), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AIUPmn6QLWdduBVvQeRHPuPp18D5kTGmks5qsRnDgaJpZM4Ik-X4
.

@zivanfi
Copy link
Contributor

zivanfi commented Sep 22, 2016

Regarding the mixture of different whitespaces, my suggestion was meant to include replacing all of them with spaces before collapsing them. I should have mentioned it explicitly, sorry.

On the other hand, your argument is valid as well, and what's more, there are cases where the existence of a whitespace doesn't make a difference - for example "[{" vs. "[ {". I haven't noticed it yesterday, but your test cases do not contain spaces in any location where their removal would matter. As such, apart from the minor styling difference, the current implementation is fine by me.

Regarding your question about specific mailing lists for different programming languages, I don't believe they exist.

"{\"name\":\"f2\",\"type\":\"int\"}]}",
/* Avro-C++ cannot do a round-trip on error schemas.
* "{\"type\":\"error\",\"name\":\"Test\",\"fields\":"
"[{\"name\":\"f1\",\"type\":\"long\"},"
Copy link
Contributor

@zivanfi zivanfi Sep 22, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Comment style is inconsistent, some lines start with a *, while others don't.

@Simon24601
Copy link
Contributor Author

Thanks for flagging up these style issues. I believe I've fixed them all now.
If you're happy with the fix, can you push it into avro/master? I don't have write-access.
Thanks for reviewing my changes.
Regards
Simon

Copy link
Contributor

@zivanfi zivanfi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Simon,

I added a minor comment regarding BOOST_CHECKPOINT.

I also noticed that avro/lang/c++/build.sh doesn't run SchemaTests. I know that you only added one test case to this already existing test, but if it doesn't take too much of your time could you please update the shell script to actually run the test you extended? If you argue that it is out of scope, I can also do it in a follow-up change.

Thanks,

Zoltan

// used to construct it, apart from whitespace changes.
static void testRoundTrip(const char* schema)
{
BOOST_CHECKPOINT(schema);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other tests have been updated to use BOOST_TEST_CHECKPOINT, please update this as well.

srw added 2 commits September 24, 2016 01:47
…ELinux (see Jira 1925). lang/c++/build.sh refers to the lang/c++/build directory which is empty. These are now fixed.
@Simon24601
Copy link
Contributor Author

Hi Zoltan,

I'm glad you asked for the fix to lang/c++/build.sh - I was going to start
a new fork to patch the build script (I opened Jiras 1925 and 1926 to raise
concerns about build scripts). So I've committed this and the change to
BOOST_TEST_CHECKPOINT.

I've also committed a minor change to the root level build.sh script. When
running "build.sh docker", it would load the docker container without
giving me write permission to most directories. This prevented building,
because compilation requires write access. This is due to SELinux default
policies (see Jira 1925 for details). Adding the :z that you will see in my
branch grants write access and makes it all work fine in SELinux. So when
you review my changes to the lang/c++/build.sh script, please review the
changes to the root-level build.sh.

Thanks
Simon

On Fri, Sep 23, 2016 at 7:33 PM, Zoltan Ivanfi notifications@github.com
wrote:

@Zicl requested changes on this pull request.

Hi Simon,

I added a minor comment regarding BOOST_CHECKPOINT.

I also noticed that avro/lang/c++/build.sh doesn't run SchemaTests. I know
that you only added one test case to this already existing test, but if it
doesn't take too much of your time could you please update the shell script
to actually run the test you extended? If you argue that it is out of
scope, I can also do it in a follow-up change.

Thanks,

Zoltan

In lang/c++/test/SchemaTests.cc
#97 (review):

@@ -154,6 +205,19 @@ static void testCompile(const char* schema)
compileJsonSchemaFromString(std::string(schema));
}

+// Test that the JSON output from a valid schema matches the JSON that was
+// used to construct it, apart from whitespace changes.
+static void testRoundTrip(const char* schema)
+{

  • BOOST_CHECKPOINT(schema);

Other tests have been updated to use BOOST_TEST_CHECKPOINT, please update
this as well.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#97 (review), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AIUPmuo2vUVNPgmkHN0G3SbifeHkCOcMks5qtBuTgaJpZM4Ik-X4
.

@zivanfi
Copy link
Contributor

zivanfi commented Sep 26, 2016

Hi Simon,

I feel that your fix for AVRO-1925 in the top-level build.sh is out of scope for this change, could you create a separate pull request for it instead?

Your change to lang/c++/build.sh is not exactly what I meant either. On my machine, ./build.sh test works correctly apart from not executing SchemaTests, which can be fixed simply by adding an && ./build/SchemaTests \ line. You seem to have addressed a different problem though by running the make command in the current directory instead of in build. This breaks the build script for me, as CMake generates no Makefile in lang/c++, only in lang/c++/build:

~/git/avro/lang/c++ $ git pull github pull/97/head
Updating fcd0a65..f7826c6
 4 files changed, 82 insertions(+), 19 deletions(-)
~/git/avro/lang/c++ $ ./build.sh test
-- Configuring done
-- Generating done
-- Build files have been written to: /home/zi/git/avro/lang/c++/build
make: *** No targets specified and no makefile found.  Stop.

Based on your description in AVRO-1926 it seems to me that things work differently on your computer: Your makefiles are not generated in lang/c++/build but in lang/c++ instead. I don't know why this happens so, but this behaviour seems to be specific to something in your environment and the fix you suggested would break it for other people.

Zoltan

…SchemaTests to the list of tests. Also revert SELinux changes to build.sh as these should be committed separately
@Simon24601
Copy link
Contributor Author

Hi Zoltan,

I've reverted the changes to the build.sh and lang/c++/build.sh files. I'm
not sure why I encountered the issues with the C++ tests, but reverting to
the master version and starting again has resolved it. I've also added the
SchemaTests to the build.sh file so these are now run as part of the build
process.

Apologies for the multiple changes - I hope this is converging on something
useful.
Thanks
Simon

On Mon, Sep 26, 2016 at 2:13 PM, Zoltan Ivanfi notifications@github.com
wrote:

Hi Simon,

I feel that your fix for AVRO-1925 in the top-level build.sh is out of
scope for this change, could you create a separate pull request for it
instead?

Your change to lang/c++/build.sh is not exactly what I meant either. On
my machine, ./build.sh test works correctly apart from not executing
SchemaTests, which can be fixed simply by adding an &&
./build/SchemaTests \ line. You seem to have addressed a different
problem though by running the make command in the current directory
instead of in build. This breaks the build script for me, as CMake
generates no Makefile in lang/c++, only in lang/c++/build:

~/git/avro/lang/c++ $ git pull github pull/97/head
Updating fcd0a65..f7826c6
4 files changed, 82 insertions(+), 19 deletions(-)
~/git/avro/lang/c++ $ ./build.sh test
-- Configuring done
-- Generating done
-- Build files have been written to: /home/zi/git/avro/lang/c++/build
make: *** No targets specified and no makefile found. Stop.

Based on your description in AVRO-1926 it seems to me that things work
differently on your computer: Your makefiles are not generated in
lang/c++/build but in lang/c++ instead. I don't know why this happens so,
but this behaviour seems to be specific to something in your environment
and the fix you suggested would break it for other people.

Zoltan


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#97 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AIUPmvcTddbX_CM_EXfGgrq8oKy-7FpGks5qt8T5gaJpZM4Ik-X4
.

Copy link
Contributor

@zivanfi zivanfi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Simon,

No problem, multiple rounds are normal in code reviews. Your contribution is definitely useful, thanks for your time!

Zoltan

@zivanfi
Copy link
Contributor

zivanfi commented Sep 28, 2016

Hi Simon,

Tom committed your change, could you please close the pull request?

Thanks,

Zoltan

@Simon24601 Simon24601 closed this Sep 28, 2016
iemejia pushed a commit that referenced this pull request Jun 11, 2021
…ity based on parsing canonical form (#97)

I noticed that we were deriving PartialEq for Schema while my understanding
is that the spec asserts that equality between schemas has to rely on
parsing canonical form. So I manually impemented PartialEq and Eq based
on that.

I also created a bunch of issues to follow with the tests that I had to
comment because they were failing.
SingingBush pushed a commit to SingingBush/avro that referenced this pull request Jun 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants