Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove cassandra-all dependency #3245

Merged
merged 2 commits into from
Nov 9, 2022

Conversation

farodin91
Copy link
Contributor

Done

  1. cleanup janusgraph-cql. Move cassandra hadoop impl to us.

Todo:

  1. switch to v4 driver
  2. Merge and refactor

Thank you for contributing to JanusGraph!

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes:

  • Is there an issue associated with this PR? Is it referenced in the commit message?
  • Does your PR body contain #xyz where xyz is the issue number you are trying to resolve?
  • Has your PR been rebased against the latest commit within the target branch (typically master)?
  • Is your initial contribution a single, squashed commit?

For code changes:

  • Have you written and/or updated unit tests to verify your changes?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE.txt file, including the main LICENSE.txt file in the root of this repository?
  • If applicable, have you updated the NOTICE.txt file, including the main NOTICE.txt file found in the root of this repository?

For documentation related changes:

  • Have you ensured that format looks appropriate for the output in which it is rendered?

@janusgraph-bot janusgraph-bot added the cla: external Externally-managed CLA label Oct 10, 2022
@farodin91 farodin91 marked this pull request as ready for review October 12, 2022 06:37
@FlorianHockmann
Copy link
Member

I haven't really reviewed the PR, but since you seem to have copied files here completely from Apache Cassandra, I think that we have to include this in our NOTICE.txt. The TinkerPop NOTICE file contains an example where a source file was also copied from another project: https://github.com/apache/tinkerpop/blob/master/NOTICE#L23

@farodin91
Copy link
Contributor Author

@FlorianHockmann I added a section to the NOTICE.txt

@farodin91
Copy link
Contributor Author

@FlorianHockmann I added a section to the NOTICE.txt

the codecov/patch will fail.

Copy link
Member

@porunov porunov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to copy so many classes from cassandra dependency to janusgraph-cql?
Why can't we just introduce a necessary dependency?
There are too many classes we will need to support after this PR.

@farodin91
Copy link
Contributor Author

Why do we need to copy so many classes from cassandra dependency to janusgraph-cql?

I copied as aleast as possible. It seems that the hadoop part is really support in cassandra. We could switch to datastax variant which uses spark instead of hadoop. That would require much more work.

Why can't we just introduce a necessary dependency?

Their was a dependency cassandra-all which a depedency hell.

@porunov
Copy link
Member

porunov commented Nov 7, 2022

Their was a dependency cassandra-all which a depedency hell.

I'm just thinking maybe it makes sense to refactor cassandra to provide a small dependency with necessary classes we are using. I.e. to create something like cassandra-hadoop-util library or something like that and publish it as a Maven artifact. If so, we won't need to support these classes by our own and anytime we bump cassandra version we would be able to get all features and bugfixes without copy pasting source code every time there is a new Cassandra version. Also by copy pasting these classes user's won't be able to replace transitive dependencies with the newer / older cassandra versions because these classes are hard coded into JanusGraph logic now.
If it's complicated to introduce a new dependency on cassandra side then I would prefer just creating our own library (or at least another module) and make janusgraph-cql module depend on that module with these copy pasted classes. Otherwise, people who start to learn janusgraph-cql module will be overwhelmed with some classes they don't really need because those classes are meant to be copied from cassandra and not developed by JanusGraph developers.

@farodin91
Copy link
Contributor Author

farodin91 commented Nov 7, 2022

I will move the classes into a own module. Afterward, i will open issue on cassandra.

Besides, i don't think we have to update on each cassandra update as they just depending open the datastax driver.

Copy link
Member

@porunov porunov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming that cassandra-hadoop-util is copy paste of classes from cassandra and assuming that all tests pass it LGTM. Thank you @farodin91 !

@farodin91
Copy link
Contributor Author

farodin91 commented Nov 7, 2022

@porunov @FlorianHockmann Should i try to create patch file with difference between cassandra and cassandra-hadoop-util?

@FlorianHockmann
Copy link
Member

Should i try to create patch file with difference between cassandra and cassandra-hadoop-util?

Doesn't your second commit here already show the full diff? That helped me at least to review this change and the PR still looks good to me.

It was definitely a good suggestion to move this into its own module by the way and asking the folks at Apache Cassandra whether they would be willing to move these files into its own module on their side of course also makes sense.

@farodin91
Copy link
Contributor Author

I want show a diff between cassandra and cassandra-hadoop-util.

@FlorianHockmann
Copy link
Member

I want show a diff between cassandra and cassandra-hadoop-util.

You mean there are changes between the original files and the ones added with this PR? I just assumed that you completely copied them. In that case, a diff would be good I guess.

@porunov
Copy link
Member

porunov commented Nov 7, 2022

I want show a diff between cassandra and cassandra-hadoop-util.

If that's only headers then I'm fine without the diff. If there are any other changes then it would be good to see the diff

@farodin91
Copy link
Contributor Author

@FlorianHockmann @porunov i've added a extra commit.

Copy link
Member

@FlorianHockmann FlorianHockmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just reviewed the diff. You seem to have mostly removed code that we aren't using which should be fine.
I'm wondering however about users who use JanusGraph embedded in their Java project and also Cassandra. They might need to exclude cassandra-hadoop-util as a dependency and add a dependency on cassandra-all themselves now if they use any of the removed code themselves. I guess this something we should mention in the upgrade docs.

Some changes are however also changing the behavior if I'm not mistaken, e.g., special handling for cases with a single datacenter. I would have preferred a follow-up PR for such changes as that would make the review easier and you could better explain these changes. But I'm also fine with keeping them in here if it's too much effort to extract them from the PR now.

@farodin91
Copy link
Contributor Author

@FlorianHockmann Changelog is a good idea.

@farodin91
Copy link
Contributor Author

Removal of cassandra-all dependency

We have extracted the hadoop part from cassandra to reduce the dependency tree of JanusGraph. If you are running JanusGraph with
an embeded Cassandra, you have to exclude the cassandra-hadoop-util from janusgraph-cql.

@FlorianHockmann what do you think?

@@ -88,6 +88,11 @@ For more information on features and bug fixes in 1.0.0, see the GitHub mileston

#### Upgrade Instructions

##### Removal of cassandra-all dependency

We have extracted the hadoop part from cassandra to reduce the dependency tree of JanusGraph. If you are running JanusGraph with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this:

JanusGraph had a dependency on cassandra-all only for some Hadoop-related classes. We moved these few classes into a new module cassandra-hadoop-util to reduce the dependencies of JanusGraph. "If you are running JanusGraph [...]".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rebased and changed

Signed-off-by: Jan Jansen <jan.jansen@gdata.de>
Signed-off-by: Jan Jansen <jan.jansen@gdata.de>
@farodin91 farodin91 merged commit f5ab554 into JanusGraph:master Nov 9, 2022
@farodin91 farodin91 added this to the Release v1.0.0 milestone Nov 9, 2022
@farodin91 farodin91 deleted the use-only-driver branch November 10, 2022 20:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: external Externally-managed CLA
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants