Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a single repository for the entire rdf4j code base (merging storage, tools, and testsuite back into rdf4j) #1467

Closed
6 tasks done
abrokenjester opened this issue Jun 26, 2019 · 5 comments
Assignees

Comments

@abrokenjester
Copy link
Contributor

abrokenjester commented Jun 26, 2019

Motivation

Back in 2017 (see https://www.eclipse.org/lists/rdf4j-dev/msg00410.html ) we made a decision to split the rdf4j project over multiple repositories. The main motivation for this was that a full build + verification of the project was taking too long, and this encouraged contributors to take shortcuts. The theory was that by splitting the project, we could get verification time down.

However, that expected speed gain has not really materialized. It's true that individual repo builds are quicker, but when we make a change in, for example, the rdf4j repo, we still need to run verification in rdf4j-storage and -tools - and can turn out to still break things.

A further downside is that compliance tests in the rdf4j repo often use code from rdf4j-storage (e.g. a sail impl) or from rdf4j-tools (e.g. to spin up an rdf4j server) - however due to the order of dependencies, those modules are not built yet when rdf4j repo does its verification. While that's no big deal as long as we're on a develop branch, and everything just uses "the latest SNAPSHOT", it seriously messes up things when release time rolls around and we have to set fixed versions: suddenly when rdf4j repo tries to build it fails because its tests can't find rdf4j-sail-memory 3.0.0. I have been attempting to mitigate this by moving tests around in the project, as well as setting fixed (older) versions for these kinds of test dependencies - but it's not ideal.

Finally: the Jenkins pipeline we currently have to build, verify and deploy all of this is just incredibly convoluted. We now have 24(!) separate Jenkins jobs to coordinate all of this, and it's painful to maintain tbh.

Proposed change

We move back to a mono-repo for the entire codebase, which will live in the rdf4j github repository. rdf4j-doc will remain a separate repository, where the project website and documentation are maintained.

To make sure we get decent build and verification times, we will make the following improvements:

  • culling and cleaning in our compliance and integration tests (there are quite a few tests in there that are either very slow, or redundant, or both).
  • better unit testing with mocking and stubbing instead of cramming all our verification into massive compliance/integration test suites that spin up full servers every time.

Tasks

  • merge rdf4j-storage (with tags and history) into rdf4j
  • merge rdf4j-tools (with tags and history) into rdf4j
  • merge rdf4j-testsuites (with tags and history) into rdf4j
  • reconfigure maven to handle the single-repo build
  • reconfigure Jenkins to handle the simplified build pipeline
  • mark old repositories as no longer in use (possibly make them read-only)

(See also discussion at https://www.eclipse.org/lists/rdf4j-dev/msg01147.html )

@abrokenjester
Copy link
Contributor Author

abrokenjester commented Jun 29, 2019

Scheduled to be done immediately after the 3.0 release.

@abrokenjester abrokenjester self-assigned this Jun 29, 2019
@abrokenjester abrokenjester changed the title Merge rdf4j-storage and rdf4j repos back together Merge rdf4j-storage, rdf4j-tools, rdf4j-testsuites and rdf4j repos back together Jul 28, 2019
@abrokenjester abrokenjester changed the title Merge rdf4j-storage, rdf4j-tools, rdf4j-testsuites and rdf4j repos back together Use a single repository for the entire rdf4j code base (merging storage, tools, and testsuite back into rdf4j) Aug 3, 2019
@abrokenjester
Copy link
Contributor Author

Merging the testsuite repo including all history is giving me a lot of headaches, due to the many files being moved and deleted when things were first split: it's nearly impossible to reconcile. So I'll instead just manually copy over the the benchmarks (which is the only part of the testsuites repo that is still relevant).

@abrokenjester
Copy link
Contributor Author

Turns out the idea of skipping the compliance tests unless the -Pcompliance profile was activated has problems. For now I'll just make sure the compliance modules are not actually deployed. We'll look into how to deal with PR verification vs full compliance later.

@abrokenjester
Copy link
Contributor Author

Jenkins configuration looks to be set well now, after a few tries:

  1. the PR verification job skips integration tests by means of the -DskipITs flag.
  2. verification of the master branch has been configured as an incremental build, so (in theory) it should only build/test those modules that have had changes applied.

On a separate note: note that the develop branch has not yet been set up correctly.

@abrokenjester
Copy link
Contributor Author

Happy with the setup for now.

@abrokenjester abrokenjester unpinned this issue Aug 31, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant