Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making the content syncs more resilient #214

Open
wants to merge 20 commits into
base: master
Choose a base branch
from

Conversation

mzgubin
Copy link

@mzgubin mzgubin commented Aug 29, 2018

  • Changes that improve resiliency like catching some of the errors encountered to keep the jobs running. It would be great if that was perhaps put into a configuration whether to fail-fast or just log those errors as warnings and go on. I don't know enough about the best approach for something like that would be.
  • Converted the servlet on the source being pulled from to handle post requests instead of get. That way the requests won't be limited by the GET URL length. This can be useful if there are a ton of exclusions as we've encountered.
  • Also I made a change to how the exclusion filters are matching the paths. Currently the paths will be searched to find the exclusion string in any part of the path, instead I've made it so that the string is matched against just the startsWith instead. Not sure if that's how it's intended to be used, but perhaps that can also be configured somehow in the exclusion filter section itself.
  • Added more debug logging

Maxim Gubin and others added 18 commits June 6, 2018 11:38
* Continuing on error when writing nodes on the client (catching and logging the ConstraintViolationException and the InvalidItemStateException)
…Logging the errors. Going to see if this helps with giving more information when something goes wrong on the server side like reading too large of a file.
…s in org.apache.jackrabbit.oak.plugins.blob.datastore.DataStoreBlobStore on line 676 of oak-blob-plugins-1.8.2.jar
… post method so that we are not limited by URL length in a GET call. This way we can pass a large exclusion list.
…re-logging

Fixes/continue on error and more logging
…rt of the path to see what it starts with.

We see if it matches or if it starts with that path (plus a forward slash to denote that it's a parent path)
Fixing the ExclusionPathNodeIterator to not just look at the first pa…
# Conflicts:
#	gradle.properties
#	src/main/groovy/com/twcable/grabbit/server/batch/steps/jcrnodes/JcrNodesWriter.groovy
#	src/main/groovy/com/twcable/grabbit/server/services/ExcludePathNodeIterator.groovy

# Please keep alphabetical
apache_httpclient_version = 4.5.4
apache_httpcore_version = 4.4.8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are using the OkHttpClient already, it would be better to continue to use that than having another way to create connections using the Apache HttpClient.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok will switch to that one

@@ -41,6 +43,7 @@ servlet_api_version = 2.5
slf4j_version = 1.7.6
sling_api_version = 2.9.0
sling_base_version = 2.2.2
sling_commons_json_version = 2.0.6
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to continue using Groovy's built-in JSONBuilder for consistency as used elsewhere in the codebase rather than adding this new dependency!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll switch it

"path={}\nafter={}\nexcludePathList={}\n\n\n", path,
afterDateString, StringUtils.join(excludePathsList, ",")

serverService.getContentForRootPath(serverUsername, path, excludePathsList ?: null,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the need to have a POST here instead of GET due to GET's limitations. However, since this is an internal call (from Grabbit client to Grabbit server), I don't think we need to keep both GET and POST.

I would thus suggest removing doGet() and also updating the GrabbitContentPullServletSpec accordingly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok will do

@@ -0,0 +1,344 @@
package com.twcable.grabbit.server.services.impl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mzgubin I did not see info about this in your PR description. Could you please let us know the need to re-implement this instead of using the TreeTraverser from Jackrabbit commons as it was before?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well the reason was that it allowed me to log what the error was when it was traversing. By default it was set to ErrorHandler.IGNORE so the errors were just being swallowed up.

log.debug "Sending NodeProto : ${node.name}"
node.writeDelimitedTo(servletOutputStream)
try {
//log.info "\n\n### NodeProtos ###\n\n${StringUtils.join(nodeProtos.toString(), "\n")}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can get rid of all the commented out code (here and elsewhere in the PR). Thanks!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem will do

@mzgubin
Copy link
Author

mzgubin commented Oct 4, 2018

Sorry have been bogged down but will definitely get to this in the next week

Maxim Gubin added 2 commits January 10, 2019 13:07
- Removing the commons sling json dependencies and using the the google groovy builder for json
- Removing commented out code
- Removing the GET method from the GrabbitContentPullServlet
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants