SOLR-15089: Allow backup/restoration to Amazon's S3 blobstore #120

athrog · 2021-05-10T18:47:27Z

Description

Solr provides a BackupRepository interface with which users can create backups to arbitrary backends. There is now a GCS implementation (see #39), but no S3 impl yet.

Solution

This PR adds a BackupRepository implementation for communicating with S3.

Tests

We've added new unit tests at the BackupRepository level as well as tests for the S3 interactions (using S3Mock framework).

Checklist

Please review the following and check all that apply:

I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
I have created a Jira issue and added the issue ID to my pull request title.
I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)
I have developed this patch against the main branch.
I have run ./gradlew check.
I have added tests for my changes.
I have added documentation for the Reference Guide

We have not yet done the work of adding license files for all the newly added libraries/dependencies. These will be added in a future commit.

Implementation of Solr's BackupRepository that uses Amazon S3 as the backing store.

solr/contrib/blob-repository/src/java/org/apache/solr/blob/backup/BlobBackupRepository.java

solr/contrib/blob-repository/src/java/org/apache/solr/blob/client/LocalStorageClient.java

gerlowskija

Hey Andy and Pierre! It took me too long to get to this, I'm sorry about that. But overall, the PR looks great as things stand today: there's a few open questions that I think we need to answer, but in terms of quality and testing this already looks really close. Thanks for taking the effort to contribute this!

As is a bad habit of mine I left a lot of comments inline and got very verbose here, but I do that mostly as a way of taking notes and thinking aloud, so don't let the uh, volume, scare you. Almost all comments boil down to me thinking about two questions:

What are the tradeoffs of the BlobRepository abstraction in contrast to having BackupRepository implementations for each blob store? What value does the abstraction bring us?
What distinguishes a few of the I/O classes in this PR (e.g. BlobIndexInput) from existing alternatives (e.g. BufferedChecksumIndexInput) that do similar things at a quick glance?

BlobRepository Abstraction

The biggest open question I think is whether or not to do the "BlobRepository" abstraction over having repository impls for each blob store we eventually want to support.

Here are the "pros" I can see of having the abstraction:

Code Reuse: BlobRepository allows some code reuse across blob stores. e.g. input validation and debug logging on BlobRepository methods. copyIndexFileFrom/copyIndexFileTo implementations. BlobIndexInput (though as I've admitted elsewhere, I don't fully understand the role this class plays). Perhaps some NamedList/argument parsing?
Partial Configuration Uniformity: BlobRepository allows some configuration properties to be defined uniformly across different blob stores. (i.e. 'blob.store.bucket.name' could be used for GCS as well as S3 backups).

Here are the "cons" that I see in having the abstraction:

Classpath Bloat: Having all blob-store implementations in the same config means that a user using S3 for backups will also be loading a bunch of the GCS (and eventually ABS) related dependencies that they have no intention of ever using. How much it'll matter in practice, idk, but in theory this expanded but unused surface area is a liability in terms of triggering vulnerability scans or true security issues whenever one of these deps has a CVE.
Limited Code Reuse: While BlobRepository does allow some pieces to be reused, it's not as much as I was hoping might be possible. IMO the actual code interfacing with the S3/GCS client comprises such a large chunk of the logic here that it dwarfs the smaller bits that are able to be reused. As an "example" of this: GCSBackupRepository and S3storageClient both come in at roughly the same LOC count despite S3StorageClient taking advantage of all the shared logic up in BlobRepository. (Admittedly this is a real shaky comparison: the two code samples are using different client libraries, implement different interfaces, are written in different coding styles with different use of whitespace, comments, etc. And LOC is a dubious stand-in for complexity in any case. But the point that S3StorageClient's code-reuse still leaves the two in the same ballpark still stands.)
Documentation Complexity: Combining multiple blob-store implementations under the same BackupRepository impl will add some documentation challenges: in calling out which config props affect which blob store "types", in describing the differing expectations on for identifiers like bucket names and paths, etc.

Am I missing any pros/cons here? Anything my descriptions above miss? Weighing these as they stand at least, I'd vote that we're probably better without the abstraction layer and just making this an s3 specific contrib. But I'm curious what I might be missing here.

One side note on this question: some of the code reuse that the abstraction does get us (specifically the input validation, debug-logging, and default method implementations) still seem achievable long-term if they're moved into BackupRepository itself. I can imagine a version of BackupRepository for example whose copyFileFrom method validates its inputs, debug-logs the invocation, and then calls an abstract doCopyFileFrom method that subclasses are required to implement. Or alternatively a BackupRepository implementation that does this work on each method before delegating to a BackupRepository object that it wraps. One nice side-effect here is that all BackupRepository impl's would benefit, and not just the blob-store ones.

Anyways, curious on your responses to some of the rambling above, looking forward to getting the ball rolling here; things look great so far!

solr/contrib/blob-repository/build.gradle

gerlowskija · 2021-05-28T13:08:06Z

solr/contrib/blob-repository/src/java/org/apache/solr/blob/backup/BlobBackupRepository.java

+    @Override
+    public URI resolve(URI baseUri, String... pathComponents) {
+        Objects.requireNonNull(baseUri);
+        Preconditions.checkArgument(baseUri.isAbsolute());


[0] I love the rigorous input-checking you guys are doing throughout here.

[0] I wonder whether some of these URI assumptions will hold up across other blob stores. I know GCS supports "relative looking" URIs (and in fact the GCS-mock used today for tests actually requires them!)

gerlowskija · 2021-05-28T13:16:48Z

...contrib/blob-repository/src/java/org/apache/solr/blob/backup/BlobBackupRepositoryConfig.java

+
+/**
+ * Class representing the {@code backup} blob config bundle specified in solr.xml. All user-provided config can be
+ * overridden via environment variables (use uppercase, with '_' instead of '.'), see {@link BlobBackupRepositoryConfig#toEnvVar}.


[+1] I love the uniformity of this approach, that all config props can be set by env-var. It's something I didn't do with the GCS repository impl and that now looks like a defect in hindsight.

...contrib/blob-repository/src/java/org/apache/solr/blob/backup/BlobBackupRepositoryConfig.java

solr/contrib/blob-repository/src/java/org/apache/solr/blob/backup/BlobIndexInput.java

solr/contrib/blob-repository/src/java/org/apache/solr/blob/client/S3StorageClient.java

solr/contrib/blob-repository/src/test/org/apache/solr/blob/backup/BlobBackupRepositoryTest.java

gerlowskija · 2021-05-28T15:36:07Z

solr/server/etc/security.policy

@@ -106,6 +106,8 @@ grant {
  permission java.lang.RuntimePermission "writeFileDescriptor";
  // needed by hadoop http
  permission java.lang.RuntimePermission "getProtectionDomain";
+  // needed by aws s3 sdk
+  permission java.lang.RuntimePermission "accessClassInPackage.jdk.internal.reflect";


[0] Note to self to look into this a bit more.

gerlowskija · 2021-05-28T15:44:30Z

solr/contrib/blob-repository/src/test/org/apache/solr/blob/backup/BlobBackupRepositoryTest.java

@@ -0,0 +1,366 @@
+/*


[+1] The unit-test coverage in the following files is laudable. The attention to thoroughness is awesome and appreciated!

[-1] That said, there's one checkbox that's missing I think - integration-level testing. We have a base class (AbstractIncrementalBackupTest) that does a passable job here that gets extended by each of the existing BackupRepository impls. We should probably create a S3IncrementalBackupTest extending this, similar to GCSIncrementalBackupTest (as one example).

Good idea on adding the S3IncrementalBackupTest, will do that before merging!

I see that GCSIncrementalBackupTest has a set of locales that it works for, do you think that this is necessary for the S3 test as well, or should I be ok skipping it?

Ok, test added, and it found that the index files are not validated for corruption when building a backup. So very good that we added the test!

Had to modify some of the other tests, but everything should work now, and we can be even more confident with the integration testing.

athrog · 2021-06-09T01:05:25Z

@gerlowskija I'll address the cons list first, that seems to me the most salient part to discuss:

Classpath Bloat: Having all blob-store implementations in the same config means that a user using S3 for backups will also be loading a bunch of the GCS (and eventually ABS) related dependencies that they have no intention of ever using. How much it'll matter in practice, idk, but in theory this expanded but unused surface area is a liability in terms of triggering vulnerability scans or true security issues whenever one of these deps has a CVE.

In my mind, if we go the path of using the BlobRepository abstraction, the ultimate goal would be to have it be a separate unit, apart from the S3 or GCS stuff. That could mean...

We have three modules in solr-contrib (e.g., BlobRepository, S3Repo, GCSRepo) [this seems weird though, b/c BlobRepository is an "abstract" layer and that doesn't seem to belong in solr-contrib]
We commit the BlobRepository code directly into the Solr core module, and then have S3Repo and GCSRepo pull from that (this sounds similar to the approach you mentioned "One side note on this question")
We push the BlobRepository code into a separate Git repo, publish a jar that S3Repo and GCSRepo

There may be other options, but for the bloat reasons you mentioned, I never really considered putting all the blob stores into one multi-cloud repo.

Limited Code Reuse: While BlobRepository does allow some pieces to be reused, it's not as much as I was hoping might be possible. IMO the actual code interfacing with the S3/GCS client comprises such a large chunk of the logic here that it dwarfs the smaller bits that are able to be reused. As an "example" of this: GCSBackupRepository and S3storageClient both come in at roughly the same LOC count despite S3StorageClient taking advantage of all the shared logic up in BlobRepository. (Admittedly this is a real shaky comparison: the two code samples are using different client libraries, implement different interfaces, are written in different coding styles with different use of whitespace, comments, etc. And LOC is a dubious stand-in for complexity in any case. But the point that S3StorageClient's code-reuse still leaves the two in the same ballpark still stands.)

Very fair criticism. For me, it's right on the edge of "are we over-architecting this?". It would of course help to have a third blob client (Azure?); perhaps with just two we're underestimating the value that middle layer could provide?

Documentation Complexity: Combining multiple blob-store implementations under the same BackupRepository impl will add some documentation challenges: in calling out which config props affect which blob store "types", in describing the differing expectations on for identifiers like bucket names and paths, etc.

Is this con still relevant if we pick one of the three options I listed above (and not have them all in one)? If each is separate, their docs would also be separate. Or do you mean internal docs (e.g., Javadoc)?

You also asked

What distinguishes a few of the I/O classes in this PR (e.g. BlobIndexInput) from existing alternatives (e.g. BufferedChecksumIndexInput) that do similar things at a quick glance?

In regards to BlobIndexInput, I'll have to defer the 'why' question to @psalagnac. That impl was written when we first wrote this plugin -- it's possible it was to work around a bug, or possible it was oversight that we built it ourselves and didn't use a pre-built impl. If I get some time, I can also try swapping our use for BufferedChecksumIndexInput and see if tests pass.

gerlowskija · 2021-06-10T15:32:28Z

if we go the path of using the BlobRepository abstraction, the ultimate goal would be to have it be a separate unit

👍 I was assuming a different structure, or maybe reading too much into the current structure of the WIP, but this makes a lot of sense and would do a lot to alleviate both the classpath and documentation concerns I had in mind previously. Of the specific implementations you mentioned, I'm partial to (2) as it seems like a nice middle ground between (1) and (3). It also seems like the approach that'd make it easiest for a user to use BlobRepository in writing their own plugin. But admittedly that's a bit handwavy. Do you have a preference among those options or see other pros/cons there?

it's right on the edge of "are we over-architecting this?". It would of course help to have a third blob client (Azure?)

That's definitely the question. I lean I think towards skipping the abstraction until we're in a situation where its value is more clear cut. If someone adds an Azure store next year and more repeated bits of code crop up, we can always add the middle-layer in at that point. But that's just my leaning - if you guys are confident that it'll be better to leave it in, I'm happy to go with that. The BlobIndexInput question might shed light on this too.

HoustonPutman · 2021-06-10T15:57:45Z

if we go the path of using the BlobRepository abstraction, the ultimate goal would be to have it be a separate unit

I agree that (2) is the best option out of the 3.

That's definitely the question. I lean I think towards skipping the abstraction until we're in a situation where its value is more clear cut. If someone adds an Azure store next year and more repeated bits of code crop up, we can always add the middle-layer in at that point. But that's just my leaning - if you guys are confident that it'll be better to leave it in, I'm happy to go with that. The BlobIndexInput question might shed light on this too.

I agree with Jason here as well.

I might have missed something, but did you mention at some point that this BlobRepository abstraction code was also going to be used for the blob-replicas that we've heard so much about? Or would that use something completely separate?

psalagnac · 2021-06-10T16:26:58Z

Thanks for all you feedback on this PR @gerlowskija

What distinguishes a few of the I/O classes in this PR (e.g. BlobIndexInput) from existing alternatives (e.g. BufferedChecksumIndexInput) that do similar things at a quick glance?

Any implementation of BackupRepository requires a friendly IndexInput for method openInput().

The two only things we have when working with S3 are an InputStream created by AWS client (so can't make much assumptions on its actual implementation) and the total file length. I haven't found any existing implementation of IndexInput in Solr codebase that would work with these two parameters only. Maybe there is one I missed, if so please point me to it.

My understanding is BufferedChecksumIndexInput requires a delegate IndexInput to access data.

BTW, I just found a bug in your implementation of IndexInput for GCS backup repository. We had this similar bug happening in production environments with AWS client. I'm not sure about details of AWS client internals, but there is a buffer somewhere in the stack. Depending exactly on when you invoke read() on the input stream, the buffer may be full or partially full. But the read operation is non-blocking, so if the buffer has less data than you want to read, you only get what's already in the buffer. That's why checking returned number of read bytes was mandatory here. Since contract of IndexInput does not support reading less bytes that initially requires, I had to solve this with a retry loop.

With AWS, it happens for real. With GCP, it may be just theoretical? That may be some code to share between different blob implementations? 😄

gerlowskija · 2021-06-16T19:13:07Z

Thanks for the clarification @psalagnac. I'll summarize your points to make sure I've got the right idea:

BlobIndexInput really serves two purposes. First there were no other reusable IndexInputs that wrap an InputStream (which is what the AWS client provides for reading). Second it builds in an extra retry loop to address the bug you described where IndexInput's contract declares that it reads N bytes, but InputStream may or may not provide that many in a single call depending on timing and internal buffering.

If I've got all that right, then I'm 👍 on keeping it around. Though maybe it should be renamed to remove the suggestion that the class is blob-specific in some way. (AFAICT it's equally useful whether you're wrapping a S3InputStream or a StringBufferInputStream, unless I'm missing something?)

I just found a bug in your implementation of IndexInput for GCS backup repository...I'm not sure about details of AWS client internals, but there is a buffer somewhere in the stack...With GCP, it may be just theoretical?

Good catch! As I understand your description - this bug requires (1) buffering somewhere in the stack underlying the InputStream and (2) non-blocking reads from that InputStream. I'll take a look and see whether the GCS code "checks" both of those boxes, unless you've already done that yourself and seen that it does?

The BlobIndexInput question might shed light on this [i.e. whether to include the BlobBackupRepository abstraction] too.

I raised this possibility in my previous comment, and I think I am sold on BlobIndexInput's necessity, but I still lean towards skipping the BlobBackupRepository abstraction for now. (Just wanted to close the loop there.)

gerlowskija · 2021-06-28T15:35:03Z

Hey @athrog - anything holding this up, or did you just miss the notification on the latest round of comments? (If it's a time thing, lmk and I can try to help out a bit myself.)

Seems like the consensus so far would be to (1) remove the BlobRepository abstraction and (2) repurpose/rebrand the contrib you're adding now to be s3-specific, pending any big objections or arguments from your side of things?

psalagnac · 2021-06-28T15:55:06Z

Hey @gerlowskija,
Sorry for the late answer here.

Thanks for you feedback. I think we reached an agreement to keepBlobIndexInput and remove the abstraction of the generic blob storage client.

I will work with @athrog on following changes:

Rename BlobBackupRepository to S3BackupRepository, and update docs to target only S3.
Drop BlobStorageClient layer, and directly use AWS APIs in S3BackupRepository
As a second step, implement incremental test (this may came later depending on our bandwidth).

Remove blob abstraction layer.

Rename package to s3.

psalagnac · 2021-06-30T17:59:43Z

Hi @gerlowskija,

The PR is updated with new class layer. I tried to align to existing implementation for CSG. All classes are now in package org.apache.solr.s3.

I kept calls to AWS API in standalone class S3StorageClient. This for for testing purpose and this class it not intended to be extended or used for nothing else than backups.

Still have to do end-to-end testing and polish docs.

HoustonPutman · 2021-07-08T20:07:36Z

@psalagnac could you merge this with main? I'm trying to build it and test it out locally, but I believe you are haven't captured the changes that fix the lucene version to a stable snapshot, thus I am getting compilation errors locally.

athrog · 2021-07-08T21:44:50Z

Just pushed a merge commit @HoustonPutman, along with an update for our README. Let me know if you have any trouble testing locally, I can add more documentation if needed.

athrog · 2021-07-08T21:48:42Z

Also, I'm not sure if it's a problem with my setup or maybe I missed some gradle setting somewhere, but just doing a ./gradlew assemble doesn't seem to put our new blob-repository jar (or it's dependencies) in the classpath? You get NoClassDefFound errors when trying to take a backup. I worked around that by just copying it manually after building. If anyone has seen this before or knows what we're missing, please let me know!

solr/contrib/blob-repository/src/java/org/apache/solr/s3/S3BackupRepository.java

sonatype-lift · 2021-07-08T22:25:48Z

solr/contrib/blob-repository/build.gradle

@@ -0,0 +1,47 @@
+/*


Moderate OSS Vulnerability:

pkg:maven/com.google.guava/guava@25.1-jre

0 Critical, 0 Severe, 1 Moderate and 0 Unknown vulnerabilities have been found in a direct dependency

MODERATE Vulnerabilities (1)

[CVE-2020-8908] A temp directory creation vulnerability exists in all versions of Guava, allowin...

A temp directory creation vulnerability exists in all versions of Guava, allowing an attacker with access to the machine to potentially access data in a temporary directory created by the Guava API com.google.common.io.Files.createTempDir(). By default, on unix-like systems, the created directory is world-readable (readable by an attacker with access to the system). The method in question has been marked @deprecated in versions 30.0 and later and should not be used. For Android developers, we recommend choosing a temporary directory API provided by Android, such as context.getCacheDir(). For other Java developers, we recommend migrating to the Java 7 API java.nio.file.Files.createTempDirectory() which explicitly configures permissions of 700, or configuring the Java runtime's java.io.tmpdir system property to point to a location whose permissions are appropriately configured.

CVSS Score: 3.3

CVSS Vector: CVSS:3.0/AV:L/AC:L/PR:L/UI:N/S:U/C:L/I:N/A:N

(at-me in a reply with help or ignore)

madrob

I think the bones are good, but I have a few concerns, mostly around the use of guava (a personal strong aversion to it), and a few minor clean up notes but overall thanks for putting this together!

solr/contrib/blob-repository/src/java/org/apache/solr/s3/AdobeMockS3StorageClient.java

solr/contrib/blob-repository/src/java/org/apache/solr/s3/S3BackupRepository.java

madrob · 2021-07-08T22:45:57Z

solr/contrib/blob-repository/src/java/org/apache/solr/s3/S3BackupRepository.java

+
+    @Override
+    @SuppressWarnings("unchecked")
+    public <T> T getConfigProperty(String name) {


This doesn't benefit from generic types AFAICT, we should fix the interface to not do this. Can do so in this issue or a separate one.

Let's do it in a separate issue.

solr/contrib/blob-repository/src/java/org/apache/solr/s3/S3BackupRepository.java

solr/contrib/blob-repository/src/java/org/apache/solr/s3/S3IndexInput.java

solr/contrib/blob-repository/src/java/org/apache/solr/s3/S3OutputStream.java

solr/contrib/blob-repository/src/java/org/apache/solr/s3/S3StorageClient.java

solr/contrib/blob-repository/src/test/org/apache/solr/s3/S3PathsTest.java

athrog · 2021-07-09T00:00:32Z

Thank you @madrob for the review. I fixed some of the low hanging fruit, will circle back later for the other comments (that will require testing after changes)

HoustonPutman · 2021-07-09T15:24:09Z

Also, I'm not sure if it's a problem with my setup or maybe I missed some gradle setting somewhere, but just doing a ./gradlew assemble doesn't seem to put our new blob-repository jar (or it's dependencies) in the classpath? You get NoClassDefFound errors when trying to take a backup. I worked around that by just copying it manually after building. If anyone has seen this before or knows what we're missing, please let me know!

@athrog Contribs aren't loaded to the classpath by default. What I'm doing for testing this (not at all recommended for the final product), is adding something to solr/webapp/build.gradle:
under dependencies {:

implementation(project(":solr:contrib:blob-repository"))

HoustonPutman · 2021-08-10T17:59:42Z

@athrog & @gerlowskija , I've gotten the precommit to pass (wow this adds a lot of dependencies). The full tests also pass 🎉

As per my comment above, I believe there are 3 areas left to finish before this work can be merged.

Documentation of the S3 Repo option in the solr-ref-guide
The tests currently have a lot of asserts without comments, these should be easy to debug when tests fail. Please add messages to any asserts there.
Finish debugging the async error, if it still occurs. (I will certainly be testing this, so I'll take responsibility for this task)

Once those are done, I will be happy to merge and backport this to 8x. Let me know what your bandwith is, and when you think you can get these done by 🙂 No rush, just want to be able to plan accordingly on my side.

Also just realized that upgrading the jackson smile dependency broke tests, so looking into fixing those currently...

…ckson-dataformat-smile.

HoustonPutman · 2021-08-10T19:40:56Z

Fixed a weird issue with the upgrade of the smile library. Tests pass now.

athrog · 2021-08-10T20:02:51Z

Happy to take the other two documentation tasks. @gerlowskija already wrote a short section in the ref guide for the GCS backup repo, so I can modify that for S3 easily. Will try to have those done by the end of the week if I have enough time.

athrog · 2021-08-10T23:41:06Z

Okay, took my first shot at the ref guide -- let me know what you all think!

One thing that stood out to me while documenting all the params was our use of the blob.* prefix (e.g., blob.s3.bucket.name). It made more sense when we had our blob abstraction, but now just seems superfluous. Any objections to removing it? Easy change to make today, much more difficult once we've shipped :D

HoustonPutman · 2021-08-11T20:21:15Z

One thing that stood out to me while documenting all the params was our use of the blob.* prefix (e.g., blob.s3.bucket.name). It made more sense when we had our blob abstraction, but now just seems superfluous. Any objections to removing it? Easy change to make today, much more difficult once we've shipped :D

Yeah I agree. We should remove blob entirely (variables, methods, config, etc), shouldn't be mentioned at all in the s3-repository contrib module.

And will take a look at the docs!

HoustonPutman · 2021-08-11T20:35:32Z

Documentation looks good to me! The only thing we should possibly add is how to add contribs to the classpath/runtime. I imagine this is already in the ref-guide somewhere, and we should be able to link to that pretty easily. Would be good to add for the GCS section as well.

…ludes changing the config bundle parameters specified in solr.xml.

athrog · 2021-08-11T22:37:14Z

Yeah good idea. I tried adding this to my _default solrconfig.xml, and I see the libs are added (from a logline), but still getting ClassNotFoundExceptions when taking backups.

  <lib dir="${solr.install.dir:..}/contrib/s3-repository/lib" regex=".*\.jar" />
  <lib dir="${solr.install.dir:..}/dist/" regex="solr-s3-repository-\d.*\.jar" />

This approach is what some other contribs recommend (tika, clustering)...any reason why it wouldn't work for this repo?

dsmiley · 2021-08-12T01:43:12Z

To document how to install plugins, I would simply link here:

<<solr-plugins.adoc#installing-plugins,Solr Plugins>>

Furthermore, I would discourage anyone from declaring <lib .../> in solrconfig.xml. I don't think it's quite deprecated but it's easy to simply put libs into SOLR_HOME/lib.

…tories.

HoustonPutman · 2021-08-12T18:07:28Z

Ok, so I tested the async stuff and it's working now.

The only thing I've found is that the indexFileCount and indexSizeMB are incorrect (listed as 0) when listing backups. Fairly confident that this doesn't have to do with this PR. Looking through the code it should be broken for all backup repos?

One last thing I want to do now is remove AdobeMockS3StorageClient, as there isn't much use now that there isn't any difference in the MockS3 client (paths in both cannot start with /). I'll update the docs to specify how to use MockS3, which is basically just pass in the Mock S3 endpoint URL.

HoustonPutman

@athrog @gerlowskija I think I'm happy with this PR.

Do y'all think there is anything else that needs to be done (other than adding a CHANGES entry)?

solr/CHANGES.txt

solr/contrib/s3-repository/src/java/org/apache/solr/s3/S3IndexInput.java

solr/contrib/s3-repository/src/java/org/apache/solr/s3/S3OutputStream.java

…#120) See solr/contrib/s3-repository/README.md for more information. Co-authored-by: Andy Throgmorton <athrogmorton@salesforce.com> Co-authored-by: Pierre Salagnac <psalagnac@salesforce.com> Co-authored-by: Houston Putman <houston@apache.org>

… those were actually correct with the old github repo anyway.

* Remove the two existing PR references (#120 and #11) as none of those were actually correct with the old github repo anyway. Co-authored-by: Christine Poerschke <cpoerschke@apache.org>

* Remove the two existing PR references (#120 and #11) as none of those were actually correct with the old github repo anyway. Co-authored-by: Christine Poerschke <cpoerschke@apache.org> (cherry picked from commit d5ab86e)

Andy Throgmorton added 2 commits May 10, 2021 11:45

SOLR-15089: Allow backup/restoration to Amazon's S3 blobstore

781c155

Implementation of Solr's BackupRepository that uses Amazon S3 as the backing store.

Fix debug log msg

2d8b7e2

sonatype-lift bot reviewed May 10, 2021

View reviewed changes

solr/contrib/blob-repository/src/java/org/apache/solr/blob/backup/BlobBackupRepository.java Outdated Show resolved Hide resolved

sonatype-lift bot reviewed May 10, 2021

View reviewed changes

solr/contrib/blob-repository/src/java/org/apache/solr/blob/client/LocalStorageClient.java Outdated Show resolved Hide resolved

gerlowskija reviewed May 28, 2021

View reviewed changes

Pierre Salagnac and others added 5 commits June 30, 2021 19:19

Remove blob abstraction layer.

55f88e8

Merge pull request #1 from psalagnac/solr-15089-flat

724549e

Remove blob abstraction layer.

Rename package to s3.

f53130c

Update README.md

a2bf025

Merge pull request #2 from psalagnac/solr-15089-package

1289ed0

Rename package to s3.

Andy Throgmorton added 3 commits July 8, 2021 13:15

Merge remote-tracking branch 'upstream/main' into solr-15089-upstream

f226d48

Merge remote-tracking branch 'andy/solr-15089' into solr-15089-upstream

f24b04c

Update README and change location prefix to 's3'

c7a6dea

sonatype-lift bot reviewed Jul 8, 2021

View reviewed changes

solr/contrib/blob-repository/src/java/org/apache/solr/s3/S3BackupRepository.java Outdated Show resolved Hide resolved

sonatype-lift bot reviewed Jul 8, 2021

View reviewed changes

madrob reviewed Jul 8, 2021

View reviewed changes

Fixing some of the easier code review comments

f002103

Don't write smile-format header twice. Fixes tests after upgrading ja…

cc29661

…ckson-dataformat-smile.

Add documentation to the ref guide. Add failure msgs to unit tests.

65a931c

Add some points to the documentation, remove an unused method.

c94e4d2

HoustonPutman and others added 2 commits August 11, 2021 16:41

Apply spotless

dbee4f2

Expunged 'blob' from codebase as it was a relic of the past. This inc…

832dfb8

…ludes changing the config bundle parameters specified in solr.xml.

In ref guide, link to "installing plugins" for both GCS and S3 reposi…

15243d7

…tories.

HoustonPutman added 2 commits August 12, 2021 14:15

Remove MockS3Client class, and s3Mock config option.

fb85dfa

Spotless.

0896de7

HoustonPutman approved these changes Aug 12, 2021

View reviewed changes

HoustonPutman added 3 commits August 12, 2021 16:07

Fix mike's review comments. Add changelog entry.

d1262de

Merge branch 'main' into solr-15089

8891fd6

Spotless.

ff7eb8c

athrog commented Aug 12, 2021

View reviewed changes

solr/CHANGES.txt Outdated Show resolved Hide resolved

madrob reviewed Aug 12, 2021

View reviewed changes

HoustonPutman added 2 commits August 16, 2021 10:09

Review comments.

480305d

Adding integration unit test for s3 repo

c748000

HoustonPutman merged commit 1cb0850 into apache:main Aug 16, 2021

janhoy added a commit to janhoy/solr that referenced this pull request Apr 8, 2022

Remove the two existing PR references (apache#120 and #11) as none of…

c3d1654

… those were actually correct with the old github repo anyway.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SOLR-15089: Allow backup/restoration to Amazon's S3 blobstore #120

SOLR-15089: Allow backup/restoration to Amazon's S3 blobstore #120

athrog commented May 10, 2021 •

edited by HoustonPutman

Loading

gerlowskija left a comment

gerlowskija May 28, 2021

gerlowskija May 28, 2021

gerlowskija May 28, 2021

gerlowskija May 28, 2021

HoustonPutman Aug 16, 2021

HoustonPutman Aug 16, 2021

athrog commented Jun 9, 2021

gerlowskija commented Jun 10, 2021

HoustonPutman commented Jun 10, 2021

psalagnac commented Jun 10, 2021

gerlowskija commented Jun 16, 2021 •

edited

Loading

gerlowskija commented Jun 28, 2021

psalagnac commented Jun 28, 2021

psalagnac commented Jun 30, 2021

HoustonPutman commented Jul 8, 2021

athrog commented Jul 8, 2021

athrog commented Jul 8, 2021

sonatype-lift bot Jul 8, 2021

madrob left a comment

madrob Jul 8, 2021

HoustonPutman Aug 12, 2021

athrog commented Jul 9, 2021

HoustonPutman commented Jul 9, 2021

HoustonPutman commented Aug 10, 2021 •

edited

Loading

HoustonPutman commented Aug 10, 2021

athrog commented Aug 10, 2021

athrog commented Aug 10, 2021

HoustonPutman commented Aug 11, 2021

HoustonPutman commented Aug 11, 2021

athrog commented Aug 11, 2021

dsmiley commented Aug 12, 2021

HoustonPutman commented Aug 12, 2021

HoustonPutman left a comment

SOLR-15089: Allow backup/restoration to Amazon's S3 blobstore #120

SOLR-15089: Allow backup/restoration to Amazon's S3 blobstore #120

Conversation

athrog commented May 10, 2021 • edited by HoustonPutman Loading

Description

Solution

Tests

Checklist

gerlowskija left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

athrog commented Jun 9, 2021

gerlowskija commented Jun 10, 2021

HoustonPutman commented Jun 10, 2021

psalagnac commented Jun 10, 2021

gerlowskija commented Jun 16, 2021 • edited Loading

gerlowskija commented Jun 28, 2021

psalagnac commented Jun 28, 2021

psalagnac commented Jun 30, 2021

HoustonPutman commented Jul 8, 2021

athrog commented Jul 8, 2021

athrog commented Jul 8, 2021

sonatype-lift bot Jul 8, 2021

Choose a reason for hiding this comment

pkg:maven/com.google.guava/guava@25.1-jre

[CVE-2020-8908] A temp directory creation vulnerability exists in all versions of Guava, allowin...

madrob left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

athrog commented Jul 9, 2021

HoustonPutman commented Jul 9, 2021

HoustonPutman commented Aug 10, 2021 • edited Loading

HoustonPutman commented Aug 10, 2021

athrog commented Aug 10, 2021

athrog commented Aug 10, 2021

HoustonPutman commented Aug 11, 2021

HoustonPutman commented Aug 11, 2021

athrog commented Aug 11, 2021

dsmiley commented Aug 12, 2021

HoustonPutman commented Aug 12, 2021

HoustonPutman left a comment

Choose a reason for hiding this comment

athrog commented May 10, 2021 •

edited by HoustonPutman

Loading

gerlowskija commented Jun 16, 2021 •

edited

Loading

HoustonPutman commented Aug 10, 2021 •

edited

Loading