Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow refreshing Geo-Location Processor files from an S3 bucket #13204

Merged
merged 21 commits into from Aug 23, 2022

Conversation

kingzacko1
Copy link
Contributor

@kingzacko1 kingzacko1 commented Aug 8, 2022

Description

Geo-Location processor city and ASN database files can now be stored in an S3 bucket and pulled onto the Graylog server's filesystem automatically. These changes rely on the AWS SDK DefaultCredentialsProvider and not any settings that may be configured in the Graylog AWS plugin configuration. Using the S3 bucket option is disabled by default and not required. Users who would like to continue to manage their on disk files on their own may continue to do so without making any changes to their processor configuration settings.

Motivation and Context

Currently users must update the database files on each of their Graylog server's filesystems which can be a cumbersome and error prone process. This change allows users to put their files in one place and have their Graylog nodes pull in those files automatically.

How Has This Been Tested?

Tested locally in a development environment using the default credential profiles file option for the DefaultCredentialsProvider (see AWS documentation). Will continue testing with the other options to confirm they work as well.

Screenshots (if appropriate):

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Refactoring (non-breaking change)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.

Copy link
Contributor

@ryan-carroll-graylog ryan-carroll-graylog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really awesome @kingzacko1 ! I was able to test successfully with the environment variables option.
I noticed one thing I left a comment about. I think your last push might have actually fixed it, but I'll leave it there for your consideration. I'll keep testing this as well this week.

Copy link
Contributor

@danotorrey danotorrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome @kingzacko1, and it delivers a lot of value by allowing those GeoLocation files to be refreshed dynamically. I also like how it fits in nicely within the existing Geo service framework and config.

I also think it would be beneficial to add some tests around the new S3 parts, since there is some specific business logic additions.

@danotorrey
Copy link
Contributor

Thanks for all the work and updates on this @kingzacko1! Feel free to dismiss my review if y'all get to the point of merging before I am back next Tuesday.

@kingzacko1 kingzacko1 force-pushed the feature/pull-geoip-files-from-s3 branch from 3d0fbbc to 6d01560 Compare August 15, 2022 14:27
@kingzacko1 kingzacko1 marked this pull request as draft August 15, 2022 17:22
@kingzacko1
Copy link
Contributor Author

While functionally this PR is at a good place to test, I have converted it to a draft while the pom inclusions are worked out so the build size doesn't increase so drastically. Just want to be sure it isn't merged before that is resolved.

@kingzacko1 kingzacko1 marked this pull request as ready for review August 16, 2022 19:42
@mpfz0r
Copy link
Member

mpfz0r commented Aug 17, 2022

@kingzacko1
I didn't have much time to look at this yet,
but just as a thought, we have S3 code in multiple places already.
Maybe we can unify and reuse some of the code?

e.g.:
https://github.com/Graylog2/graylog-plugin-enterprise/blob/master/enterprise/src/main/java/org/graylog/plugins/archive/backends/s3/AWSAuthFactory.java#L36

@kingzacko1
Copy link
Contributor Author

I definitely think it would be worthwhile to consolidate that duplicated code, but for this particular task we decided on only using the DefaultAWSCredentialsProviderChain for authentication. So I don't think there would be much reuse since the only generic S3 code we use is the creation of a default S3 client with S3Client.create(). Everything else is pretty specific to the Geo Location Processor files.

If we wanted to expand on this feature in the future and allow for more customized authentication, we could definitely reuse that AWSAuthFactory. I'd be happy to pull that class into this repo and refactor its usages in the others in another set of PRs. @mpfz0r

@danotorrey danotorrey self-assigned this Aug 17, 2022
@danotorrey
Copy link
Contributor

Thanks for all of the updates @kingzacko1! The code LGTM. I did not have a chance to test this out today unfortunately. But, I will do that first thing on Monday when I am back in the office. Github does not let me dismiss my own review, but feel free to dismiss my review if you get to the point of merging this before then.

Copy link
Contributor

@ryan-carroll-graylog ryan-carroll-graylog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Re-ran through some testing just to be safe, and all is as it should be.

@kingzacko1 kingzacko1 force-pushed the feature/pull-geoip-files-from-s3 branch from 0d45e80 to d755fd9 Compare August 19, 2022 13:43
@kingzacko1
Copy link
Contributor Author

Since @ryan-carroll-graylog and I have had our eyes all over this one for awhile now, I am going to wait until you get back in and can give this a final run through @danotorrey.

@danotorrey
Copy link
Contributor

Thanks @kingzacko1! Doing the final run-through now...

Copy link
Contributor

@danotorrey danotorrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and tests successfully. Well done @kingzacko1!

@kingzacko1 kingzacko1 merged commit 1e6a8c0 into master Aug 23, 2022
@kingzacko1 kingzacko1 deleted the feature/pull-geoip-files-from-s3 branch August 23, 2022 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants