METRON-283 Migrate Geo Enrichment outside of MySQL #421
Conversation
Expanding the Stellar GEO_GET function to allow for just returning a map of specified fields (if more than one), or just the direct value (if just one field requested) would be very nice. e.g. GEO_GET('', ['country', 'city']) returns {"country": "US", "city:"Chicago"} and GEO_GET('ip', ['county'] just returns 'US' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great...first pass through and a few comments, but I love it so far!
@@ -210,6 +210,12 @@ This product bundles some test examples from the Stix project (metron-platform/m | |||
|
|||
This product bundles wait-for-it.sh, which is available under a "MIT Software License" license. For details, see https://github.com/vishnubob/wait-for-it | |||
|
|||
------------------------------------------------------------------------------------ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you make the license explicit here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added header with license similar to the MIT one above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated with another license. The testing DB used is under a share-alike 3.0 (not 4.0) license, so I explicitly called it it and added a link to the GitHub repo.
@@ -225,6 +163,8 @@ indexing.executors=0 | |||
kafka.zk={{ zookeeper_quorum }} | |||
kafka.broker={{ kafka_brokers }} | |||
kafka.start=WHERE_I_LEFT_OFF | |||
##### GEO ##### | |||
geo.hdfs.file={{ geoip_hdfs_file }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the geo IP data file in HDFS configured in flux rather than zookeeper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't be there at all, deleted it.
@@ -81,7 +81,6 @@ public JSONObject enrich(CacheKey value) { | |||
Map<String, Object> globalConfig = value.getConfig().getConfiguration(); | |||
Map<String, Object> sensorConfig = value.getConfig().getEnrichment().getConfig(); | |||
if(handler == null) { | |||
_LOG.trace("Stellar ConfigHandler is null."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't like the trace statements there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added back in. Didn't intend to drop them at all.
@@ -161,6 +167,7 @@ protected void initializeStellar() { | |||
stellarContext = new Context.Builder() | |||
.with(Context.Capabilities.ZOOKEEPER_CLIENT, () -> client) | |||
.with(Context.Capabilities.GLOBAL_CONFIG, () -> getConfigurations().getGlobalConfig()) | |||
.with(Context.Capabilities.GEO_IP, () -> GeoLiteDatabase.INSTANCE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the INSTANCE is static, is there a reason to not just refer to it directly in the Stellar function rather than adding a capability? I always envisioned capabilities to be for configs which were changing at runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also- why is this in the GenericEnrichment bolt? Am I mistaken, or wouldn't this be better located in the GEO bolt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cestella Capability was left over from earlier testing. Everything referred to the static instance anyway for precisely that reason. Dropped it entirely and removed from the enum.
@dlyle65535 There's not a geo specific bolt, it's an instance of GenericEnrichment. To the best of my knowledge (and correct me if I'm wrong), you can't grab global configs in the adapters. This specific one gets dropped, but the prepare() method references the static INSTANCE Casey refers to. Is there somewhere that could live?
@@ -63,12 +65,15 @@ public ThreatIntelJoinBolt(String zookeeperUrl) { | |||
@Override | |||
public void prepare(Map map, TopologyContext topologyContext) { | |||
super.prepare(map, topologyContext); | |||
GeoLiteDatabase.INSTANCE.update((String)getConfigurations().getGlobalConfig().get(GeoLiteDatabase.GEO_HDFS_FILE)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can certainly stand now, but it opens the question as to whether we want to register some easier way to do this than remembering to call the right set of functions in all of the bolts that use stellar as a follow-on JIRA. Just a thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I completely agree with this, particularly because this shows up a couple times, but without a particularly good way to avoid it that I saw.
I'm definitely open to suggestions if there's anything we want to do short term to make this a little better or clearer.
As a note to anyone coming in late, at least one comment (David's) is still relevant, but hidden behind a collapsed outdated diff, because the line he commented on needed to be deleted, but the point he brought up is still in active conversation. |
Forgot to comment on it, but the GEO_GET call was updated a bit ago per my comment about lists and fields. That change should also be reviewed and given any feedback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a really nice contribution. Love that we don't need MySQL any longer. This was a ton of work. Thanks, Justin.
@@ -317,6 +316,8 @@ This package installs the Metron Profiler %{metron_home} | |||
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |||
|
|||
%changelog | |||
* Thu Jan 19 2017 Justin Leet <justinjleet@gmail.com> - 0.3.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to this PR, but I thought it was un-Apache to put individual's names and emails in the source code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not even sure who we should talk to about how it should be handled. I'm more than happy to adjust the changelog as needed, but I need to know how to change it first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is true that apache recommends strongly (or maybe outright forbids) things like author tags in source code ( see here for a discussion, the reasoning was mostly around the author tags not being accurately representative.
For this situation, however, this is a changelog, so it doesn't have the problem of accurate representation that author tags do. That being said, it is redundant information because such information is stored in git.
I'd recommend doing a dev list discussion with a subject that starts with [MENTORS] on this to see what the ASF wants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd just yank that section or leave it blank if it's required. No point in figuring out how to keep it if we're agreed we don't need it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to that @dlyle65535
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I sent out the email before I saw this, but I'm +1 on just dropping it. I'll pull it in this PR, and whatever answer we get from the mentors will just be for our info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But so we're clear- with the changelog in place you can interrogate the rpm directly to find out what's changed "rpm -q --changelog". We'd lose that capability. I'm still good with omitting it, but I wanted to make sure everyone was aware of that consequence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, you're right @dlyle65535, we do lose some capabilities by not having the changelog in there. You know what would be cool, if we could generate the spec files as part of maven's generate-source
and fill in the changelog with the information from git. Anyway, best of both worlds kinda thing, but not effort that should be done as part of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's where I'm at with it. How about we let @justinleet's mentor thread confirm/deny that it's okay to keep it and then handle it as a directed effort after?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good :)
@@ -586,7 +586,6 @@ include $RULE_PATH/community.rules | |||
# include $RULE_PATH/malware-tools.rules | |||
# include $RULE_PATH/misc.rules | |||
# include $RULE_PATH/multimedia.rules | |||
# include $RULE_PATH/mysql.rules |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that we actually want to remove the MySQL rules from Snort. These rules help Snort create alerts based on MySQL databases that may be running in the protected network.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I'll add it back in.
The shell script `$METRON_HOME/bin/geo_enrichment_load.sh` will retrieve MaxMind GeoLite2 data and load data into HDFS, and update the configuration. | ||
|
||
THIS SCRIPT WILL NOT UPDATE AMBARI'S GLOBAL.JSON, JUST THE ZK CONFIGS. CHANGES WILL GO INTO EFFECT, BUT WILL NOT PERSIST PAST AN AMBARI RESTART UNTIL UPDATED THERE. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like something that could cause a really annoying problem for a user. Is there no simple action we can take in this PR to stop Ambari from clobbering things?
Obviously, deploying/updating the geo database in an Ambari "service action" would be ideal. Would be nice to have one way to do it that does not break on restart.
Or maybe we need to accept this for now and once we have Ansible deployments using the MPack, we can implement an Ambari "service action" for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's annoying, for sure. It's a persistent problem and one of the reasons for that discuss thread about how to handle configurations. I believe the conclusion arrived at there is that in the future state, we should push changes through ambari which will then update zookeeper, thereby avoiding this unfortunate state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This gets into the whole "How do we manage configs discussion?". Unfortunately, it's in a really awkward spot. I might be able to add a service action to sorta take care of it, but it still probably does the same end around of Ambari's management, except it's going through the UI. I don't know that there's a good solution to this until we unify our config management, which is the real answer here.
Given that I don't think there was originally support for updating the db, I'm inclined to clean it up as part of unifying config management. It's ugly and I don't like it, but I can't come up with a good, short-term, alternative that isn't ugly for other reason anyway.
return null; | ||
} | ||
if(args.size() > 2) { | ||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we throw an exception? If a user misuses the function, then let's tell them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a well defined expectation for what happens in cases like this? I don't think I've seen anything, and on reflection I'm pretty sure I just based that off what other functions did.
@cestella Can you shed some light on what perceptions/expectations exist for this case? Or shame my reading comprehension by pointing me to where we have them documented, if we do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it is true that we do not tend to throw exceptions on some errors in Stellar functions for reasons that may or may not be valid. The original intent was to not stop topologies because an error has happened in a specific function. I happen to think that this isn't correct reasoning anymore, but that's a discussion for another thread.
In this specific case, that the user has passed in too many arguments, I think you could make a reasonable argument for a couple of different semantics:
- do nothing, passing in too many arguments does no harm
- throw an exception as Nick suggests
- return null
I actually think I agree in this case with Nick, I'd rather err on the side of caution and throw an exception here so we don't have people using functions wrong and not realizing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now throwing an IllegalArgumentException and adjusted the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a little tricky because we don't want a data quality issue to impact the system negatively, but at the same time failing fast can make the user experience much better.
I have been thinking about it this way. If it is a condition that can be triggered by a data quality issue, then try to be as forgiving as possible. If it is clearly a 'static' condition, like I passed the wrong number of arguments, then it is appropriate to fail-fast.
return addr.isAnyLocalAddress() || addr.isLoopbackAddress() | ||
|| addr.isSiteLocalAddress() || addr.isMulticastAddress() | ||
|| !ipvalidator.isValidInet4Address(ipStr); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Method name may be a little misleading. It is more answering the question of whether an IP is eligible for geo-enrichment; not whether it is a local address.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, I'll go ahead and change the method name.
"fields - Optional list of GeoIP fields to grab. Options are locID, country, city, postalCode, dmaCode, latitude, longitude, location_point" | ||
} | ||
,returns = "If a Single field is requested a string of the field, If multiple fields a map of string of the fields, and null otherwise" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems useful. Would we ever deprecate the current way of doing enrichment in favor of just using Stellar and a function like this? May take some more work, but that might simplify things.
(Comment for future state; obviously outside the scope of your PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Outside of the scope of this PR, I agree, but I just wanted to chime in and agree with the sentiment. Now that we have geo IP enrichment, honestly, stellar enrichments aren't missing any features that the other enrichments have. Worthy of a discussion, for sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to kick off a discuss thread for this? Seems like we might as well start a discussion, rather than potentially having it buried here. And the discussion is useful even before everyone's set with this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Will do.
@@ -149,9 +154,10 @@ public JSONObject load(CacheKey key) throws Exception { | |||
cache = CacheBuilder.newBuilder().maximumSize(maxCacheSize) | |||
.expireAfterWrite(maxTimeRetain, TimeUnit.MINUTES) | |||
.build(loader); | |||
GeoLiteDatabase.INSTANCE.update((String)getConfigurations().getGlobalConfig().get(GeoLiteDatabase.GEO_HDFS_FILE)); | |||
boolean success = adapter.initializeAdapter(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that we will attempt to load the geo database into memory in every GenericEnrichmentBolt
; even ones not doing geo-enrichment? Maybe that is one reason, we have singleton for the GeoLiteDatabase; to avoid repeated initialization?
Along the same lines, If I choose to not do geo-enrichment, do I still need to have a valid Maxmind file in HDFS? Or would that cause all GenericEnrichmentBolts to fail to initiailize?
Would it make sense to update the EnrichmentAdapter interface to accept configuration values, so that this initialization occurs in the GeoAdapter where it makes more sense? Then only the bolts doing geo-enrichment attempt to initialize the geo data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right that if we adjust the enrichment adapter to accept configuration values, this can be pushed to the GeoAdapter, where I agree it makes more sense.
Are there concerns over changing the interface method directly, or would we prefer to give EnrichmentAdapter
a initializeAdapter(Map<String, Object> config)
with default impl that calls initializeAdapter()
with no args? The second option makes it a bit ugly that GeoAdapter
still has the original initializeAdapter()
, but has the benefit of keeping the interface backwards compatible to anybody with custom adapters (Is this a thing that people have?).
And you are correct that there is an issue with initializing a missing DB (Great catch!). If we make this change, I believe the only time you'd have an issue is if you set up a geo enrichment and didn't actually set up the DB. Which is an entirely reasonable issue to have an error with.
@dlyle65535 Would this address your concerns about having the init in the GenericEnrichmentBolt in a satisfactory manner?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@justinleet - I think that's a great solution, thanks!
import java.util.zip.GZIPInputStream; | ||
|
||
public enum GeoLiteDatabase { | ||
INSTANCE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you go with a singleton? I'm sure you thought through this and there are good reasons, so just want to understand. I just don't like the extra burden that using a singleton brings on (like the extra locking).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I'll defend the singleton use:
- We can reuse the same database across multiple storm tasks on the same storm worker
- Even without a singleton, you'd have to deal with the locking in the stellar function since those are effectively static. i.e. You're reading a database that could be being updated/replaced.
I'll let Justin respond with his reasons, but those are the ones that came to my head.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's primary because the database file could be used and accessed by multiple threads of execution, depending on Storm's parallelism. If multiple tasks are running, they could potentially start stomping on each other's data.
The burden shouldn't be bad, because it primarily consists of reading. The alternative is having to manage potentially every thread pulling down it's own version of the file and loading it while avoiding tripping up other threads.
Having said that, I am definitely open to other ideas if we have good alternatives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to look over the locking code one more time.
try { | ||
readLock.lock(); | ||
addr = InetAddress.getByName(ip); | ||
CityResponse cityResponse = reader.city(addr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to constrain the scope of the read lock? Once we have a CityResponse
object we don't need the lock any longer.
} | ||
|
||
// Always update if we don't have a DatabaseReader | ||
if (reader == null || !hdfsLoc.equals(hdfsFile)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this null check need to be protected by a lock? I am really not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UpdateIfNecessary
is actually synchronized, so it's already locked. Given that updates are rare and this is only run on global configs being updated, impact will be very low (since it's called very rarely).
Updated to replace This removes the direct dependency of I'll want to spin it up again to further validate, but this addresses @nickwallen's catch on topologies not using geo needing geo data to exist and @dlyle65535's concern about the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. +1
I rant it up on "Quick Dev" and geo-enrichment was working as expected.
+1 as well, great job, Justin. |
Ditto +1. Thanks for all the hard work! |
@cestella @dlyle65535 @nickwallen I added one more change to |
+1 looks good |
Still +1 @justinleet - thanks for checking. |
reaffirmed +1 |
MySQL Removed
Drops MySQL entirely from the project. This is done for a couple reasons outlined in a discussion thread on the dev lists. They boil down to a combination of licensing, eliminating a single point of failure, and using MaxMind's libraries for handling GeoLite2 data, and a couple other concerns.
This PR includes dependencies, installation paths, READMEs, etc. The only places left are MySQLConfig and MySQLConfigTest for if anyone wants to use them. The vast majority of removed files / code are simply from stripping this out. If any traces of MySQL outside of this are found in review, they should almost certainly be removed.
This moves to a system based on using MaxMind's binary DB directly. Per their page:
Our LICENSE file has been updated with the notification that we include use their database (and some of their test data, which is CCA ShareAlike 3.0). Both of these licenses are acceptable for us as stated on Apache Legal.
Let me know if that notification should be in a different spot or spots, and I can adjust it appropriately.
GeoLite2 Database
The main portion of the PR is in
GeoLiteDatabaseTest.java
, which manages access to the GeoLite2 database.Raw Database
The raw database is stored on HDFS. By default, it will be in
/apps/metron/geo
. If no explicit location is given,/apps/metron/geo/default/<db_filename>
will be used. Otherwise, updates will use/apps/metron/geo/<millis>/<db_filename>
. Given the low rate of churn on the DB (updated once per week) and the potential for replay use cases , I haven't implemented any pruning or anything fancy on top of this.Updating DB
A script is provided for updates
geo_enrichment_load.sh
inmetron-data-management
. Usage details are provided inmetron-data-management/README.md
. Note that the original didn't appear to have update capabilities,The script will pull down a new instance of GeoLite2 database. This location can be either their standard web address (or somewhere else hosted), or even a file:// URL. Once the db file is pulled down, it will push to the appropriate HDFS location. Finally, it will pull down and update the global config with the new location. This will not require a topology restart.
Note that there have been conversations about how we manage config updates (specifically leaning towards Ambari). This has not been finalized, and we have two non Ambari testing environments (quickdev and docker-metron) so this just hits ZK. Ambari is not updated based on this script, and it is the user's responsibility to update global.json.
This leads to a questions people may have preferences on
Code
It is a singleton that allows for the database to be updated when a global config is updated. It is (hopefully!) correctly locked to avoid threading issues when updating or reading from the DB (and I've been able to update without issues.
The various Bolts have been updated to make sure they initialize the adapter to have it grab the current data appropriately.
In addition, a Stellar function has been provided GEO_GET(), which takes an IPV4 address. It probably works with an IPV6 address, but I didn't really dig into it, given that the goal was to initially match parity.
Given the somewhat core nature of this, and my relative unfamiliarity going in with how all these pieces tie together, I'm definitely looking for feedback on how things are implemented, or if I missed conventions we've used in the code.
Testing
Unit testing is added for the database and Stellar portions of the code as needed. The DB testing uses one of MaxMind's test DB's that they've published, because we can't create the binary format correctly. It does not use the full (20+ MB) version of the data, but rather a stripped down version (on the order of several KB).
Three environments were tested during this. Having these three disparate environments make features that cut across like this more complicated to test, so additional scrutiny would be merited (I would definitely like at least one person to run through one of these themselves and make sure it's transparent). Notably, quickdev requires Ansible setup scripts to align; the mpack requires layout, internal configuration, and handling of additional files and ownership of scripts to work properly; and docker-metron requires essentially cheating the scripts and just running a wget on the file because things aren't actually setup.
QuickDev
Ansible scripts are updated. Running data through topologies kicked out the data.
Ambari Management Pack
RPMs updated where needed. Config Screen layout changed, updates made to properly handle configs and ownership. Ran Stellar on this install. Again, ran data through the topology.
Docker
Essentially this just involved cheating the scripts and running a wget on the GeoLite2 dbfile, because there's no Hadoop. Ran through the instructions to run the topologies (which are a little different than the others because Docker) and again was able to get data back out.
Additional Notes
IncreasedOutside issue (and code is reverted in this PR), see: https://issues.apache.org/jira/browse/METRON-672.withMaxTimeMS
in the indexing integration test. This seems unrelated to my changes ( and I believe had been seen elsewhere), so if anybody has found the root cause, I can adjust my code appropriately.is_anonymous_proxy
key, etc. I didn't validate if everything new is in the free version of the dataset.