Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

METRON-2149: Shaded jar classifier is not consistent #1436

Closed
wants to merge 9 commits into from

Conversation

merrimanr
Copy link
Contributor

@merrimanr merrimanr commented Jun 3, 2019

Contributor Comments

This PR updates the appropriate Maven pom.xml files to use a classifier when building shaded jars. After this change all shaded jars will follow the pattern ${project.artifactId}-${project.version}-uber.jar whereas before some jars would follow this pattern while others would follow ${project.artifactId}-${project.version}.jar.

I believe this will provide a substantial benefit to the project because it will alleviate 2 significant issues. When a module that uses the shaded plugin without a classifier is added to another module as a dependency:

  1. Any Maven excludes added to that dependency are ignored
  2. The Maven dependency:tree tool does not accurately report the transitive dependencies pulled in by that dependency

These issues make it extremely difficult to troubleshoot and resolve classpath version problems.

Changes Included

  • Appropriate modules were updated to use a classifier
  • Scripts and RPM spec files were updated with the new jar names
  • New classpath version issues introduced with this change were fixed
  • Minor formatting issues were fixed in a couple pom files
  • All stellar-common maven dependencies were changed to provided
  • Various scripts were updated to add stellar-common to the classpath separately
  • The metron-data-management pom file needed a plugin to include stellar as an external dependency

Testing

Since this PR effectively changes the contents and ordering of dependencies in our jars, exhaustive regressing testing is needed. I have started this and will update these instructions I get think of more and get further along. Please feel free to add other tests. These instructions have all been performed on full dev.

Smoke Test

This is a quick test to verify things are working normally:

  • Make sure sensor data flows to the Alerts UI
  • Make sure there are no errors in Storm or the error ES Index

Typosquat Test

The use case is detailed here.

I had to perform these extra steps:

  • Install wget with yum install -y wget
  • Increase the OBJECT_GET cache size in the global config:
curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' -d '{
  "es.clustername": "metron",
  "es.ip": "node1:9200",
  "es.date.format": "yyyy.MM.dd.HH",
  "parser.error.topic": "indexing",
  "update.hbase.table": "metron_update",
  "update.hbase.cf": "t",
  "es.client.settings": {},
  "profiler.client.period.duration": "15",
  "profiler.client.period.duration.units": "MINUTES",
  "enrichment.list.hbase.provider.impl": "org.apache.metron.hbase.HTableProvider",
  "enrichment.list.hbase.table": "enrichment_list",
  "enrichment.list.hbase.cf": "t",
  "user.settings.hbase.table": "user_settings",
  "user.settings.hbase.cf": "cf",
  "bootstrap.servers": "node1:6667",
  "source.type.field": "source:type",
  "threat.triage.score.field": "threat:triage:score",
  "enrichment.writer.batchSize": "15",
  "enrichment.writer.batchTimeout": "0",
  "profiler.writer.batchSize": "15",
  "profiler.writer.batchTimeout": "0",
  "geo.hdfs.file": "/apps/metron/geo/default/GeoLite2-City.tar.gz",
  "asn.hdfs.file": "/apps/metron/asn/default/GeoLite2-ASN.tar.gz",
"object.cache.max.file.size": 10485760
}' 'http://user:password@node1:8082/api/v1/global/config'

REST Test

I issued several requests against REST and verified there were no errors:

curl -X POST --header 'Content-Type: application/json' --header 'Accept: */*' -d '{
  "facetFields": [
    "test"
  ]
}' 'http://user:password@node1:8082/api/v1/alerts/ui/settings'

curl -X GET --header 'Accept: application/json' 'http://user:password@node1:8082/api/v1/alerts/ui/settings'

curl -X GET --header 'Accept: application/json' 'http://user:password@node1:8082/api/v1/global/config'

curl -X GET --header 'Accept: application/json' 'http://user:password@node1:8082/api/v1/hdfs/list?path=%2Fapps%2Fmetron'

curl -X GET --header 'Accept: text/plain' 'http://user:password@node1:8082/api/v1/kafka/topic/indexing/sample'

curl -X GET --header 'Accept: application/json' 'http://user:password@node1:8082/api/v1/pcap?state=RUNNING'

curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' -d '{
  "from": 0,
  "indices": [
    "bro"
  ],
  "query": "*",
  "size": 5
}' 'http://user:password@node1:8082/api/v1/search/search'

curl -X GET --header 'Accept: application/json' 'http://user:password@node1:8082/api/v1/sensor/enrichment/config/bro'

curl -X GET --header 'Accept: application/json' 'http://user:password@node1:8082/api/v1/sensor/enrichment/config/bro'

curl -X GET --header 'Accept: application/json' 'http://user:password@node1:8082/api/v1/sensor/parser/config/bro'

curl -X GET --header 'Accept: application/json' 'http://user:password@node1:8082/api/v1/sensor/enrichment/config/bro'

curl -X GET --header 'Accept: application/json' 'http://user:password@node1:8082/api/v1/sensor/indexing/config/bro'

curl -X GET --header 'Accept: application/json' 'http://user:password@node1:8082/api/v1/storm'

curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' -d '{
  "sampleData": { "field": "value" },
  "sensorParserConfig": {
    "fieldTransformations": [
      {
        "config": { 
          "output": "TO_UPPER(field)"
        },
        "input": [ "field" ],
        "output": ["output"],
        "transformation": "STELLAR"
      }
    ]
  }
}' 'http://user:password@node1:8082/api/v1/stellar/apply/transformations'

curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' -d '{
  "comment": "test",
  "guid": "6a3c4270-fbc1-4ad6-b168-40128a9f0060",
  "sensorType": "bro"
}' 'http://user:password@node1:8082/api/v1/update/add/comment'

Profiler Test

I followed the instructions here and verified everything worked correctly.

Zeppelin Test

I followed the instructions here and verified everything worked correctly.

Pull Request Checklist

Thank you for submitting a contribution to Apache Metron.
Please refer to our Development Guidelines for the complete guide to follow for contributions.
Please refer also to our Build Verification Guidelines for complete smoke testing guides.

In order to streamline the review of the contribution we ask you follow these guidelines and ask you to double check the following:

For all changes:

  • Is there a JIRA ticket associated with this PR? If not one needs to be created at Metron Jira.
  • Does your PR title start with METRON-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
  • Has your PR been rebased against the latest commit within the target branch (typically master)?

For code changes:

  • Have you included steps to reproduce the behavior or problem that is being changed or addressed?

  • Have you included steps or a guide to how the change may be verified and tested manually?

  • Have you ensured that the full suite of tests and checks have been executed in the root metron folder via:

    mvn -q clean integration-test install && dev-utilities/build-utils/verify_licenses.sh 
    
  • Have you written or updated unit tests and or integration tests to verify your changes?

  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?

  • Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent?

For documentation related changes:

  • Have you ensured that format looks appropriate for the output in which it is rendered by building and verifying the site-book? If not then run the following commands and the verify changes via site-book/target/site/index.html:

    cd site-book
    mvn site
    
  • Have you ensured that any documentation diagrams have been updated, along with their source files, using draw.io? See Metron Development Guidelines for instructions.

Note:

Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.
It is also recommended that travis-ci is set up for your personal repository such that your branches are built there before submitting a pull request.

@mmiklavc
Copy link
Contributor

mmiklavc commented Jun 3, 2019

I did a crude grep through our Stellar function classes looking for non-java, non-metron dependencies, and this is what I came up with. This is a hit-list of functions that are probably worth a test of some sort or another to verify we haven't broken anything. It might be worth adding a more robust automated test at some point that we can spin up as a topology and include in release validation.

EDIT - even this list is not sufficient because in instances of more complex functions we don't have a direct dependency on the 3rd party jar. E.g. HLLP uses helper classes, and those don't show up here.

for file in $(grep -lR --include \*.java "@Stellar" .)
do 
    if grep "^import" $file | grep -v " java\| org.apache.metron";
    then
        echo "    $file matches"
        echo "====================="
    fi
done > /tmp/matchlist.txt


import org.apache.commons.validator.routines.LongValidator;
    ./metron-platform/metron-common/src/main/java/org/apache/metron/common/field/validation/primitive/IntegerValidation.java matches
=====================
import org.apache.commons.validator.routines.EmailValidator;
    ./metron-platform/metron-common/src/main/java/org/apache/metron/common/field/validation/network/EmailValidation.java matches
=====================
import org.apache.commons.validator.routines.DomainValidator;
    ./metron-platform/metron-common/src/main/java/org/apache/metron/common/field/validation/network/DomainValidation.java matches
=====================
import org.apache.commons.validator.routines.InetAddressValidator;
    ./metron-platform/metron-common/src/main/java/org/apache/metron/common/field/validation/network/IPValidation.java matches
=====================
import org.apache.commons.validator.routines.UrlValidator;
    ./metron-platform/metron-common/src/main/java/org/apache/metron/common/field/validation/network/URLValidation.java matches
=====================
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
    ./metron-platform/metron-enrichment/metron-enrichment-common/src/main/java/org/apache/metron/enrichment/stellar/GeoEnrichmentFunctions.java matches
=====================
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
    ./metron-platform/metron-enrichment/metron-enrichment-common/src/main/java/org/apache/metron/enrichment/stellar/AsnEnrichmentFunctions.java matches
=====================
import com.google.common.cache.CacheBuilder;
import com.google.common.cache.CacheLoader;
import com.google.common.cache.LoadingCache;
import org.apache.commons.io.IOUtils;
import org.apache.commons.lang3.StringUtils;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
    ./metron-platform/metron-enrichment/metron-enrichment-common/src/main/java/org/apache/metron/enrichment/stellar/ObjectGet.java matches
=====================
import ch.hsr.geohash.WGS84Point;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
    ./metron-platform/metron-enrichment/metron-enrichment-common/src/main/java/org/apache/metron/enrichment/stellar/GeoHashFunctions.java matches
=====================
import com.google.common.cache.Cache;
import com.google.common.cache.CacheBuilder;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.HTableInterface;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
    ./metron-platform/metron-enrichment/metron-enrichment-common/src/main/java/org/apache/metron/enrichment/stellar/SimpleHBaseEnrichmentFunctions.java matches
=====================
import com.fasterxml.jackson.core.JsonProcessingException;
import org.apache.curator.framework.CuratorFramework;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
    ./metron-platform/metron-management/src/main/java/org/apache/metron/management/ConfigurationFunctions.java matches
=====================
import com.jakewharton.fliptables.FlipTable;
import org.apache.commons.io.IOUtils;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.LocalFileSystem;
import org.apache.hadoop.fs.Path;
import org.slf4j.LoggerFactory;
    ./metron-platform/metron-management/src/main/java/org/apache/metron/management/FileSystemFunctions.java matches
=====================
import com.jakewharton.fliptables.FlipTable;
import oi.thekraken.grok.api.Grok;
import oi.thekraken.grok.api.Match;
import oi.thekraken.grok.api.exception.GrokException;
import org.slf4j.LoggerFactory;
    ./metron-platform/metron-management/src/main/java/org/apache/metron/management/GrokFunctions.java matches
=====================
import org.apache.commons.lang3.exception.ExceptionUtils;
import org.apache.curator.framework.CuratorFramework;
    ./metron-platform/metron-management/src/main/java/org/apache/metron/management/ParserFunctions.java matches
=====================
import com.fasterxml.jackson.core.JsonProcessingException;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
    ./metron-platform/metron-management/src/main/java/org/apache/metron/management/IndexingConfigFunctions.java matches
=====================
import com.fasterxml.jackson.core.JsonProcessingException;
import com.jakewharton.fliptables.FlipTable;
import org.apache.commons.collections4.ListUtils;
import org.apache.commons.lang3.ClassUtils;
import org.json.simple.JSONObject;
import org.json.simple.parser.JSONParser;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
    ./metron-platform/metron-management/src/main/java/org/apache/metron/management/ThreatTriageFunctions.java matches
=====================
import org.apache.commons.collections4.CollectionUtils;
import org.apache.commons.lang3.ClassUtils;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.clients.producer.RecordMetadata;
import org.apache.kafka.common.PartitionInfo;
import org.apache.kafka.common.TopicPartition;
import org.apache.kafka.common.serialization.StringDeserializer;
import org.apache.kafka.common.serialization.StringSerializer;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
    ./metron-platform/metron-management/src/main/java/org/apache/metron/management/KafkaFunctions.java matches
=====================
import com.fasterxml.jackson.core.JsonProcessingException;
import com.jakewharton.fliptables.FlipTable;
import org.slf4j.LoggerFactory;
    ./metron-platform/metron-management/src/main/java/org/apache/metron/management/ParserConfigFunctions.java matches
=====================
import com.fasterxml.jackson.core.JsonProcessingException;
import com.jakewharton.fliptables.FlipTable;
import org.slf4j.LoggerFactory;
    ./metron-platform/metron-management/src/main/java/org/apache/metron/management/EnrichmentConfigFunctions.java matches
=====================
import com.github.benmanes.caffeine.cache.Cache;
import com.github.benmanes.caffeine.cache.Caffeine;
    ./metron-analytics/metron-profiler-client/src/main/java/org/apache/metron/profiler/client/stellar/WindowLookback.java matches
=====================
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.HTableInterface;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
    ./metron-analytics/metron-profiler-client/src/main/java/org/apache/metron/profiler/client/stellar/GetProfile.java matches
=====================
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.HTableInterface;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
    ./metron-analytics/metron-profiler-client/src/main/java/org/apache/metron/profiler/client/stellar/VerboseProfile.java matches
=====================
import org.apache.commons.lang3.ClassUtils;
import org.json.simple.JSONArray;
import org.json.simple.JSONObject;
import org.json.simple.parser.JSONParser;
import org.slf4j.LoggerFactory;
    ./metron-analytics/metron-profiler-repl/src/main/java/org/apache/metron/profiler/repl/ProfilerFunctions.java matches
=====================
import com.google.common.collect.ImmutableList;
    ./metron-analytics/metron-statistics/src/main/java/org/apache/metron/statistics/StellarStatisticsFunctions.java matches
=====================
import com.codahale.metrics.Reservoir;
    ./metron-analytics/metron-statistics/src/main/java/org/apache/metron/statistics/sampling/SamplingOpsFunctions.java matches
=====================
import com.google.common.cache.Cache;
import com.google.common.cache.CacheBuilder;
import org.apache.curator.framework.CuratorFramework;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
    ./metron-analytics/metron-maas-common/src/main/java/org/apache/metron/maas/functions/MaaSFunctions.java matches
=====================
import static org.hamcrest.CoreMatchers.equalTo;
import static org.hamcrest.CoreMatchers.instanceOf;
import static org.junit.Assert.assertThat;
import com.google.common.base.Joiner;
import com.google.common.collect.ImmutableMap;
import com.google.common.collect.ImmutableSet;
import org.apache.commons.lang3.StringUtils;
import org.junit.Assert;
import org.junit.Ignore;
import org.junit.Rule;
import org.junit.Test;
import org.junit.rules.ExpectedException;
    ./metron-stellar/stellar-common/src/test/java/org/apache/metron/stellar/dsl/functions/BasicStellarTest.java matches
=====================
import static org.hamcrest.CoreMatchers.equalTo;
import static org.junit.Assert.assertThat;
import org.junit.Assert;
import org.junit.Before;
import org.junit.Rule;
import org.junit.Test;
import org.junit.rules.ExpectedException;
    ./metron-stellar/stellar-common/src/test/java/org/apache/metron/stellar/dsl/functions/resolver/BaseFunctionResolverTest.java matches
=====================
import com.google.common.collect.Lists;
import org.junit.Assert;
import org.junit.Before;
import org.junit.Test;
    ./metron-stellar/stellar-common/src/test/java/org/apache/metron/stellar/dsl/functions/resolver/SimpleFunctionResolverTest.java matches
=====================
import com.google.common.collect.Iterables;
    ./metron-stellar/stellar-common/src/main/java/org/apache/metron/stellar/dsl/functions/DataStructureFunctions.java matches
=====================
import com.google.common.collect.Lists;
    ./metron-stellar/stellar-common/src/main/java/org/apache/metron/stellar/dsl/functions/FunctionalFunctions.java matches
=====================
import org.apache.commons.lang.StringUtils;
    ./metron-stellar/stellar-common/src/main/java/org/apache/metron/stellar/dsl/functions/EncodingFunctions.java matches
=====================
import org.apache.commons.lang3.StringUtils;
    ./metron-stellar/stellar-common/src/main/java/org/apache/metron/stellar/dsl/functions/RegExFunctions.java matches
=====================
import com.github.benmanes.caffeine.cache.CacheLoader;
import com.github.benmanes.caffeine.cache.Caffeine;
import com.github.benmanes.caffeine.cache.LoadingCache;
    ./metron-stellar/stellar-common/src/main/java/org/apache/metron/stellar/dsl/functions/DateFunctions.java matches
=====================
import org.apache.commons.io.IOUtils;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.http.HttpEntity;
import org.apache.http.HttpHost;
import org.apache.http.auth.AuthScope;
import org.apache.http.auth.UsernamePasswordCredentials;
import org.apache.http.client.CredentialsProvider;
import org.apache.http.client.config.RequestConfig;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.client.protocol.HttpClientContext;
import org.apache.http.impl.client.BasicCredentialsProvider;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.impl.conn.PoolingHttpClientConnectionManager;
import org.apache.http.util.EntityUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
    ./metron-stellar/stellar-common/src/main/java/org/apache/metron/stellar/dsl/functions/RestFunctions.java matches
=====================
import com.google.common.collect.Iterables;
    ./metron-stellar/stellar-common/src/main/java/org/apache/metron/stellar/dsl/functions/SetFunctions.java matches
=====================
import com.google.common.collect.Iterables;
    ./metron-stellar/stellar-common/src/main/java/org/apache/metron/stellar/dsl/functions/MapFunctions.java matches
=====================
import org.apache.commons.codec.EncoderException;
    ./metron-stellar/stellar-common/src/main/java/org/apache/metron/stellar/dsl/functions/HashFunctions.java matches
=====================
import com.jakewharton.fliptables.FlipTable;
import org.apache.commons.io.IOUtils;
import org.apache.commons.lang3.text.WordUtils;
import org.jboss.aesh.console.Console;
import org.slf4j.LoggerFactory;
    ./metron-stellar/stellar-common/src/main/java/org/apache/metron/stellar/dsl/functions/ShellFunctions.java matches
=====================
import com.google.common.collect.Iterables;
    ./metron-stellar/stellar-common/src/main/java/org/apache/metron/stellar/dsl/functions/OrdinalFunctions.java matches
=====================
import com.google.common.collect.ImmutableList;
import org.apache.commons.lang.StringUtils;
import org.apache.commons.text.similarity.FuzzyScore;
    ./metron-stellar/stellar-common/src/main/java/org/apache/metron/stellar/dsl/functions/TextFunctions.java matches
=====================
import com.fasterxml.jackson.core.JsonProcessingException;
import com.google.common.base.Joiner;
import com.google.common.base.Splitter;
import com.google.common.collect.Iterables;
import org.apache.commons.lang3.StringUtils;
    ./metron-stellar/stellar-common/src/main/java/org/apache/metron/stellar/dsl/functions/StringFunctions.java matches
=====================
import com.google.common.base.Joiner;
import com.google.common.base.Splitter;
import com.google.common.collect.Iterables;
import com.google.common.net.InternetDomainName;
import org.apache.commons.lang3.StringUtils;
import org.apache.commons.net.util.SubnetUtils;
    ./metron-stellar/stellar-common/src/main/java/org/apache/metron/stellar/dsl/functions/NetworkFunctions.java matches
=====================

Just the files without the import context (there are currently 43 that I've found)

./metron-platform/metron-common/src/main/java/org/apache/metron/common/field/validation/primitive/IntegerValidation.java
./metron-platform/metron-common/src/main/java/org/apache/metron/common/field/validation/network/EmailValidation.java
./metron-platform/metron-common/src/main/java/org/apache/metron/common/field/validation/network/DomainValidation.java
./metron-platform/metron-common/src/main/java/org/apache/metron/common/field/validation/network/IPValidation.java
./metron-platform/metron-common/src/main/java/org/apache/metron/common/field/validation/network/URLValidation.java
./metron-platform/metron-enrichment/metron-enrichment-common/src/main/java/org/apache/metron/enrichment/stellar/GeoEnrichmentFunctions.java
./metron-platform/metron-enrichment/metron-enrichment-common/src/main/java/org/apache/metron/enrichment/stellar/AsnEnrichmentFunctions.java
./metron-platform/metron-enrichment/metron-enrichment-common/src/main/java/org/apache/metron/enrichment/stellar/ObjectGet.java
./metron-platform/metron-enrichment/metron-enrichment-common/src/main/java/org/apache/metron/enrichment/stellar/GeoHashFunctions.java
./metron-platform/metron-enrichment/metron-enrichment-common/src/main/java/org/apache/metron/enrichment/stellar/SimpleHBaseEnrichmentFunctions.java
./metron-platform/metron-management/src/main/java/org/apache/metron/management/ConfigurationFunctions.java
./metron-platform/metron-management/src/main/java/org/apache/metron/management/FileSystemFunctions.java
./metron-platform/metron-management/src/main/java/org/apache/metron/management/GrokFunctions.java
./metron-platform/metron-management/src/main/java/org/apache/metron/management/ParserFunctions.java
./metron-platform/metron-management/src/main/java/org/apache/metron/management/IndexingConfigFunctions.java
./metron-platform/metron-management/src/main/java/org/apache/metron/management/ThreatTriageFunctions.java
./metron-platform/metron-management/src/main/java/org/apache/metron/management/KafkaFunctions.java
./metron-platform/metron-management/src/main/java/org/apache/metron/management/ParserConfigFunctions.java
./metron-platform/metron-management/src/main/java/org/apache/metron/management/EnrichmentConfigFunctions.java
./metron-analytics/metron-profiler-client/src/main/java/org/apache/metron/profiler/client/stellar/WindowLookback.java
./metron-analytics/metron-profiler-client/src/main/java/org/apache/metron/profiler/client/stellar/GetProfile.java
./metron-analytics/metron-profiler-client/src/main/java/org/apache/metron/profiler/client/stellar/VerboseProfile.java
./metron-analytics/metron-profiler-repl/src/main/java/org/apache/metron/profiler/repl/ProfilerFunctions.java
./metron-analytics/metron-statistics/src/main/java/org/apache/metron/statistics/StellarStatisticsFunctions.java
./metron-analytics/metron-statistics/src/main/java/org/apache/metron/statistics/sampling/SamplingOpsFunctions.java
./metron-analytics/metron-maas-common/src/main/java/org/apache/metron/maas/functions/MaaSFunctions.java
./metron-stellar/stellar-common/src/test/java/org/apache/metron/stellar/dsl/functions/BasicStellarTest.java
./metron-stellar/stellar-common/src/test/java/org/apache/metron/stellar/dsl/functions/resolver/BaseFunctionResolverTest.java
./metron-stellar/stellar-common/src/test/java/org/apache/metron/stellar/dsl/functions/resolver/SimpleFunctionResolverTest.java
./metron-stellar/stellar-common/src/main/java/org/apache/metron/stellar/dsl/functions/DataStructureFunctions.java
./metron-stellar/stellar-common/src/main/java/org/apache/metron/stellar/dsl/functions/FunctionalFunctions.java
./metron-stellar/stellar-common/src/main/java/org/apache/metron/stellar/dsl/functions/EncodingFunctions.java
./metron-stellar/stellar-common/src/main/java/org/apache/metron/stellar/dsl/functions/RegExFunctions.java
./metron-stellar/stellar-common/src/main/java/org/apache/metron/stellar/dsl/functions/DateFunctions.java
./metron-stellar/stellar-common/src/main/java/org/apache/metron/stellar/dsl/functions/RestFunctions.java
./metron-stellar/stellar-common/src/main/java/org/apache/metron/stellar/dsl/functions/SetFunctions.java
./metron-stellar/stellar-common/src/main/java/org/apache/metron/stellar/dsl/functions/MapFunctions.java
./metron-stellar/stellar-common/src/main/java/org/apache/metron/stellar/dsl/functions/HashFunctions.java
./metron-stellar/stellar-common/src/main/java/org/apache/metron/stellar/dsl/functions/ShellFunctions.java
./metron-stellar/stellar-common/src/main/java/org/apache/metron/stellar/dsl/functions/OrdinalFunctions.java
./metron-stellar/stellar-common/src/main/java/org/apache/metron/stellar/dsl/functions/TextFunctions.java
./metron-stellar/stellar-common/src/main/java/org/apache/metron/stellar/dsl/functions/StringFunctions.java
./metron-stellar/stellar-common/src/main/java/org/apache/metron/stellar/dsl/functions/NetworkFunctions.java

@merrimanr
Copy link
Contributor Author

As I worked through resolving the final failing test I realized there is a use case that is not properly handled by the original changes in this PR. There are 2 versions of guava required by different classes (Stellar and HBase testing utility) so we need a way to relocate one of them. It's not possible to do this without depending on a shaded module because transitive dependencies have already been resolved (meaning only 1 version remains) by the time the final shaded jar is built. Relocating at this point just relocates the single remaining version.

At this point I want to summarize my findings and present some options. Here are the requirements I see:

  1. Transitive dependency resolution should be predictable and easy to troubleshoot. Maven configuration settings (excludes, etc) should work as expected.
  2. Versions reported by mvn dependency:tree should match what's included in the uber jar.
  3. There should be a well understood and robust strategy for relocating classes that conflict.

Using classifiers on all modules solves 1 and 2 but does not support 3 as described above. We currently support 3 but not 1 and 2 because lower level modules (like metron-common) do not use a classifier. This means other modules that depend on it inherit relocated classes. The problem is that transitive dependencies from these modules overwrite other dependencies, making it harder to determine which versions end up in the final uber jar.

To get to a point where we can satisfy all 3 requirements above, I can think of a couple options. Both of these options are based on the assumption that most class version conflicts involve Stellar classes. Stellar contains most of our business logic and contains a long list of dependencies, including several that commonly conflict with other projects (guava, log4j, jackson, etc). The idea behind both of these options involves isolating Stellar from the rest of the project. Here they are:

  1. Make Stellar the exception and remove the classifier on stellar-common. This module would be the only one that does this. This satisfies 3 as long as code requiring different class versions is located in this module. This means we may need to move classes into this module (or do this with other modules too). To satisfy 1 and 2 we would need to ensure we are rewriting ALL transitive dependencies or tolerate relocating classes as we run into issues. The advantage of this approach is there would still be a single uber jar so changes to scripts and classpath setup would not change. The disadvantage is there is still the risk of transitive dependencies leaking into the main uber jar.

  2. Deploy Stellar code in a separate jar and add it to classpath after the main uber jar, whatever that is (metron-data-management, metron-enrichment-storm, etc). This satisfies 3 because the separate Stellar jar can contain the relocated classes but other dependencies will not overwrite dependencies of the main uber jar (because it's listed after the main uber jar). 1 and 2 are not a concern when classifiers are used which is the case here. The main disadvantage I see is that there will be work adding this extra jar in the various scripts or startup options and we may have to reorganize some classes.

I tested both options and was able to get both working for the these use cases:

This is all fairly complex so if anything is not clear I can elaborate. Are there other options I'm not thinking of? Thoughts?

@mmiklavc
Copy link
Contributor

mmiklavc commented Jul 2, 2019

To get to a point where we can satisfy all 3 requirements above, I can think of a couple options. Both of these options are based on the assumption that most class version conflicts involve Stellar classes. Stellar contains most of our business logic and contains a long list of dependencies, including several that commonly conflict with other projects (guava, log4j, jackson, etc). The idea behind both of these options involves isolating Stellar from the rest of the project. Here they are:

  1. Make Stellar the exception and remove the classifier on stellar-common. This module would be the only one that does this. This satisfies 3 as long as code requiring different class versions is located in this module. This means we may need to move classes into this module (or do this with other modules too). To satisfy 1 and 2 we would need to ensure we are rewriting ALL transitive dependencies or tolerate relocating classes as we run into issues. The advantage of this approach is there would still be a single uber jar so changes to scripts and classpath setup would not change. The disadvantage is there is still the risk of transitive dependencies leaking into the main uber jar.
  2. Deploy Stellar code in a separate jar and add it to classpath after the main uber jar, whatever that is (metron-data-management, metron-enrichment-storm, etc). This satisfies 3 because the separate Stellar jar can contain the relocated classes but other dependencies will not overwrite dependencies of the main uber jar (because it's listed after the main uber jar). 1 and 2 are not a concern when classifiers are used which is the case here. The main disadvantage I see is that there will be work adding this extra jar in the various scripts or startup options and we may have to reorganize some classes.

This is all fairly complex so if anything is not clear I can elaborate. Are there other options I'm not thinking of? Thoughts?

I really like your 3 proposed main requirements - those sound reasonable to me, with varying degrees of importance depending on how you look at them. I'd say that 3 is the most important in practice, with 1 and 2 supporting that goal.

First big question I have is what about Stellar functions in metron-analytics? As I pointed out in this comment those are not core Stellar functions, but I suspect there's potential trouble there as well.

Other thoughts

Would it make sense to shade and relocate all of the core libs for Stellar and mark all others as scope provided? The litmus test here might be similar to how this is handled in J2EE web applications.

  1. If the web container (analogous to Storm in this case) provides a specific library, mark it as provided.
  2. If we absolutely want/need a conflicting version, then we would shade/relocate it. e.g. com.fasterxml.jackson is relocated as org.apache.metron.jackson.${metron_jackson_version} (credit @nickwallen on his work in the profiler libs with Guava relocation)

The catch with still using an uber jar (ie dep with the "uber" jar classifier) is that modules depending on it would still need to exclude the transitive dependencies that were relocated. The short of it being that the uber jar would have the relocated packages (e.g. org.apache.metron.jackson), but the transitive dependency still carries through as com.fasterxml.jackson when our uber jar is referenced as a dep. This is effectively very very similar to what you outlined and solved in approach 2 with the separate uber jar deploy, but should also cover the broader project classpath for all cases.

We spent multiple hours working on this together offline, so I won't rehash all of that here - you've done a reasonable job of summarizing a lot of the main bits. Thanks @merrimanr!

@merrimanr
Copy link
Contributor Author

Thanks for the feedback. A module that includes Stellar functions is not necessarily a problem. The reason I focused on stellar-common is because it contains a substantial amount of custom code with specific dependency versions that often do not line up with versions required by other 3rd party dependencies (HBase, etc). Moving all Stellar functions to stellar-common is an option but I don't think it's necessary right now. We may want to do this for some classes but it can be done on a case-by-case basis as conflicts arise. I think the important thing is to isolate stellar-common now since we know there are several conflicts and have a strategy in place for when conflicts in other modules happen. The more extreme option would be to move all Stellar code to their own separate modules but I think that's overkill.

I agree with your approach of relocating core libraries and marking others as provided where appropriate. That makes sense to me. For the case where a module depends on another module that uses the shade plugin and uber classifier, would we just mark that dependency as provided? For example, if a module depends on stellar-common, stellar-common would be set to provided. This way we wouldn't need to exclude dependencies that are rewritten. This would also mean we need to go with the option of adding the uber jars to the classpath separately (option 2 above) but I think that is the correct approach anyways.

@mmiklavc
Copy link
Contributor

mmiklavc commented Jul 3, 2019

For example, if a module depends on stellar-common, stellar-common would be set to provided. This way we wouldn't need to exclude dependencies that are rewritten. This would also mean we need to go with the option of adding the uber jars to the classpath separately (option 2 above) but I think that is the correct approach anyways.

This sounds sensible to me. I think the challenge with any approach we take here is that it's going to require us to effectively get it 80% of the way done before we find out if there are any gotchas we didn't think of.

@merrimanr
Copy link
Contributor Author

Due the the extensive testing that will be required for this, I am planning on closing and reopening against the METRON-2088-support-hdp-3.1 feature branch. These changes should directly help with that effort and take advantage of the testing that will be required.

@merrimanr merrimanr closed this Jul 30, 2019
@merrimanr merrimanr reopened this Aug 15, 2019
@ottobackwards
Copy link
Contributor

I think you are going to have to look at the zepplin stellar stuff, we have to add the jar's to the zeppelin configuration by hand etc, so you may break things there

@merrimanr
Copy link
Contributor Author

Thanks @ottobackwards. I am developing a test plan right now and will be sure to include this.

@ottobackwards
Copy link
Contributor

It might just be the instructions or screenshots, but worth a look.
We don't test that often

@merrimanr
Copy link
Contributor Author

merrimanr commented Aug 19, 2019

The latest commits implement the change I proposed above (option 2). For clarity's sake, here it is again:

Deploy Stellar code in a separate jar and add it to classpath after the main uber jar, whatever that is (metron-data-management, metron-enrichment-storm, etc). This satisfies 3 because the separate Stellar jar can contain the relocated classes but other dependencies will not overwrite dependencies of the main uber jar (because it's listed after the main uber jar). 1 and 2 are not a concern when classifiers are used which is the case here. The main disadvantage I see is that there will be work adding this extra jar in the various scripts or startup options and we may have to reorganize some classes.

I have finished some fairly exhaustive testing and will update the original PR description with that test plan next. This approach was fairly straightforward with the exception of an integration test in metron-data-management (the pom.xml file contains a comment with an explanation). I will also update the Changes Included section in the original PR description to include the changes needed for this approach.

@tigerquoll
Copy link
Contributor

tigerquoll commented Aug 22, 2019

commit 297b3b3cd3c6a2b138724692d95b70494d70341f
man clean install
...
Running org.apache.metron.stellar.dsl.functions.RestFunctionsIntegrationTest
2019-08-22 11:08:32 ERROR RestFunctions:271 - Stellar REST request to http://localhost:43408/post expected status code to be one of [200] but failed with http status code 404: 
java.io.IOException: Stellar REST request to http://localhost:43408/post expected status code to be one of [200] but failed with http status code 404: 
...
Failed tests: 
  RestFunctionsIntegrationTest.restGetShouldTimeoutWithSuppliedTimeout:279 expected null, but was:<{get=success}>

@tigerquoll
Copy link
Contributor

tigerquoll commented Aug 22, 2019

I am getting a lot of flakey tests results.

-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running org.apache.metron.profiler.storm.integration.ProfilerIntegrationTest
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 28.351 sec <<< FAILURE! - in org.apache.metron.profiler.storm.integration.ProfilerIntegrationTest
org.apache.metron.profiler.storm.integration.ProfilerIntegrationTest  Time elapsed: 28.351 sec  <<< FAILURE!
java.lang.AssertionError: Partition [indexing,0] metadata not propagated after 5000 ms
	at org.junit.Assert.fail(Assert.java:88)
	at kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:771)
	at kafka.utils.TestUtils$.waitUntilMetadataIsPropagated(TestUtils.scala:812)
	at kafka.utils.TestUtils.waitUntilMetadataIsPropagated(TestUtils.scala)
	at org.apache.metron.integration.components.KafkaComponent.waitUntilMetadataIsPropagated(KafkaComponent.java:298)
	at org.apache.metron.integration.components.KafkaComponent.createTopic(KafkaComponent.java:310)
	at org.apache.metron.integration.components.KafkaComponent.start(KafkaComponent.java:178)
	at org.apache.metron.integration.ComponentRunner.start(ComponentRunner.java:131)
	at org.apache.metron.profiler.storm.integration.ProfilerIntegrationTest.setupBeforeClass(ProfilerIntegrationTest.java:476)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
	at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:283)
	at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:173)
	at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
	at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:128)
	at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:203)
	at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:155)
	at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)


Results :

Failed tests: 
  ProfilerIntegrationTest.setupBeforeClass:476 Partition [indexing,0] metadata not propagated after 5000 ms

EDIT -
Some of my unit test flakiness issues fixed with
#1493

@mmiklavc
Copy link
Contributor

commit 297b3b3cd3c6a2b138724692d95b70494d70341f
man clean install
...
Running org.apache.metron.stellar.dsl.functions.RestFunctionsIntegrationTest
2019-08-22 11:08:32 ERROR RestFunctions:271 - Stellar REST request to http://localhost:43408/post expected status code to be one of [200] but failed with http status code 404: 
java.io.IOException: Stellar REST request to http://localhost:43408/post expected status code to be one of [200] but failed with http status code 404: 
...
Failed tests: 
  RestFunctionsIntegrationTest.restGetShouldTimeoutWithSuppliedTimeout:279 expected null, but was:<{get=success}>

Any diagnostics or further detail on this? We don't appear to be having this failure in Travis, at least in the latest run.

@anandsubbu
Copy link
Contributor

@MohanDV and I spent some time testing this PR on a multi node cluster. We were able to run the following and all tests passed :

  1. Regression tests for basic parsers (bro, snort, yaf, squid)
  2. Parser chaining use case
  3. Tests for REST spanning most of the functions.
  4. Command line tests for Stellar REPL
  5. Typosquatting use case
  6. Stellar zeppelin intepreter functionality
  7. Zeppelin canned notebooks import and execution
  8. Profiler tests
  9. JSON MapQuery parser test
  10. Smoke tests of the Alerts and Mangement UIs

+1 based on the above.

@mmiklavc
Copy link
Contributor

@merrimanr How did you get through the instructions for https://github.com/apache/metron/tree/master/use-cases/typosquat_detection? I just ran up to the Stellar Bloom Filter test and am running into a stacktrace from the OBJECT_GET global cache size.

[Stellar]>>> BLOOM_EXISTS(OBJECT_GET('/tmp/reference/alexa10k_filter.ser'), 'gogle')
[!] Unable to parse: BLOOM_EXISTS(OBJECT_GET('/tmp/reference/alexa10k_filter.ser'), 'gogle') due to: File at path '/tmp/reference/alexa10k_filter.ser' is larger than the configured max file size of 1048576
org.apache.metron.stellar.dsl.ParseException: Unable to parse: BLOOM_EXISTS(OBJECT_GET('/tmp/reference/alexa10k_filter.ser'), 'gogle') due to: File at path '/tmp/reference/alexa10k_filter.ser' is larger than the configured max file size of 1048576
	at org.apache.metron.stellar.common.BaseStellarProcessor.createException(BaseStellarProcessor.java:166)
	at org.apache.metron.stellar.common.BaseStellarProcessor.parse(BaseStellarProcessor.java:154)
	at org.apache.metron.stellar.common.shell.DefaultStellarShellExecutor.executeStellar(DefaultStellarShellExecutor.java:407)
	at org.apache.metron.stellar.common.shell.DefaultStellarShellExecutor.execute(DefaultStellarShellExecutor.java:257)
	at org.apache.metron.stellar.common.shell.cli.StellarShell.execute(StellarShell.java:359)
	at org.jboss.aesh.console.AeshProcess.run(AeshProcess.java:53)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException: File at path '/tmp/reference/alexa10k_filter.ser' is larger than the configured max file size of 1048576
	at org.apache.metron.enrichment.cache.ObjectCache$Loader.load(ObjectCache.java:72)
	at org.apache.metron.enrichment.cache.ObjectCache$Loader.load(ObjectCache.java:46)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache$BoundedLocalLoadingCache.lambda$new$0(BoundedLocalCache.java:3366)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.lambda$doComputeIfAbsent$14(BoundedLocalCache.java:2039)
	at java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1853)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.doComputeIfAbsent(BoundedLocalCache.java:2037)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.computeIfAbsent(BoundedLocalCache.java:2020)
	at com.github.benmanes.caffeine.cache.LocalCache.computeIfAbsent(LocalCache.java:112)
	at com.github.benmanes.caffeine.cache.LocalLoadingCache.get(LocalLoadingCache.java:67)
	at org.apache.metron.enrichment.cache.ObjectCache.get(ObjectCache.java:82)
	at org.apache.metron.enrichment.stellar.ObjectGet.apply(ObjectGet.java:66)
	at org.apache.metron.stellar.common.StellarCompiler.lambda$exitTransformationFunc$13(StellarCompiler.java:664)
	at org.apache.metron.stellar.common.StellarCompiler$Expression.apply(StellarCompiler.java:259)
	at org.apache.metron.stellar.common.BaseStellarProcessor.parse(BaseStellarProcessor.java:151)
	... 7 more

I took a spin back through #1399 and it looks like the README documentation about the config options, neither local to the Stellar function nor globally, ever made it into a PR. Added a bug to track it here https://issues.apache.org/jira/browse/METRON-2228https://issues.apache.org/jira/browse/METRON-2228

@mmiklavc
Copy link
Contributor

Note for anyone else testing the typosquat usecase. The only way I was able to get this working was to add the following property to global.json.

{
...
 "object.cache.max.file.size" : 5000000
...
}

The file size of the summarized object is larger than the default configured maximum of 1MB.

# hdfs dfs -ls /tmp/reference/alexa10k_filter.ser
-rw-r--r--   1 root hdfs    4710254 2019-08-22 19:16 /tmp/reference/alexa10k_filter.ser

@merrimanr
Copy link
Contributor Author

Increasing the OBJECT_CACHE size is in the Typosquat testing instructions of this PR. There is a curl command that sets it higher than the bloom filter object.

@mmiklavc
Copy link
Contributor

Based on getting through that issue in the typosquat use case (and confirming we didn't introduce a new regression of some sort on this PR), and the added full range test comfort from @anandsubbu and @MohanDV, I'm giving this my +1.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants