Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

METRON-627: Add HyperLogLogPlus implementation to Stellar #397

Closed
wants to merge 9 commits into from

Conversation

mmiklavc
Copy link
Contributor

@mmiklavc mmiklavc commented Dec 15, 2016

This PR addresses https://issues.apache.org/jira/browse/METRON-627

Leverages the HLLP implementation from https://github.com/addthis/stream-lib/blob/master/src/main/java/com/clearspring/analytics/stream/cardinality/HyperLogLogPlus.java

4 new Stellar functions have been added that allow a user to initialize a cardinality estimator, add items, merge estimators, and calculate cardinality estimates.

HLLP_ADD

  • Description: Add value to the HyperLogLogPlus estimator set.
  • Input:
    • hyperLogLogPlus - the hllp estimator to add a value to
    • value+ - value to add to the set. Takes a single item or a list.
  • Returns: The HyperLogLogPlus set with a new value added

HLLP_CARDINALITY

  • Description: Returns HyperLogLogPlus-estimated cardinality for this set.
  • Input:
    • hyperLogLogPlus - the hllp set
  • Returns: Long value representing the cardinality for this set

HLLP_INIT

  • Description: Initializes the HyperLogLogPlus estimator set. p must be a value between 4 and sp and sp must be less than 32 and greater than 4.
  • Input:
    • p - the precision value for the normal set
    • sp - the precision value for the sparse set. If p is set, but sp is 0 or not specified, the sparse set will be disabled.
  • Returns: A new HyperLogLogPlus set

HLLP_MERGE

  • Description: Merge hllp sets together. The resulting estimator is initialized with p and sp precision values from the first provided hllp estimator set.
  • Input:
    • hllp - List of hllp estimators to merge. Takes a single hllp set or a list.
  • Returns: A new merged HyperLogLogPlus estimator set

Note: Added new library to metron-common pom and added 3 new items to dependencies_with_url.csv.

Testing

Stellar REPL

Spun up the Stellar REPL in quick-dev. And verified that the function composition is working as expected and returning correct cardinality estimates for simple sparse set cases. For example:

[Stellar]>>> HLLP_CARDINALITY(HLLP_MERGE( HLLP_ADD(HLLP_ADD(HLLP_INIT(), "runnings"), "cool"), HLLP_ADD(HLLP_ADD(HLLP_INIT(), "bobsled"), "team")))
4
[Stellar]>>> HLLP_CARDINALITY(HLLP_ADD(null, [1, 2, 3, 4, 4, 5]))
5

Profiler

Note: In order to fully test this PR with the profiler, the fixes in 420 are required - #420

Testing in a single node cluster with Bro will not give you a reasonable distribution of data, so I've ginned up a script to create some. This script will replace the timestamp and destination ip address for a dns and http record template. The split of distinct IP's is set to 60%, 20%, 20%.

#!/usr/bin/python
import random
import sys
import time
import datetime

def main():
  freq_s = float(sys.argv[1])
  r1 = '{"dns": {"ts":%TIMESTAMP%,"uid":"CXtY9x4GUiVhHwLLAa","id.orig_h":"192.168.66.1","id.orig_p":5353,"id.resp_h":"%DEST_ADDR%","id.resp_p":5353,"proto":"udp","trans_id":0,"query":"_googlecast._tcp.local","qclass":1,"qclass_name":"C_INTERNET","qtype":12,"qtype_name":"PTR","AA":false,"TC":false,"RD":false,"RA":false,"Z":0,"rejected":false}}'
  r2 = '{"http": {"ts":%TIMESTAMP%,"uid":"C9RfeyXjNU00nvE8i","id.orig_h":"192.168.138.158","id.orig_p":49184,"id.resp_h":"%DEST_ADDR%","id.resp_p":80,"trans_depth":1,"method":"GET","host":"va872g.g90e1h.b8.642b63u.j985a2.v33e.37.pa269cc.e8mfzdgrf7g0.groupprograms.in","uri":"/?285a4d4e4e5a4d4d4649584c5d43064b4745","user_agent":"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0)","request_body_len":0,"response_body_len":560,"status_code":200,"status_msg":"OK","tags":[],"resp_fuids":["FvZ5o85OV1E3xiZH"],"resp_mime_types":["text/html"]}}'

  while True:
    # print str(random.gauss(mu, sigma))
    num = random.randint(1,10)
    dest = "224.0.0.25"
    if (num <= 6):
        dest = dest + str(1)
    elif (num <= 8):
        dest = dest + str(2)
    else:
        dest = dest + str(3)

    ts = datetime.datetime.now()
    if (random.randint(1,2) == 1):
        record = r1
    else:
        record = r2
    print record.replace("%DEST_ADDR%", dest).replace("%TIMESTAMP%", ts.strftime("%s.%f"))
    sys.stdout.flush()
    time.sleep(freq_s)

if __name__ == '__main__':
  main()

Run as follows to create a random record every 3 seconds:
python bro-data.py 3 | /usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --broker-list node1:6667 --topic bro

Setup $METRON_HOME/config/profiler.properties

profiler.period.duration=1
profiler.period.duration.units=MINUTES

Also setup Adjust $METRON_HOME/config/zookeeper/global.json to adjust the capture duration:

"profiler.client.period.duration" : "1",
"profiler.client.period.duration.units" : "MINUTES"

Setup profiler.json with 3 profiles. (Note that if you delete a profile, you will need to restart the topology in order to remove it fully.)

{
    "profiles": [
    {
        "profile": "cardinality",
        "foreach": "ip_src_addr",
        "onlyif": "true",
        "init" : {
            "hllp": "HLLP_INIT()"
        },
        "update": {
            "hllp" : "HLLP_ADD(hllp, ip_dst_addr)"
        },
        "result": "HLLP_CARDINALITY(hllp)"
    },
    {
        "profile": "sketchy_mad_state",
        "foreach": "ip_src_addr",
        "onlyif": "true",
        "init" : {
            "last_cardinality" : "GET_FIRST(PROFILE_GET('cardinality', ip_src_addr, 1, 'MINUTES'))",
            "trailing_outlier_state" : "OUTLIER_MAD_STATE_MERGE(PROFILE_GET('sketchy_mad_state', ip_src_addr, 5, 'MINUTES'))",
            "s": "OUTLIER_MAD_ADD(trailing_outlier_state, last_cardinality)"
        },
        "update": { },
        "result": "s"
    },
    {
        "profile": "sketchy_mad_score",
        "foreach": "ip_src_addr",
        "onlyif": "true",
        "init" : {
            "last_cardinality" : "GET_FIRST(PROFILE_GET('cardinality', ip_src_addr, 1, 'MINUTES'))",
            "trailing_outlier_state" : "OUTLIER_MAD_STATE_MERGE(PROFILE_GET('sketchy_mad_state', ip_src_addr, 5, 'MINUTES'))",
            "score" : "OUTLIER_MAD_SCORE(trailing_outlier_state, last_cardinality)"
        },
        "update": { },
        "result": "score"
    }
    ]
}

The 3 profiles setup are as follows:

  1. Initialize the HLLP set and maintain an HLLP cardinality set per ip_src_addr encountered.
  2. Add latest cardinality calculation to the MAD state
  3. Generate a MAD score every tick that comprises a sliding window of the last 5 minutes of hllp scores for this ip_src_addr

Setup $METRON_HOME/config/zookeeper/enrichments/bro.json

{
  "enrichment" : {
    "fieldMap": {
      "geo": ["ip_dst_addr", "ip_src_addr"],
      "host": ["host"],
      "stellar" : {
          "config" : {
              "profiler_score" : "GET_FIRST(PROFILE_GET( 'sketchy_mad_score', ip_src_addr, 1, 'MINUTES'))",
              "is_alert" : "if profiler_score > 3.5 then true else false"
          }
      }
    }
  },
  "threatIntel": {
    "fieldMap": {
      "hbaseThreatIntel": ["ip_src_addr", "ip_dst_addr"]
    },
    "fieldToTypeMap": {
      "ip_src_addr" : ["malicious_ip"],
      "ip_dst_addr" : ["malicious_ip"]
    },
    "triageConfig" : {
      "riskLevelRules" : {
        "profiler_score > 3.5" : 10
      },
      "aggregator" : "MAX"
    }
  }
}

Push the config to Zookeeper
$METRON_HOME/bin/zk_load_configs.sh -m PUSH -i $METRON_HOME/config/zookeeper/ -z node1:2181

Start the profiler topology
$METRON_HOME/bin/start_profiler_topology.sh

@@ -21,6 +21,8 @@ com.jcraft:jsch:jar:0.1.42:compile,BSD,http://www.jcraft.com/jsch/
com.sun.xml.bind:jaxb-impl:jar:2.2.3-1:compile,CDDL,http://jaxb.java.net/
com.sun.xml.bind:jaxb-impl:jar:2.2.5-2:compile,CDDL,http://jaxb.java.net/
com.twitter:jsr166e:jar:1.1.0:compile,CC0 1.0 Universal,http://github.com/twitter/jsr166e
it.unimi.dsi:fastutil:jar:6.5.11:compile,ASLv2,https://github.com/vigna/fastutil
it.unimi.dsi:fastutil:jar:7.0.6:compile,ASLv2,https://github.com/vigna/fastutil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea if we can rely on just 7.0.6? rather than 6.5.11 and 7.0.6? Is streamlib depending on both?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to look at this further. I just did another cursory dependency review and I only see 7.0.6 referenced. 6.5.11 was definitely showing up in the report.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update - this 6.5.11 dep was added somewhere else along the way: org.apache.solr:solr-test-framework:jar:5.2.1:test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is part of the transitive test dependencies for Solr. I excluded fastutil and ran the integration tests for metron-solr and nothing broke. Worst case scenario, we can later add 7.0.6 as a test dep if needed, or simply keep the original. But seeing as it works without it, I'm inclined to keep the exclusion unless anyone has any objections.

Copy link
Member

@cestella cestella left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! Good contribution @mmiklavc! A few small things, but overall great. :)

* fill - the fill character string
* len - the required length
* Returns: Last element of the list
* p (required) - the precision value for the normal set
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the bounds for p and sp?

* Description: Add value to the set
* Input:
* hyperLogLogPlus - the hllp set
* o - Object to add to the set
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make o be an object or a list? This way you can call HLLP_OFFER(hllp, [ field1, field2]) or HLLP_OFFER(hllp, field1).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, I just made the same comment above. Incidentally, our BLOOM_ADD currently works like a vararg. I may open a separate PR to change that as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good

* hllpn - additional sets to merge
* Returns: A new merged HyperLogLogPlus estimator set

### `HLLP_OFFER`
Copy link
Member

@cestella cestella Dec 16, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not in love with HLLP_OFFER..I'd prefer something like HLLP_ADD to match BLOOM_ADD. I understand that HLLP is probabalistic, so they're probably being cagey about having their API calls make checks that their maths can't cash, but I think we can educate sufficiently to make it safe to call this ADD IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to this terminology change, what do you think of changing the implementation to accept a single item or array?
HLLP_ADD(hllp, "value1")
or
HLLP_ADD(hllp, [ "value1", "value2", "value3" ] )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that idea! We should probably make BLOOM_ADD function that way too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a great idea. That seems like a common API design best practice with stellar, we could probably use a stellar programmers guide with stuff like that

  • DO consider accepting scalar and list inputs to your functions
  • DO return NaN from xxxxxx

This is a 'plan for success' kind of thing. We want the stellar library to be large, so large that you can't rely on looking through all the other functions to find the 'right' way to do things

, name="INIT"
, description="Initializes the set"
, params = {
"p (required) - the precision value for the normal set"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are their any documents that we can link to describe the tradeoffs for these values? I'm thinking of something like here discussing the tradeoff between accuracy and size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some diagrams. Think it's sufficient to link in the description, or should we consider embedding these images right into the README?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a link and any description or guidance you can provide in the description will help.

, name="MERGE"
, description="Merge hllp sets together"
, params = {
"hllp1 - first hllp set"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this fit the pattern from BLOOM_MERGE to just take a list? That was done because interactions with the profiler, which is largely where this will be used, will be made easier with that arrangement (i.e. PROFILE_GET returns a list of these, so you'd call this 99% of the time via HLLP_MERGE(PROFILE_GET(....)).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

* @return New List with the elements cast to the desired type
*/
public static <T, U> List<U> convertList(List<T> from, Class<U> clazz) {
return Lists.transform(from, s -> clazz.cast(s));
Copy link
Member

@cestella cestella Dec 16, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My issue with this is that above we are calling a method convert and it is a coercive operation (e.g. convert("1", Integer.class) == 1) rather than just a simple cast. Can you make convertList call convert on the elements? If you really want a naive version of this, feel free to make a castList method.


import java.io.Serializable;
import java.util.List;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you link the google paper from arxiv on which this is based in the Javadoc comments and in the Stellar description?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heads up, I searched arxiv and it doesn't appear that this paper is there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to link to the Google research link - https://research.google.com/pubs/pub40671.html

@cestella
Copy link
Member

cestella commented Dec 16, 2016

Because this is a function that will 99% be used from the profiler, could you please design a simple test around using this with the profiler and give us quick rundown on the test procedure as a comment on this PR. I'm thinking something similar (but perhaps less intense) to here.

@ottobackwards
Copy link
Contributor

Should functions like this have a more expansive description, that perhaps includes a set of sample data and the outcome? Similar to what the Profiler functions have?

@ottobackwards
Copy link
Contributor

I also often find myself wondering if we need peer jiras for documentation/wiki work on this stuff created

@mmiklavc
Copy link
Contributor Author

@ottobackwards +1 on some working examples. We currently plop the whole lot of Stellar documentation in the metron-common README and I wonder if we should start splitting any of this out into separate md files. This set of functions really works as a small suite. Same for the Bloom filter functions. And some deeper-dive might prove useful. I like @cestella 's recommendation of using the profiler for an end-to-end test. This really gets at illustrating the value from a user/customer perspective.

I think we want to consider how we choose to use the Metron wiki vs the README files. It feels like we could keep the README's lean and mean, dealing specifically with relevant options and links to external papers and documentation. And then on the wiki we can go into greater technical detail and maybe even start assembling a cookbook of examples. Thoughts?

@cestella
Copy link
Member

A couple things. First off, these functions (Bloom and HLL) really should exist in stellar-statistics. rather than being in common IMO (thoughts?). Second, stellar should exist in its own project with a bare set of common functions and the stellar infrastructure.

@ottobackwards
Copy link
Contributor

I agree with keeping the README's lean. What I'm struggling with ( and this may just be as an FOSS noob ) is that there seems to be no process around calling out documentation. On internal company projects, we would require jira and confluence entries for product documentation and internal documentation needs - to put in the backlog. I don't see where we do that here. There is a general understanding that "even if you don't contribute code, contributing documentation is good", but where are we tracking what needs it etc?

@cestella
Copy link
Member

Now, regarding the wiki vs the Readme. The reason I am hesitant to move things to the wiki is that the wiki doesn't really track with the code well. What happens when we change things around and want examples from an older version? We don't have a solution for that at the moment.

@ottobackwards
Copy link
Contributor

In that case, we should have multiple concise README's with more detail and a top level overview readme

@ottobackwards
Copy link
Contributor

ottobackwards commented Dec 16, 2016

Maybe we could build a readme-wiki, like a section of our wiki which is just the readme's from the current build?

That would also give doc-contributors a way to just work in confluence and not have to go through the code. That might help?

@cestella
Copy link
Member

I imagine the wiki is for things like process docs (bylaws, etc.) and high level coverage of end-to-end usecases aimed at multiple features working in concert to achieve an end (e.g. the blogs). These get updated per release. The README examples are more targeted and aimed at demonstrating the feature without necessarily being tied to a specific usecase (e.g. the MAD example just generates random data, not security data).

@ottobackwards
Copy link
Contributor

But, I just download the binaries or install through ambari? You mean I have to look through the codebase?

@cestella
Copy link
Member

@ottobackwards Good point, there is some overlap between end-user docs and the stuff in the README.md's. Things like the stellar language and function docs should be part of the wiki, I suspect. We probably need something more high level than the examples in the stellar-statistics README that I cited earlier.

@ottobackwards
Copy link
Contributor

what we need is a plan, whatever it is ;)

@cestella
Copy link
Member

Ok, I propose a [discuss] around docs. These are great questions, but they are probably not going to result in getting us HLL functionality soon ;)

@nickwallen
Copy link
Contributor

I was thinking how I could use your HLLP functionality with the Profiler. I think I could use this functionality to track the in-degree and out-degree of a host over time. This might be an interesting motivating example.

For example, calculating the in-degree would look like the following.

{
  "profile": "in-degree",
  "onlyif": "source.type == 'yaf'"
  "foreach": "ip_dst_addr",
  "init": {
    "in": "HLLP_INIT(5, 6)"
  }
  "update": {
    "in": "HLLP_ADD(in, ip_src_addr)"
  }
  "result": {
    "HLLP_CARDINALITY(in)"
  }
}

Calculating the out-degree would look like this.

{
  "profile": "out-degree",
  "onlyif": "source.type == 'yaf'"
  "foreach": "ip_src_addr",
  "init": {
    "out": "HLLP_INIT(5, 6)"
  }
  "update": {
    "out": "HLLP_ADD(out, ip_dst_addr)"
  }
  "result": {
    "HLLP_CARDINALITY(out)"
  }
}

Of course, with your HLLP_MERGE it might be more interesting to store the HLLP object itself and call HLLP_CARDINALITY after-the-fact, but I just want to make sure I'm using the API correctly.

@nickwallen
Copy link
Contributor

@mmiklavc Meant to ask previously, does this look right? Am I using your API correctly?

@cestella
Copy link
Member

@nickwallen That does look right, but you probably want the result to just be out so you can decouple the in-degree and out-degree calculation from the window size in the profiler.

@nickwallen
Copy link
Contributor

Yes, understood. I think that's what I wrote at the bottom of my last post. More clear for a motivating example to actually do the calculation as the result.

@cestella
Copy link
Member

@nickwallen Fair point; I missed that last sentence. Sorry for the confusion.

@nickwallen
Copy link
Contributor

Very cool stuff, @mmiklavc. Can't wait to see it hit master.

Just to lend my $.02, I do agree with @ottobackwards that a more detailed example would be helpful. Maybe something in a REPL session that uses multiple lines. Kind of hard for me to grok the single-line example in the PR description.

Also, per @cestella, I agree that this makes sense in metron-analytics/metron-statistics. He mentioned stellar-statistics, but I think he meant metron-statistics.

@mmiklavc
Copy link
Contributor Author

mmiklavc commented Jan 3, 2017

@nickwallen Thanks for the feedback! - I'm currently working on finishing up some performance metrics (p/sp vals vs accuracy, cardinality, serialized mem consumption) in addition to an example utilizing the profiler. Your example above is exactly in line with what I was thinking might make for a good demo of this functionality.

Also, agreed on the single-line example - that was meant to make cut-and-paste easy. The unit and integration tests are much easier to read. I'll provide similar examples here.

@mmiklavc
Copy link
Contributor Author

@nickwallen @cestella Moved the HLLP implementation wrapper and Stellar functions to the metron-statistics project. Bloomfilter Stellar functions should probably be moved to the stats project as well, but that's for a separate PR, I feel.

I also added a utility for generating performance metrics that includes a README (HLLP.md) that is linked to in the main metron-statistics README as well as the metron-common README. There is a link provided to the Google paper, an exposition of performance results, and a new default precision for the sparse and dense (normal) sets.

Demo sample soon to follow using the profiler.

@mmiklavc
Copy link
Contributor Author

Hm, something fishy going on here with Travis. Running all tests fine from Maven (on OSX) locally.

[INFO] metron-analytics ................................... SUCCESS [  1.079 s]
[INFO] metron-maas-common ................................. SUCCESS [ 10.757 s]
[INFO] metron-maas-service ................................ SUCCESS [01:25 min]
[INFO] metron-statistics .................................. SUCCESS [ 46.321 s]
[INFO] metron-profiler-common ............................. SUCCESS [ 13.036 s]
[INFO] metron-profiler-client ............................. SUCCESS [01:21 min]
[INFO] metron-profiler .................................... SUCCESS [04:34 min]

@dlyle65535
Copy link
Contributor

I was able to duplicate the failures in my environment by doing the following (based on what Travis does).

mkdir mike_test
cd mike_test
git clone --depth=50 https://github.com/apache/incubator-metron.git apache/incubator-metron
cd apache/incubator-metron/
git fetch origin +refs/pull/397/merge
git checkout -qf FETCH_HEAD
mvn -q integration-test install && build_utils/verify_licenses.sh

It bombs out with:

cardinality_gives_distinct_value_estimate_with_precisions_set(org.apache.metron.statistics.approximation.HyperLogLogPlusFunctionsIntegrationTest)  Time elapsed: 0.004 sec  <<< ERROR!
org.apache.metron.common.dsl.ParseException: Unable to parse HLLP_CARDINALITY(
  HLLP_ADD(
    HLLP_ADD(
      HLLP_INIT(5, 6),
      val1
    ),
    val2
  )
)
: Syntax error @ 8:3 no viable alternative at input 'HLLP_CARDINALITY(HLLP_ADD(HLLP_ADD(HLLP_INIT(5,6),val1),val2)\n'

@jjmeyer0
Copy link
Contributor

This looks like an issue with the grammar. My guess is it isn't properly handling white space at the end. If you change the variable hllpDefaultConstructorRule to the below, it should work. I know this isn't a permanent solution, but it narrows the problem down.

  /**
   *HLLP_CARDINALITY(
   *  HLLP_ADD(
   *    HLLP_ADD(
   *      HLLP_INIT(),
   *      val1
   *    ),
   *    val2
   *  )
   *  )*/
  @Multiline
  private static String hllpDefaultConstructorRule;

@mmiklavc
Copy link
Contributor Author

@dlyle65535 the thing I'm not fully understanding is how this is working in my local branch. Doing a git fetch and checkout shouldn't merge anything from master, so the most recent master shouldn't be in this test run. Doing a git pull does a fetch and merge, which I would anticipate causing problems.

@jjmeyer0 Yeah, that's what I thought as well. Again, unclear yet why this is not a problem locally. Furthermore, it wasn't a problem in the last commit I had that I believe had the newline. I'll give it a shot anyhow.

@cestella
Copy link
Member

@jjmeyer0 @mmiklavc It appears that a regression was added relatively recently that affects whitespace at the end of stellar rules (the whitespace isn't sent to the skip channel in the lexer in that situation, for some reason..not sure why). A stopgap is what JJ said, a slightly more permanant fix considering whitespace has no meaning in Stellar is to trim the rule in BaseStellarProcessor (line 111). I'd still consider it important to investigate what is confusing the grammar suddenly, mind you.

@jjmeyer0
Copy link
Contributor

Okay, I took a look. It's because EOL isn't defined as a fragment.

If EOL : '\n'; is changed to fragment EOL : '\n'; it should work. Without the fragment keyword antlr4 makes a token for EOL and it gets confused on how to parse it.

@cestella
Copy link
Member

@jjmeyer0 Makes sense and that explains things. @mmiklavc despite it being out of the scope of this ticket, it should just be a 1-liner and we should probably correct it as part of this IMO.

@jjmeyer0
Copy link
Contributor

@mmiklavc have you rebased lately? If not, I'm sure the PR I submitted for Stellar comparisons is the culprit.

@mmiklavc
Copy link
Contributor Author

Ok, this is making more sense now. GitHub has merged this PR out of band of the build. That fetch is not doing a merge. It's fetching what has already been merged out of band from the build - git fetch origin +refs/pull/397/merge. This also explains how GitHub is able to tell us if a PR has merge conflicts long before Travis runs its build. So yes, the grammar corrections should work. On it.

@mmiklavc
Copy link
Contributor Author

Merged with latest master and modified Stellar.g4 to handle EOL fragments. Test went from failing to passing locally.

@mmiklavc mmiklavc closed this Jan 18, 2017
@mmiklavc
Copy link
Contributor Author

Kick Travis

@mmiklavc mmiklavc reopened this Jan 18, 2017
@cestella
Copy link
Member

Did you verify that these functions run on:

  • the profiler
  • the parser topology via field transformations
  • the enrichment topology via stellar enrichments
  • the Stellar REPL

If so, then I'm +1

@mmiklavc
Copy link
Contributor Author

@cestella Verified the above - modifying the testing plan now

@cestella
Copy link
Member

@mmiklavc Can you please merge master in and commit this?

@asfgit asfgit closed this in f3ca3c0 Jan 27, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
6 participants