-
Notifications
You must be signed in to change notification settings - Fork 224
Spot-166 module code cleanup pass #60
Spot-166 module code cleanup pass #60
Conversation
… be sparkSession.
Unit testing added to improve coverage, current lines covered 84%.
* @param ldaAlpha LDA alpha parameter | ||
* @param ldaBeta LDA beta parameter | ||
* @param ldaOptimizerOption LDA optimizer, em or online | ||
* @param maxIterations number of maximum iteration LDA will run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this say, "maximum number of iterations for the optimizer" or something similar? The current line is a little unclear to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like your suggestion, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ldaAlpha - shouldn't this be explained as document concentration
ldaBeta - should this be explained as topic concentration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can change that, thanks.
@@ -66,6 +75,13 @@ object WideUDFs { | |||
:: Nil).getOrElse(Nil) | |||
UserDefinedFunction(f, ScalaReflection.schemaFor(typeTag[RT]).dataType, Option(inputTypes)) | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding these comments.
@@ -32,7 +32,7 @@ object DNSFeedback { | |||
/** | |||
* Load the feedback file for DNS data. | |||
* | |||
* @param spark Spark Session | |||
* @param sparkSession Spark Session |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch.
@@ -150,9 +150,12 @@ class ProxySuspiciousConnectsAnalysisTest extends TestingSparkContextFlatSpec wi | |||
val data = sparkSession.createDataFrame(Seq(anomalousRecord, typicalRecord, typicalRecord, typicalRecord, typicalRecord, | |||
typicalRecord, typicalRecord, typicalRecord, typicalRecord, typicalRecord)) | |||
|
|||
val model = ProxySuspiciousConnectsModel.trainModel(sparkSession, logger, onlineTestConfigProxy, data) | |||
// val model = ProxySuspiciousConnectsModel.trainModel(sparkSession, logger, onlineTestConfigProxy, data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these lines commented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, my bad. Thank you and @brandon-edwards for noticing this.
@@ -19,6 +19,9 @@ package org.apache.spot.lda | |||
|
|||
import org.apache.spark.sql.types.{LongType, StringType, StructField} | |||
|
|||
/** | |||
* Schemas and column names used in SpotLDAWrapper and *SuspiciousConnectsModel. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any meaning to the * next to SuspiciousConnectsModel ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- --> Any pipeline, Flow, DNS or Proxy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh. Since there's just three right now, you might as well list them. As it is, it looks like a typo.
* | ||
* @param domain domain name | ||
* @return | ||
*/ | ||
def domainBelongsToSafeList(domain: String) = domain == "intel" // TBD parameterize this! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this returns a Booelan - the return type should be noted in the signature and the @return parameter should informally describe the result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm adding that, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks pretty good - just a few small questions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Done. |
Implements changes requested on SPOT-166
Main changes