Skip to content
This repository has been archived by the owner on Apr 21, 2023. It is now read-only.

Spot-166 module code cleanup pass #60

Merged
merged 3 commits into from
Jun 21, 2017

Conversation

rabarona
Copy link

Implements changes requested on SPOT-166

Main changes

  • Fixed Scala docs issues; reference to spark replaced with sparkSession
  • Added missing Scala docs
  • Updated and added unit test; test coverage 84% lines.

Ricardo Barona added 2 commits June 20, 2017 16:42
Unit testing added to improve coverage, current lines covered 84%.
* @param ldaAlpha LDA alpha parameter
* @param ldaBeta LDA beta parameter
* @param ldaOptimizerOption LDA optimizer, em or online
* @param maxIterations number of maximum iteration LDA will run
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this say, "maximum number of iterations for the optimizer" or something similar? The current line is a little unclear to me.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like your suggestion, thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ldaAlpha - shouldn't this be explained as document concentration
ldaBeta - should this be explained as topic concentration?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change that, thanks.

@@ -66,6 +75,13 @@ object WideUDFs {
:: Nil).getOrElse(Nil)
UserDefinedFunction(f, ScalaReflection.schemaFor(typeTag[RT]).dataType, Option(inputTypes))
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding these comments.

@@ -32,7 +32,7 @@ object DNSFeedback {
/**
* Load the feedback file for DNS data.
*
* @param spark Spark Session
* @param sparkSession Spark Session
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.

@@ -150,9 +150,12 @@ class ProxySuspiciousConnectsAnalysisTest extends TestingSparkContextFlatSpec wi
val data = sparkSession.createDataFrame(Seq(anomalousRecord, typicalRecord, typicalRecord, typicalRecord, typicalRecord,
typicalRecord, typicalRecord, typicalRecord, typicalRecord, typicalRecord))

val model = ProxySuspiciousConnectsModel.trainModel(sparkSession, logger, onlineTestConfigProxy, data)
// val model = ProxySuspiciousConnectsModel.trainModel(sparkSession, logger, onlineTestConfigProxy, data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these lines commented?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, my bad. Thank you and @brandon-edwards for noticing this.

@@ -19,6 +19,9 @@ package org.apache.spot.lda

import org.apache.spark.sql.types.{LongType, StringType, StructField}

/**
* Schemas and column names used in SpotLDAWrapper and *SuspiciousConnectsModel.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any meaning to the * next to SuspiciousConnectsModel ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • --> Any pipeline, Flow, DNS or Proxy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. Since there's just three right now, you might as well list them. As it is, it looks like a typo.

*
* @param domain domain name
* @return
*/
def domainBelongsToSafeList(domain: String) = domain == "intel" // TBD parameterize this!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this returns a Booelan - the return type should be noted in the signature and the @return parameter should informally describe the result

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm adding that, thanks.

Copy link
Contributor

@NathanSegerlind NathanSegerlind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks pretty good - just a few small questions

Copy link
Contributor

@lujangus lujangus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@rabarona
Copy link
Author

Done.

@asfgit asfgit merged commit 5f25c95 into apache:master Jun 21, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
5 participants