# Sentiment UDF

Sentiment analysis, or the ability to understand the general postive or negative connotation of free form text, can play an important part in understanding what your customers (and critics) are saying about you.  In this exercise, we will create a JavaUDF that will use an existing open source library [Stanford CoreNLP](https://stanfordnlp.github.io/CoreNLP/) to analyze the sentiment of text in Snowflake.

We are going to leverage the ability to create JavaUDFs that utilize multiple .jars (aka, libraries), not just those we authored. Along with this we will employ more intricate code that uses these libraries to perform sentiment analysis inside of Snowflake.

- [ ] Review the .scala code, built using SBT and packaged to a jar file
- [ ] Upload our custom .jar
- [ ] Create a function using our jar, and extra libraries
- [ ] Test our sentiment analysis using ANALYZE_TEXT

![](../assets/java_sent_overview.gif)

## Connect to Snowflake

In [None]:
import com.snowflake.snowpark._
import com.snowflake.snowpark.functions._
import com.snowflake.snowpark.types._

In [None]:
// Set connection properties built in de_snowpark/A-Dataframes/01-Sessions.ipynb
val pwd = sys.env.get("PWD").fold("")(_.toString)
val filename = s"$pwd/de_snowpark/connect.properties"

val session = Session.builder.configFile(s"$filename").create

In [None]:
// Create a Snowflake internal stage that will be used by our Java UDFs
session.sql("create stage if not exists raw.JAVA_UDF_STAGE").collect

## Review Scala Code

Here is the code that has been written and compiled for you, to perform the sentiment anaysis...


```scala
import java.util.Properties // for initializing StanfordCoreNLP
 
// Import StanfordCoreNLP
import edu.stanford.nlp.ling.CoreAnnotations
import edu.stanford.nlp.neural.rnn.RNNCoreAnnotations
import edu.stanford.nlp.pipeline.{Annotation, StanfordCoreNLP}
import edu.stanford.nlp.sentiment.SentimentCoreAnnotations

// Misc for building our result
import scala.collection.convert.wrapAll._
import scala.collection.mutable.ListBuffer

// Scala class
class SentimentAnalyzer {

    // Initialize our processor, and load our model ONCE during the constructor
    val props = new Properties()
    props.setProperty("annotators", "tokenize, ssplit, parse, sentiment")
    val pipeline: StanfordCoreNLP = new StanfordCoreNLP(props)

    // This public method is called with every UDF invocation
    def analyzeSentence (s: String) : String = { 

        val processed = pipeline.process(s)
        val sentences = processed.get(classOf[CoreAnnotations.SentencesAnnotation])
        
        // TODO need to check empty!
        val numberSentences = sentences.size
         
        var sentimentScores : ListBuffer[Int] = ListBuffer()
        sentences.map(sentence => {
            var tree = sentence.get(classOf[SentimentCoreAnnotations.SentimentAnnotatedTree])
            var sentiment = RNNCoreAnnotations.getPredictedClass(tree)
            sentimentScores.append(sentiment)
        })
        
        val totalSentimentScore = sentimentScores.sum        
        val averageSentiment = totalSentimentScore / numberSentences
        val sentimentString = sentimentScores.mkString(",")
        
        // We create a JSON string, which Snowflake will parse as VARIANT
        var myJson = raw"""{"AVERAGE_SENTIMENT" : ${averageSentiment},"NUMBER_SENTENCES" : ${numberSentences}, "SENTIMENTS" : [${sentimentString}]}"""

        // return
        myJson
    }
}
```

This jar has already been assembled, using `sbt` build tool and is called `edu-sentiment-udf_2.12-1.0.jar`.

### Progress: Check

- [X] Review the .scala code, built using SBT and packaged to a jar file
- [ ] Upload our custom .jar
- [ ] Create a function using our jar, and extra libraries
- [ ] Test our sentiment analysis using ANALYZE_TEXT

## Upload the Custom Jar

In [None]:
session.file.put(
    "edu-sentiment-udf_2.12-1.0.jar", "@raw.JAVA_UDF_STAGE",
     Map(
        "OVERWRITE" -> "true"
        , "AUTO_COMPRESS" -> "false"
     )
)

### Progress: Check

- [X] Review the .scala code, built using SBT and packaged to a jar file
- [X] Upload our custom .jar
- [ ] Create a function using our jar, and extra libraries
- [ ] Test our sentiment analysis using ANALYZE_TEXT

## Create Function (Using Extra Jars)

In [None]:
session.sql("""

CREATE FUNCTION raw.ANALYZE_TEXT(arg1 STRING) 
RETURNS VARIANT 
LANGUAGE JAVA IMPORTS = (
'@training_db.traininglab.datasets_stage/dependencies/javax.json.jar'
,'@training_db.traininglab.datasets_stage/dependencies/istack-commons-runtime-3.0.7.jar'
,'@training_db.traininglab.datasets_stage/dependencies/jaxb-impl-2.4.0-b180830.0438.jar'
,'@training_db.traininglab.datasets_stage/dependencies/stanford-corenlp-4.2.2-javadoc.jar'
,'@training_db.traininglab.datasets_stage/dependencies/jaxb-api-2.4.0-b180830.0359.jar'
,'@training_db.traininglab.datasets_stage/dependencies/protobuf-java-3.11.4.jar'
,'@training_db.traininglab.datasets_stage/dependencies/stanford-corenlp-4.2.2.jar'
,'@training_db.traininglab.datasets_stage/dependencies/ejml-simple-0.39.jar'
,'@training_db.traininglab.datasets_stage/dependencies/xom.jar'
,'@training_db.traininglab.datasets_stage/dependencies/ejml-core-0.39.jar'
,'@training_db.traininglab.datasets_stage/dependencies/stanford-corenlp-4.2.2-models.jar'
,'@training_db.traininglab.datasets_stage/dependencies/slf4j-simple.jar'
,'@training_db.traininglab.datasets_stage/dependencies/javax.activation-api-1.2.0.jar'
,'@training_db.traininglab.datasets_stage/dependencies/jollyday.jar'
,'@training_db.traininglab.datasets_stage/dependencies/ejml-ddense-0.39.jar'
,'@training_db.traininglab.datasets_stage/dependencies/joda-time.jar'
,'@training_db.traininglab.datasets_stage/dependencies/slf4j-api.jar'
,'@training_db.traininglab.datasets_stage/dependencies/scala-library-2.12.11.jar'
,'@raw.JAVA_UDF_STAGE/edu-sentiment-udf_2.12-1.0.jar'
) HANDLER='SentimentAnalyzer.analyzeSentence';

""").collect

### Progress: Check

- [X] Review the .scala code, built using SBT and packaged to a jar file
- [X] Upload our custom .jar
- [X] Create a function using our jar, and extra libraries
- [ ] Test our sentiment analysis using ANALYZE_TEXT

## Test Sentiment Analysis Running in Snowflake

Now let's pass in some example text and invoke the sentiment analysis processing using our ANALYZE_TEXT UDF in Snowflake.

In [None]:
val exampleSentencesDF = session.createDataFrame(Seq(
    ("This is great!  I definitely hated this thing!"), ("This is AWFUL!!!")
)).toDF("input")

In [None]:
exampleSentencesDF
    .withColumn("SENTIMENT", callUDF("raw.ANALYZE_TEXT", col("input"))) 
.show

### Progress: Check

- [X] Review the .scala code, built using SBT and packaged to a jar file
- [X] Upload our custom .jar
- [X] Create a function using our jar, and extra libraries
- [X] Test our sentiment analysis using ANALYZE_TEXT