#### Problem Tutorial 1: Sentiment Analysis Model

We want to do sentiment analysis by using [VaderSentiment ML framework](https://medium.com/analytics-vidhya/simplifying-social-media-sentiment-analysis-using-vader-in-python-f9e6ec6fc52f) not supported as an MLflow Flavor.
The goal of sentiment analysis is to "gauge the attitude, sentiments, evaluations, attitudes and emotions of a speaker/writer based on the computational treatment of subjectivity in a text."

VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media.

VADER has a lot of advantages over traditional methods of Sentiment Analysis, including:


 * It works exceedingly well on social media type text, yet readily generalizes to multiple domains
 * It doesn’t require any training data but is constructed from a generalizable, valence-based, human-curated gold standard sentiment lexicon
 * It is fast enough to be used online with streaming data, and
 * It does not severely suffer from a speed-performance tradeoff.


<table>
  <tr><td>
    <img src="https://github.com/dmatrix/olt-mlflow/raw/master/models/images/sentiment_analysis.jpg"
         alt="Sentiment Analysis with Vader " width="600">
  </td></tr>
</table>

[image source](https://medium.com/analytics-vidhya/sentiment-analysis-with-vader-label-the-unlabeled-data-8dd785225166)

In [9]:
%pip install vaderSentiment

Note: you may need to restart the kernel to use updated packages.


### VaderSentiment Python Package

You can read the orignal paper by authors [here](http://comp.social.gatech.edu/papers/icwsm14.vader.hutto.pdf).

In [10]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
import pandas as pd
import mlflow.pyfunc

In [11]:
# Define some input text

INPUT_TEXTS = [{'text': "This is a bad ass movie. You got to see it! :-)"},
               {'text': "Ricky Gervais is smart, witty, and creative!!!!!! :D"},
               {'text': "LOL, this guy fell off a chair while sleeping and snoring in a meeting"},
               {'text': "Men shoots himself while trying to steal a dog, OMG"},
               {'text': "Yay!! Another good phone interview. I nailed it!!"},
               {'text': "This is INSANE! I can't believe it. How could you do such a horrible thing?"}]

### Define a SocialMediaAnalyserModel

This is a subclass of [PythonModel](https://mlflow.org/docs/latest/python_api/mlflow.pyfunc.html#mlflow.pyfunc.PythonModel)

In [12]:
class SocialMediaAnalyserModel(mlflow.pyfunc.PythonModel):

   def __init__(self):
      """
      Constructor for our Cusomized PyFunc PythonModel Class
      """
      super().__init__()
      # Initialize an instance of vader analyser
      self._analyser = SentimentIntensityAnalyzer()

   def _score(self, text):
    """
    Private function to analyse the scores. It invokes model's polarity_scores  
    param: text to analyse
    return: sentiment analyses scores
    """
    scores = self._analyser.polarity_scores(text)
    return scores

   def predict(self, context, model_input):
    """
    Implement the predict function required for PythonModel
    """
    model_output = model_input.apply(lambda col: self._score(col))
    return model_output

In [13]:
def mlflow_run():
  
  # Save the conda environment for this model. 
  conda_env = {
    'channels': ['defaults', 'conda-forge'],
    'dependencies': [
        'python=3.7.6',
        'pip'],
    'pip': [
      'mlflow',
      'cloudpickle==1.3.0',
      'vaderSentiment==3.3.2'
    ],
    'name': 'mlflow-env'
  }
  
  mlflow.set_tracking_uri("sqlite:///mlruns.db")

  # Model name and create an instance of PyFuncModel
  model_path = "vader"
  vader_model = SocialMediaAnalyserModel()
  with mlflow.start_run(run_name="Vader Sentiment Analysis") as run:
    # Log MLflow entities: params and model
    mlflow.pyfunc.log_model(model_path, python_model=vader_model, conda_env=conda_env, registered_model_name="PyFuncVader")
    mlflow.log_param("algorithm", "VADER")
    mlflow.log_param("total_sentiments", len(INPUT_TEXTS))

In [14]:
mlflow_run()

INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
Registered model 'PyFuncVader' already exists. Creating a new version of this model...
INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
2021/06/07 09:24:03 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation.                     Model name: PyFuncVader, version 2
Created version '2' of model 'PyFuncVader'.
INFO  [alembic.runtime.migration] Context imp

### Load as a pyfunc_model from the model registry

In [15]:
# Load back the model as a pyfunc_model for scoring
input_texts = ["Got to love this code snippet!", 
               "Men shoots himself while trying to steal a dog, OMG", 
               "Yay!! Another good phone interview. I nailed it!!"]

model_uri = f"models:/PyFuncVader/1"
pyfunc_model = mlflow.pyfunc.load_model(model_uri)
for i, text in enumerate(input_texts):
  score = pyfunc_model.predict(pd.DataFrame([text]))
  print(f"<{text}> --> {str(score[0])}>")

INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
INFO  [alembic.runtime.migration] Will assume non-transactional DDL.


<Got to love this code snippet!> --> {'neg': 0.0, 'neu': 0.527, 'pos': 0.473, 'compound': 0.6696}>
<Men shoots himself while trying to steal a dog, OMG> --> {'neg': 0.262, 'neu': 0.738, 'pos': 0.0, 'compound': -0.4939}>
<Yay!! Another good phone interview. I nailed it!!> --> {'neg': 0.0, 'neu': 0.446, 'pos': 0.554, 'compound': 0.816}>


In [8]:
model_input = pd.DataFrame([["Got to love this code snippet!", "Men shoots himself while trying to steal a dog, OMG", "Yay!! Another good phone interview. I nailed it!!"]])
pd.DataFrame.to_json(model_input, orient='records')

'[{"0":"Got to love this code snippet!","1":"Men shoots himself while trying to steal a dog, OMG","2":"Yay!! Another good phone interview. I nailed it!!"}]'