**Sample Materials, provided under license. <br>
Licensed Materials - Property of IBM. <br>
© Copyright IBM Corp. 2019, 2020. All Rights Reserved. <br>
US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. <br>**

# Import required packages

In [1]:
import json
import os
import datetime
from ibm_watson_machine_learning import APIClient

# Table of Contents

* [0. About](#about)
* [1. Deploying Functions](#1)
    * [1.1 Clustering](#1-1)
        * [1.1.1 Create Function to Deploy](#1-1-1)
        * [1.1.2 Deploy Function](#1-1-2)
        * [1.1.3 Test Deployed Function](#1-1-3)
    * [1.2 Sentiment Analysis](#1-2)
        * [1.2.1 Create Function to Deploy](#1-2-1)
        * [1.2.2 Deploy Function](#1-2-2)
        * [1.2.3 Test Deployed Function](#1-2-3)
* [2. View Deployed Functions](#2)

# 0. About <a class="anchor" id="about"></a>

In `1_Data_Exploration_and_Model_Training.ipynb`, we saw how all the functions work. This notebook will focus on using the Watson Machine Learning client to deploy functions. 

We will walk through the steps on setting up Watson Machine Learning and deploying functions in the next sections. However, if you need more help:
* For more information about Watson Machine Learning Client, see the documentation [here](http://ibm-wml-api-pyclient.mybluemix.net/)
* For more information on deploying functions see the documentation [here](https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/ml-deploy-functions_local.html).

# 1. Deploying Functions <a class="anchor" id="1"></a>

We will have two main functions, 
1. Clustering comments and 
2. Sentiment Analysis on comments 

Deployable functions should follow the structure:
```
def deployable_function():
    def score(paylod):
        return payload['values']
    return score
```

The `payload` is a dictionary that must include `values` as a key. Additionally, each deployable function must return a json serializable object. So the format of the `payload` and the json response for each function is:

1. Clustering Function
    * Payload (input):
    ```
    {
        "values": list of comments (required parameter),
        "number_of_clusters": number of clusters to create 
            if this is empty then the model automatically 
            chooses the number but caps it at 5 (optional parameter)
        "number_of_terms": number of top terms to display 
            for each cluster, if the number of comments is 
            less than this value then the number of comments 
            will be used (optional parameter)
    }
    ```
    * Response:
    ```
    [
        {
            "comment": {
                "idx": int,
                "group": int,
                "topics": [str]
            },
        },
    ]
    ```
Where "idx" is the index of the original comment. "group" is the group number assigned. "topics" are the most important terms in that group.

2. Sentiment Analysis Function
    * Payload (input):
    ```
    {
        "values": list of comments (required parameter)
    }
    ```
    * Response:
    ```
    [
        {
            "comment": {
                "idx": int,
                "sentiment": str,
                "sentence_sentiment": [
                    {
                        "idx": int,
                        "sentence": str,
                        "sentiment": str
                    },
                ]
            }
        },
    ]
    ```
Where "idx" is the index of the original comment. "sentiment" is the sentiment of the entire comment. "sentence_sentiment" is for every sentence of the comment, its "idx" is the index of the sentence, "sentence" is the entire sentence string, its "sentiment" is the sentence's sentiment.


### Setting Up Watson Machine Learning

To setup Watson Machine Learning, you can follow the steps below or see [here](http://ibm-wml-api-pyclient.mybluemix.net/#api-for-ibm-cloud-pak-for-data-ibm-watson-machine-learning-server) for more help.

In [2]:
token = os.environ['USER_ACCESS_TOKEN']

wml_credentials = {
   "token": token,
   "instance_id" : "openshift",
   "url": os.environ['RUNTIME_ENV_APSX_URL'],
   "version": "3.5"
}


client = APIClient(wml_credentials)

#### Create the Deployment Space
Create a new deployment space using name of the space as specified in the user inputs cell above. The space name will be used in future to identify this space.
If a space with specified space_name already exists, user can either use the existing space by specifying `use_existing_space=True` or delete the existing space and create a new one by specifying `use_existing_space=False` below. By default `use_existing_space` is set to True.


In [3]:
space_name = 'Comments Organizer Space'

use_existing_space=True

In [4]:
space_uid=""
for space in client.spaces.get_details()['resources']:

    if space['entity']['name'] ==space_name:
        print("Deployment space with ",space_name,"already exists . .")
        space_uid=space['metadata']['id']
        client.set.default_space(space_uid)
        if(use_existing_space==False):

            for deployment in client.deployments.get_details()['resources']:
                print("Deleting deployment",deployment['entity']['name'], "in the space",)
                deployment_id=deployment['metadata']['id']
                client.deployments.delete(deployment_id)
            print("Deleting Space ",space_name,)
            client.spaces.delete(space_uid)
            time.sleep(5)
        else:
            print("Using the existing space")
            
            
if (space_uid=="" or use_existing_space==False):
    print("\nCreating a new deployment space -",space_name)
    # create the space and set it as default
    space_meta_data = {
        client.spaces.ConfigurationMetaNames.NAME : space_name

        }

    stored_space_details = client.spaces.store(space_meta_data)

    space_uid = stored_space_details['metadata']['id']

    client.set.default_space(space_uid)


Creating a new deployment space - Comments Organizer Space
Space has been created. However some background setup activities might still be on-going. Check for 'status' field in the response. It has to show 'active' before space can be used. If its not 'active', you can monitor the state with a call to spaces.get_details(space_id)


## 1.1 Clustering <a class="anchor" id="1-1"></a>
This section will be focusing on wrapping our clustering methods and data preprocessing into a function that will be deployed. 

### 1.1.1 Create Function to Deploy<a class="anchor" id="1-1-1"></a>

Reminder that the deployed function's input and response is:
* Payload (input):
    
    ```
    {
    values 
        {
        "test": list of comments (required parameter),
        "number_of_clusters": number of clusters to create 
            if this is empty then the model automatically 
            chooses the number but caps it at 5 (optional parameter)
        "number_of_terms": number of top terms to display 
            for each cluster, if the number of comments is 
            less than this value then the number of comments 
            will be used (optional parameter)
        }
    }
    ```
* Response:
    ```
    [
        {
            "comment": {
                "idx": int,
                "group": int,
                "topics": [str]
            },
        },
    ]
    ```

In [5]:
def run_clustering():
    import json
    from collections import defaultdict

    import numpy as np
    import pandas as pd
    from sklearn.cluster import KMeans
    from sklearn.metrics import silhouette_score
    from sklearn.feature_extraction.text import TfidfVectorizer
    from sklearn.metrics.cluster import homogeneity_score, completeness_score, v_measure_score

    
    def get_top_n_terms_per_cluster(km_model, terms, n=5):
        """
        Gets the top terms used to cluster text

        :param km_model: KMeans model
        :param terms: list of terms from a TfidfVecotrizer object
        :return: dictionary mapping cluster number to top n terms
                 {cluster_number: [term1, term2,..., termn]}
        """
        cluster_terms = defaultdict(list)

        tfidf_values = km_model.cluster_centers_
        order_centroids = km_model.cluster_centers_.argsort()[:, ::-1]
        for i in range(len(order_centroids)):
            cluster = order_centroids[i]
            for term_idx in cluster[:n]:
                # Term should exist in the cluster in order to include
                if tfidf_values[i, term_idx] != 0:
                    cluster_terms[i].append(terms[term_idx])

        return cluster_terms
    
    def run_kmeans(number_of_clusters, tfidf_matrix):
        """
        :param number_of_clusters: int
        :param tfidf_matrix: matrix from TfidfVectorizer object
        :return: KMeans model, list of cluster labels
        """
        km_model = KMeans(n_clusters=number_of_clusters, init='k-means++')
        km_model.fit(tfidf_matrix.toarray())
        clusters = km_model.labels_.tolist()
        return km_model, clusters
    
    def run_model(X, number_of_clusters=None, number_of_terms=5, max_number_of_groups=5):
        """
        Runs the entire modeling process
        1. create TFIDF matrix
        2. run KMeans with TFIDF matrix
        3. get top terms used

        :return: (list of cluster assignments, 
                  dictionary mapping cluster number of terms)
        """
        # First find TFIDF matrix
        tfidf_vectorizer = TfidfVectorizer(max_df=0.75 if len(X)>1 else 1, 
                                           min_df=0.1 if len(X)>1 else 1,
                                           stop_words='english',
                                           use_idf=True, 
                                           ngram_range=(1,3),
                                          )

        tfidf_matrix = tfidf_vectorizer.fit_transform(X)
        terms = tfidf_vectorizer.get_feature_names()

        # Number of clusters must be > 2 for silhouette_score to work.
        # If there are 2 or less comments, then just set to number of comments.
        number_of_clusters = len(X) if len(X) <= 2 else number_of_clusters

        if number_of_clusters:
            # If there's a specific number of clusters specified, then run with that number.
            km_model, best_clusters = run_kmeans(number_of_clusters, tfidf_matrix)
            cluster_terms = get_top_n_terms_per_cluster(km_model, terms, number_of_terms)
        else:
            # Automatically find number of clusters with silhouette_score
            # but have a maximum of max_number_of_groups 
            max_silhouette_score = 0
            for k in range(2, min(max_number_of_groups, len(X))):
                km_model, clusters = run_kmeans(k, tfidf_matrix)
                current_silhouette_score = silhouette_score(tfidf_matrix, clusters)
                if current_silhouette_score > max_silhouette_score:
                    max_silhouette_score = current_silhouette_score
                    cluster_terms = get_top_n_terms_per_cluster(km_model, terms, number_of_terms)
                    best_clusters = clusters

        return best_clusters, cluster_terms
    
    def format_clusters(final_labels, group_topics):
        """
        return:{
                "comment": {
                    "idx": 1,
                    "group": 2
                    "topics": [""]
                }
               }
        """
        
        result = []
        for i in range(len(final_labels)):
            group = int(final_labels[i])
            comment = {
                "comment": {
                    "idx": i,
                    "group": group,
                    "topics": list(group_topics[group])
                }
            }
            result.append(comment)
        return json.dumps(result)

    def score(payload):
        '''
        :param payload: dictionary, expects the key 'values' (list of comments).
            optional keys are 'number_of_clusters' and 'number_of_numbers', both int
        :return: JSON serializable object with format 
                [
                    {
                        "comment": {
                            "idx": int,
                            "group": int,
                            "topics": [str]
                        },
                    },
                ]
        '''

        comments = payload['input_data'][0]['values'].get('test') 
        n = payload['input_data'][0]['values'].get('number_of_clusters')
        n_terms = payload['input_data'][0]['values'].get('number_of_terms', 5)  # default is 5
        final_labels, top_terms = run_model(comments, number_of_clusters=n, number_of_terms=n_terms)
        
        score_response = {'predictions': [{ 'values': format_clusters(final_labels, top_terms)}]}        

        return score_response
    
    return score

### 1.1.2 Deploy Function <a class="anchor" id="1-1-2"></a>

Now that we have a deployable function, we store the function details.

In [6]:
software_spec_id = client.software_specifications.get_id_by_name("default_py3.7")

In [7]:
# Store function details

meta_data = {
    client.repository.FunctionMetaNames.NAME : 'Function for Clustering Text',
    client.repository.FunctionMetaNames.TAGS : ['clustering_function_tag'],
    client.repository.FunctionMetaNames.SOFTWARE_SPEC_UID: software_spec_id

}

function_details = client.repository.store_function( meta_props=meta_data, function=run_clustering )

In [8]:
# Get function id
function_id = function_details["metadata"]["id"]

In [9]:
meta_props = {
    client.deployments.ConfigurationMetaNames.NAME: 'Clustering Deployment',
   client.deployments.ConfigurationMetaNames.TAGS : ['clustering_function_deployment_tag'],
    client.deployments.ConfigurationMetaNames.ONLINE: {}
}

# deploy the function
function_deployment_details = client.deployments.create(artifact_uid=function_id, meta_props=meta_props)



#######################################################################################

Synchronous deployment creation for uid: 'c10d45d4-835c-46b9-802c-a0b46c34ec40' started

#######################################################################################


initializing..
ready


------------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_uid='e5436813-0de6-4d3e-99c3-d3b72d617e2e'
------------------------------------------------------------------------------------------------




In [10]:
# Get deployment id for clustering
clustering_deployment_id = client.deployments.get_uid(function_deployment_details)

### 1.1.3 Test Deployed Function <a class="anchor" id="1-1-3"></a>

In [11]:
# Example 1
# Input for payload to be passed into function
input_sentences = [
    'Customer service was polite.',
    'The socks are a pretty color.',
    'The shirt I bought was green.',
    'I think the sweater and socks were perfect.',
    'I do not like the shoes, so ugly.',
]
#payload = {"values" : input_sentences}

payload = {client.deployments.ScoringMetaNames.INPUT_DATA: [{'values': 
                                                             {"test": input_sentences,
                                                              "number_of_clusters": 3,
                                                              "number_of_terms": 2}} 
                                                           ]}


# Send data to deployment for processing
client.deployments.score(clustering_deployment_id, payload)

{'predictions': [{'values': '[{"comment": {"idx": 0, "group": 0, "topics": ["ugly", "shoes ugly"]}}, {"comment": {"idx": 1, "group": 1, "topics": ["socks", "socks pretty color"]}}, {"comment": {"idx": 2, "group": 2, "topics": ["bought", "bought green"]}}, {"comment": {"idx": 3, "group": 1, "topics": ["socks", "socks pretty color"]}}, {"comment": {"idx": 4, "group": 0, "topics": ["ugly", "shoes ugly"]}}]'}]}

In [12]:
# Example 2: Convert function string output to json format
comments = [
    'I bought several items: socks, shirt, sweater. By far my most favorite was the shirt because it is so soft. However, the sweater and socks missed the mark.',
    'My order arrived several days late. But when I contacted customer serivce they were very helpful and refunded me.',
    'Horrible, horrible customer service, I have never met such rude people. Why is it so bad? Would not recommend at all.',
    'Everything I ordered arrived on time and looked exactly like in the pictures! This company has high quality products.',
    'I bought somethings on sale, they were a great deal. Will be buying more next time.'
]


payload = {client.deployments.ScoringMetaNames.INPUT_DATA: [{'values': 
                                                             {"test": comments,
                                                              "number_of_clusters": 3,
                                                              "number_of_terms": 2}} 
                                                           ]}

labels = client.deployments.score(clustering_deployment_id, payload)


In [13]:
labels

{'predictions': [{'values': '[{"comment": {"idx": 0, "group": 2, "topics": ["shirt", "sweater"]}}, {"comment": {"idx": 1, "group": 0, "topics": ["horrible", "customer"]}}, {"comment": {"idx": 2, "group": 0, "topics": ["horrible", "customer"]}}, {"comment": {"idx": 3, "group": 1, "topics": ["time", "somethings"]}}, {"comment": {"idx": 4, "group": 1, "topics": ["time", "somethings"]}}]'}]}

## 1.2 Sentiment Anlaysis <a class="anchor" id="1-2"></a>

This section will be focusing on wrapping our sentiment analysis methods and data preprocessing into a function that will be deployed. For more help on deploying functions, see [here](https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/ml-deploy-functions.html).

### 1.2.1 Create Function to Deploy <a class="anchor" id="1-2-1"></a>

Reminder that the deployed function's input and response will be:

* Payload (input):
    ```
    {
    values
        {
        "test": list of comments (required parameter)
        }
    }
    ```
* Response:
    ```
    [
        {
            "comment": {
                "idx": int,
                "sentiment": str,
                "sentence_sentiment": [
                    {
                        "idx": int,
                        "sentence": str,
                        "sentiment": str
                    },
                ]
            }
        },
    ]
    ```

In [14]:
def run_sentiment_analysis():
    import subprocess
    subprocess.check_output( "pip install nltk --user", stderr=subprocess.STDOUT, shell=True )
    subprocess.check_output( "pip install xlrd --user", stderr=subprocess.STDOUT, shell=True )
    
    import json
    import itertools
    import os
    import requests
    import tarfile

    import string
    import nltk
    import numpy as np
    import pandas as pd
    from nltk import word_tokenize
    
    nltk.download('punkt')
    nltk.download('averaged_perceptron_tagger')
    
    class Adjective:
        FAST = 'FAST_TOKENS'
        SLOW = 'SLOW_TOKENS'
        HIGH = 'HIGH_TOKENS'
        LOW = 'LOW_TOKENS'

    class Sign:
        POSITIVE = 'Positive'
        NEGATIVE = 'Negative'
        NEUTRAL = 'Neutral'
        
    def download_data(url_base, version, data_file_name):
        # Downloading the dataset
        url = "{}/{}/{}".format(url_base, version, data_file_name)
        response = requests.get(url)

        # Check for errors
        if not response.ok:
            print("There are some errors when downloading {}".format(url))

        # Open tar file
        with open(data_file_name, 'wb') as file_name:
            file_name.write(response.content)
    
    def extract_data(data_directory, data_file_name):
        # Extracting the dataset
        with tarfile.open(data_file_name) as file_name:
            file_name.extractall(path='./' + data_directory)
            
    def clean_adjective_df(df):
        adjectives = []
        for i in range(4):
            adjectives.extend(df.iloc[:, i+1].dropna().tolist())

        adj_category = df.iloc[0,0]

        return [(adj, adj_category) for adj in adjectives]
            
    
    # Download the dataset
    sentiment_data_directory = 'data/sentiment-composition-lexicons'
    url_base = 'https://dax-cdn.cdn.appdomain.cloud/dax-sentiment-composition-lexicons'
    version = '1.0.2'
    data_file_name = 'sentiment-composition-lexicons.tar.gz'
    download_data(url_base, version, data_file_name)

    # Extract the dataset
    extract_data(sentiment_data_directory, data_file_name)
    
    # 1. Read unigram data
    unigram_df = pd.read_csv(os.path.join(sentiment_data_directory, 'LEXICON_UG.txt'), sep=" ")
    # Add sentiment column
    unigram_df['sentiment'] = np.where(unigram_df['SENTIMENT_SCORE'] > 0, 1, 0)  # 1 is positive, 0 is negative
    unigram_sentiment_dict = pd.Series(unigram_df.SENTIMENT_SCORE.values,
                               index=unigram_df.UNIGRAM.values).to_dict()

    # 2. Read bigram data
    bigrams_df = pd.read_csv(os.path.join(sentiment_data_directory, 'LEXICON_BG.txt'), sep=" ")
    # Add sentiment column
    bigrams_df['sentiment'] = np.where(bigrams_df['SENTIMENT_SCORE'] > 0, 1, 0)  # 1 is positive, 0 is negative
    bigram_sentiment_dict = pd.Series(bigrams_df.SENTIMENT_SCORE.values,
                               index=bigrams_df.BIGRAM.str.split('-').apply(lambda l: tuple(l))).to_dict()

    # 3.1 Adjective Classes
    xls_file = pd.ExcelFile(os.path.join(sentiment_data_directory, 'ADJECTIVES.xlsx'))
    adjective_expansion = pd.read_excel(xls_file, 'ADJECTIVE_EXPANSION').dropna(how='all').reset_index(drop=True)
    high_low_PN = pd.read_excel(xls_file, '(HIGH,LOW)_POS_NEG', header=None)[0].values.tolist()
    high_low_NP = pd.read_excel(xls_file, '(HIGH,LOW)_NEG_POS', header=None)[0].values.tolist()
    fast_slow_PN = pd.read_excel(xls_file, '(FAST,SLOW)_POS_NEG', header=None)[0].values.tolist()
    fast_slow_NP = pd.read_excel(xls_file, '(FAST,SLOW)_NEG_POS', header=None)[0].values.tolist()
    
    # Get tokens
    tokens = []
    token_rows = 5
    for i in range(0, len(adjective_expansion), token_rows+1):
        tokens.append(clean_adjective_df(adjective_expansion.loc[i:i+token_rows]))
    high_tokens, low_tokens, fast_tokens, slow_tokens = tokens

    adjective_class_map = dict(high_tokens + low_tokens + fast_tokens + slow_tokens)

    # 3.2 Composition Classes
    semantic_classes_file = pd.ExcelFile(os.path.join(sentiment_data_directory, 'SEMANTIC_CLASSES.xlsx'))
    dominator_neg = pd.read_excel(semantic_classes_file, 'DOMINATOR_NEG', header=None)[0].values.tolist()
    dominator_pos = pd.read_excel(semantic_classes_file, 'DOMINATOR_POS', header=None)[0].values.tolist()
    propagator_pos = pd.read_excel(semantic_classes_file, 'PROPAGATOR_POS', header=None)[0].values.tolist()
    propagator_neg = pd.read_excel(semantic_classes_file, 'PROPAGATOR_NEG', header=None)[0].values.tolist()
    reverser_pos = pd.read_excel(semantic_classes_file, 'REVERSER_POS', header=None)[0].values.tolist()
    reverser_neg = pd.read_excel(semantic_classes_file, 'REVERSER_NEG', header=None)[0].values.tolist()

    adjective_conditions = {
        Adjective.FAST: [(fast_slow_PN, Sign.POSITIVE), (fast_slow_NP, Sign.NEGATIVE)],
        Adjective.SLOW: [(fast_slow_PN, Sign.NEGATIVE), (fast_slow_NP, Sign.POSITIVE)],
        Adjective.HIGH: [(high_low_PN, Sign.POSITIVE), (high_low_NP, Sign.NEGATIVE)],
        Adjective.LOW: [(high_low_PN, Sign.NEGATIVE), (high_low_NP, Sign.POSITIVE)],
    }

    sentiment_to_score = {
        Sign.POSITIVE: +1,
        Sign.NEGATIVE: -1,
    }


    # Simple implementation: if finds token in unigram_sentiment_dict, then adds sentiment
    # The total sentiment for sentence is averaged
    def calculate_unigram_sentiment(tokenized_sentence, sentiment_map=unigram_sentiment_dict):
        sentiment_score = 0
        for token in tokenized_sentence:
            token_sentiment = sentiment_map.get(token)
            if token_sentiment is None:
                continue
            else:
                sentiment_score += token_sentiment

        return sentiment_score
    
    def calculate_bigram_sentiment_sentence(sentence_bigrams):
        bigrams = []
        bigram_sentiment_score = 0
        for bigram in sentence_bigrams:
            bigram_sentiment = calculate_bigram_sentiment(bigram)
            if bigram_sentiment is not None:
                bigram_sentiment_score += bigram_sentiment
                bigrams.append(bigram)

        return bigram_sentiment_score, bigrams


    def calculate_bigram_sentiment(bigram):
        bigram_sentiment = bigram_sentiment_dict.get(bigram)
        if bigram_sentiment:
            # -0.02 and 0.02 allows for margin of error for neutrals
            if bigram_sentiment < -0.02:
                return -1
            elif bigram_sentiment > 0.02:
                return 1
            else:
                return 0
        return None

    def is_given_sentiment(sentiment, word, sentiment_map):
        if word in sentiment_map:
            if sentiment==Sign.NEGATIVE and sentiment_map[word] < 0:
                return True
            elif sentiment==Sign.POSITIVE and sentiment_map[word] > 0:
                return True

    def calculate_composition_or_adj_sentiment(bigram):
        # Adjective
        adjective_token = adjective_class_map.get(bigram[0])
        if adjective_token is not None:
            for expansions_list, sentiment_sign in adjective_conditions[adjective_token]:
                if bigram[1] in expansions_list:
                    return sentiment_to_score[sentiment_sign]

        # Composition: Reverser
        elif bigram[0] in reverser_pos and is_given_sentiment(Sign.NEGATIVE, bigram[1], unigram_sentiment_dict):
            return sentiment_to_score[Sign.POSITIVE]
        elif bigram[0] in reverser_neg and is_given_sentiment(Sign.POSITIVE, bigram[1], unigram_sentiment_dict):
            return sentiment_to_score[Sign.NEGATIVE]

        # Composition: Propagator
        elif bigram[0] in propagator_pos and is_given_sentiment(Sign.NEGATIVE, bigram[0], unigram_sentiment_dict) and is_given_sentiment(Sign.POSITIVE, bigram[1], unigram_sentiment_dict):
            return sentiment_to_score[Sign.POSITIVE]
        elif bigram[0] in propagator_neg and is_given_sentiment(Sign.POSITIVE, bigram[0], unigram_sentiment_dict) and is_given_sentiment(Sign.NEGATIVE, bigram[1], unigram_sentiment_dict):
            return sentiment_to_score[Sign.NEGATIVE]

        # Composition: Dominator
        elif bigram[0] in dominator_neg:
            return sentiment_to_score[Sign.NEGATIVE]
        elif bigram[0] in dominator_pos:
            return sentiment_to_score[Sign.POSITIVE]

        return None


    def calculate_composition_or_adj_sentiment_sentence(sentence):
        sentiment_count = 0
        bigrams = []
        sentence_bigrams = list(nltk.bigrams(word_tokenize(sentence.lower())))
        for bigram in sentence_bigrams:
            sentiment = calculate_composition_or_adj_sentiment(bigram)
            if sentiment is not None:
                sentiment_count += sentiment
                bigrams.append(bigram)
            else:
                continue 

        # if sentiment_count > 0 then positive
        return sentiment_count, bigrams
    
    def calculate_sentiment_combined(sentence):
        sentiment_score = 0
        table = str.maketrans(dict.fromkeys(string.punctuation))
        cleaned_sentence = sentence.translate(table)  # remove punctuation
        sentence_bigrams = list(nltk.bigrams(word_tokenize(cleaned_sentence.lower())))

        for bigram in sentence_bigrams:
            current_sentiment = calculate_bigram_sentiment(bigram)
            if current_sentiment is None:
                current_sentiment = calculate_composition_or_adj_sentiment(bigram)
                if current_sentiment is None:
                    unigram_sentiment = calculate_unigram_sentiment(bigram)
                    if unigram_sentiment < - 0.1:
                        current_sentiment = -1
                    elif unigram_sentiment > 0.1:
                        current_sentiment = 1
                    else:
                        current_sentiment = 0

            sentiment_score += current_sentiment

        return sentiment_score
    
    
    def convert_score_to_sentiment(score):
        if score < 0:
            return Sign.NEGATIVE
        elif score > 0:
            return Sign.POSITIVE
        else:
            return Sign.NEUTRAL


    def calculate_sentence_level_sentiment(comment_by_sentence):
        """
        :param comment_by_sentence: comment broken down by sentence [sentence1, sentence2, ...]
                                    each sentence is string.
        :return: (overall_score, [(sentiment, sentence), ...])
        """
        sentence_level_sentiment = []
        overall_score = 0
        for sentence in comment_by_sentence:
            score = calculate_sentiment_combined(sentence)
            overall_score += score
            sentiment_sentence_pair = (convert_score_to_sentiment(score), sentence)
            sentence_level_sentiment.append(sentiment_sentence_pair)
        return overall_score, sentence_level_sentiment


    def format_comment_sentiment(comments):
        """
        :param comments: [comment, comment, ...]
        :return:{
                "comment": {
                    "idx": int,
                    "sentiment": str,
                    "sentence_sentiment": [
                        {
                            "idx": int,
                            "sentence": str,
                            "sentiment": str
                        },
                    ]
                }
            }
        """
        
        result = []
        for i in range(len(comments)):
            comment_by_sentence = nltk.tokenize.sent_tokenize(comments[i])
            overall_score, sentence_level_sentiment = calculate_sentence_level_sentiment(comment_by_sentence)
            
            sentence_jsons = []
            for j in range(len(sentence_level_sentiment)):
                sentence_json = {
                    "idx": j,
                    "sentence": sentence_level_sentiment[j][1],
                    "sentiment": sentence_level_sentiment[j][0]
                }
                sentence_jsons.append(sentence_json)
            
            comment_json = {
                "comment": {
                    "idx": i,
                    "sentiment": convert_score_to_sentiment(overall_score),
                    "sentence_sentiment": sentence_jsons
                }
            }
            
            result.append(comment_json)
    
        return json.dumps(result)
    
    
    def score(payload):
        '''
        :param payload: dictionary, expects the key 'values' (list of comments).
        :return: JSON serializable object with format
                [
                    {
                        "comment": {
                            "idx": int,
                            "sentiment": str,
                            "sentence_sentiment": [
                                {
                                    "idx": int,
                                    "sentence": str,
                                    "sentiment": str
                                },
                            ]
                        }
                    },
                ]
        '''
        comments = payload['input_data'][0]['values'].get('test') 
        score_response = {'predictions': [{ 'values': format_comment_sentiment(comments)}]}  
        return score_response
        
    
    return score

In [15]:
# TODO: Uncomment to test deployable function
# json.loads(run_sentiment_analysis()({"values": comments}))

### 1.2.2 Deploy Function <a class="anchor" id="1-2-2"></a>

In [16]:
# Store function details

meta_data = {
    client.repository.FunctionMetaNames.NAME : 'Function for Sentiment Analysis',
    client.repository.FunctionMetaNames.TAGS : ['sentiment_analysis_function_tag'],
    client.repository.FunctionMetaNames.SOFTWARE_SPEC_UID: software_spec_id

}

function_details = client.repository.store_function( meta_props=meta_data, function=run_sentiment_analysis )

In [17]:
# Get function id
function_id = function_details["metadata"]["id"]

In [18]:
meta_props = {
    client.deployments.ConfigurationMetaNames.NAME: 'Sentiment Analysis Deployment',
   client.deployments.ConfigurationMetaNames.TAGS : ['sentiment_analysis_function_deployment_tag'],
    client.deployments.ConfigurationMetaNames.ONLINE: {}
}

# deploy the function
function_deployment_details = client.deployments.create(artifact_uid=function_id, meta_props=meta_props)



#######################################################################################

Synchronous deployment creation for uid: '19c61b69-7d69-4735-8ce5-1df006df5105' started

#######################################################################################


initializing....
ready


------------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_uid='c77d930a-7392-4569-a88c-cb8696bbef5f'
------------------------------------------------------------------------------------------------




In [19]:
# Get deployment id for clustering
sentiment_deployment_id = client.deployments.get_uid(function_deployment_details)

### 1.2.3 Test Deployed Function <a class="anchor" id="1-2-3"></a>

After successfully deploying, test the deployed function.

In [20]:
payload = {client.deployments.ScoringMetaNames.INPUT_DATA: [{'values': 
                                                             {"test": comments}} 
                                                           ]}


# Send data to deployment for processing
client.deployments.score(sentiment_deployment_id, payload)

{'predictions': [{'values': '[{"comment": {"idx": 0, "sentiment": "Negative", "sentence_sentiment": [{"idx": 0, "sentence": "I bought several items: socks, shirt, sweater.", "sentiment": "Negative"}, {"idx": 1, "sentence": "By far my most favorite was the shirt because it is so soft.", "sentiment": "Negative"}, {"idx": 2, "sentence": "However, the sweater and socks missed the mark.", "sentiment": "Negative"}]}}, {"comment": {"idx": 1, "sentiment": "Positive", "sentence_sentiment": [{"idx": 0, "sentence": "My order arrived several days late.", "sentiment": "Negative"}, {"idx": 1, "sentence": "But when I contacted customer serivce they were very helpful and refunded me.", "sentiment": "Positive"}]}}, {"comment": {"idx": 2, "sentiment": "Negative", "sentence_sentiment": [{"idx": 0, "sentence": "Horrible, horrible customer service, I have never met such rude people.", "sentiment": "Negative"}, {"idx": 1, "sentence": "Why is it so bad?", "sentiment": "Negative"}, {"idx": 2, "sentence": "Wou