# Software Design Using ML&AI nWave


# 1. Setup

To prepare your environment, you need to install some packages

# 1.1 Install the necessary packages

You need the latest versions of these packages:<br>
** Watson Developer Cloud:** a client library for Watson services -- using this for Object Storage which comes free with the ibm bluemix account.<br>
** Spacy** a client library for NLP.<br>
** Pandas for dataframe.<br>
** stop_words: **List of common stop words.<br>
** python-boto3:** is a python client for the Boto3 API used for communicating to AWS.<br>
** websocket-client: ** is a python client for the Websockets.<br>
** pyorient: ** is a python client for the Orient DB.<br><br>



** Install NLTK: **

In [10]:
!pip install --upgrade nltk


Requirement already up-to-date: nltk in /anaconda3/lib/python3.6/site-packages (3.2.5)
Requirement not upgraded as not directly required: six in /anaconda3/lib/python3.6/site-packages (from nltk) (1.11.0)
Collecting python-swiftclient
[?25l  Downloading https://files.pythonhosted.org/packages/95/d4/393d9f835abdc8085edf6d1007e9954af33ac4ff7c160db694c25dcbadb1/python_swiftclient-3.5.0-py2.py3-none-any.whl (77kB)
[K    100% |████████████████████████████████| 81kB 225kB/s ta 0:00:01
Installing collected packages: python-swiftclient
Successfully installed python-swiftclient-3.5.0


**Install Boto3 client for AWS communication thorugh CLI **

In [14]:
!pip install boto3 

[31mawscli 1.15.10 has requirement botocore==1.10.10, but you'll have botocore 1.10.11 which is incompatible.[0m


** Install stop_words **

In [2]:
!pip install stop-words



** Install websocket client: **

In [3]:
!pip install websocket-client



** Install pyorient: **

In [15]:
! pip install awscli
! pip install pyorient --user

Collecting botocore==1.10.10 (from awscli)
  Using cached https://files.pythonhosted.org/packages/77/c9/a40ebce24bbab4c7986fccdac9dade097385ad2feae73dcc47d31a1b4dc8/botocore-1.10.10-py2.py3-none-any.whl
[31mboto3 1.7.11 has requirement botocore<1.11.0,>=1.10.11, but you'll have botocore 1.10.10 which is incompatible.[0m
Installing collected packages: botocore
  Found existing installation: botocore 1.10.11
    Uninstalling botocore-1.10.11:
      Successfully uninstalled botocore-1.10.11
Successfully installed botocore-1.10.10
[31mboto3 1.7.11 has requirement botocore<1.11.0,>=1.10.11, but you'll have botocore 1.10.10 which is incompatible.[0m


# 1.2 Import packages and libraries 

Import the packages and libraries that you'll use:

In [16]:
import json
import spacy

import re
import nltk
from nltk.cluster.util import cosine_distance
from stop_words import get_stop_words
import numpy

import boto3
from botocore.client import Config

import websocket
import _thread
import time

from io import BytesIO
import pandas as pd
import json
import sys

# 2. Configuration

Add configurable items of the notebook below
## 2.1 Add your service credentials if any required( this is where you need to add credentials of infrastructure you are using to store data etc)


Run the cell.

## 2.2 Add your service credentials for S3

You must create S3 bucket service on AWS. To access data in a file in Object Storage, you need the Object Storage authentication credentials. Insert the Object Storage authentication credentials as credentials_1 in the following cell after removing the current contents in the cell.

In [17]:
# @hidden_cell
# The following code contains the credentials for a file in your IBM Cloud Object Storage.
# You might want to remove those credentials before you share your notebook.
credentials_1 = {
    'ACCESS_KEY_ID': 'AKIAI2HNLNKRJAZABJOQ',
    'ACCESS_SECRET_KEY': 'DLracSjxQKTWSOjyGAX+yyC+PjRTx0c8r58nt/Vu',
    'BUCKET': 'software-testing-pyscript'
}

# 3.  Spacy Text Classification

Write the classification related utility functions in a modularalized form.
Define functions based on the text classification


# 4. Correlate text content

# 5. Persistence and Storage
## 5.1 Configure Object Storage Client

In [21]:
s3 =    boto3.resource('s3',
                    aws_access_key_id=credentials_1['ACCESS_KEY_ID'],
                    aws_secret_access_key=credentials_1['ACCESS_SECRET_KEY'],
                    config=Config(signature_version='s3v4')
                     )

#def get_file(filename):
#    '''Retrieve file from Cloud Object Storage'''
#    fileobject = cos.get_object(Bucket=credentials_1['BUCKET'], Key=filename)['Body']
#    return fileobject

#def load_string(fileobject):
#    '''Load the file contents into a Python string'''
#    text = fileobject.read()
#    return text

#def load_df(fileobject,sheetname):
#    '''Load file contents into a Pandas dataframe'''
#    excelFile = pd.ExcelFile(fileobject)
#    df = excelFile.parse(sheetname)
#    return df

def put_file(filename, filecontents):
    '''Write file to Cloud Object Storage'''
    resp = s3.put_object(Bucket=credentials_1['BUCKET'], Key=filename, Body=filecontents)
    #resp = s3.Bucket(Bucket=credentials_1['BUCKET']).put_object(Key=filename, Body=filecontents)
    return resp



## 5.2 OrientDB client - functions to connect, store and retrieve data

** Connect to OrientDB **

** OrientDB Core functions **

** OrientDB Insights **

# 6. Data Preparation

## 6.1 Global variables and functions

In [None]:
# Name of the excel file with data in Object Storage
dataFileName = "sample_data.xlsx"

# Name of the config file in Object Storage
configFileName = "sample_config.txt"

# Config contents
config = None;

# Data file
datafile = None

# Requirements dataframe
requirements_sheet_name = "Requirements"
requirements_df = None

# Defects dataframe
defects_sheet_name = "Defects"
defects_df = None

# Testcases dataframe
testcases_sheet_name ="TestCases"
testcases_df = None

def load_artifacts():
    """ Load the artifacts into a pandas dataframe
    """
    global requirements_df 
    global defects_df 
    global testcases_df 
    global config
    global datafile
    config = load_string(get_file(configFileName))
    datafile = get_file(dataFileName)
    excel_file = pd.ExcelFile(datafile)
    requirements_df = excel_file.parse(requirements_sheet_name)
    defects_df = excel_file.parse(defects_sheet_name)
    testcases_df = excel_file.parse(testcases_sheet_name)
    
def prepare_artifact_dataframes():
    """ Prepare artifact dataframes by creating necessary output columns
    """
    global requirements_df 
    global defects_df 
    global testcases_df 
    req_cols_len = len(requirements_df.columns)
    def_cols_len = len(defects_df.columns)
    tcs_cols_len = len(testcases_df.columns)
    requirements_df.insert(req_cols_len, "ClassifiedText","")
    requirements_df.insert(req_cols_len+1, "Keywords","")
    requirements_df.insert(req_cols_len+2, "DefectsMatchScore","")

    defects_df.insert(def_cols_len, "ClassifiedText","")
    defects_df.insert(def_cols_len+1, "Keywords","")
    defects_df.insert(def_cols_len+2, "TestCasesMatchScore","")

    testcases_df.insert(tcs_cols_len, "ClassifiedText","")
    testcases_df.insert(tcs_cols_len+1, "Keywords","")
    testcases_df.insert(tcs_cols_len+2, "RequirementsMatchScore","")

## 6.2 Utility functions for Engineering Insights

In [None]:
def add_text_classifier_output(artifact_df, config, output_column_name):
    """ Add Watson text classifier output to the artifact dataframe
    """
    for index, row in artifact_df.iterrows():
        summary = row["Description"]
        classifier_journey_output = classify_text(summary, config)
        artifact_df.set_value(index, output_column_name, classifier_journey_output)
    return artifact_df 
           
def add_keywords_entities(artifact_df, classify_text_column_name, output_column_name):
    """ Add keywords and entities to the artifact dataframe"""
    for index, artifact in artifact_df.iterrows():
        keywords_array = []
        for row in artifact[classify_text_column_name]['keywords']:
            if not row['text'] in keywords_array:
                keywords_array.append(row['text'])
                
        for entities in artifact[classify_text_column_name]['entities']:
            if not entities['text'] in keywords_array:
                keywords_array.append(entities['text'])
            if not entities['type'] in keywords_array:
                keywords_array.append(entities['type'])
        artifact_df.set_value(index, output_column_name, keywords_array)
    return artifact_df 

def populate_text_similarity_score(artifact_df1, artifact_df2, keywords_column_name, output_column_name):
    """ Populate text similarity score to the artifact dataframes
    """
    for index1, artifact1 in artifact_df1.iterrows():
        matches = []
        top_matches = []
        for index2, artifact2 in artifact_df2.iterrows():
            matches.append({'ID': artifact2['ID'], 
                            'cosine_score': 0, 
                            'SubjectID':artifact1['ID']})
            cosine_score = compute_text_similarity(
                artifact1['Description'], 
                artifact2['Description'], 
                artifact1['Keywords'], 
                artifact2['Keywords'])
            matches[index2]["cosine_score"] = cosine_score
       
        sorted_obj = sorted(matches, key=lambda x : x['cosine_score'], reverse=True)
      
        for obj in sorted_obj:
            if obj['cosine_score'] > 0.4:
                top_matches.append(obj)
               
        artifact_df1.set_value(index1, output_column_name, top_matches)
    return artifact_df1

## 6.3 Process flow

** Prepare data **
* Load artifacts from object storage and create pandas dataframes
* Prepare the pandas dataframes. Add additional columns required for further processing.

In [None]:
load_artifacts()
prepare_artifact_dataframes()

** Run Spacy Text Classifier on data **
* Add the text classification output to the artifact dataframes

In [None]:
output_column_name = "ClassifiedText"
defects_df = add_text_classifier_output(defects_df,config, output_column_name)
testcases_df = add_text_classifier_output(testcases_df,config, output_column_name)
requirements_df = add_text_classifier_output(requirements_df,config, output_column_name)

** Populate keywords and entities **
* Add the keywords and entities extracted from the unstructured text to the artifact dataframes

** Correlate keywords between artifacts **
* Add the text similarity score of associated artifacts to the dataframe

** Utility functions to store entities and relations in Orient DB **

# 7. Transform results for Visualization

# 8. Expose integration point with a websocket client

## 8.1 Start websocket client

In [None]:
start_websocket_listener()