# Usage Instructions - Mphasis DeepInsights Text Summarizer

DeepInsights is a cloud-based cognitive computing platform that offers data extraction & predictive analytics capabilities. Text Summarization is an optimal way to tackle the problem of information overload by reducing the size of long document into a few sentences or paragraphs.The recent advance in neural network architecture and training algorithms has shown the effectiveness of representation learning. The neural-network-based models generate better representation than the traditional ones. They have the ability to automatically learn the distributed representation for sentences and documents. This summarizer is built using Transfer Learning, Transformer based models which use self attention.



## Contents

1. [Prequisites](#Prerequisite)
1. [Data Dictionary](#Data-Dictionary)
1. [Set Up The Environment](#Set-up-the-environment)
1. [Create The Model](#Create-Model)
1. [Batch Transform Job](#Batch-Transform-Job)
1. [Invoke Endpoint](#Invoking-through-Endpoint)

### Prerequisites

To run this algorithm you need to have access to the following AWS Services:
- Access to AWS SageMaker and the model package.
- An S3 bucket to specify input/output.
- Role for AWS SageMaker to access input/output from S3.


### Data Dictionary

- The input has to be a '.txt' file with 'utf-8' encoding. PLEASE NOTE: If your input .txt file is not 'utf-8' encoded, model   will not perform as expected
- The input can have a maximum of 512 words.
- To make sure that your input file is 'UTF-8' encoded please 'Save As' using Encoding as 'UTF-8'

## Set up the environment


### Update Boto Client and AWS SDK

We are launching new APIs in SageMaker to support this new functionality. The next cell sets it up for you to invoke the new APIs.

### Private Beta Setup

The private beta is limited to us-east-2 region. The client we are creating below will be hard-coded to talk to our us-east-2 endpoint only.



### Sample input data

In [8]:
with open('./self_driving_test.txt', 'rb') as file_stream:
    input_text = file_stream.read().decode('utf-8')

In [9]:
print(input_text)

﻿An autonomous car, also known as a robotic car, self-driving car, or driverless car,[1][2] is a vehicle that is capable of sensing its environment and moving with little or no human input.[3]

Autonomous cars combine a variety of sensors to perceive their surroundings, such as radar, Lidar, sonar, GPS, odometry and inertial measurement units. Advanced control systems interpret sensory information to identify appropriate navigation paths, as well as obstacles and relevant signage.[4][5]

Long distance trucks are seen as being in the forefront of adopting and implementing the technology.[6]

History

Main article: History of self-driving cars
Experiments have been conducted on automated driving systems (ADS) since at least the 1920s;[7] trials began in the 1950s. The first semi-automated car was developed in 1977, by Japan's Tsukuba Mechanical Engineering Laboratory, which required specially marked streets that were interpreted by two cameras on the vehicle and an analog comput

### Create the session

The session remembers our connection parameters to SageMaker. We'll use it to perform all of our SageMaker operations.

In [10]:
import sagemaker as sage
from time import gmtime, strftime
from sagemaker import get_execution_role

sess = sage.Session()
role = get_execution_role()

## Create Model

Now we use the Model Package to create a model

In [11]:
# Please use the appropriate ARN obtained after subscribing to the model to define 'model_package_arn'

model_package_arn = 'arn:aws:sagemaker:us-east-2:786796469737:model-package/marketplace-text-summarizer-11-4'
from sagemaker import ModelPackage
import sagemaker as sage
from sagemaker import get_execution_role

role = get_execution_role()
sagemaker_session = sage.Session()
model = ModelPackage(model_package_arn=model_package_arn,
                    role = role,
                    sagemaker_session = sagemaker_session)


## Input File

Now we pull a sample input file for testing the model.

In [12]:
sample_txt="s3://aws-marketplace-mphasis-assets/Text Summarizer/self_driving.txt"

## Batch Transform Job

Now let's use the model built to run a batch inference job and verify it works.

In [13]:
import json 
import uuid


transformer = model.transformer(1, 'ml.m5.xlarge')
transformer.transform(sample_txt, content_type='text/plain')
transformer.wait()
#transformer.output_path
print("Batch Transform complete")


........................
.[32m2020-04-11T17:51:29.109:[sagemaker logs]: MaxConcurrentTransforms=1, MaxPayloadInMB=6, BatchStrategy=MULTI_RECORD[0m
[34m * Serving Flask app "serve" (lazy loading)
 * Environment: production
   Use a production WSGI server instead.
 * Debug mode: off
 * Running on http://0.0.0.0:8080/ (Press CTRL+C to quit)[0m
[34m169.254.255.130 - - [11/Apr/2020 17:51:29] "#033[37mGET /ping HTTP/1.1#033[0m" 200 -[0m
[34m169.254.255.130 - - [11/Apr/2020 17:51:29] "#033[33mGET /execution-parameters HTTP/1.1#033[0m" 404 -[0m
[35m * Serving Flask app "serve" (lazy loading)
 * Environment: production
   Use a production WSGI server instead.
 * Debug mode: off
 * Running on http://0.0.0.0:8080/ (Press CTRL+C to quit)[0m
[35m169.254.255.130 - - [11/Apr/2020 17:51:29] "#033[37mGET /ping HTTP/1.1#033[0m" 200 -[0m
[35m169.254.255.130 - - [11/Apr/2020 17:51:29] "#033[33mGET /execution-parameters HTTP/1.1#033[0m" 404 -[0m
[34m---input--- ﻿An autonomous car, also known

## Output from Batch Transform

Note: Ensure that the following package is installed on the local system : boto3

In [14]:
print(transformer.output_path)
bucketFolder = transformer.output_path.rsplit('/')[3]
#print(s3bucket,s3prefix)
s3_conn = boto3.client("s3")
bucket_name="sagemaker-us-east-2-786796469737"
with open('result.txt', 'wb') as f:
    s3_conn.download_fileobj(bucket_name,bucketFolder+'/self_driving.txt.out', f)
    print("Output file loaded from bucket")

s3://sagemaker-us-east-2-786796469737/marketplace-text-summarizer-11-4-2020-0-2020-04-11-17-47-35-070
Output file loaded from bucket


In [15]:
with open('./result.txt', 'rb') as file_stream:
    output_text = file_stream.read().decode('utf-8')

In [16]:
print(output_text)

Autonomous cars combine a variety of sensors to perceive their surroundings , such as radar , lidar , sonar , gps , odometry and inertial measurement units. ﻿an autonomous car , also known as a robotic car , self-driving car , or driverless car ,   is a vehicle that is capable of sensing its environment and moving with little or no human input. Advanced control systems interpret sensory information to identify appropriate navigation paths , as well as obstacles and relevant signage.
Execution time : 1.93seconds



## Invoking through Endpoint
This is another way of deploying the model that provides results as real time inference. Here is a sample endpoint for reference

In [17]:
import json 
import uuid
from sagemaker import ModelPackage
import sagemaker as sage
from sagemaker import get_execution_role
from sagemaker import ModelPackage
import boto3
from IPython.display import Image
from PIL import Image as ImageEdit

role = get_execution_role()

sagemaker_session = sage.Session()
bucket=sagemaker_session.default_bucket()

In [18]:
content_type='text/plain'
model_name='summarizer-model'
real_time_inference_instance_type='ml.c4.2xlarge'

In [19]:
# Please use the appropriate ARN obtained after subscribing to the model to define 'model_package_arn'
model_package_arn = 'arn:aws:sagemaker:us-east-2:786796469737:model-package/marketplace-text-summarizer-11-4'

In [20]:
from sagemaker import ModelPackage
import sagemaker as sage
from sagemaker import get_execution_role

role = get_execution_role()
sagemaker_session = sage.Session()

In [21]:
#Define predictor wrapper class
def predict_wrapper(endpoint, session):
    return sage.RealTimePredictor(endpoint, session,content_type=content_type)
#create a deployable model from the model package.
model = ModelPackage(role=role,
                    model_package_arn=model_package_arn,
                    sagemaker_session=sagemaker_session,
                    predictor_cls=predict_wrapper)

In [22]:
predictor = model.deploy(1, real_time_inference_instance_type, endpoint_name=model_name)

-------------!

###  1. Invoking endpoint result through CLI command

In [23]:
file_name="self_driving_test.txt"

In [24]:
!aws sagemaker-runtime invoke-endpoint --endpoint-name $model_name --body fileb://$file_name --content-type 'text/plain' --region us-east-2 result.txt

{
    "ContentType": "text/plain; charset=utf-8",
    "InvokedProductionVariant": "AllTraffic"
}


In [25]:
with open('./result.txt', 'rb') as file_stream:
    input_text = file_stream.read().decode('utf-8')
print(input_text)

Autonomous cars combine a variety of sensors to perceive their surroundings , such as radar , lidar , sonar , gps , odometry and inertial measurement units. ﻿an autonomous car , also known as a robotic car , self-driving car , or driverless car ,   is a vehicle that is capable of sensing its environment and moving with little or no human input. Advanced control systems interpret sensory information to identify appropriate navigation paths , as well as obstacles and relevant signage.
Execution time : 2.20seconds



### 2. Invoking endpoint result through python code

In [26]:
f = open('./self_driving_test.txt', mode='r')
data=f.read()
prediction = predictor.predict(data)

In [33]:
from io import StringIO

s=str(prediction,'utf-8')
data = StringIO(s) 
print(data.read())

Autonomous cars combine a variety of sensors to perceive their surroundings , such as radar , lidar , sonar , gps , odometry and inertial measurement units. ﻿an autonomous car , also known as a robotic car , self-driving car , or driverless car ,   is a vehicle that is capable of sensing its environment and moving with little or no human input. Advanced control systems interpret sensory information to identify appropriate navigation paths , as well as obstacles and relevant signage.
Execution time : 1.73seconds

