# Corporate Credit Rating Endpoint Demo

In this endpoint demo notebook, we demonstrate how to send inference requests to an pre-deployed endpoint and get the model response.

To find more details of an end-to-end solution for model training and deployement using SageMaker, check the solution notebook `corporate-credit-rating.ipynb`. It shows how to train the AutoGluon model on a financial dataset that consists of the 5 financial ratios for Altman Z-score and eleven NLP scores based on SEC filings text so as to achieve a material improvement in the prediction of credit ratings. The exposition in this notebook is deliberately brief. 

>**<span style="color:RED">Important</span>**: 
>This solution is for demonstrative purposes only. It is not financial advice and should not be relied on as financial or investment advice. The associated notebooks, including the trained model, use synthetic data, and are not intended for production.

### Step 1: Read in the solution config

In [None]:
import json

SOLUTION_CONFIG = json.load(open("stack_outputs.json"))
ROLE = SOLUTION_CONFIG["IamRole"]
SOLUTION_BUCKET = SOLUTION_CONFIG["SolutionS3Bucket"]
REGION = SOLUTION_CONFIG["AWSRegion"]
SOLUTION_NAME = SOLUTION_CONFIG["SolutionName"]
BUCKET = SOLUTION_CONFIG["S3Bucket"]

### Step 2: Download and read in the synthetic multimodal dataset for inference

The synthetic multimodal dataset is consist of texts from the MD&A section in the 10K/Q SEC filings, the industrial classification codes, and simulated tabular data with 8 financial variables that are essential to calculate the Altman’s Z-score.

In [None]:
from sagemaker.s3 import S3Downloader

input_data_bucket = f"s3://{SOLUTION_BUCKET}-{REGION}/{SOLUTION_NAME}/data"
print("original data: ")
S3Downloader.list(input_data_bucket)

#### Download the data for inference from S3

In [None]:
inference_data = f"{input_data_bucket}/inference_data.csv"
!aws s3 cp $inference_data .

In [None]:
import pandas as pd

df = pd.read_csv("inference_data.csv")
print(df.shape)
df.head()

Next step, we convert 8 inputs into the following 5 financial ratios.  

In [None]:
df["A"] = df["EBIT"]/df["TotalAssets"]
df["B"] = df["NetSales"]/df["TotalAssets"]
df["C"] = df["MktValueEquity"]/df["TotalLiabs"]
df["D"] = (df["CurrentAssets"]-df["CurrentLiabs"])/df["TotalAssets"]
df["E"] = df["RetainedEarnings"]/df["TotalAssets"]
df = df.drop(["TotalAssets","CurrentLiabs","TotalLiabs", "RetainedEarnings", "CurrentAssets", 
              "NetSales", "EBIT", "MktValueEquity"], axis=1)
df.head()

In [None]:
df.to_csv("inference_data_input.csv", index=False)

### Step 3: Add NLP scores to the multimodal dataset

We add 11 NLP scores to the multimodal dataset using the <span style="color:lightgreen">SageMaker JumpStart Industry Python SDK</span>. This client library helps trigger a SageMaker processing job. Running the NLP-scoring processing job will take about 10 minutes to complete. 

#### Download dependencies and install SageMaker JumpStart Industry Python SDK

In [None]:
dependency_bucket = f"s3://{SOLUTION_BUCKET}-{REGION}/{SOLUTION_NAME}/python-dependencies"

!mkdir -p python-dependencies
!aws s3 sync $dependency_bucket python-dependencies/

!pip install smjsindustry --no-index --find-links file://$PWD/python-dependencies/wheelhouse


Here, we use `ml.c5.18xlarge` for the NLPScorer processing job to reduce the running time. If `ml.c5.18xlarge` is not available in your region or account, choose one of the other processing instances. If you encounter an error message that you've exceeded your quota, use AWS Support to request a service limit increase for [SageMaker resources](https://console.aws.amazon.com/support/home#/) you want to scale up.

In [None]:
import sagemaker
from smjsindustry import NLPScoreType, NLPSCORE_NO_WORD_LIST
from smjsindustry import NLPScorer, NLPScorerConfig

score_type_list = list(
    NLPScoreType(score_type, [])
    for score_type in NLPScoreType.DEFAULT_SCORE_TYPES
    if score_type not in NLPSCORE_NO_WORD_LIST
)
score_type_list.extend([NLPScoreType(score_type, None) for score_type in NLPSCORE_NO_WORD_LIST])
nlp_scorer_config = NLPScorerConfig(score_type_list)

nlp_score_processor = NLPScorer(
        ROLE,        
        1,                                      
        'ml.c5.18xlarge',                        
        volume_size_in_gb=30,                  
        volume_kms_key=None,                  
        output_kms_key=None,                   
        max_runtime_in_seconds=None,            
        sagemaker_session=sagemaker.Session(),  
        tags=None)                              

nlp_score_processor.calculate(
    nlp_scorer_config, 
    "MDNA", 
    "inference_data_input.csv", 
    "s3://{}/{}".format(BUCKET, "nlp_score"), 
    "ccr_nlp_score_inference.csv"
)

In [None]:
import boto3
client = boto3.client('s3')
client.download_file(BUCKET, '{}/{}'.format("nlp_score", 'ccr_nlp_score_inference.csv'), 'ccr_nlp_score_inference.csv')
df_tabtext_score = pd.read_csv('ccr_nlp_score_inference.csv')
df_tabtext_score.head()

### Step 4: Test the endpoint

If you want to use the demo endpoint successfully, your dataframe columns should be identical to the `df_tabtext_score` as shown in the previous step.

In [None]:
import sagemaker
from sagemaker import Predictor

endpoint_name = SOLUTION_CONFIG["SolutionPrefix"] + "-demo-endpoint" 


predictor = Predictor(
    endpoint_name = endpoint_name,
    sagemaker_session = sagemaker.Session(),
    deserializer =  sagemaker.deserializers.JSONDeserializer(),
    serializer = sagemaker.serializers.CSVSerializer(),
)

predictor.predict(df_tabtext_score.values)
