## Synthetic Data Generation - SWIFT MT103

Synthetic data for SWIFT MT103(Single Customer Credit Transfer) creates curated synthetic data which helps banks and other financial institutions to test their systems, do simulation exercises, train employees and do compliance testing without getting exposed to actual SWIFT messages. These synthetic swift messages mimic actual transactions in their format and structure. The synthetic data is generated for both the Mandatory and Optional tags, hence users can get data for every kind of transaction through message type MT103. The solution also gives users to get information of how a swift transaction would look like between certain countries, currencies and banks of their choice. Added to it users can enter custom banks for synthetic data generation. This would help generate data for specifically for some banks/financial institutions/countries.  

This sample notebook shows you how to execute Synthetic Data Generation - SWIFT MT103 Algorithm from AWS Marketplace 

> **Note**: This is a reference notebook and it cannot run unless you make changes suggested in the notebook.

#### Pre-requisites:
1. **Note**: This notebook contains elements which render correctly in Jupyter interface. Open this notebook from an Amazon SageMaker Notebook Instance or Amazon SageMaker Studio.
1. Ensure that IAM role used has **AmazonSageMakerFullAccess**
1. Some hands-on experience using [Amazon SageMaker](https://aws.amazon.com/sagemaker/).
1. To use this algorithm successfully, ensure that:
    1. Either your IAM role has these three permissions and you have authority to make AWS Marketplace subscriptions in the AWS account used: 
        1. **aws-marketplace:ViewSubscriptions**
        1. **aws-marketplace:Unsubscribe**
        1. **aws-marketplace:Subscribe**  
    2. or your AWS account has a subscription to For Seller to update: Synthetic Data Generation - SWIFT MT103. 

#### Contents:
1. [Subscribe to the algorithm](#1.-Subscribe-to-the-algorithm)
1. [Prepare dataset](#2.-Prepare-dataset)
	1. [Dataset format expected by the algorithm](#A.-Dataset-format-expected-by-the-algorithm)
	1. [Configure dataset](#B.-Configure-dataset)
	1. [Upload datasets to Amazon S3](#C.-Upload-datasets-to-Amazon-S3)
1. [Execute optimization model](#3.-Execute-optimization-model)
	1. [Set up environment](#3.1-Set-up-environment)
	1. [Execute model](#3.2-Execute-model)
    1. [Visualize Output](#3.3-Inspect-the-Output-in-S3)
1. [Clean-up](#4.-Clean-up)
	1. [Unsubscribe to the listing (optional)](#Unsubscribe-to-the-listing-(optional))


#### Usage instructions
You can run this notebook one cell at a time (By using Shift+Enter for running a cell).

### 1. Subscribe to the algorithm

To subscribe to the algorithm:
1. Open the algorithm listing page **Synthetic Data Generation - SWIFT MT103**
1. On the AWS Marketplace listing,  click on **Continue to subscribe** button.
1. On the **Subscribe to this software** page, review and click on **"Accept Offer"** if you agree with EULA, pricing, and support terms. 
1. Once you click on **Continue to configuration button** and then choose a **region**, you will see a **Product Arn**. This is the algorithm ARN that you need to specify while training a custom ML model. Copy the ARN corresponding to your region and specify the same in the following cell.

In [1]:
algo_arn = "swift-listing-mt-103"


### 2. Prepare dataset

In [2]:
import os
import json 
import uuid
import boto3
import pickle
import base64
import tarfile
from pprint import pprint
from PIL import Image

import numpy as np
import pandas as pd

import urllib.request
from urllib.parse import urlparse

import sagemaker as sage
from sagemaker import ModelPackage
from sagemaker import get_execution_role

#### A. Dataset format expected by the algorithm

The algorithm requires data in the format as described for best results:
* Input File name should be input_zip.zip
* The input data files must contain one json file(containing countries,banks and currencies across which they wish to see transactions) and/or a csv file(optional) containing extra banks/currencies/countries which user wants to input.
* For detailed instructions, please refer sample notebook and algorithm input details.

#### B. Configure dataset

In [3]:
training_dataset="Input/input_zip.zip"

#### C. Upload datasets to Amazon S3

In [4]:
role = get_execution_role()

In [None]:
sagemaker_session = sage.Session()

bucket = sagemaker_session.default_bucket()
bucket

In [None]:
# training input location
common_prefix = "swift-listing"
training_input_prefix = common_prefix + "/training-input-data"
TRAINING_WORKDIR = "Input" #Input directory in Jupyter Server
training_input = sagemaker_session.upload_data(TRAINING_WORKDIR, key_prefix=training_input_prefix) #uploads data from jupyter server to S3
print("Training input uploaded to " + training_input)

## 3. Execute the training process

Now that dataset is available in an accessible Amazon S3 bucket, we are ready to execute a training pipeline to get clean sentiment class labels using clean-sentiment-classification-labels Algorithm. 

### 3.1 Set up environment

In [7]:
output_location = 's3://{}/swift-listing/{}'.format(bucket, 'Output')

### 3.2 Execute model

For information on creating an `Estimator` object, see [documentation](https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html)

In [8]:
training_instance_type="ml.m5.4xlarge"

In [9]:
#Create an estimator object for running a training job
estimator = sage.algorithm.AlgorithmEstimator(
    algorithm_arn=algo_arn,
    base_job_name="swift-listing-algo",
    role=role,
    train_instance_count=1,
    train_instance_type=training_instance_type,
    input_mode="File",
    output_path=output_location,
    sagemaker_session=sagemaker_session,
    instance_count=1,
    instance_type=training_instance_type
)

#Run the training job.
estimator.fit({"training": training_input})

2023-11-28 08:43:41 Starting - Starting the training job...
2023-11-28 08:44:04 Starting - Preparing the instances for trainingProfilerReport-1701161020: InProgress
......
2023-11-28 08:45:04 Downloading - Downloading input data...
2023-11-28 08:45:24 Training - Downloading the training image.........
[34mTrue[0m
[34mEmpty DataFrame[0m
[34mColumns: [ISO COUNTRY CODE, COUNTRY NAME, INSTITUTION NAME, IBAN BIC, ADDRESS_BANK, CURRENCY][0m
[34mIndex: [][0m
[34m#015  0%|          | 0/100 [00:00<?, ?it/s]#015Sampling rows:   0%|          | 0/100 [00:00<?, ?it/s]#015Sampling rows: 100%|██████████| 100/100 [00:00<00:00, 1623.04it/s][0m
[34mA value is trying to be set on a copy of a slice from a DataFrame[0m
[34mSee the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  synth_data['tag_23_e'][i] = synth_data['tag_23_e'][i]+random.choice(['','/'+str(random.randint(10 ** (num_digits - 1), 10 ** num_digit

See this [blog-post](https://aws.amazon.com/blogs/machine-learning/easily-monitor-and-visualize-metrics-while-training-models-on-amazon-sagemaker/) for more information how to visualize metrics during the process. You can also open the training job from [Amazon SageMaker console](https://console.aws.amazon.com/sagemaker/home?#/jobs/) and monitor the metrics/logs in **Monitor** section.

In [None]:
#output is available on following path
estimator.output_path

## Note: Inferencing is done within training pipeline. Real time inference endpoint/batch transform job is not required.

### 3.3 Inspect the Output in S3

In [11]:
parsed_url = urlparse(estimator.output_path)
bucket_name = parsed_url.netloc
file_key = parsed_url.path[1:]+'/'+estimator.latest_training_job.job_name+'/output/'+"model.tar.gz"

s3_client = sagemaker_session.boto_session.client('s3')
response = s3_client.get_object(Bucket = sagemaker_session.default_bucket(), Key = file_key)

In [12]:
bucketFolder = estimator.output_path.rsplit('/')[3] +'/Output/'+ estimator.latest_training_job.job_name+'/output/'+"model.tar.gz"

In [13]:
s3_conn = boto3.client("s3")
bucket_name=bucket
with open('output.tar.gz', 'wb') as f:
    s3_conn.download_fileobj(bucket_name, bucketFolder, f)
    print("Output file loaded from bucket")



Output file loaded from bucket


In [14]:
with tarfile.open('output.tar.gz') as file:
    file.extractall('./output')

In [15]:
output_path = "output/output"
f = open(os.path.join(output_path,'synth_data.json'))
json_strings = json.load(f)  
# Since output are json strings, use this command to convert those into a dataframe

data = [json.loads(js) for js in json_strings]

df = pd.DataFrame(data)
df.head(10)


Unnamed: 0,ID,MESSAGE TYPE,SENDER,RECEIVER,:20:,:23B:,:23E:,:32A:,:33B:,:50A:,...,:56C:,:56D:,:57A:,:57B:,:57C:,:57D:,:59:,:59A:,:59F:,:71A:
0,0,fin.103,BCSIBRRSXXX,BROUBRSPXXX,KJZ7ZC7NUGSGAEHW,SSTD,TELE,"210623BRL1116,50","GBP7487841,50",/X17632329256582851536BCSIBRRSXXX,...,ABBYGB2LXXX,ABBYGB2LXXX SIMLTK XALOF BANK UNITED KINGDOM,BCSIBRRSXXX,BCSIBRRSXXX BRAZIL,BCSIBRRSXXX,BCSIBRRSXXX TBCZVV XYUXX BANK BRAZIL,/S2412791\nBALCYE THSKMQWBTX\n634 Howard keys\...,/T497120904545335293HDFCINBB,/W2083346537\n1\TJYZLU OMKCVYOZXX\n2\Studio 56...,OUR
1,1,fin.103,ABBYGB2LXXX,BCSIBRRSXXX,57SPWKTUGSQC4NW6,SSTD,TELB/64957810637712,"220518BRL919982054,90","BRL4760690,30",/C12197648810508998500BVSPBRSP,...,ABBYGB2LXXX,ABBYGB2LXXX QTRYXR EYLLO BANK UNITED KINGDOM,CHASBRSPXXX,CHASBRSPXXX BRAZIL,CHASBRSPXXX,CHASBRSPXXX LUJIHW JPSSC BANK BRAZIL,/Z9036039\nFVZFDT BEEMCMKZOD\nFlat 21\nGilbert...,/A879108566593063636BCSIBRRSXXX,/N7764134276\n1\WBIFOY EOKWJMLJAD\n2\Studio 28...,SHA
2,2,fin.103,ABBYGB2LXXX,BCSIBRRSXXX,HSSU6QPZX3HXHOCM,SPRI,CHQB/79966398,"210721GBP7615037,50","INR2937963,20",/G43645187412082124521BCSIBRRSXXX,...,PARABRPRXXX,PARABRPRXXX RSZACI MNJKU BANK BRAZIL,ABBYGB2LXXX,ABBYGB2LXXX UNITED KINGDOM,ABBYGB2LXXX,ABBYGB2LXXX WHWTKW TTAHJ BANK UNITED KINGDOM,/W3653113\nSYCQGN AWBARMQISG\n38 Hill canyon\n...,/X425865724734676749BCSIBRRSXXX,/V9656040837\n1\WJNHGT DTKDBOFSIK\n2\Studio 75...,OUR
3,3,fin.103,ABCBBRSPXXX,ABBYGB2LXXX,YN36VQXZA7VULKES,CRTS,HOLD/359453261,"210502BRL1116,50","GBP47766,90",/O98079440969742544431YORKGB22,...,BCSIBRRSXXX,BCSIBRRSXXX KUJULU DIUXK BANK BRAZIL,ABBYGB2LXXX,ABBYGB2LXXX United Kingdom,ABBYGB2LXXX,ABBYGB2LXXX KKGHIA LMUWS BANK UNITED KINGDOM,/Y1628057\nLYVCKL YKIIXVRKLU\n6 Bernard summit...,/V838759465724051813ABBYGB2LXXX,/D5891142008\n1\LUZMBD TDPATPFGYB\n2\134 Megan...,OUR
4,4,fin.103,PUNBINBBISB,BFATBRS1XXX,DET3OUUTOB8ROLYI,SPAY,PHOI,"220830GBP10264220,40","GBP4719104,80",/B62083417322237904403ABBYGB2LXXX,...,BCSIBRRSXXX,BCSIBRRSXXX OSWCIP QCLNS BANK BRAZIL,ABBYGB2LXXX,ABBYGB2LXXX UNITED KINGDOM,ABBYGB2LXXX,ABBYGB2LXXX IYECHQ RIWVZ BANK UNITED KINGDOM,/C8841536\nVKPUHM PNBRLBRICK\n749 Dixon turnpi...,/G874551919922471904PARABRPRXXX,/P7464361768\n1\QMHPAX GJTKVLJQQQ\n2\718 Melan...,BEN
5,5,fin.103,LOYDGB2L,BCSIBRRSXXX,HMGJCPBYLAQC3RKS,SSTD,PHOB/67155,"211208GBP1002839,50","GBP1056,90",/O24663090317178612075BTGPBRSP,...,GOLDBRSPXXX,GOLDBRSPXXX RCOFZY SUAZJ BANK BRAZIL,ABBYGB2LXXX,ABBYGB2LXXX UNITED KINGDOM,ABBYGB2LXXX,ABBYGB2LXXX DBEGBM AHGHK BANK UNITED KINGDOM,/M9921262\nZERCEU HOZBFTJWXR\n791 Denise strea...,/C559942106949140595UBININBBXXX,/P6955121921\n1\XYNVOE NTNYYSHYBN\n2\4 Smith j...,BEN
6,6,fin.103,ABBYGB2LXXX,ABBYGB2LXXX,KPN8IDRVAYNIPQAR,SSTD,REPA,"210327GBP732037378,10","BRL1056,90",/A32425732061563809141PARABRPRXXX,...,BCSIBRRSXXX,BCSIBRRSXXX OGXYXK CZXZH BANK BRAZIL,BCSIBRRSXXX,BCSIBRRSXXX BRAZIL,BCSIBRRSXXX,BCSIBRRSXXX OGXYXK CZXZH BANK BRAZIL,/J9317152\nKZXYLF ZHURHYWPYR\nStudio 66\nSusan...,/Y642043149058574298HSBCGB2L,/N6044191711\n1\FIYJNJ XMXGJVMYQN\n2\Studio 09...,OUR
7,7,fin.103,MSDWBRSPXXX,ABBYGB2LXXX,TMWF2MIWCO83GW1G,CRTS,PHOI,"220419GBP10713033,00","BRL6198032,20",/E57158898134958564055BCSIBRRSXXX,...,CLYDGB2S,CLYDGB2S CPTAYH RTACC BANK UNITED KINGDOM,PARABRPRXXX,PARABRPRXXX BRAZIL,PARABRPRXXX,PARABRPRXXX EUHAVX VHZBT BANK BRAZIL,/M4747512\nUATZWZ EMVRFDZJBU\nStudio 60u\nDami...,/F534940023536855290BCSIBRRSXXX,/L1938983540\n1\XYVRSS RWSCEBPVME\n2\399 Burns...,OUR
8,8,fin.103,TOPZBRRSXXX,ABBYGB2LXXX,XXM3JS5U7DHZJSYM,SSTD,TELB,"211226GBP1116,50","INR222986387,30",/H65485469958373849012ABBYGB2LXXX,...,TSBSGB2A,TSBSGB2A MPWQJT EDLVC BANK UNITED KINGDOM,ABBYGB2LXXX,ABBYGB2LXXX UNITED KINGDOM,ABBYGB2LXXX,ABBYGB2LXXX OJJQHK FYSRX BANK UNITED KINGDOM,/N8536450\nHIYCGZ MBLMFIQSEE\nStudio 8\nCarol ...,/H679415063184588923BSFRBRSPXXX,/E8222515548\n1\QENHUN OBPKSUGECC\n2\2 Russell...,OUR
9,9,fin.103,BSCHBRSPXXX,ABBYGB2LXXX,7C4YF4QSYDV9RSKF,SSTD,CHQB/876423,"220209INR792459847,20","BRL340354,50",/L77982798248725853373BCSIBRRSXXX,...,ABBYGB2LXXX,ABBYGB2LXXX SWXCZA CNQQO BANK UNITED KINGDOM,//IN86360598611SBININBBXXX,//IN86360598611SBININBBXXX India,SBININBBXXX,//IN86360598611SBININBBXXX FWOULI ZCYKI BANK I...,/T1924765\nYASHOO VGASFWOELA\n17 White groves\...,/A848244270211364473ABBYGB2LXXX,/O1932811560\n1\BDJZXC CQPCTUDAJD\n2\Studio 19...,SHA


### 4. Clean-up

#### Unsubscribe to the listing (optional)

If you would like to unsubscribe to the algorithm, follow these steps. Before you cancel the subscription, ensure that you do not have any [deployable model](https://console.aws.amazon.com/sagemaker/home#/models) created from the model package or using the algorithm. Note - You can find this information by looking at the container name associated with the model. 

**Steps to unsubscribe to product from AWS Marketplace**:
1. Navigate to __Machine Learning__ tab on [__Your Software subscriptions page__](https://aws.amazon.com/marketplace/ai/library?productType=ml&ref_=mlmp_gitdemo_indust)
2. Locate the listing that you want to cancel the subscription for, and then choose __Cancel Subscription__  to cancel the subscription.