# Prerequisites

1. Please read the contents of this notebook carefully to run CCAI successfully
2. Ensure that the user has the all permissions to run Sagemaker
3. Ensure that user has all permissions to read/write/edit/copy S3 buckets
4. Some hands-on experience using Amazon SageMaker
5. Conversation audio files must be transcribed to text using AWS Transcribe
6. To use CCAI successfully, ensure your IAM role has these three permissions and you have authority to make AWS Marketplace subscriptions in the AWS account used:
    * aws-marketplace:ViewSubscriptions
    * aws-marketplace:Unsubscribe
    * aws-marketplace:Subscribe
7. CCAI pipeline requires a csv file **df_date.csv** containing the dates of the conversations and unique filenames as an input. Refer to https://github.com/Cedrusco/AWS-CCAI/tree/main/Sample%20Dates%20Dataframe for a sample file
8. For best results a minimum of 20K transcripts/conversations is recommended to run the CCAI pipeline

# Contents

1. [Create & Load S3 Buckets](#1)
1. [Create a training job on AWS SageMaker](#2)  
1. [Review results](#3)



CCAI utilizes a data modeling pipeline which runs remotely, the instructions below will help the user to extract insights from transcribed telephone conversations

<a id="1"></a> 
## 1. Create & Load S3 Buckets

CCAI pipeline requires the user to create two S3 Buckets with below listed names:
 * "cba-transcribe-input"
 * "cba-call-analytics-output"

### 1.1 cba-transcribe-input

This bucket needs to be utilized as a storage place for all the transcribed conversations. The input files need to be in .json format with a specific structure file. Review sample input files here- **https://github.com/Cedrusco/AWS-CCAI/tree/main/Sample%20Input%20Data**


### 1.2 cba-call-analytics-output

1. The S3 output bucket will contain output files once CCAI pipeline is succesfully completed. 
2. Every time the pipeline is run a new folder is created with a format "CCAI_batchId_(date/time)" specifying the date and time the pipeline was started. Review sample outputs here - **https://github.com/Cedrusco/AWS-CCAI/tree/main/Sample%20Output%20Data**.
3. After creating the output bucket create a folder and name it Date_dataframe. Upload the **df_date.csv** containing the fileNames and dates into the folder. The file structure must be : s3://cba-call-analytics-output/Date_dataframe/df_date.csv.
4. Sample Date dataframe is here - https://github.com/Cedrusco/AWS-CCAI/tree/main/Sample%20Dates%20Dataframe.


<a id="2"></a> 
# 2. Create a training job on AWS SageMaker

1. Navigate to AWS SageMaker on AWS console and click on **Training Jobs** listed under **Training** located on the left side of the webpage
2. Click on **Create training job** located on the top right corner of the same page
3. A new page loads with a heading **Job Settings**. Under **Job Settings** specify **Job Name** with a desired name
4. Scroll down, under **Algorithm source** select radio button next to **An algorithm subscription from AWS Marketplace** and select **CCAI**
5. Under **Resource configuration** select **ml.m4.xlarge**
6. Scroll down to **Input data configuration** under **S3 location** specify path for the cba-transcribe-input S3 bucket created in step 1
7. Scroll further down to **Output data configuration** under **S3 output path** specify path for the cba-call-analytics-output created in step 2

<a id="3"></a> 
# 3. Review Results

After the training job on Sagemaker is completed, review the **cba-call-analytics-output** bucket for results which should contain 3 reports:
  * Insights_Report.csv - Provides a snapshot of each indicator and KPI by Issue. 
  * Transcripts.csv - Each transcript with Date and indicator
  * Call_Duration.csv - Call Volume and Avg Call Duration by Date
  
