# Coswara Audio Classification

In this notebook, we will demonstrate using a custom SagemMaker PyTorch container to train an acoustic classification model in SageMaker script mode.

In this example, the model take reference to the paper VERY DEEP CONVOLUTIONAL NEURAL NETWORKS FOR RAW WAVEFORMS by Wei Dai et al., you can get more information by reading the paper.

### Dataset

We will use the Coswara dataset to train our network. It is available for free here <https://github.com/iiscleap/Coswara-Data> The data set distribution is here <https://iiscleap.github.io/coswara-blog/coswara/2020/11/23/visualize_coswara_data_metadata.html>

The following are the class labels:
```
0 = healthy 
1 = resp_illness_not_identified
1 = no_resp_illness_exposed 
1 = recovered_full
1 = positive_mild
1 = positive_asymp 
1 = positive_moderate
```

The expected directory structure is as follows with respect to this notebook:

```
/home/ec2-user/SageMaker/Coswara-Data/
|-- 20200413
|   |-- 20200413.csv
|   |-- 20200413.tar.gz.aa
|   |-- 20200413.tar.gz.ab
|   |-- 20200413.tar.gz.ac
|   |-- 20200413.tar.gz.ad
...
|   
`-- combined_data.csv
```

Let's take a look at a sample file to ensure dataset is downloaded to the correct location.

### first process the raw Coswara data
uncompress audio recordings and generate metadata file for each type of recording, including:  
- breathing-deep-metadata.csv  
- breathing-shallow-metadata.csv  
- cough-heavy-metadata.csv  
- cough-shallow-metadata.csv  
- counting-fast-metadata.csv  
- counting-normal-metadata.csv  
- vowel-a-metadata.csv  
- vowel-e-metadata.csv  
- vowel-o-metadata.csv  

In [53]:
!chmod u+x ../preprocess.sh
!../preprocess.sh

In [1]:
import boto3
import sagemaker
from sagemaker import get_execution_role
from sagemaker.pytorch import PyTorch
import warnings
warnings.filterwarnings('ignore')

role = get_execution_role()
ecr_repository_name = 'coswara-audio-classification'
account_id = role.split(':')[4]
region = boto3.Session().region_name
sagemaker_session = sagemaker.Session(default_bucket='sagemaker-audio-classification-{}'.format(account_id))  ## this S3 bucket was created by the same CloudFormation stack for creating this notebook instance
bucket = sagemaker_session.default_bucket()


print('Account: {}'.format(account_id))
print('Region: {}'.format(region))
print('Role: {}'.format(role))
print('S3 Bucket: {}'.format(bucket))

Account: 948279062543
Region: us-east-1
Role: arn:aws:iam::948279062543:role/SageMakerAPIExecutionRole
S3 Bucket: sagemaker-audio-classification-948279062543


In [35]:
## play a sample audio recording
from IPython.display import Audio

coswarapath = '/home/ec2-user/SageMaker/Coswara-Data/20200413/20200413'
audioid = 'l3umDXECeOUTZH8pFN19c2WM4m43'
audiotype = 'counting-normal.wav'
filename = '/'.join([coswarapath, audioid, audiotype])
Audio(filename, autoplay=False)

In [32]:
with open('Dockerfile', 'w') as f:
    f.write("FROM 763104351884.dkr.ecr.{}.amazonaws.com/pytorch-training:1.5.1-gpu-py3\n".format(region))
    f.write("RUN apt-get update && apt-get install -y --allow-downgrades --allow-change-held-packages --no-install-recommends libsndfile1")

In [2]:
%%writefile build_and_push.sh

ACCOUNT_ID=$1
REGION=$2
REPO_NAME=$3
DOCKERFILE=$4
SERVER="${ACCOUNT_ID}.dkr.ecr.${REGION}.amazonaws.com"

echo "ACCOUNT_ID: ${ACCOUNT_ID}"
echo "REPO_NAME: ${REPO_NAME}"
echo "REGION: ${REGION}"
echo "DOCKERFILE: ${DOCKERFILE}"

# Login to retrieve base container
aws ecr get-login-password | docker login --username AWS --password-stdin 763104351884.dkr.ecr.${REGION}.amazonaws.com

docker build -q -f ${DOCKERFILE} -t ${REPO_NAME} .

docker tag ${REPO_NAME} ${ACCOUNT_ID}.dkr.ecr.${REGION}.amazonaws.com/${REPO_NAME}:latest

aws ecr get-login-password | docker login --username AWS --password-stdin ${SERVER}
aws ecr describe-repositories --repository-names ${REPO_NAME} || aws ecr create-repository --repository-name ${REPO_NAME}

docker push ${ACCOUNT_ID}.dkr.ecr.${REGION}.amazonaws.com/${REPO_NAME}:latest

Overwriting build_and_push.sh


In [3]:
!bash build_and_push.sh $account_id $region $ecr_repository_name Dockerfile

ACCOUNT_ID: 948279062543
REPO_NAME: coswara-audio-classification
REGION: us-east-1
DOCKERFILE: Dockerfile
https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded
sha256:914d3711f853599bf370f6d71df22a958e9c1c0068fe2bfd9e133a229b33a853
https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded
{
    "repositories": [
        {
            "repositoryArn": "arn:aws:ecr:us-east-1:948279062543:repository/coswara-audio-classification",
            "registryId": "948279062543",
            "repositoryName": "coswara-audio-classification",
            "repositoryUri": "948279062543.dkr.ecr.us-east-1.amazonaws.com/coswara-audio-classification",
            "createdAt": 1610312698.0,
            "imageTagMutability": "MUTABLE",
            "imageScanningConfiguration": {
                "scanOnPush": false
            },
            "encryptionConfiguration": {
                "encryptionType": "AES256"
            }
    

In [6]:
## first time run this to upload data to S3
train_data = sagemaker_session.upload_data(
    "/home/ec2-user/SageMaker/Coswara-Data/",
    bucket=bucket,
    key_prefix="Coswara-Data",
)

In [None]:
## following run this to avoid upload
train_data = "s3://sagemaker-audio-classification-{}/Coswara-Data".format(account_id)

train_input = sagemaker.session.TrainingInput(train_data,
                                    distribution='FullyReplicated',
                                    content_type='csv',
                                    s3_data_type='S3Prefix')

train_image_uri = '{0}.dkr.ecr.{1}.amazonaws.com/{2}:latest'.format(account_id, region, ecr_repository_name)
print('ECR training container ARN: {}'.format(train_image_uri))

hyperparams = {'lr': 0.0001388900761687841, # learning rate
               'gamma': 0.6165182113724552, # Learning rate step gamma
               'weight-decay': 0.001, # Optimizer regularization
               'stepsize': 5, # Optimizer stepsize
               'epochs': 30, # iterations to stablize
               'batch-size': 256, # train batch size
               'num-workers': 30,
               'csv-file': 'counting-normal-metadata.csv' ## breathing-deep-metadata.csv, breathing-shallow-metadata.csv, cough-heavy-metadata.csv, cough-shallow-metadata.csv, counting-fast-metadata.csv, counting-normal-metadata.csv, vowel-a-metadata.csv, vowel-e-metadata.csv, vowel-o-metadata.csv
              }

pytorch_estimator = PyTorch(image_uri=train_image_uri,
                            entry_point='train.py',
                            source_dir='./',
                            role=role,
                            instance_type='ml.c5.2xlarge',
                            instance_count=1,
                            output_path = "s3://{}/".format(bucket),
                            hyperparameters = hyperparams,
                            metric_definitions = [
                                {'Name': 'test:loss', 'Regex': 'Average loss: ([0-9\\.]+)'},
                                {'Name': 'test:f1', 'Regex': 'F1: ([0-9\\.]+)'},
                                {'Name': 'test:f2', 'Regex': 'F2: ([0-9\\.]+)'},
                                {'Name': 'test:precision', 'Regex': 'Precision: ([0-9\\.]+)'},
                                {'Name': 'test:recall', 'Regex': 'Recall: ([0-9\\.]+)'},
                                {'Name': 'test:accuracy', 'Regex': 'Accuracy: ([0-9\\.]+)'}
                            ]
                           )


pytorch_estimator.fit({'training': train_input}, wait=True)

ECR training container ARN: 948279062543.dkr.ecr.us-east-1.amazonaws.com/coswara-audio-classification:latest
2021-01-11 19:40:21 Starting - Starting the training job...
2021-01-11 19:40:46 Starting - Launching requested ML instancesProfilerReport-1610394020: InProgress
......
2021-01-11 19:41:46 Starting - Preparing the instances for training......
2021-01-11 19:42:47 Downloading - Downloading input data.....................
2021-01-11 19:46:10 Training - Downloading the training image...............
2021-01-11 19:48:52 Training - Training image download completed. Training in progress.[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
[34m2021-01-11 19:48:43,585 sagemaker-training-toolkit INFO     Imported framework sagemaker_pytorch_container.training[0m
[34m2021-01-11 19:48:43,587 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2021-01-11 19:48:43,595 sagemak

[34mNamespace(batch_size=256, csv_file='counting-normal-metadata.csv', data_dir='/opt/ml/input/data/training', epochs=30, gamma=0.6165182113724552, localpath='data', log_interval=10, lr=0.0001388900761687841, model='m3', model_dir='/opt/ml/model', num_workers=30, seed=1, stepsize=5, test_batch_size=64, weight_decay=0.001)[0m
[34mDevice: cpu[0m
[34mRunning on sagemaker[0m
[34mcsv_path /opt/ml/input/data/training/counting-normal-metadata.csv[0m
[34mfile_path /opt/ml/input/data/training[0m
[34m{}[0m
[34mtrain_size: 1166, test_size:292[0m
[34mLoading model: m3[0m
[34mLearning rate: 0.0001388900761687841[0m
[34mTrain Epoch: 1, Loss: 0.7243, Accuracy: 0.5000[0m
[34mTest set: Average loss: 0.6941, F1: 0.3333, F2: 0.2778, Precision: 0.2500, Recall: 0.5000, ROCAUC: 0.5706, Accuracy: 0.5000, corrected prediction ratio: 219/438

[0m
[34mLearning rate: 0.0001388900761687841[0m
[34mTrain Epoch: 2, Loss: 0.6935, Accuracy: 0.5261[0m
[34mTest set: Average loss: 0.6926, F1: 0

In [None]:
## hyperparameter tuning (optional to run)

objective_metric_name = 'test:f2'
objective_type = 'Maximize'
metric_definitions = [
    {'Name': 'test:loss', 'Regex': 'Average loss: ([0-9\\.]+)'},
    {'Name': 'test:f1', 'Regex': 'F1: ([0-9\\.]+)'},
    {'Name': 'test:f2', 'Regex': 'F2: ([0-9\\.]+)'},
    {'Name': 'test:precision', 'Regex': 'Precision: ([0-9\\.]+)'},
    {'Name': 'test:recall', 'Regex': 'Recall: ([0-9\\.]+)'},
    {'Name': 'test:accuracy', 'Regex': 'Accuracy: ([0-9\\.]+)'}
]

hyperparameter_ranges = {
    'lr': sagemaker.tuner.ContinuousParameter(0.0001, 0.1),
    'gamma': sagemaker.tuner.ContinuousParameter(0.001, 1),
    'weight-decay': sagemaker.tuner.CategoricalParameter([0.000001, 0.00001, 0.001]), 
    'stepsize': sagemaker.tuner.CategoricalParameter([1,5,10])
}


tuner = sagemaker.tuner.HyperparameterTuner(pytorch_estimator,
                            objective_metric_name,
                            hyperparameter_ranges,
                            metric_definitions,
                            max_jobs=2,
                            max_parallel_jobs=2,
                            objective_type=objective_type)

tuner.fit({'training': train_input})

In [18]:
from sagemaker.pytorch import PyTorchModel

pytorch_model = PyTorchModel(model_data=pytorch_estimator.model_data, 
                             role=role, 
                             entry_point='inference.py',
                             source_dir='./',
                             py_version='py3',
                             framework_version='1.6.0',
                            )
predictor = pytorch_model.deploy(initial_instance_count=1, instance_type='ml.c5.2xlarge', wait=True)
## The inference endpoint name will be used in SageMaker Client
print("Inference endpoint name: {}".format(pytorch_model.endpoint_name))

---------------!Inference endpoint name: pytorch-inference-2021-01-11-19-24-18-557


The voice classification model has been deoployed as a SageMaker inference endpoint. 
We will test it below. 
First, we will install the dependency:  

In [20]:
!pip install torchaudio

Collecting torchaudio
  Downloading torchaudio-0.7.2-cp36-cp36m-manylinux1_x86_64.whl (7.6 MB)
[K     |████████████████████████████████| 7.6 MB 10.7 MB/s eta 0:00:01
[?25hCollecting torch==1.7.1
  Downloading torch-1.7.1-cp36-cp36m-manylinux1_x86_64.whl (776.8 MB)
[K     |████████████████████████████████| 776.8 MB 11 kB/s s eta 0:00:01   |▏                               | 3.8 MB 22.1 MB/s eta 0:00:35     |▍                               | 9.0 MB 22.1 MB/s eta 0:00:35     |▋                               | 13.9 MB 22.1 MB/s eta 0:00:35     |▋                               | 15.5 MB 22.1 MB/s eta 0:00:35     |▊                               | 17.2 MB 22.1 MB/s eta 0:00:35     |████▌                           | 109.2 MB 66.3 MB/s eta 0:00:11     |█████▌                          | 133.3 MB 74.2 MB/s eta 0:00:09     |██████▋                         | 161.6 MB 74.2 MB/s eta 0:00:09     |███████▎                        | 177.5 MB 74.2 MB/s eta 0:00:09     |███████████████                 |

In [21]:
from coswara_dataset import CoswareDataset
from pathlib import Path
import torch

datapath = Path("/home/ec2-user/SageMaker/Coswara-Data")
csvpath = datapath / "breathing-deep-metadata.csv"

test_set = CoswareDataset(
    csv_path=csvpath,
    file_path=datapath,
    new_sr=8000,
    audio_len=20,
    sampling_ratio=5,
)
test_loader = torch.utils.data.DataLoader(test_set, batch_size=5)

In [22]:
X, y = next(iter(test_loader))
print(X.shape, y)

torch.Size([5, 1, 32000]) tensor([0, 0, 0, 0, 0])


In [23]:
import numpy as np
response = predictor.predict(X.numpy())
response = np.transpose(response, (1, 0, 2))
prediction = response[0].argmax(axis=1)
print(prediction)

[1 1 0 1 1]


In [19]:
import boto3

client = boto3.client('sagemaker-runtime')
response = client.invoke_endpoint(
    EndpointName=pytorch_model.endpoint_name,
    Body='s3://sagemaker-audio-classification-{}/Coswara-Data/20200413/20200413/0Rlzhiz6bybk77wdLjxwy7yLDhg1/breathing-shallow.wav'.format(account_id),
    ContentType='text/csv',
)
response['Body'].read().decode("utf-8")

'[[[-0.9181905388832092, -0.5095610022544861]]]'