# Workshop: Audio Analytics
## Transcribe, analyze, and index audio using Steamship's Audio Analytics Package 


This notebook demonstrates how to create and use the `audio-analytics` package. 

Instances of the package will be created in such a way that each instance will have its own workspace where data is stored. Workspaces can be used to create personal data vaults for clients as they are isolated from each other.

In [1]:
from pathlib import Path
from utils import upload_audio_file
import time

In [2]:
import logging

logging.getLogger().setLevel(logging.ERROR)  # Set an appropriate logging level

In [3]:
from steamship import Steamship, PackageInstance, File, MimeTypes, Tag
from steamship.base import TaskState

### Constants

Using this notebook requires a Steamship API key. If you do not have one, you can create one: 

> npm install -g @steamship/cli && steamship login

In [4]:
# The globally unique name of the package. Think of this like an NPM or PyPI package name.
PACKAGE_HANDLE = "audio-analytics"

# The name that identifies the instance of the package we'll be working with.
INSTANCE_HANDLE = "audio-analytics-test"

# The workspace in which that instance will reside.
# In practice, it's best for each package instance to reside in its own workspace.
WORKSPACE_HANDLE = INSTANCE_HANDLE

# Authenticate with Steamship

Authentication with Steamship is handled using a Steamship client object. To instantiate a Steamship client you'll need your own API key. 

In [5]:
ship = Steamship(workspace=WORKSPACE_HANDLE)

# Create a new instance of the package

In [6]:
instance = ship.use(PACKAGE_HANDLE, INSTANCE_HANDLE)

print(f"""
{'Invocation URL': <20}: {instance.invocation_url}
{'Bearer Token': <20}: {instance.client.config.api_key} 
{'Instance ID': <20}: {instance.id} 
{'Version ID': <20}: {instance.package_version_id} 
{'App ID': <20}: {instance.package_id}
{'Workspace ID': <20}: {instance.workspace_id}
"""
)


Invocation URL      : https://steamship.apps.staging.steamship.com/audio-analytics-test/audio-analytics-test/
Bearer Token        : FD08F643-9408-46E6-A6BB-2D935B8BF836 
Instance ID         : 4EB9855D-9254-4BE0-AB0D-C52CBCF9A1E3 
Version ID          : F091E799-DD1B-40D2-89C8-6F40FCA1E07E 
App ID              : 26EC7831-D149-4D01-8C35-66C704549F42
Workspace ID        : 53300572-C932-407D-844E-5B38342F6B1A



# Submitting mp3 files for analysis

The audio analytics package transcribes mp3 files that are accessible via URL. Any publically accessible URL will work including pre-signed url's to S3 or Google Storage.

In the code sample below we show you how to submit an URL of your MP3 file to the `analyze` API.

The `POST` request triggers the asynchronous transcription and analysis of your file and stores the results in your workspace for future access. 

After submitting your `POST` request you will receive a response that includes a `task_id` and `status` key. 

The `status` key shows you the status of your analysis task. It will start with `"waiting"`, and then proceed to `"processing"`, and finally to `"completed"` or `"failed"`. 


If you want to upload local audio files directly to your workspace you can use the helper method `upload_audio_file`. 

In [7]:
# URL pointing to your publically accessible audio file
audio_url = "https://s3.us-east-2.amazonaws.com/static-assets.steamship.com/audio.mp3" or upload_audio_file(
    ship, 
    Path("./../tests/data/inputs/audio.mp3"),
    MimeTypes.MP3,
)

In [8]:
# Transcribe and analyze your audio file
response = instance.invoke("analyze_url", url=audio_url)

task_id = response["task_id"]
status = response["status"]

print(
    f"""
task ID: {task_id}
status: {status}
"""
)


task ID: 7B0E8FBD-BD73-4163-93E7-953CB140CF80
status: waiting



### Retrieving audio analysis results

As your file is being processed the `"status"` will go from `"waiting"` to `"processing"` to `"completed"` or `"failed"`. You can check in on the progress of your analysis task using the `"task_id"` by calling the `get_file` endpoint. 

You'll have to make repeated `GET` requests untill the status converges to `"completed"` or `"failed"`. Once the `status` key is set to `"completed"`, you'll see a `file` key that represents the transcription augmented with language AI features such as entities and emotion. 


To facilitate future file retrieval Steamship will store and index the augmented transcription in your workspace. 

In this notebook, we'll inspect the contents of the `file` response. For more info on how to query audio files in your workspace using Steamships query language scroll to [Query your workspace](#query_workspace).  

In [9]:
response = instance.invoke("status", task_id=task_id)

In [10]:
n_retries = 0
while n_retries <= 100 and response["status"] != TaskState.succeeded:
    response = instance.invoke("status", task_id=task_id)
    time.sleep(2)

file = response["file"]
file = File.parse_obj(file)

# Extracting transcription and Language AI features

A file consists of one or more blocks. Blocks are used by Steamship to fragment large corpuses into reasonably sized chunks for scalability. Most transcripts fit into a single block so you won't have to worry about accessing more than one block per audio file. 

A block contains the transcription augmented with language AI tags bundled into the `text` and `tags` fields. 

In [11]:
block = file.blocks[0]

### Getting the transcription

The `text` field contains the raw transcription. 

In [12]:
transcription = block.text
print(transcription[:1_000])

Hello? Hello. Can I speak to Sally, please? Speaking. Hi. This is Hannah. Hi, Hannah. What's up? Kate is sick. That's too bad. How about going to see her? That's a good idea. What time shall we meet? How about at two? Sounds good. Let's meet at the bus stop. Okay. See you then. How are you? I'm okay now. I can go to school on Monday. Good. Kate, here's an apple pie. I made it for you. Thanks. I like apple pie. Let's roleplay.


### Getting the language AI features

The `tags` field contains language AI features anchored to the transcription. Anchoring language AI features to character positions in the transcript enables you to navigate overlapping features more easily using [queries](#query_workspace). For example, the positional indexes enable you to ask questions such as "Give me the audio fragments where `Paul` talked about `Feature X`". 

In [13]:
tags = block.tags
feature_types = {tag.kind for tag in tags}
feature_types_str = "\n * " + "\n * ".join(feature_types)
print(
    f"There are {len(tags)} language AI features "
    f"across {len(feature_types)} unique language AI feature types: {feature_types_str}"
)

There are 160 language AI features across 7 unique language AI feature types: 
 * speaker
 * sentiment
 * chapter
 * entity
 * topic_summary
 * timestamp
 * topic


### Timestamps 

Timestamp tags are used to attatch timestamps, in miliseconds, to transcribed words.  

#### Fields: 

* `kind`: "timestamp".
* `name`: The word itself that was detected.
* `start_idx`: Starting index, in characters, of the transcribed word. 
* `end_idx`:  Ending index, in characters, of the transcribed word. 
* `value.start_time`: Starting timestamp, in milliseconds, of the word in the transcript
* `value.end_time`: Ending timestamp, in milliseconds, of the word in the transcript

In [14]:
timestamp_tags = [tag for tag in tags if tag.kind == "timestamp"]

In [15]:
timestamp_tags[0].__dict__

{'client': None,
 'id': 'BBDAB88C-AF8F-4EED-B408-DE4C3AE6DC54',
 'file_id': 'D4F142CF-D65F-49A8-857B-495595D69B45',
 'block_id': 'CD5EC1C4-50D3-4C84-8DFC-AB5C5A9DB8B3',
 'kind': 'timestamp',
 'name': 'Hello?',
 'value': {'end_time': 8000, 'start_time': 7437},
 'start_idx': 0,
 'end_idx': 6}

### Speakers 

Speaker tags are are used to attribute utterances to speakers. Speakers are labeled using sequential letters from the alphabet.   

#### Fields: 

* `kind`: "speaker"
* `name`: The speaker label. 
* `start_idx`: Starting index, in characters, of the text in the transcript. 
* `end_idx`:  Starting index, in characters, of the text in the transcript. 
* `value.start_time`: Starting timestamp, in milliseconds, of the text in the transcript.
* `value.end_time`: Ending timestamp, in milliseconds, of the text in the transcript. 

In [16]:
speaker_tags = [tag for tag in tags if tag.kind == "speaker"]

In [17]:
speaker_tags[0].__dict__

{'client': None,
 'id': '3E03FA1E-79FE-497A-BA73-F6D6654A322C',
 'file_id': 'D4F142CF-D65F-49A8-857B-495595D69B45',
 'block_id': 'CD5EC1C4-50D3-4C84-8DFC-AB5C5A9DB8B3',
 'kind': 'speaker',
 'name': 'A',
 'value': {'end_time': 74112, 'start_time': 7437},
 'start_idx': 0,
 'end_idx': 429}

### Sentiments

Sentiment tags are are used to assign sentiment to each sentence of your transcription. Sentiment is labelled using three possible values: `POSITIVE`, `NEGATIVE`, and `NEUTRAL`.    

#### Fields: 

* `kind`: "sentiment"
* `name`: The detected sentiment, `POSITIVE`, `NEGATIVE`, or `NEUTRAL`.
* `start_idx`: Starting index, in characters, of the text in the transcript. 
* `end_idx`:  Starting index, in characters, of the text in the transcript. 
* `value.confidence`: Confidence score for the detected sentiment.
* `value.start_time`: Starting time offset, in milliseconds, of the text in the transcript.
* `value.end_time`: Ending time offset, in milliseconds, of the text in the transcript. 

In [18]:
sentiment_tags = [tag for tag in tags if tag.kind == "sentiment"]
sentiment_tags[0].__dict__

{'client': None,
 'id': '1D755AEE-4B54-419C-A491-BD50554CC282',
 'file_id': 'D4F142CF-D65F-49A8-857B-495595D69B45',
 'block_id': 'CD5EC1C4-50D3-4C84-8DFC-AB5C5A9DB8B3',
 'kind': 'sentiment',
 'name': 'NEUTRAL',
 'value': {'start_time': 7437,
  'end_time': 8000,
  'confidence': 0.6430088877677917},
 'start_idx': 0,
 'end_idx': 6}

### Entities

Entity tags are are used to highlight entities in your transcription such as people, company names, email addresses, dates, and locations. 

#### Fields: 

* `kind`: "entity"
* `name`: The detected entity.
* `start_idx`: Starting index, in characters, of the text in the transcript. 
* `end_idx`:  Starting index, in characters, of the text in the transcript. 
* `value.type`: The entity type of the detected entity.
* `value.start_time`: Starting time offset, in milliseconds, of the text in the transcript.
* `value.end_time`: Ending time offset, in milliseconds, of the text in the transcript. 

In [19]:
entity_tags = [tag for tag in tags if tag.kind == "entity"]
entity_tags[0].__dict__

{'client': None,
 'id': 'DAA1011F-FB7D-4517-B1E1-F26236236702',
 'file_id': 'D4F142CF-D65F-49A8-857B-495595D69B45',
 'block_id': 'CD5EC1C4-50D3-4C84-8DFC-AB5C5A9DB8B3',
 'kind': 'entity',
 'name': 'Sally',
 'value': {'start_time': 10930, 'end_time': 11322, 'type': 'person_name'},
 'start_idx': 29,
 'end_idx': 35}

### Chapters

Chapter tags are are used to summarize your transcription over time. Chapter tags span across logical chapters that cover a main topic and summarize the content within the chapter via a `gist`, a `headline`, and a `summary`.    

#### Fields: 

* `kind`: "chapter"
* `name`: The sequence number of the chapter.
* `start_idx`: Starting index, in characters, of the text in the chapter. 
* `end_idx`:  Starting index, in characters, of the text in the chapter. 
* `value.gist`: An ultra-short summary of the text in the chapter.
* `value.headline`: A single sentence summary of the text in the chapter.
* `value.summary`: A one paragraph summary of the text in the chapter.
* `value.start_time`: Starting timestamp, in milliseconds, of the text in the transcript.
* `value.end_time`: Ending timestamp, in milliseconds, of the text in the transcript. 

In [20]:
chapter_tags = [tag for tag in tags if tag.kind == "chapter"]
chapter_tags[0].__dict__

{'client': None,
 'id': '07132EF2-1AA8-4036-BF34-1ED258333751',
 'file_id': 'D4F142CF-D65F-49A8-857B-495595D69B45',
 'block_id': 'CD5EC1C4-50D3-4C84-8DFC-AB5C5A9DB8B3',
 'kind': 'chapter',
 'name': '0',
 'value': {'end_time': 74112,
  'start_time': 7437,
  'headline': 'Hannah: Kate is sick; how about going to see Sally',
  'gist': "Kate's illness",
  'summary': "Hannah: Kate is sick. How about going to see her? That's a good idea. What time shall we meet? How about at two? Sounds good. Let's meet at the bus stop."},
 'start_idx': 0,
 'end_idx': 429}

### Topics

Topic tags are are used to highlight topic-specific sequences in your transcription. Each sequence is labelled with one or more topic labels according to the [IAB Taxomony](https://www.iab.com/guidelines/content-taxonomy/). 

#### Fields: 

* `kind`: "topic"
* `name`: The predicted topic label.
* `start_idx`: Starting index, in characters, of the text that was classified with topic labels. 
* `end_idx`:  Starting index, in characters,  of the text that was classified with topic labels. 
* `value.confidence`: Confidence score between 0 and 1 for the detected topic. 
* `value.start_time`: Starting timestamp, in milliseconds,  of the text that was classified with topic labels. .
* `value.end_time`: Ending timestamp, in milliseconds,  of the text that was classified with topic labels. . 

In [21]:
topic_tags = [tag for tag in tags if tag.kind == "topic"]
topic_tags[0].__dict__

{'client': None,
 'id': '27F76BD4-807C-478D-92D5-2F38682DA6F6',
 'file_id': 'D4F142CF-D65F-49A8-857B-495595D69B45',
 'block_id': 'CD5EC1C4-50D3-4C84-8DFC-AB5C5A9DB8B3',
 'kind': 'topic',
 'name': 'FamilyAndRelationships>Parenting',
 'value': {'start_time': 7437,
  'end_time': 56225,
  'confidence': 0.0039001856930553913},
 'start_idx': 0,
 'end_idx': 341}

### Summary

Topic Summary Tags are are used to assign topics to your transcription. Topic Summary Tags, in contrast to Topic Tags, are not anchored to parts of your transcription as they attempt to summarize your whole audio file. Topics are labeled according to the [IAB Taxomony](https://www.iab.com/guidelines/content-taxonomy/).   

#### Fields: 

* `kind`: "topic_summary"
* `name`: The predicted topic label.
* `start_idx`: None
* `end_idx`:  None
* `value.confidence`: Confidence score between 0 and 1 for the detected topic.
* `value.start_time`: None
* `value.end_time`: None

In [22]:
summary_tags = [tag for tag in tags if tag.kind == "topic_summary"]
summary_tags[0].__dict__

{'client': None,
 'id': '67F5CF3F-D9A0-4E67-89B5-2001C3DBC773',
 'file_id': 'D4F142CF-D65F-49A8-857B-495595D69B45',
 'block_id': 'CD5EC1C4-50D3-4C84-8DFC-AB5C5A9DB8B3',
 'kind': 'topic_summary',
 'name': 'Food&Drink>DiningOut',
 'value': {'end_time': None,
  'confidence': 0.0022771090734750032,
  'start_time': None},
 'start_idx': None,
 'end_idx': None}

<a id='query_workspace'></a>
# Query your workspace

Now that the data has been loaded into your workspace, you can use Steamship's built-in query system to find the parts most relevant to you. More information about steamship's query language can be found [here](https://steamship.notion.site/Tag-Query-System-f549c912059642cbbaef7773dbaa0d93).

## Examples

### Query the people that are mentioned across all your audio files

In [23]:
entity_tags = Tag.query(ship, 'kind "entity" and value("type") = "person_name"').tags
unique_entities = {tag.name for tag in entity_tags}
print(f"There are {len(unique_entities)} people referenced in your workspace:")
print(" * " + "\n * ".join(unique_entities))

There are 3 people referenced in your workspace:
 * Kate
 * Sally
 * Hannah


### Query the sentiment over an entity

In [24]:
entity_name = "Kate"
query_tags = Tag.query(
    ship, f'kind "sentiment" and overlaps {{ kind "entity" and name "{entity_name}" }}'
).tags
print(
    "* "
    + "\n* ".join(
        [
            f"{tag.name} | {transcription[tag.start_idx: tag.end_idx]}"
            for tag in query_tags
        ]
    )
)

* POSITIVE | Kate, here's an apple pie.
* NEGATIVE | Kate is sick.


### Retrieve the entities who have a negative sentiment over them.

In [25]:
# Return the entities for which
query_tags = Tag.query(
    ship, 'kind "entity" and overlaps { kind "sentiment" and name "NEGATIVE" }'
).tags
unique_entities = {tag.name for tag in query_tags}
print(
    f"There are {len(unique_entities)} people who have been referenced in a negative context:"
)
print(" * " + "\n * ".join([tag.name for tag in query_tags]))

There are 1 people who have been referenced in a negative context:
 * Kate


### Play the audio fragments talking about a specific entity

In [26]:
# Retrieve the timestamps where the given entity appears
entity_name = "Kate"
timestamp_tags = Tag.query(
    ship, f'kind "timestamp" and overlaps {{ kind "entity" and name "{entity_name}" }}'
).tags

In [27]:
# Download the audio file linked to the transcription
audio_file = File.get(ship, query_tags[0].file_id)
audio_stream = audio_file.raw()

In [28]:
print("Kate is featured at the following timestamps (in seconds):")
print(
    "* " + "\n* ".join({str(tag.value["start_time"] // 1000) for tag in timestamp_tags})
)

Kate is featured at the following timestamps (in seconds):
* 20
* 59


In [29]:
from IPython.display import Audio

Audio(audio_stream, rate=22_050)