In [1]:
%pip install steamship==2.1.31

Note: you may need to restart the kernel to use updated packages.


# Demo: Audio Markdown Package
## Transcribe audio using Steamship's Audio Markdown Package 


This notebook demonstrates how to create and use the `audio-markdown` package. 

Instances of the package will be created in such a way that each instance will have its own workspace where data is stored. Workspaces can be used to create personal data vaults for clients as they are isolated from each other.

In [2]:
from steamship import Steamship, File, MimeTypes, Space, Block, Tag, App, AppInstance
from steamship.data.space import SignedUrl
from steamship.utils.signed_urls import upload_to_signed_url
from steamship.base import TaskState

In [3]:
from pathlib import Path
from datetime import datetime
from uuid import uuid4
from typing import Union
import time
from pprint import pprint

### Constants

Using this notebook requires a Steamship API key. If you do not have one, you can create one by typing the following in a terminal. If you opened this notebook using `npx try-steamship audio-markdown` you will already have an API key configured in `~/steamship.json`. 

> npm install -g @steamship/cli && steamship login

# Create a new instance of the package

In [15]:
instance = Steamship.use("audio-markdown", "audio-markdown-client-test-01")

INFO:root:[Client] Creating/Fetching workspace with handle/id: audio-markdown-client-test-01/None.
INFO:root:Making POST to https://api.steamship.com/api/v1/space/create in space None/None
INFO:root:From POST to https://api.steamship.com/api/v1/space/create got HTTP 200
INFO:root:[Client] Switched to workspace audio-markdown-client-test-01/7E4972E1-7836-4658-B10C-1C6D3B9D8032
INFO:root:Making POST to https://api.steamship.com/api/v1/app/instance/create in space None/None
INFO:root:From POST to https://api.steamship.com/api/v1/app/instance/create got HTTP 200


In [16]:
print(
    f"""
{'Invocation URL': <20}: {instance.invocation_url}
{'Instance ID': <20}: {instance.id}
{'Version ID': <20}: {instance.app_version_id} 
{'App ID': <20}: {instance.app_id}
{'Workspace ID': <20}: {instance.space_id}
"""
)


Invocation URL      : https://doug.steamship.run/audio-markdown-client-test-01/audio-markdown-client-test-01/
Instance ID         : 9179C560-2F40-46FA-886B-F07A904ED781
Version ID          : 2AC208B4-92FB-4F64-A4E1-8D8182010F32 
App ID              : D4B4D028-D4E4-48DA-92F7-C4E4E4C28918
Workspace ID        : 7E4972E1-7836-4658-B10C-1C6D3B9D8032



# Submitting mp3 files for analysis

The audio analytics package transcribes mp3 files that are accessible via URL. Any publically accessible URL will work including pre-signed url's to S3 or Google Storage.

In the code sample below we show you how to submit an URL of your MP3 file to the `analyze` API.

The `POST` request triggers the asynchronous transcription and analysis of your file and stores the results in your workspace for future access. 

After submitting your `POST` request you will receive a response that includes a `task_id` and `status` key. 

The `status` key shows you the status of your analysis task. It will start with `"waiting"`, and then proceed to `"processing"`, and finally to `"completed"` or `"failed"`. 


If you want to upload local audio files directly to your workspace you can use the helper method `upload_audio_file`. 

In [17]:
# URL pointing to an audio file
audio_url = "https://api.webm.to/static/downloads/c0b449c32c9e4b4f9444167dcc1bcd1d/markdown_test_4.webm"

In [18]:
# Transcribe and analyze your audio file
transcribe_url_response = instance.post("transcribe_url", url=audio_url).data

task_id = transcribe_url_response["task_id"]
status = transcribe_url_response["status"]

print(
    f"""
task ID: {task_id}
status: {status}
"""
)

INFO:root:Making POST to https://api.steamship.com/api/v1/space/get in space None/None
INFO:root:From POST to https://api.steamship.com/api/v1/space/get got HTTP 200
INFO:root:Making POST to https://doug.steamship.run/audio-markdown-client-test-01/audio-markdown-client-test-01/transcribe_url in space None/7E4972E1-7836-4658-B10C-1C6D3B9D8032
INFO:root:From POST to https://doug.steamship.run/audio-markdown-client-test-01/audio-markdown-client-test-01/transcribe_url got HTTP 200



task ID: 57462055-3997-4925-988E-A9EC0A89B3E0
status: waiting



### Retrieving audio analysis results

As your file is being processed the `"status"` will go from `"waiting"` to `"processing"` to `"completed"` or `"failed"`. You can check in on the progress of your analysis task using the `"task_id"` by calling the `get_markdown` endpoint. 

You'll have to make repeated `GET` requests untill the status converges to `"completed"` or `"failed"`. Once the `status` key is set to `"completed"`, you'll see a `file` key that represents the transcription augmented with language AI features such as entities and emotion. 


To facilitate future file retrieval Steamship will store and index the augmented transcription in your workspace. 

In this notebook, we'll inspect the contents of the `file` response. For more info on how to query audio files in your workspace using Steamships query language scroll to [Query your workspace](#query_workspace).

In [19]:
task_id

'57462055-3997-4925-988E-A9EC0A89B3E0'

In [20]:
instance.post("get_markdown", task_id=task_id)

INFO:root:Making POST to https://doug.steamship.run/audio-markdown-client-test-01/audio-markdown-client-test-01/get_markdown in space None/7E4972E1-7836-4658-B10C-1C6D3B9D8032
INFO:root:From POST to https://doug.steamship.run/audio-markdown-client-test-01/audio-markdown-client-test-01/get_markdown got HTTP 200


Response(expect=None, task=None, data_={'task_id': '57462055-3997-4925-988E-A9EC0A89B3E0', 'status': 'running'}, error=None, client=Steamship(config=Configuration(api_key='19A61049-3F5C-4928-A250-463C3D053D81', api_base=HttpUrl('https://api.steamship.com/api/v1/', scheme='https', host='api.steamship.com', tld='com', host_type='domain', port='443', path='/api/v1/'), app_base=HttpUrl('https://steamship.run/', scheme='https', host='steamship.run', tld='run', host_type='domain', port='443', path='/'), web_base=HttpUrl('https://app.steamship.com/', scheme='https', host='app.steamship.com', tld='com', host_type='domain', port='443', path='/'), space_id='7E4972E1-7836-4658-B10C-1C6D3B9D8032', space_handle='audio-markdown-client-test-01', profile=None)))

In [22]:
n_retries = 0

In [23]:
while n_retries <= 100 and status != TaskState.succeeded:
    response = instance.post("get_markdown", task_id=task_id)

    if response.task and response.task.state == TaskState.failed:
        print(f"[FAILED] {response.task.status_message}")
        break

    status = response.data["status"]

    print(f"[Try {n_retries}] Transcription is {status}.")
    if status == "succeeded":
        break
    time.sleep(2)
    n_retries += 1

response = instance.post("get_markdown", task_id=task_id)
file = response.data["markdown"]


INFO:root:Making POST to https://doug.steamship.run/audio-markdown-client-test-01/audio-markdown-client-test-01/get_markdown in space None/7E4972E1-7836-4658-B10C-1C6D3B9D8032
INFO:root:From POST to https://doug.steamship.run/audio-markdown-client-test-01/audio-markdown-client-test-01/get_markdown got HTTP 200
INFO:root:Making POST to https://doug.steamship.run/audio-markdown-client-test-01/audio-markdown-client-test-01/get_markdown in space None/7E4972E1-7836-4658-B10C-1C6D3B9D8032


[Try 0] Transcription is succeeded.


INFO:root:From POST to https://doug.steamship.run/audio-markdown-client-test-01/audio-markdown-client-test-01/get_markdown got HTTP 200


# testing this Deam Ship Audio Markdown Package 

Here's a list of things we'd like to check, 
1. whisper model, 
2. markdown extraction, 
3. webpage, 

Here's a list of things we don't care about, 
* misspellings, 
* weird grammar, 
# this is the first heading 
## this is the second heading 
### this is the third heading 
#### this is the fourth heading 
##### this is the sixth element. 
###### this is the sixth heading 

Thank you for listening to our tests.


In [24]:
from IPython.display import display, Markdown
display(Markdown(file))

# testing this Deam Ship Audio Markdown Package 

Here's a list of things we'd like to check, 
1. whisper model, 
2. markdown extraction, 
3. webpage, 

Here's a list of things we don't care about, 
* misspellings, 
* weird grammar, 
# this is the first heading 
## this is the second heading 
### this is the third heading 
#### this is the fourth heading 
##### this is the sixth element. 
###### this is the sixth heading 

Thank you for listening to our tests.