In [10]:
# Reload jupyter notebook when underlying files change
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## General instructions

1. Use the `requirements.txt` file to install the required packages

### Define the data we are going to use

In [11]:
# Edit the values here for your local setup
import pathlib, json
DATA_DIR = pathlib.Path('/Users/edwardatkins/Downloads/TakeHomeTaskData')
data_path = DATA_DIR /  'my-first-million_how-to-build-a-community.json'
wave_file_loc = DATA_DIR / 'my-first-million_how-to-build-a-community.wav'
data = json.loads(data_path.read_text())
audio_data = open(wave_file_loc, 'rb').read()

## ACCESS 1: Python Package

In [14]:
from clipper import Clipper

In [24]:
# Initialize the object with the data
clipper = Clipper(audio_data, data, save_loc='/tmp/clip.wav')

In [25]:
# Run the process that identifies the best clip
result = clipper.run(); result

Calculating embeddings for entire text list
Calculating average embedding for entire document
Finding best sliding window
be like an investment you're basic betting . Today that I want to . I want to hold a block of Sam's kind of like advice or coaching , or you know just hang out time , because I think that's going to go up in value over time yeah . So anyway , it interests me - and I agree - I think , we're aligned there . Can I tell you about one more interesting thing that is related to rich people . I yeah so have you ever heard of this company called the Wellfx ? No , what is it okay , so every year wealthx is at Waltex is a company that is a database company and what they do is they basically use publicly available records , so property records and and sometimes you can get people's tax returns and things like that ,
5529 5679
Find better start time
Max gap:  1.3600000000001273
Max gap index:  5506


In [26]:
# Cut the audio file
clipper.cut_audio(result)

In [27]:
# Play the audio in the notebook!
import IPython.display as ipd
ipd.Audio('/tmp/clip.wav') # load a local WAV file

## ACCESS 2: Using FASTAPI
1. Start the FastAPI server using `uvicorn main:app --reload`

In [28]:
### Send post request to http://127.0.0.1:8000/clip with the transcript data

import requests
import json

with open(data_path) as f:
    data = json.load(f)

r = requests.post('http://127.0.0.1:8000/clip', json=data)
result = r.json(); result

{'_id': 'd5a2aba6-c2f1-4d96-9106-71b415886f20'}

In [29]:
### Use the ID returned to upload the audio file if desired
url = f'http://127.0.0.1:8000/clip/upload_audio/{result["_id"]}'
r = requests.post(url, files={'file': open(wave_file_loc, 'rb')})
r.json()

{'_id': 'd5a2aba6-c2f1-4d96-9106-71b415886f20'}

In [35]:
# Get the clip result
r = requests.get(f'http://127.0.0.1:8000/clip/get_text/{result["_id"]}')
r.json()

{'text': "so I think that's the other thing you could offer a value is basically time with people of value and that time could be like an investment you're basic betting . Today that I want to . I want to hold a block of Sam's kind of like advice or coaching , or you know just hang out time , because I think that's going to go up in value over time yeah . So anyway , it interests me - and I agree - I think , we're aligned there . Can I tell you about one more interesting thing that is related to rich people . I yeah so have you ever heard of this company called the Wellfx ? No , what is it okay , so every year wealthx is at Waltex is a company that is a database company and what they do is they basically use publicly available records , so property records and and sometimes you can get people's tax returns and things like that ,",
 'window_start_token': 5506,
 'window_end_token': 5679}

In [36]:
# Get the audio clip, this currently assumes a local file system
# Ideally this would be uploaded to an accessible S3 Bucket or similar

r = requests.get(f'http://127.0.0.1:8000/clip/get_audio/{result["_id"]}')
result = r.json(); result

{'audio_loc': '/tmp/d5a2aba6-c2f1-4d96-9106-71b415886f20_clip.wav'}

In [37]:
import IPython.display as ipd
ipd.Audio(result['audio_loc']) # load a local WAV file