<a href="https://colab.research.google.com/github/MK316/OpenAI/blob/main/SR_Whisper_tutorialoriginal.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Open AI {Whisper}

* [Video tutorial](https://www.youtube.com/watch?v=Wc4bQxuypo0&t=210s)
* [Online blog](https://openai.com/blog/whisper/)

1. Set GPU in Runtime (colab menu-Runtime) 
2. Install the OpenAI Whisper Python package

In [None]:
!pip install git+https://github.com/openai/whisper.git -q

In [None]:
# Checking which GPU is being used in the current runtime
!nvidia-smi -L

Since we'll be using youtube video(audio), we import {pytube}

In [None]:
!pip install pytube -q

In [None]:
import whisper
from pytube import YouTube

"Whisper has a variety of models of varying sizes. The large model will be more accurate but will be more resource intensive."

|Size|Parameters|English-only model|Multilingual model|
|--|--|--|--|
|tiny|39M|V|V|
|base|74M|V|V|
|small|244M|V|V|
|medium|769M|V|V|
|large|1550M|V|V|

The base model will be sufficient for our needs.

In [None]:
model = whisper.load_model('base')

In [None]:
youtube_video_url = "https://www.youtube.com/watch?v=NT2H9iyd-ms"
youtube_video = YouTube(youtube_video_url)

In [None]:
youtube_video.title

In [None]:
#Meta information in the video link
dir(youtube_video)

In [None]:
youtube_video.streams

We'll be using the audio channel. Thus, we'll filter down to audio streams only. 

In [None]:
for streams in youtube_video.streams:
  print(streams)

In [None]:
# Filtering
streams = youtube_video.streams.filter(only_audio=True)
streams

In [None]:
stream = streams.first()
stream

???????? How do we know the audio file name?

In [None]:
stream.download(filename = 'fed_meeting.mp4')

Cleaning process (single speaker)

timeline 378~2715

In [None]:
!ffmpeg -ss 378 -i fed_meeting.mp4 -t 2715 fed_meeting_trimmed.mp4

In [None]:
import datetime

# save a timestamp before transcription
t1 = datetime.datetime.now()
print(f"started at {t1}")

# do the transcription
output = model.transcribe("fed_meeting_trimmed.mp4")

# show time elapsed after transcription is complete.
t2 = datetime.datetime.now()
print(f"ended at {t2}")
print(f"time elapsed: {t2 - t1}")

In [None]:
# output itself is too big.
# output
output['text']

In [None]:
for segment in output['segments']:
  print(segment)
  second = int(segment['start'])
  second = second - (second % 5)
  print(second)

## Combining Speech Data with Price Data

Now that we have this speech and the associated timestamps, we can go further by merging these segments into a dataframe containing price data. Let's see how the speech maps to the price of the S&P 500. I have retrieved 5 second OHLCV data for SPY using Interactive Brokers. A copy of this data and the code used to retrieve it are located on the website. We can upload spy.csv to Colab and process it using pandas.

data(spy.csv) to download - [file link](https://gist.githubusercontent.com/hackingthemarkets/c6ca7834d2af4932e3ab0d847679c14e/raw/b28fde61c41465565042d75fb2438adc9684d77b/spy.csv)

In [None]:
import pandas as pd
spy = pd.read_csv("spy.csv")

In [None]:
spy

In [None]:
for segment in output['segments']:
   second = int(segment['start'])
   second = second - (second % 5)
   spy.loc[second / 5, 'text'] = segment['text']

spy

In [None]:
spy['percent'] = ((spy['close'] - spy['open']) / spy['open']) * 100

In [None]:
big_downmoves = spy[spy.percent < -0.2]
big_downmoves

Visualize the data on 14:36 using mplfinance:

In [None]:
!pip install mplfinance -q
import mplfinance as mpf

df = spy
df.index = pd.DatetimeIndex(df['date'])

mpf.plot(df['2022-11-02 14:36':'2022-11-02 14:39'],type='candle')

In [None]:
spy[50:70]

What does this mean? Why are we doing this? 

=> This code has done what humans interprete while listening speech. AI extracted meaningful information from audio signal. Isn't it cool?